Saved in:
| Main Authors: | Alzahrani, Reem, Alshanqiti, Hassan, Hemid, Bushra Bin, Alyafeai, Zaid, Eldesokey, Abdelrahman, Ghanem, Bernard |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.17826 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation
by: Eldesokey, Abdelrahman, et al.
Published: (2025)
by: Eldesokey, Abdelrahman, et al.
Published: (2025)
NearID: Identity Representation Learning via Near-identity Distractors
by: Cvejic, Aleksandar, et al.
Published: (2026)
by: Cvejic, Aleksandar, et al.
Published: (2026)
SoccerLens: Grounded Soccer Video Understanding Beyond Accuracy
by: Elsharkawi, Ismael, et al.
Published: (2026)
by: Elsharkawi, Ismael, et al.
Published: (2026)
LatentMan: Generating Consistent Animated Characters using Image Diffusion Models
by: Eldesokey, Abdelrahman, et al.
Published: (2023)
by: Eldesokey, Abdelrahman, et al.
Published: (2023)
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
by: Eldesokey, Abdelrahman, et al.
Published: (2024)
by: Eldesokey, Abdelrahman, et al.
Published: (2024)
Skill-Aligned Annotation for Reliable Evaluation in Text-to-Image Generation
by: Eldesokey, Abdelrahman, et al.
Published: (2026)
by: Eldesokey, Abdelrahman, et al.
Published: (2026)
PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models
by: Cvejic, Aleksandar, et al.
Published: (2025)
by: Cvejic, Aleksandar, et al.
Published: (2025)
EditCLIP: Representation Learning for Image Editing
by: Wang, Qian, et al.
Published: (2025)
by: Wang, Qian, et al.
Published: (2025)
LineCounter: Learning Handwritten Text Line Segmentation by Counting
by: Li, Deng, et al.
Published: (2021)
by: Li, Deng, et al.
Published: (2021)
Out-of-Distribution Segmentation via Wasserstein-Based Evidential Uncertainty
by: Brosch, Arnold, et al.
Published: (2025)
by: Brosch, Arnold, et al.
Published: (2025)
GroundCount: Grounding Vision-Language Models with Object Detection for Mitigating Counting Hallucinations
by: Chen, Boyuan, et al.
Published: (2026)
by: Chen, Boyuan, et al.
Published: (2026)
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
by: Liu, Shuming, et al.
Published: (2025)
by: Liu, Shuming, et al.
Published: (2025)
ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models
by: Gong, Bingchen, et al.
Published: (2024)
by: Gong, Bingchen, et al.
Published: (2024)
MultiCounter: Multiple Action Agnostic Repetition Counting in Untrimmed Videos
by: Tang, Yin, et al.
Published: (2024)
by: Tang, Yin, et al.
Published: (2024)
Understanding Counting Mechanisms in Large Language and Vision-Language Models
by: Hasani, Hosein, et al.
Published: (2025)
by: Hasani, Hosein, et al.
Published: (2025)
Your Vision-Language Model Can't Even Count to 20: Exposing the Failures of VLMs in Compositional Counting
by: Guo, Xuyang, et al.
Published: (2025)
by: Guo, Xuyang, et al.
Published: (2025)
$β$-CLIP: Text-Conditioned Contrastive Learning for Multi-Granular Vision-Language Alignment
by: Zohra, Fatimah, et al.
Published: (2025)
by: Zohra, Fatimah, et al.
Published: (2025)
Point, Segment and Count: A Generalized Framework for Object Counting
by: Huang, Zhizhong, et al.
Published: (2023)
by: Huang, Zhizhong, et al.
Published: (2023)
Counting Through Occlusion: Framework for Open World Amodal Counting
by: Arib, Safaeid Hossain, et al.
Published: (2025)
by: Arib, Safaeid Hossain, et al.
Published: (2025)
Unveiling the Visual Counting Bottleneck in Vision-Language Models
by: Pang, Xingzhou, et al.
Published: (2026)
by: Pang, Xingzhou, et al.
Published: (2026)
SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic Cues
by: Hinojosa, Carlos, et al.
Published: (2026)
by: Hinojosa, Carlos, et al.
Published: (2026)
Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models
by: Wang, Qian, et al.
Published: (2024)
by: Wang, Qian, et al.
Published: (2024)
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
by: Yao, Ziyu, et al.
Published: (2025)
by: Yao, Ziyu, et al.
Published: (2025)
mmCounter: Static People Counting in Dense Indoor Scenarios Using mmWave Radar
by: Toha, Tarik Reza, et al.
Published: (2025)
by: Toha, Tarik Reza, et al.
Published: (2025)
LVLM-COUNT: Enhancing the Counting Ability of Large Vision-Language Models
by: Qharabagh, Muhammad Fetrat, et al.
Published: (2024)
by: Qharabagh, Muhammad Fetrat, et al.
Published: (2024)
CountSteer: Steering Attention for Object Counting in Diffusion Models
by: Boo, Hyemin, et al.
Published: (2025)
by: Boo, Hyemin, et al.
Published: (2025)
FocalCount: Towards Class-Count Imbalance in Class-Agnostic Counting
by: Zhu, Huilin, et al.
Published: (2025)
by: Zhu, Huilin, et al.
Published: (2025)
Querying Autonomous Vehicle Point Clouds: Enhanced by 3D Object Counting with CounterNet
by: Zhang, Xiaoyu, et al.
Published: (2025)
by: Zhang, Xiaoyu, et al.
Published: (2025)
Can Vision-Language Models Count? A Synthetic Benchmark and Analysis of Attention-Based Interventions
by: Sengupta, Saurav, et al.
Published: (2025)
by: Sengupta, Saurav, et al.
Published: (2025)
Counting Circuits: Mechanistic Interpretability of Visual Reasoning in Large Vision-Language Models
by: Che, Liwei, et al.
Published: (2026)
by: Che, Liwei, et al.
Published: (2026)
Assessing the Visual Enumeration Abilities of Specialized Counting Architectures and Vision-Language Models
by: Hou, Kuinan, et al.
Published: (2025)
by: Hou, Kuinan, et al.
Published: (2025)
AvatarMMC: 3D Head Avatar Generation and Editing with Multi-Modal Conditioning
by: Para, Wamiq Reyaz, et al.
Published: (2024)
by: Para, Wamiq Reyaz, et al.
Published: (2024)
CountFormer: Multi-View Crowd Counting Transformer
by: Mo, Hong, et al.
Published: (2024)
by: Mo, Hong, et al.
Published: (2024)
Rethinking Cell Counting Methods: Decoupling Counting and Localization
by: Zheng, Zixuan, et al.
Published: (2025)
by: Zheng, Zixuan, et al.
Published: (2025)
CountGD: Multi-Modal Open-World Counting
by: Amini-Naieni, Niki, et al.
Published: (2024)
by: Amini-Naieni, Niki, et al.
Published: (2024)
CountGD++: Generalized Prompting for Open-World Counting
by: Amini-Naieni, Niki, et al.
Published: (2025)
by: Amini-Naieni, Niki, et al.
Published: (2025)
Counting Hallucinations in Diffusion Models
by: Fu, Shuai, et al.
Published: (2025)
by: Fu, Shuai, et al.
Published: (2025)
Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
by: Khan, Zaid, et al.
Published: (2024)
by: Khan, Zaid, et al.
Published: (2024)
Beam-Guided Knowledge Replay for Knowledge-Rich Image Captioning using Vision-Language Model
by: AlJunaid, Reem, et al.
Published: (2025)
by: AlJunaid, Reem, et al.
Published: (2025)
CountMamba: Exploring Multi-directional Selective State-Space Models for Plant Counting
by: He, Hulingxiao, et al.
Published: (2024)
by: He, Hulingxiao, et al.
Published: (2024)
Similar Items
-
Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation
by: Eldesokey, Abdelrahman, et al.
Published: (2025) -
NearID: Identity Representation Learning via Near-identity Distractors
by: Cvejic, Aleksandar, et al.
Published: (2026) -
SoccerLens: Grounded Soccer Video Understanding Beyond Accuracy
by: Elsharkawi, Ismael, et al.
Published: (2026) -
LatentMan: Generating Consistent Animated Characters using Image Diffusion Models
by: Eldesokey, Abdelrahman, et al.
Published: (2023) -
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
by: Eldesokey, Abdelrahman, et al.
Published: (2024)