:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Alzahrani, Reem, Alshanqiti, Hassan, Hemid, Bushra Bin, Alyafeai, Zaid, Eldesokey, Abdelrahman, Ghanem, Bernard
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.17826
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation
by: Eldesokey, Abdelrahman, et al.
Published: (2025)

NearID: Identity Representation Learning via Near-identity Distractors
by: Cvejic, Aleksandar, et al.
Published: (2026)

SoccerLens: Grounded Soccer Video Understanding Beyond Accuracy
by: Elsharkawi, Ismael, et al.
Published: (2026)

LatentMan: Generating Consistent Animated Characters using Image Diffusion Models
by: Eldesokey, Abdelrahman, et al.
Published: (2023)

Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
by: Eldesokey, Abdelrahman, et al.
Published: (2024)

Skill-Aligned Annotation for Reliable Evaluation in Text-to-Image Generation
by: Eldesokey, Abdelrahman, et al.
Published: (2026)

PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models
by: Cvejic, Aleksandar, et al.
Published: (2025)

EditCLIP: Representation Learning for Image Editing
by: Wang, Qian, et al.
Published: (2025)

LineCounter: Learning Handwritten Text Line Segmentation by Counting
by: Li, Deng, et al.
Published: (2021)

Out-of-Distribution Segmentation via Wasserstein-Based Evidential Uncertainty
by: Brosch, Arnold, et al.
Published: (2025)

GroundCount: Grounding Vision-Language Models with Object Detection for Mitigating Counting Hallucinations
by: Chen, Boyuan, et al.
Published: (2026)

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
by: Liu, Shuming, et al.
Published: (2025)

ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models
by: Gong, Bingchen, et al.
Published: (2024)

MultiCounter: Multiple Action Agnostic Repetition Counting in Untrimmed Videos
by: Tang, Yin, et al.
Published: (2024)

Understanding Counting Mechanisms in Large Language and Vision-Language Models
by: Hasani, Hosein, et al.
Published: (2025)

Your Vision-Language Model Can't Even Count to 20: Exposing the Failures of VLMs in Compositional Counting
by: Guo, Xuyang, et al.
Published: (2025)

$β$-CLIP: Text-Conditioned Contrastive Learning for Multi-Granular Vision-Language Alignment
by: Zohra, Fatimah, et al.
Published: (2025)

Point, Segment and Count: A Generalized Framework for Object Counting
by: Huang, Zhizhong, et al.
Published: (2023)

Counting Through Occlusion: Framework for Open World Amodal Counting
by: Arib, Safaeid Hossain, et al.
Published: (2025)

Unveiling the Visual Counting Bottleneck in Vision-Language Models
by: Pang, Xingzhou, et al.
Published: (2026)

SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic Cues
by: Hinojosa, Carlos, et al.
Published: (2026)

Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models
by: Wang, Qian, et al.
Published: (2024)

CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
by: Yao, Ziyu, et al.
Published: (2025)

mmCounter: Static People Counting in Dense Indoor Scenarios Using mmWave Radar
by: Toha, Tarik Reza, et al.
Published: (2025)

LVLM-COUNT: Enhancing the Counting Ability of Large Vision-Language Models
by: Qharabagh, Muhammad Fetrat, et al.
Published: (2024)

CountSteer: Steering Attention for Object Counting in Diffusion Models
by: Boo, Hyemin, et al.
Published: (2025)

FocalCount: Towards Class-Count Imbalance in Class-Agnostic Counting
by: Zhu, Huilin, et al.
Published: (2025)

Querying Autonomous Vehicle Point Clouds: Enhanced by 3D Object Counting with CounterNet
by: Zhang, Xiaoyu, et al.
Published: (2025)

Can Vision-Language Models Count? A Synthetic Benchmark and Analysis of Attention-Based Interventions
by: Sengupta, Saurav, et al.
Published: (2025)

Counting Circuits: Mechanistic Interpretability of Visual Reasoning in Large Vision-Language Models
by: Che, Liwei, et al.
Published: (2026)

Assessing the Visual Enumeration Abilities of Specialized Counting Architectures and Vision-Language Models
by: Hou, Kuinan, et al.
Published: (2025)

AvatarMMC: 3D Head Avatar Generation and Editing with Multi-Modal Conditioning
by: Para, Wamiq Reyaz, et al.
Published: (2024)

CountFormer: Multi-View Crowd Counting Transformer
by: Mo, Hong, et al.
Published: (2024)

Rethinking Cell Counting Methods: Decoupling Counting and Localization
by: Zheng, Zixuan, et al.
Published: (2025)

CountGD: Multi-Modal Open-World Counting
by: Amini-Naieni, Niki, et al.
Published: (2024)

CountGD++: Generalized Prompting for Open-World Counting
by: Amini-Naieni, Niki, et al.
Published: (2025)

Counting Hallucinations in Diffusion Models
by: Fu, Shuai, et al.
Published: (2025)

Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
by: Khan, Zaid, et al.
Published: (2024)

Beam-Guided Knowledge Replay for Knowledge-Rich Image Captioning using Vision-Language Model
by: AlJunaid, Reem, et al.
Published: (2025)

CountMamba: Exploring Multi-directional Selective State-Space Models for Plant Counting
by: He, Hulingxiao, et al.
Published: (2024)