:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Qin, Zhu, Lili, Cheng, Xiaoxia, Jiang, Bo
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2508.06959
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hard to See, Hard to Label: Generative and Symbolic Acquisition for Subtle Visual Phenomena
by: Prasad, Renjith, et al.
Published: (2026)

Saccadic Vision for Fine-Grained Visual Classification
by: Schmidt, Johann, et al.
Published: (2025)

Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
by: Zhang, Qizhe, et al.
Published: (2024)

SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation
by: Zhou, Sashuai, et al.
Published: (2026)

Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization
by: Xu, Qin, et al.
Published: (2024)

SASP: Strip-Aware Spatial Perception for Fine-Grained Bird Image Classification
by: Wang, Zheng
Published: (2025)

See Further, Think Deeper: Advancing VLM's Reasoning Ability with Low-level Visual Cues and Reflection
by: Wu, Zhiheng, et al.
Published: (2026)

On the Reliability of Cue Conflict and Beyond
by: Kim, Pum Jun, et al.
Published: (2026)

See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition
by: Si, Chongjie, et al.
Published: (2024)

Seeing Beyond Frames: Zero-Shot Pedestrian Intention Prediction with Raw Temporal Video and Multimodal Cues
by: Zambare, Pallavi, et al.
Published: (2025)

Enhancing Fine-Grained Visual Recognition in the Low-Data Regime Through Feature Magnitude Regularization
by: Chapman, Avraham, et al.
Published: (2024)

Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning
by: Chen, Chi-Sheng, et al.
Published: (2024)

ProtoVQA: An Adaptable Prototypical Framework for Explainable Fine-Grained Visual Question Answering
by: Diao, Xingjian, et al.
Published: (2025)

FILA: Fine-Grained Vision Language Models
by: Zhu, Shiding, et al.
Published: (2024)

H3Former: Hypergraph-based Semantic-Aware Aggregation via Hyperbolic Hierarchical Contrastive Loss for Fine-Grained Visual Classification
by: Zhang, Yongji, et al.
Published: (2025)

Fine-Grained ImageNet Classification in the Wild
by: Lymperaiou, Maria, et al.
Published: (2023)

MEET: A Million-Scale Dataset for Fine-Grained Geospatial Scene Classification with Zoom-Free Remote Sensing Imagery
by: Li, Yansheng, et al.
Published: (2025)

PixelSmile: Toward Fine-Grained Facial Expression Editing
by: Hua, Jiabin, et al.
Published: (2026)

Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection
by: Kim, Taehoon, et al.
Published: (2025)

Helping CLIP See Both the Forest and the Trees: A Decomposition and Description Approach
by: Xue, Leyan, et al.
Published: (2025)

Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking
by: Ge, Jiawei, et al.
Published: (2023)

From Coarse to Nuanced: Cross-Modal Alignment of Fine-Grained Linguistic Cues and Visual Salient Regions for Dynamic Emotion Recognition
by: Liu, Yu, et al.
Published: (2025)

FG-CLIP: Fine-Grained Visual and Textual Alignment
by: Xie, Chunyu, et al.
Published: (2025)

Salient Mask-Guided Vision Transformer for Fine-Grained Classification
by: Demidov, Dmitry, et al.
Published: (2023)

PAND: Prompt-Aware Neighborhood Distillation for Lightweight Fine-Grained Visual Classification
by: Luo, Qiuming, et al.
Published: (2026)

Fake-in-Facext: Towards Fine-Grained Explainable DeepFake Analysis
by: Qin, Lixiong, et al.
Published: (2025)

Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
by: Huang, Zheng, et al.
Published: (2025)

LensWalk: Agentic Video Understanding by Planning How You See in Videos
by: Li, Keliang, et al.
Published: (2026)

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
by: Qin, Yiming, et al.
Published: (2025)

See Before You Code: Learning Visual Priors for Spatially Aware Educational Animation Generation
by: Li, Yuejia, et al.
Published: (2026)

Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering
by: Zhang, Zhengxuan, et al.
Published: (2025)

Feature-Enhanced TResNet for Fine-Grained Food Image Classification
by: Liu, Lulu, et al.
Published: (2025)

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis
by: Paul, Dipanjyoti, et al.
Published: (2023)

Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models
by: Jiang, Jiachen, et al.
Published: (2025)

Fourier Compressor: Frequency-Domain Visual Token Compression for Vision-Language Models
by: Wang, Huanyu, et al.
Published: (2025)

MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding
by: Zhu, Fengbin, et al.
Published: (2024)

VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?
by: Kim, Minkyu, et al.
Published: (2026)

Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking
by: Wang, Shiao, et al.
Published: (2026)

Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization
by: Zhang, Zhiwang, et al.
Published: (2025)

ProtoQuant: Quantization of Prototypical Parts For General and Fine-Grained Image Classification
by: Janusz, Mikołaj, et al.
Published: (2026)