Saved in:
| Main Authors: | Xu, Qin, Zhu, Lili, Cheng, Xiaoxia, Jiang, Bo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.06959 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hard to See, Hard to Label: Generative and Symbolic Acquisition for Subtle Visual Phenomena
by: Prasad, Renjith, et al.
Published: (2026)
by: Prasad, Renjith, et al.
Published: (2026)
Saccadic Vision for Fine-Grained Visual Classification
by: Schmidt, Johann, et al.
Published: (2025)
by: Schmidt, Johann, et al.
Published: (2025)
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
by: Zhang, Qizhe, et al.
Published: (2024)
by: Zhang, Qizhe, et al.
Published: (2024)
SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation
by: Zhou, Sashuai, et al.
Published: (2026)
by: Zhou, Sashuai, et al.
Published: (2026)
Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization
by: Xu, Qin, et al.
Published: (2024)
by: Xu, Qin, et al.
Published: (2024)
SASP: Strip-Aware Spatial Perception for Fine-Grained Bird Image Classification
by: Wang, Zheng
Published: (2025)
by: Wang, Zheng
Published: (2025)
See Further, Think Deeper: Advancing VLM's Reasoning Ability with Low-level Visual Cues and Reflection
by: Wu, Zhiheng, et al.
Published: (2026)
by: Wu, Zhiheng, et al.
Published: (2026)
On the Reliability of Cue Conflict and Beyond
by: Kim, Pum Jun, et al.
Published: (2026)
by: Kim, Pum Jun, et al.
Published: (2026)
See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition
by: Si, Chongjie, et al.
Published: (2024)
by: Si, Chongjie, et al.
Published: (2024)
Seeing Beyond Frames: Zero-Shot Pedestrian Intention Prediction with Raw Temporal Video and Multimodal Cues
by: Zambare, Pallavi, et al.
Published: (2025)
by: Zambare, Pallavi, et al.
Published: (2025)
Enhancing Fine-Grained Visual Recognition in the Low-Data Regime Through Feature Magnitude Regularization
by: Chapman, Avraham, et al.
Published: (2024)
by: Chapman, Avraham, et al.
Published: (2024)
Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning
by: Chen, Chi-Sheng, et al.
Published: (2024)
by: Chen, Chi-Sheng, et al.
Published: (2024)
ProtoVQA: An Adaptable Prototypical Framework for Explainable Fine-Grained Visual Question Answering
by: Diao, Xingjian, et al.
Published: (2025)
by: Diao, Xingjian, et al.
Published: (2025)
FILA: Fine-Grained Vision Language Models
by: Zhu, Shiding, et al.
Published: (2024)
by: Zhu, Shiding, et al.
Published: (2024)
H3Former: Hypergraph-based Semantic-Aware Aggregation via Hyperbolic Hierarchical Contrastive Loss for Fine-Grained Visual Classification
by: Zhang, Yongji, et al.
Published: (2025)
by: Zhang, Yongji, et al.
Published: (2025)
Fine-Grained ImageNet Classification in the Wild
by: Lymperaiou, Maria, et al.
Published: (2023)
by: Lymperaiou, Maria, et al.
Published: (2023)
MEET: A Million-Scale Dataset for Fine-Grained Geospatial Scene Classification with Zoom-Free Remote Sensing Imagery
by: Li, Yansheng, et al.
Published: (2025)
by: Li, Yansheng, et al.
Published: (2025)
PixelSmile: Toward Fine-Grained Facial Expression Editing
by: Hua, Jiabin, et al.
Published: (2026)
by: Hua, Jiabin, et al.
Published: (2026)
Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection
by: Kim, Taehoon, et al.
Published: (2025)
by: Kim, Taehoon, et al.
Published: (2025)
Helping CLIP See Both the Forest and the Trees: A Decomposition and Description Approach
by: Xue, Leyan, et al.
Published: (2025)
by: Xue, Leyan, et al.
Published: (2025)
Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking
by: Ge, Jiawei, et al.
Published: (2023)
by: Ge, Jiawei, et al.
Published: (2023)
From Coarse to Nuanced: Cross-Modal Alignment of Fine-Grained Linguistic Cues and Visual Salient Regions for Dynamic Emotion Recognition
by: Liu, Yu, et al.
Published: (2025)
by: Liu, Yu, et al.
Published: (2025)
FG-CLIP: Fine-Grained Visual and Textual Alignment
by: Xie, Chunyu, et al.
Published: (2025)
by: Xie, Chunyu, et al.
Published: (2025)
Salient Mask-Guided Vision Transformer for Fine-Grained Classification
by: Demidov, Dmitry, et al.
Published: (2023)
by: Demidov, Dmitry, et al.
Published: (2023)
PAND: Prompt-Aware Neighborhood Distillation for Lightweight Fine-Grained Visual Classification
by: Luo, Qiuming, et al.
Published: (2026)
by: Luo, Qiuming, et al.
Published: (2026)
Fake-in-Facext: Towards Fine-Grained Explainable DeepFake Analysis
by: Qin, Lixiong, et al.
Published: (2025)
by: Qin, Lixiong, et al.
Published: (2025)
Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
by: Huang, Zheng, et al.
Published: (2025)
by: Huang, Zheng, et al.
Published: (2025)
LensWalk: Agentic Video Understanding by Planning How You See in Videos
by: Li, Keliang, et al.
Published: (2026)
by: Li, Keliang, et al.
Published: (2026)
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
by: Qin, Yiming, et al.
Published: (2025)
by: Qin, Yiming, et al.
Published: (2025)
See Before You Code: Learning Visual Priors for Spatially Aware Educational Animation Generation
by: Li, Yuejia, et al.
Published: (2026)
by: Li, Yuejia, et al.
Published: (2026)
Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering
by: Zhang, Zhengxuan, et al.
Published: (2025)
by: Zhang, Zhengxuan, et al.
Published: (2025)
Feature-Enhanced TResNet for Fine-Grained Food Image Classification
by: Liu, Lulu, et al.
Published: (2025)
by: Liu, Lulu, et al.
Published: (2025)
A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis
by: Paul, Dipanjyoti, et al.
Published: (2023)
by: Paul, Dipanjyoti, et al.
Published: (2023)
Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models
by: Jiang, Jiachen, et al.
Published: (2025)
by: Jiang, Jiachen, et al.
Published: (2025)
Fourier Compressor: Frequency-Domain Visual Token Compression for Vision-Language Models
by: Wang, Huanyu, et al.
Published: (2025)
by: Wang, Huanyu, et al.
Published: (2025)
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding
by: Zhu, Fengbin, et al.
Published: (2024)
by: Zhu, Fengbin, et al.
Published: (2024)
VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?
by: Kim, Minkyu, et al.
Published: (2026)
by: Kim, Minkyu, et al.
Published: (2026)
Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking
by: Wang, Shiao, et al.
Published: (2026)
by: Wang, Shiao, et al.
Published: (2026)
Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization
by: Zhang, Zhiwang, et al.
Published: (2025)
by: Zhang, Zhiwang, et al.
Published: (2025)
ProtoQuant: Quantization of Prototypical Parts For General and Fine-Grained Image Classification
by: Janusz, Mikołaj, et al.
Published: (2026)
by: Janusz, Mikołaj, et al.
Published: (2026)
Similar Items
-
Hard to See, Hard to Label: Generative and Symbolic Acquisition for Subtle Visual Phenomena
by: Prasad, Renjith, et al.
Published: (2026) -
Saccadic Vision for Fine-Grained Visual Classification
by: Schmidt, Johann, et al.
Published: (2025) -
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
by: Zhang, Qizhe, et al.
Published: (2024) -
SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation
by: Zhou, Sashuai, et al.
Published: (2026) -
Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization
by: Xu, Qin, et al.
Published: (2024)