Saved in:
| Main Authors: | Tang, Jingyi, Jiang, Shuai, Su, Fei, Zhao, Zhicheng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.07077 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MindShot: A Few-Shot Brain Decoding Framework via Transferring Cross-Subject Prior and Distilling Frequency Domain Knowledge
by: Jiang, Shuai, et al.
Published: (2024)
by: Jiang, Shuai, et al.
Published: (2024)
Aligning What Vision-Language Models See and Perceive with Adaptive Information Flow
by: Liu, Chengxin, et al.
Published: (2026)
by: Liu, Chengxin, et al.
Published: (2026)
See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding
by: Sun, Boyuan, et al.
Published: (2026)
by: Sun, Boyuan, et al.
Published: (2026)
FrameOracle: Learning What to See and How Much to See in Videos
by: Li, Chaoyu, et al.
Published: (2025)
by: Li, Chaoyu, et al.
Published: (2025)
ViBE: Visual-to-M/EEG Brain Encoding via Spatio-Temporal VAE and Distribution-Aligned Projection
by: Xu, Ganxi, et al.
Published: (2026)
by: Xu, Ganxi, et al.
Published: (2026)
Hierarchical IoU Tracking based on Interval
by: Du, Yunhao, et al.
Published: (2024)
by: Du, Yunhao, et al.
Published: (2024)
YYDS: Visible-Infrared Person Re-Identification with Coarse Descriptions
by: Du, Yunhao, et al.
Published: (2024)
by: Du, Yunhao, et al.
Published: (2024)
Vision-Language Models Can't See the Obvious
by: Dahou, Yasser, et al.
Published: (2025)
by: Dahou, Yasser, et al.
Published: (2025)
Learning More by Seeing Less: Structure First Learning for Efficient, Transferable, and Human-Aligned Vision
by: Li, Tianqin, et al.
Published: (2025)
by: Li, Tianqin, et al.
Published: (2025)
Filter or Compensate: Towards Invariant Representation from Distribution Shift for Anomaly Detection
by: Chen, Zining, et al.
Published: (2024)
by: Chen, Zining, et al.
Published: (2024)
Can Graphs Help Vision SSMs See Better?
by: Parikh, Dhruv, et al.
Published: (2026)
by: Parikh, Dhruv, et al.
Published: (2026)
Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers
by: Shinnick, Zachary, et al.
Published: (2025)
by: Shinnick, Zachary, et al.
Published: (2025)
Modality and Task Adaptation for Enhanced Zero-shot Composed Image Retrieval
by: Li, Haiwen, et al.
Published: (2024)
by: Li, Haiwen, et al.
Published: (2024)
iKUN: Speak to Trackers without Retraining
by: Du, Yunhao, et al.
Published: (2023)
by: Du, Yunhao, et al.
Published: (2023)
Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation
by: Li, Weize, et al.
Published: (2024)
by: Li, Weize, et al.
Published: (2024)
All-in-One: Transferring Vision Foundation Models into Stereo Matching
by: Zhou, Jingyi, et al.
Published: (2024)
by: Zhou, Jingyi, et al.
Published: (2024)
How Well Can Vision Language Models See Image Details?
by: Gou, Chenhui, et al.
Published: (2024)
by: Gou, Chenhui, et al.
Published: (2024)
PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
by: Chen, Zining, et al.
Published: (2024)
by: Chen, Zining, et al.
Published: (2024)
ViEEG: Hierarchical Visual Neural Representation for EEG Brain Decoding
by: Liu, Minxu, et al.
Published: (2025)
by: Liu, Minxu, et al.
Published: (2025)
Can Current AI Models Count What We Mean, Not What They See? A Benchmark and Systematic Evaluation
by: Nguyen, Gia Khanh, et al.
Published: (2025)
by: Nguyen, Gia Khanh, et al.
Published: (2025)
Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching
by: Liu, Yuhan, et al.
Published: (2025)
by: Liu, Yuhan, et al.
Published: (2025)
Time Blindness: Why Video-Language Models Can't See What Humans Can?
by: Upadhyay, Ujjwal, et al.
Published: (2025)
by: Upadhyay, Ujjwal, et al.
Published: (2025)
Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion
by: Ferrante, Matteo, et al.
Published: (2024)
by: Ferrante, Matteo, et al.
Published: (2024)
Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval
by: Li, Haiwen, et al.
Published: (2025)
by: Li, Haiwen, et al.
Published: (2025)
Data-Efficient Generalization for Zero-shot Composed Image Retrieval
by: Chen, Zining, et al.
Published: (2025)
by: Chen, Zining, et al.
Published: (2025)
Rethinking Two-Stage Referring-by-Tracking in Referring Multi-Object Tracking: Make it Strong Again
by: Li, Weize, et al.
Published: (2025)
by: Li, Weize, et al.
Published: (2025)
Can Retrieval Heads See Images? Multimodal Retrieval Heads in Long-Context Vision-Language Models
by: Li, Aaron Branson Cigres, et al.
Published: (2026)
by: Li, Aaron Branson Cigres, et al.
Published: (2026)
See What Matters: Differentiable Grid Sample Pruning for Generalizable Vision-Language-Action Model
by: Feng, Yixu, et al.
Published: (2026)
by: Feng, Yixu, et al.
Published: (2026)
Learning What Helps: Task-Aligned Context Selection for Vision Tasks
by: Guo, Jingyu, et al.
Published: (2025)
by: Guo, Jingyu, et al.
Published: (2025)
CHRep: Cross-modal Histology Representation and Post-hoc Calibration for Spatial Gene Expression Prediction
by: Wang, Changfan, et al.
Published: (2026)
by: Wang, Changfan, et al.
Published: (2026)
What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation
by: Cheng, Yihua, et al.
Published: (2024)
by: Cheng, Yihua, et al.
Published: (2024)
Brain3D: EEG-to-3D Decoding of Visual Representations via Multimodal Reasoning
by: Balloni, Emanuele, et al.
Published: (2026)
by: Balloni, Emanuele, et al.
Published: (2026)
Achieving More Human Brain-Like Vision via Human EEG Representational Alignment
by: Lu, Zitong, et al.
Published: (2024)
by: Lu, Zitong, et al.
Published: (2024)
"See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models
by: Gu, Jihao, et al.
Published: (2025)
by: Gu, Jihao, et al.
Published: (2025)
Conv-INR: Convolutional Implicit Neural Representation for Multimodal Visual Signals
by: Cai, Zhicheng
Published: (2024)
by: Cai, Zhicheng
Published: (2024)
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
by: Han, Jiaming, et al.
Published: (2025)
by: Han, Jiaming, et al.
Published: (2025)
Brain-CLIPLM: Decoding Compressed Semantic Representations in EEG for Language Reconstruction
by: Yang, Xiaoli, et al.
Published: (2026)
by: Yang, Xiaoli, et al.
Published: (2026)
Eye-See-You: Reverse Pass-Through VR and Head Avatars
by: Dash, Ankan, et al.
Published: (2025)
by: Dash, Ankan, et al.
Published: (2025)
Boundary-Refined Prototype Generation: A General End-to-End Paradigm for Semi-Supervised Semantic Segmentation
by: Dong, Junhao, et al.
Published: (2023)
by: Dong, Junhao, et al.
Published: (2023)
VLIC: Vision-Language Models As Perceptual Judges for Human-Aligned Image Compression
by: Sargent, Kyle, et al.
Published: (2025)
by: Sargent, Kyle, et al.
Published: (2025)
Similar Items
-
MindShot: A Few-Shot Brain Decoding Framework via Transferring Cross-Subject Prior and Distilling Frequency Domain Knowledge
by: Jiang, Shuai, et al.
Published: (2024) -
Aligning What Vision-Language Models See and Perceive with Adaptive Information Flow
by: Liu, Chengxin, et al.
Published: (2026) -
See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding
by: Sun, Boyuan, et al.
Published: (2026) -
FrameOracle: Learning What to See and How Much to See in Videos
by: Li, Chaoyu, et al.
Published: (2025) -
ViBE: Visual-to-M/EEG Brain Encoding via Spatio-Temporal VAE and Distribution-Aligned Projection
by: Xu, Ganxi, et al.
Published: (2026)