Saved in:
| Main Authors: | Ha, Hyeonjeong, Ge, Jinjin, Feng, Bo, Ma, Kaixin, Chakraborty, Gargi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.01095 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning
by: Jiang, Yifan, et al.
Published: (2024)
by: Jiang, Yifan, et al.
Published: (2024)
Entity-Centric Reinforcement Learning for Object Manipulation from Pixels
by: Haramati, Dan, et al.
Published: (2024)
by: Haramati, Dan, et al.
Published: (2024)
Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping
by: Dalal, Dwip, et al.
Published: (2025)
by: Dalal, Dwip, et al.
Published: (2025)
SPoRC-VIST: A Benchmark for Evaluating Generative Natural Narrative in Vision-Language Models
by: Zeng, Yunlin
Published: (2026)
by: Zeng, Yunlin
Published: (2026)
ReasoningTrack: Chain-of-Thought Reasoning for Long-term Vision-Language Tracking
by: Wang, Xiao, et al.
Published: (2025)
by: Wang, Xiao, et al.
Published: (2025)
HoneyBee: Data Recipes for Vision-Language Reasoners
by: Bansal, Hritik, et al.
Published: (2025)
by: Bansal, Hritik, et al.
Published: (2025)
Evaluating Object-Centric Models beyond Object Discovery
by: Singh, Krishnakant, et al.
Published: (2026)
by: Singh, Krishnakant, et al.
Published: (2026)
NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation
by: Feng, X., et al.
Published: (2025)
by: Feng, X., et al.
Published: (2025)
Reasoning-Enhanced Object-Centric Learning for Videos
by: Li, Jian, et al.
Published: (2024)
by: Li, Jian, et al.
Published: (2024)
EchoAgent: Guideline-Centric Reasoning Agent for Echocardiography Measurement and Interpretation
by: Daghyani, Matin, et al.
Published: (2025)
by: Daghyani, Matin, et al.
Published: (2025)
Towards Sparse Video Understanding and Reasoning
by: Xu, Chenwei, et al.
Published: (2026)
by: Xu, Chenwei, et al.
Published: (2026)
Oh-A-DINO: Understanding and Enhancing Attribute-Level Information in Self-Supervised Object-Centric Representations
by: Wagner, Stefan Sylvius, et al.
Published: (2025)
by: Wagner, Stefan Sylvius, et al.
Published: (2025)
Energy-Based Transformers are Scalable Learners and Thinkers
by: Gladstone, Alexi, et al.
Published: (2025)
by: Gladstone, Alexi, et al.
Published: (2025)
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning
by: Li, Shaoxuan, et al.
Published: (2026)
by: Li, Shaoxuan, et al.
Published: (2026)
Trajectory Consistency for One-Step Generation on Euler Mean Flows
by: Li, Zhiqi, et al.
Published: (2026)
by: Li, Zhiqi, et al.
Published: (2026)
Learning Privacy from Visual Entities
by: Xompero, Alessio, et al.
Published: (2025)
by: Xompero, Alessio, et al.
Published: (2025)
UtilGen: Utility-Centric Generative Data Augmentation with Dual-Level Task Adaptation
by: Guo, Jiyu, et al.
Published: (2025)
by: Guo, Jiyu, et al.
Published: (2025)
A Review of Driver Gaze Estimation and Application in Gaze Behavior Understanding
by: Sharma, Pavan Kumar, et al.
Published: (2023)
by: Sharma, Pavan Kumar, et al.
Published: (2023)
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks
by: Ha, Hyeonjeong, et al.
Published: (2025)
by: Ha, Hyeonjeong, et al.
Published: (2025)
Understanding Dataset Distillation via Spectral Filtering
by: Bo, Deyu, et al.
Published: (2025)
by: Bo, Deyu, et al.
Published: (2025)
Simplified priors for Object-Centric Learning
by: Patil, Vihang, et al.
Published: (2024)
by: Patil, Vihang, et al.
Published: (2024)
Why Does RL Generalize Better Than SFT? A Data-Centric Perspective on VLM Post-Training
by: Lu, Aojun, et al.
Published: (2026)
by: Lu, Aojun, et al.
Published: (2026)
Evaluating the Robustness of Off-Road Autonomous Driving Segmentation against Adversarial Attacks: A Dataset-Centric analysis
by: Deoli, Pankaj, et al.
Published: (2024)
by: Deoli, Pankaj, et al.
Published: (2024)
MINERVA: Evaluating Complex Video Reasoning
by: Nagrani, Arsha, et al.
Published: (2025)
by: Nagrani, Arsha, et al.
Published: (2025)
ChartAgent: A Chart Understanding Framework with Tool Integrated Reasoning
by: Wang, Boran, et al.
Published: (2025)
by: Wang, Boran, et al.
Published: (2025)
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
by: Tang, Jingqun, et al.
Published: (2024)
by: Tang, Jingqun, et al.
Published: (2024)
Are Object-Centric Representations Better At Compositional Generalization?
by: Kapl, Ferdinand, et al.
Published: (2026)
by: Kapl, Ferdinand, et al.
Published: (2026)
Grouped Discrete Representation for Object-Centric Learning
by: Zhao, Rongzhen, et al.
Published: (2024)
by: Zhao, Rongzhen, et al.
Published: (2024)
Zero-Shot Object-Centric Representation Learning
by: Didolkar, Aniket, et al.
Published: (2024)
by: Didolkar, Aniket, et al.
Published: (2024)
Object-Centric Relational Representations for Image Generation
by: Butera, Luca, et al.
Published: (2023)
by: Butera, Luca, et al.
Published: (2023)
Object-Centric Diffusion for Efficient Video Editing
by: Kahatapitiya, Kumara, et al.
Published: (2024)
by: Kahatapitiya, Kumara, et al.
Published: (2024)
Understanding Model Reprogramming for CLIP via Decoupling Visual Prompts
by: Cai, Chengyi, et al.
Published: (2025)
by: Cai, Chengyi, et al.
Published: (2025)
SOHES: Self-supervised Open-world Hierarchical Entity Segmentation
by: Cao, Shengcao, et al.
Published: (2024)
by: Cao, Shengcao, et al.
Published: (2024)
Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking
by: Kang, Ben, et al.
Published: (2025)
by: Kang, Ben, et al.
Published: (2025)
Generating Fine Details of Entity Interactions
by: Gu, Xinyi, et al.
Published: (2025)
by: Gu, Xinyi, et al.
Published: (2025)
Optimized Weighted Voting System for Brain Tumor Classification Using MRI Images
by: Vu, Ha Anh
Published: (2026)
by: Vu, Ha Anh
Published: (2026)
Object-Centric Cropping for Visual Few-Shot Classification
by: Abdali, Aymane, et al.
Published: (2025)
by: Abdali, Aymane, et al.
Published: (2025)
Unsupervised 4D Cardiac Motion Tracking with Spatiotemporal Optical Flow Networks
by: Teng, Long, et al.
Published: (2024)
by: Teng, Long, et al.
Published: (2024)
Understanding Data Influence with Differential Approximation
by: Tan, Haoru, et al.
Published: (2025)
by: Tan, Haoru, et al.
Published: (2025)
ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention Refinement
by: Salamatian, Ali, et al.
Published: (2025)
by: Salamatian, Ali, et al.
Published: (2025)
Similar Items
-
MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning
by: Jiang, Yifan, et al.
Published: (2024) -
Entity-Centric Reinforcement Learning for Object Manipulation from Pixels
by: Haramati, Dan, et al.
Published: (2024) -
Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping
by: Dalal, Dwip, et al.
Published: (2025) -
SPoRC-VIST: A Benchmark for Evaluating Generative Natural Narrative in Vision-Language Models
by: Zeng, Yunlin
Published: (2026) -
ReasoningTrack: Chain-of-Thought Reasoning for Long-term Vision-Language Tracking
by: Wang, Xiao, et al.
Published: (2025)