:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ha, Hyeonjeong, Ge, Jinjin, Feng, Bo, Ma, Kaixin, Chakraborty, Gargi
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2601.01095
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning
by: Jiang, Yifan, et al.
Published: (2024)

Entity-Centric Reinforcement Learning for Object Manipulation from Pixels
by: Haramati, Dan, et al.
Published: (2024)

Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping
by: Dalal, Dwip, et al.
Published: (2025)

SPoRC-VIST: A Benchmark for Evaluating Generative Natural Narrative in Vision-Language Models
by: Zeng, Yunlin
Published: (2026)

ReasoningTrack: Chain-of-Thought Reasoning for Long-term Vision-Language Tracking
by: Wang, Xiao, et al.
Published: (2025)

HoneyBee: Data Recipes for Vision-Language Reasoners
by: Bansal, Hritik, et al.
Published: (2025)

Evaluating Object-Centric Models beyond Object Discovery
by: Singh, Krishnakant, et al.
Published: (2026)

NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation
by: Feng, X., et al.
Published: (2025)

Reasoning-Enhanced Object-Centric Learning for Videos
by: Li, Jian, et al.
Published: (2024)

EchoAgent: Guideline-Centric Reasoning Agent for Echocardiography Measurement and Interpretation
by: Daghyani, Matin, et al.
Published: (2025)

Towards Sparse Video Understanding and Reasoning
by: Xu, Chenwei, et al.
Published: (2026)

Oh-A-DINO: Understanding and Enhancing Attribute-Level Information in Self-Supervised Object-Centric Representations
by: Wagner, Stefan Sylvius, et al.
Published: (2025)

Energy-Based Transformers are Scalable Learners and Thinkers
by: Gladstone, Alexi, et al.
Published: (2025)

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning
by: Li, Shaoxuan, et al.
Published: (2026)

Trajectory Consistency for One-Step Generation on Euler Mean Flows
by: Li, Zhiqi, et al.
Published: (2026)

Learning Privacy from Visual Entities
by: Xompero, Alessio, et al.
Published: (2025)

UtilGen: Utility-Centric Generative Data Augmentation with Dual-Level Task Adaptation
by: Guo, Jiyu, et al.
Published: (2025)

A Review of Driver Gaze Estimation and Application in Gaze Behavior Understanding
by: Sharma, Pavan Kumar, et al.
Published: (2023)

MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks
by: Ha, Hyeonjeong, et al.
Published: (2025)

Understanding Dataset Distillation via Spectral Filtering
by: Bo, Deyu, et al.
Published: (2025)

Simplified priors for Object-Centric Learning
by: Patil, Vihang, et al.
Published: (2024)

Why Does RL Generalize Better Than SFT? A Data-Centric Perspective on VLM Post-Training
by: Lu, Aojun, et al.
Published: (2026)

Evaluating the Robustness of Off-Road Autonomous Driving Segmentation against Adversarial Attacks: A Dataset-Centric analysis
by: Deoli, Pankaj, et al.
Published: (2024)

MINERVA: Evaluating Complex Video Reasoning
by: Nagrani, Arsha, et al.
Published: (2025)

ChartAgent: A Chart Understanding Framework with Tool Integrated Reasoning
by: Wang, Boran, et al.
Published: (2025)

TextSquare: Scaling up Text-Centric Visual Instruction Tuning
by: Tang, Jingqun, et al.
Published: (2024)

Are Object-Centric Representations Better At Compositional Generalization?
by: Kapl, Ferdinand, et al.
Published: (2026)

Grouped Discrete Representation for Object-Centric Learning
by: Zhao, Rongzhen, et al.
Published: (2024)

Zero-Shot Object-Centric Representation Learning
by: Didolkar, Aniket, et al.
Published: (2024)

Object-Centric Relational Representations for Image Generation
by: Butera, Luca, et al.
Published: (2023)

Object-Centric Diffusion for Efficient Video Editing
by: Kahatapitiya, Kumara, et al.
Published: (2024)

Understanding Model Reprogramming for CLIP via Decoupling Visual Prompts
by: Cai, Chengyi, et al.
Published: (2025)

SOHES: Self-supervised Open-world Hierarchical Entity Segmentation
by: Cao, Shengcao, et al.
Published: (2024)

Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking
by: Kang, Ben, et al.
Published: (2025)

Generating Fine Details of Entity Interactions
by: Gu, Xinyi, et al.
Published: (2025)

Optimized Weighted Voting System for Brain Tumor Classification Using MRI Images
by: Vu, Ha Anh
Published: (2026)

Object-Centric Cropping for Visual Few-Shot Classification
by: Abdali, Aymane, et al.
Published: (2025)

Unsupervised 4D Cardiac Motion Tracking with Spatiotemporal Optical Flow Networks
by: Teng, Long, et al.
Published: (2024)

Understanding Data Influence with Differential Approximation
by: Tan, Haoru, et al.
Published: (2025)

ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention Refinement
by: Salamatian, Ali, et al.
Published: (2025)