:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Huilin, Chen, Tao, Xu, Feng
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2403.10079
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Weakly Supervised Concept Learning for Object-centric Visual Reasoning
by: Tiwari, Sparsh, et al.
Published: (2026)

Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning
by: Xu, Huilin, et al.
Published: (2025)

RELO: Reinforcement Learning to Localize for Visual Object Tracking
by: Chen, Xin, et al.
Published: (2026)

Learning Disentangled Representation in Object-Centric Models for Visual Dynamics Prediction via Transformers
by: Gandhi, Sanket, et al.
Published: (2024)

Object-centric Binding in Contrastive Language-Image Pretraining
by: Assouel, Rim, et al.
Published: (2025)

SlotPi: Physics-informed Object-centric Reasoning Models
by: Li, Jian, et al.
Published: (2025)

Serial Over Parallel: Learning Continual Unification for Multi-Modal Visual Object Tracking and Benchmarking
by: Tang, Zhangyong, et al.
Published: (2025)

Object Isolated Attention for Consistent Story Visualization
by: Luo, Xiangyang, et al.
Published: (2025)

Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection
by: Li, Wenqiao, et al.
Published: (2025)

Successes and Limitations of Object-centric Models at Compositional Generalisation
by: Montero, Milton L., et al.
Published: (2024)

SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields
by: Liu, Yu, et al.
Published: (2024)

Dynamic Attention Mechanism in Spatiotemporal Memory Networks for Object Tracking
by: Zhou, Meng, et al.
Published: (2025)

DORSal: Diffusion for Object-centric Representations of Scenes et al
by: Jabri, Allan, et al.
Published: (2023)

EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
by: Pei, Xiaohuan, et al.
Published: (2024)

Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration
by: Zhu, Younan, et al.
Published: (2025)

Towards End-to-End Neuromorphic Event-based 3D Object Reconstruction Without Physical Priors
by: Xu, Chuanzhi, et al.
Published: (2025)

CFMD: Dynamic Cross-layer Feature Fusion for Salient Object Detection
by: Lian, Jin, et al.
Published: (2025)

Adversarial Error Correction for Visual Autoregressive Generation
by: Bi, Ligong, et al.
Published: (2026)

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
by: Nayak, Shravan, et al.
Published: (2025)

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models
by: Zhang, Jieyu, et al.
Published: (2024)

MITracker: Multi-View Integration for Visual Object Tracking
by: Xu, Mengjie, et al.
Published: (2025)

Visual Grounding for Object-Level Generalization in Reinforcement Learning
by: Jiang, Haobin, et al.
Published: (2024)

OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics
by: Song, Yeon-Ji, et al.
Published: (2024)

Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective
by: Kim, Seunghyeon, et al.
Published: (2025)

LocalMamba: Visual State Space Model with Windowed Selective Scan
by: Huang, Tao, et al.
Published: (2024)

Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning
by: Tu, Yunbin, et al.
Published: (2024)

EvObj: Learning Evolving Object-centric Representations for 3D Instance Segmentation without Scene Supervision
by: Chen, Jiahao, et al.
Published: (2026)

IRNet: Iterative Refinement Network for Noisy Partial Label Learning
by: Lian, Zheng, et al.
Published: (2022)

iPad: Iterative Proposal-centric End-to-End Autonomous Driving
by: Guo, Ke, et al.
Published: (2025)

Learning Dynamic Collaborative Network for Semi-supervised 3D Vessel Segmentation
by: Xu, Jiao, et al.
Published: (2026)

Adaptive Runge-Kutta Dynamics for Spatiotemporal Prediction
by: Zhao, Xuanle, et al.
Published: (2024)

InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction
by: Xu, Sirui, et al.
Published: (2024)

CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving
by: Chen, Shuhang, et al.
Published: (2026)

SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models
by: Zantout, Nader, et al.
Published: (2025)

Cross-View Referring Multi-Object Tracking
by: Chen, Sijia, et al.
Published: (2024)

A Novel Multi-layer Task-centric and Data Quality Framework for Autonomous Driving
by: Zhou, Yuhan, et al.
Published: (2025)

High-fidelity Person-centric Subject-to-Image Synthesis
by: Wang, Yibin, et al.
Published: (2023)

Let's Think with Images Efficiently! An Interleaved-Modal Chain-of-Thought Reasoning Framework with Dynamic and Precise Visual Thoughts
by: Liu, Xu, et al.
Published: (2026)

IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation
by: Jiang, Yankai, et al.
Published: (2026)

MedEyes: Learning Dynamic Visual Focus for Medical Progressive Diagnosis
by: Zhu, Chunzheng, et al.
Published: (2025)