Saved in:
| Main Authors: | Xu, Huilin, Chen, Tao, Xu, Feng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.10079 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Weakly Supervised Concept Learning for Object-centric Visual Reasoning
by: Tiwari, Sparsh, et al.
Published: (2026)
by: Tiwari, Sparsh, et al.
Published: (2026)
Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning
by: Xu, Huilin, et al.
Published: (2025)
by: Xu, Huilin, et al.
Published: (2025)
RELO: Reinforcement Learning to Localize for Visual Object Tracking
by: Chen, Xin, et al.
Published: (2026)
by: Chen, Xin, et al.
Published: (2026)
Learning Disentangled Representation in Object-Centric Models for Visual Dynamics Prediction via Transformers
by: Gandhi, Sanket, et al.
Published: (2024)
by: Gandhi, Sanket, et al.
Published: (2024)
Object-centric Binding in Contrastive Language-Image Pretraining
by: Assouel, Rim, et al.
Published: (2025)
by: Assouel, Rim, et al.
Published: (2025)
SlotPi: Physics-informed Object-centric Reasoning Models
by: Li, Jian, et al.
Published: (2025)
by: Li, Jian, et al.
Published: (2025)
Serial Over Parallel: Learning Continual Unification for Multi-Modal Visual Object Tracking and Benchmarking
by: Tang, Zhangyong, et al.
Published: (2025)
by: Tang, Zhangyong, et al.
Published: (2025)
Object Isolated Attention for Consistent Story Visualization
by: Luo, Xiangyang, et al.
Published: (2025)
by: Luo, Xiangyang, et al.
Published: (2025)
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection
by: Li, Wenqiao, et al.
Published: (2025)
by: Li, Wenqiao, et al.
Published: (2025)
Successes and Limitations of Object-centric Models at Compositional Generalisation
by: Montero, Milton L., et al.
Published: (2024)
by: Montero, Milton L., et al.
Published: (2024)
SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields
by: Liu, Yu, et al.
Published: (2024)
by: Liu, Yu, et al.
Published: (2024)
Dynamic Attention Mechanism in Spatiotemporal Memory Networks for Object Tracking
by: Zhou, Meng, et al.
Published: (2025)
by: Zhou, Meng, et al.
Published: (2025)
DORSal: Diffusion for Object-centric Representations of Scenes et al
by: Jabri, Allan, et al.
Published: (2023)
by: Jabri, Allan, et al.
Published: (2023)
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
by: Pei, Xiaohuan, et al.
Published: (2024)
by: Pei, Xiaohuan, et al.
Published: (2024)
Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration
by: Zhu, Younan, et al.
Published: (2025)
by: Zhu, Younan, et al.
Published: (2025)
Towards End-to-End Neuromorphic Event-based 3D Object Reconstruction Without Physical Priors
by: Xu, Chuanzhi, et al.
Published: (2025)
by: Xu, Chuanzhi, et al.
Published: (2025)
CFMD: Dynamic Cross-layer Feature Fusion for Salient Object Detection
by: Lian, Jin, et al.
Published: (2025)
by: Lian, Jin, et al.
Published: (2025)
Adversarial Error Correction for Visual Autoregressive Generation
by: Bi, Ligong, et al.
Published: (2026)
by: Bi, Ligong, et al.
Published: (2026)
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
by: Nayak, Shravan, et al.
Published: (2025)
by: Nayak, Shravan, et al.
Published: (2025)
ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models
by: Zhang, Jieyu, et al.
Published: (2024)
by: Zhang, Jieyu, et al.
Published: (2024)
MITracker: Multi-View Integration for Visual Object Tracking
by: Xu, Mengjie, et al.
Published: (2025)
by: Xu, Mengjie, et al.
Published: (2025)
Visual Grounding for Object-Level Generalization in Reinforcement Learning
by: Jiang, Haobin, et al.
Published: (2024)
by: Jiang, Haobin, et al.
Published: (2024)
OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics
by: Song, Yeon-Ji, et al.
Published: (2024)
by: Song, Yeon-Ji, et al.
Published: (2024)
Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective
by: Kim, Seunghyeon, et al.
Published: (2025)
by: Kim, Seunghyeon, et al.
Published: (2025)
LocalMamba: Visual State Space Model with Windowed Selective Scan
by: Huang, Tao, et al.
Published: (2024)
by: Huang, Tao, et al.
Published: (2024)
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning
by: Tu, Yunbin, et al.
Published: (2024)
by: Tu, Yunbin, et al.
Published: (2024)
EvObj: Learning Evolving Object-centric Representations for 3D Instance Segmentation without Scene Supervision
by: Chen, Jiahao, et al.
Published: (2026)
by: Chen, Jiahao, et al.
Published: (2026)
IRNet: Iterative Refinement Network for Noisy Partial Label Learning
by: Lian, Zheng, et al.
Published: (2022)
by: Lian, Zheng, et al.
Published: (2022)
iPad: Iterative Proposal-centric End-to-End Autonomous Driving
by: Guo, Ke, et al.
Published: (2025)
by: Guo, Ke, et al.
Published: (2025)
Learning Dynamic Collaborative Network for Semi-supervised 3D Vessel Segmentation
by: Xu, Jiao, et al.
Published: (2026)
by: Xu, Jiao, et al.
Published: (2026)
Adaptive Runge-Kutta Dynamics for Spatiotemporal Prediction
by: Zhao, Xuanle, et al.
Published: (2024)
by: Zhao, Xuanle, et al.
Published: (2024)
InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction
by: Xu, Sirui, et al.
Published: (2024)
by: Xu, Sirui, et al.
Published: (2024)
CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving
by: Chen, Shuhang, et al.
Published: (2026)
by: Chen, Shuhang, et al.
Published: (2026)
SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models
by: Zantout, Nader, et al.
Published: (2025)
by: Zantout, Nader, et al.
Published: (2025)
Cross-View Referring Multi-Object Tracking
by: Chen, Sijia, et al.
Published: (2024)
by: Chen, Sijia, et al.
Published: (2024)
A Novel Multi-layer Task-centric and Data Quality Framework for Autonomous Driving
by: Zhou, Yuhan, et al.
Published: (2025)
by: Zhou, Yuhan, et al.
Published: (2025)
High-fidelity Person-centric Subject-to-Image Synthesis
by: Wang, Yibin, et al.
Published: (2023)
by: Wang, Yibin, et al.
Published: (2023)
Let's Think with Images Efficiently! An Interleaved-Modal Chain-of-Thought Reasoning Framework with Dynamic and Precise Visual Thoughts
by: Liu, Xu, et al.
Published: (2026)
by: Liu, Xu, et al.
Published: (2026)
IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation
by: Jiang, Yankai, et al.
Published: (2026)
by: Jiang, Yankai, et al.
Published: (2026)
MedEyes: Learning Dynamic Visual Focus for Medical Progressive Diagnosis
by: Zhu, Chunzheng, et al.
Published: (2025)
by: Zhu, Chunzheng, et al.
Published: (2025)
Similar Items
-
Weakly Supervised Concept Learning for Object-centric Visual Reasoning
by: Tiwari, Sparsh, et al.
Published: (2026) -
Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning
by: Xu, Huilin, et al.
Published: (2025) -
RELO: Reinforcement Learning to Localize for Visual Object Tracking
by: Chen, Xin, et al.
Published: (2026) -
Learning Disentangled Representation in Object-Centric Models for Visual Dynamics Prediction via Transformers
by: Gandhi, Sanket, et al.
Published: (2024) -
Object-centric Binding in Contrastive Language-Image Pretraining
by: Assouel, Rim, et al.
Published: (2025)