:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yu, En, Zhao, Liang, Wei, Yana, Yang, Jinrong, Wu, Dongming, Kong, Lingyu, Wei, Haoran, Wang, Tiancai, Ge, Zheng, Zhang, Xiangyu, Tao, Wenbing
Format:	Preprint
Published:	2023
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2312.00589
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Unhackable Temporal Rewarding for Scalable Video MLLMs
by: Yu, En, et al.
Published: (2025)

Small Language Model Meets with Reinforced Vision Vocabulary
by: Wei, Haoran, et al.
Published: (2024)

Perception-R1: Pioneering Perception Policy with Reinforcement Learning
by: Yu, En, et al.
Published: (2025)

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token
by: Chen, Jinyue, et al.
Published: (2024)

Perception in Reflection
by: Wei, Yana, et al.
Published: (2025)

DreamLLM: Synergistic Multimodal Comprehension and Creation
by: Dong, Runpei, et al.
Published: (2023)

Focus Anywhere for Fine-grained Multi-page Document Understanding
by: Liu, Chenglong, et al.
Published: (2024)

Cross-View Referring Multi-Object Tracking
by: Chen, Sijia, et al.
Published: (2024)

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
by: Wei, Haoran, et al.
Published: (2024)

ReaMOT: A Benchmark and Framework for Reasoning-based Multi-Object Tracking
by: Chen, Sijia, et al.
Published: (2025)

PerPO: Perceptual Preference Optimization via Discriminative Rewarding
by: Zhu, Zining, et al.
Published: (2025)

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
by: Chen, Sijia, et al.
Published: (2024)

Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion
by: Liu, Enyu, et al.
Published: (2025)

OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
by: Li, Jinyang, et al.
Published: (2025)

Reconstructive Visual Instruction Tuning
by: Wang, Haochen, et al.
Published: (2024)

Language Prompt for Autonomous Driving
by: Wu, Dongming, et al.
Published: (2023)

The impact of baffle and taper channel tilt angle on the output performance of proton‐exchange membrane fuel cells
by: Tiancai Cheng, et al.
Published: (2024)

Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction
by: Shu, Bao, et al.
Published: (2025)

Foresight Diffusion: Improving Sampling Consistency in Predictive Diffusion Models
by: Zhang, Yu, et al.
Published: (2025)

Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?
by: Bai, Yifan, et al.
Published: (2024)

ORMOT: A Dataset and Framework for Omnidirectional Referring Multi-Object Tracking
by: Chen, Sijia, et al.
Published: (2026)

Merlin: Multi-View Representation Learning for Robust Multivariate Time Series Forecasting with Unfixed Missing Rates
by: Yu, Chengqing, et al.
Published: (2025)

DEGround: An Effective Baseline for Ego-centric 3D Visual Grounding with a Homogeneous Framework
by: Zhang, Yani, et al.
Published: (2025)

Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs
by: Li, Yunxin, et al.
Published: (2023)

Empowering Multimodal LLMs with External Tools: A Comprehensive Survey
by: An, Wenbin, et al.
Published: (2025)

ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
by: Sun, Zhihao, et al.
Published: (2024)

Research on the Flexural Performance and Degree of Composite Action of Precast Concrete Sandwich Panels With Concrete Ribs
by: Qi Ge, et al.
Published: (2025)

DRMOT: A Dataset and Framework for RGBD Referring Multi-Object Tracking
by: Chen, Sijia, et al.
Published: (2026)

Edge-Cloud Collaborative Pothole Detection via Onboard Event Screening and Federated Temporal Segmentation
by: Wu, Yingjie, et al.
Published: (2026)

Can Multimodal LLMs Perform Time Series Anomaly Detection?
by: Xu, Xiongxiao, et al.
Published: (2025)

Beyond the Cartesian Illusion: Testing Two-Stage Multi-Modal Theory of Mind under Perceptual Bottlenecks
by: Zhou, Yajing, et al.
Published: (2026)

Generative Visual Foresight Meets Task-Agnostic Pose Estimation in Robotic Table-Top Manipulation
by: Zhang, Chuye, et al.
Published: (2025)

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
by: Wei, Yana, et al.
Published: (2025)

Multipole expansion of the gravitational field in a general class of fourth-order theories of gravity and the application in gyroscopic precession
by: Wu, Bofeng, et al.
Published: (2023)

TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks
by: Wang, Xiangyu, et al.
Published: (2026)

Foresight Prediction Enhanced Live-Streaming Recommendation
by: Cao, Jiangxia, et al.
Published: (2025)

Slow Perception: Let's Perceive Geometric Figures Step-by-step
by: Wei, Haoran, et al.
Published: (2024)

MindCine: Multimodal EEG-to-Video Reconstruction with Large-Scale Pretrained Models
by: Zhou, Tian-Yi, et al.
Published: (2026)

X-Ray Polarization Study of Pulsar Wind Nebulae with eXTP: Simulation Results and Scientific Prospects
by: Liu, Kuan, et al.
Published: (2026)

Quantum Merlin-Arthur with an internally separable proof
by: Bassirian, Roozbeh, et al.
Published: (2024)