Saved in:
| Main Authors: | Yu, En, Zhao, Liang, Wei, Yana, Yang, Jinrong, Wu, Dongming, Kong, Lingyu, Wei, Haoran, Wang, Tiancai, Ge, Zheng, Zhang, Xiangyu, Tao, Wenbing |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2312.00589 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unhackable Temporal Rewarding for Scalable Video MLLMs
by: Yu, En, et al.
Published: (2025)
by: Yu, En, et al.
Published: (2025)
Small Language Model Meets with Reinforced Vision Vocabulary
by: Wei, Haoran, et al.
Published: (2024)
by: Wei, Haoran, et al.
Published: (2024)
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
by: Yu, En, et al.
Published: (2025)
by: Yu, En, et al.
Published: (2025)
OneChart: Purify the Chart Structural Extraction via One Auxiliary Token
by: Chen, Jinyue, et al.
Published: (2024)
by: Chen, Jinyue, et al.
Published: (2024)
Perception in Reflection
by: Wei, Yana, et al.
Published: (2025)
by: Wei, Yana, et al.
Published: (2025)
DreamLLM: Synergistic Multimodal Comprehension and Creation
by: Dong, Runpei, et al.
Published: (2023)
by: Dong, Runpei, et al.
Published: (2023)
Focus Anywhere for Fine-grained Multi-page Document Understanding
by: Liu, Chenglong, et al.
Published: (2024)
by: Liu, Chenglong, et al.
Published: (2024)
Cross-View Referring Multi-Object Tracking
by: Chen, Sijia, et al.
Published: (2024)
by: Chen, Sijia, et al.
Published: (2024)
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
by: Wei, Haoran, et al.
Published: (2024)
by: Wei, Haoran, et al.
Published: (2024)
ReaMOT: A Benchmark and Framework for Reasoning-based Multi-Object Tracking
by: Chen, Sijia, et al.
Published: (2025)
by: Chen, Sijia, et al.
Published: (2025)
PerPO: Perceptual Preference Optimization via Discriminative Rewarding
by: Zhu, Zining, et al.
Published: (2025)
by: Zhu, Zining, et al.
Published: (2025)
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
by: Chen, Sijia, et al.
Published: (2024)
by: Chen, Sijia, et al.
Published: (2024)
Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion
by: Liu, Enyu, et al.
Published: (2025)
by: Liu, Enyu, et al.
Published: (2025)
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
by: Li, Jinyang, et al.
Published: (2025)
by: Li, Jinyang, et al.
Published: (2025)
Reconstructive Visual Instruction Tuning
by: Wang, Haochen, et al.
Published: (2024)
by: Wang, Haochen, et al.
Published: (2024)
Language Prompt for Autonomous Driving
by: Wu, Dongming, et al.
Published: (2023)
by: Wu, Dongming, et al.
Published: (2023)
The impact of baffle and taper channel tilt angle on the output performance of proton‐exchange membrane fuel cells
by: Tiancai Cheng, et al.
Published: (2024)
by: Tiancai Cheng, et al.
Published: (2024)
Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction
by: Shu, Bao, et al.
Published: (2025)
by: Shu, Bao, et al.
Published: (2025)
Foresight Diffusion: Improving Sampling Consistency in Predictive Diffusion Models
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?
by: Bai, Yifan, et al.
Published: (2024)
by: Bai, Yifan, et al.
Published: (2024)
ORMOT: A Dataset and Framework for Omnidirectional Referring Multi-Object Tracking
by: Chen, Sijia, et al.
Published: (2026)
by: Chen, Sijia, et al.
Published: (2026)
Merlin: Multi-View Representation Learning for Robust Multivariate Time Series Forecasting with Unfixed Missing Rates
by: Yu, Chengqing, et al.
Published: (2025)
by: Yu, Chengqing, et al.
Published: (2025)
DEGround: An Effective Baseline for Ego-centric 3D Visual Grounding with a Homogeneous Framework
by: Zhang, Yani, et al.
Published: (2025)
by: Zhang, Yani, et al.
Published: (2025)
Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs
by: Li, Yunxin, et al.
Published: (2023)
by: Li, Yunxin, et al.
Published: (2023)
Empowering Multimodal LLMs with External Tools: A Comprehensive Survey
by: An, Wenbin, et al.
Published: (2025)
by: An, Wenbin, et al.
Published: (2025)
ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
by: Sun, Zhihao, et al.
Published: (2024)
by: Sun, Zhihao, et al.
Published: (2024)
Research on the Flexural Performance and Degree of Composite Action of Precast Concrete Sandwich Panels With Concrete Ribs
by: Qi Ge, et al.
Published: (2025)
by: Qi Ge, et al.
Published: (2025)
DRMOT: A Dataset and Framework for RGBD Referring Multi-Object Tracking
by: Chen, Sijia, et al.
Published: (2026)
by: Chen, Sijia, et al.
Published: (2026)
Edge-Cloud Collaborative Pothole Detection via Onboard Event Screening and Federated Temporal Segmentation
by: Wu, Yingjie, et al.
Published: (2026)
by: Wu, Yingjie, et al.
Published: (2026)
Can Multimodal LLMs Perform Time Series Anomaly Detection?
by: Xu, Xiongxiao, et al.
Published: (2025)
by: Xu, Xiongxiao, et al.
Published: (2025)
Beyond the Cartesian Illusion: Testing Two-Stage Multi-Modal Theory of Mind under Perceptual Bottlenecks
by: Zhou, Yajing, et al.
Published: (2026)
by: Zhou, Yajing, et al.
Published: (2026)
Generative Visual Foresight Meets Task-Agnostic Pose Estimation in Robotic Table-Top Manipulation
by: Zhang, Chuye, et al.
Published: (2025)
by: Zhang, Chuye, et al.
Published: (2025)
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
by: Wei, Yana, et al.
Published: (2025)
by: Wei, Yana, et al.
Published: (2025)
Multipole expansion of the gravitational field in a general class of fourth-order theories of gravity and the application in gyroscopic precession
by: Wu, Bofeng, et al.
Published: (2023)
by: Wu, Bofeng, et al.
Published: (2023)
TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks
by: Wang, Xiangyu, et al.
Published: (2026)
by: Wang, Xiangyu, et al.
Published: (2026)
Foresight Prediction Enhanced Live-Streaming Recommendation
by: Cao, Jiangxia, et al.
Published: (2025)
by: Cao, Jiangxia, et al.
Published: (2025)
Slow Perception: Let's Perceive Geometric Figures Step-by-step
by: Wei, Haoran, et al.
Published: (2024)
by: Wei, Haoran, et al.
Published: (2024)
MindCine: Multimodal EEG-to-Video Reconstruction with Large-Scale Pretrained Models
by: Zhou, Tian-Yi, et al.
Published: (2026)
by: Zhou, Tian-Yi, et al.
Published: (2026)
X-Ray Polarization Study of Pulsar Wind Nebulae with eXTP: Simulation Results and Scientific Prospects
by: Liu, Kuan, et al.
Published: (2026)
by: Liu, Kuan, et al.
Published: (2026)
Quantum Merlin-Arthur with an internally separable proof
by: Bassirian, Roozbeh, et al.
Published: (2024)
by: Bassirian, Roozbeh, et al.
Published: (2024)
Similar Items
-
Unhackable Temporal Rewarding for Scalable Video MLLMs
by: Yu, En, et al.
Published: (2025) -
Small Language Model Meets with Reinforced Vision Vocabulary
by: Wei, Haoran, et al.
Published: (2024) -
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
by: Yu, En, et al.
Published: (2025) -
OneChart: Purify the Chart Structural Extraction via One Auxiliary Token
by: Chen, Jinyue, et al.
Published: (2024) -
Perception in Reflection
by: Wei, Yana, et al.
Published: (2025)