Saved in:
| Main Authors: | Xiang, Jiannan, Liu, Guangyi, Gu, Yi, Gao, Qiyue, Ning, Yuting, Zha, Yuheng, Feng, Zeyu, Tao, Tianhua, Hao, Shibo, Shi, Yemin, Liu, Zhengzhong, Xing, Eric P., Hu, Zhiting |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.09455 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation
by: Zha, Yuheng, et al.
Published: (2025)
by: Zha, Yuheng, et al.
Published: (2025)
World Reasoning Arena
by: PAN Team, et al.
Published: (2026)
by: PAN Team, et al.
Published: (2026)
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
by: Shi, Yemin, et al.
Published: (2025)
by: Shi, Yemin, et al.
Published: (2025)
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
by: PAN Team, et al.
Published: (2025)
by: PAN Team, et al.
Published: (2025)
Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models
by: Yin, Yanbin, et al.
Published: (2025)
by: Yin, Yanbin, et al.
Published: (2025)
Critiques of World Models
by: Xing, Eric, et al.
Published: (2025)
by: Xing, Eric, et al.
Published: (2025)
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
by: Hao, Shibo, et al.
Published: (2023)
by: Hao, Shibo, et al.
Published: (2023)
General Agentic Planning Through Simulative Reasoning with World Models
by: Deng, Mingkai, et al.
Published: (2025)
by: Deng, Mingkai, et al.
Published: (2025)
Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation
by: Gao, Qiyue, et al.
Published: (2025)
by: Gao, Qiyue, et al.
Published: (2025)
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models
by: Hao, Shibo, et al.
Published: (2024)
by: Hao, Shibo, et al.
Published: (2024)
3D CoCa: Contrastive Learners are 3D Captioners
by: Huang, Ting, et al.
Published: (2025)
by: Huang, Ting, et al.
Published: (2025)
MultiHateLoc: Towards Temporal Localisation of Multimodal Hate Content in Online Videos
by: Sun, Qiyue, et al.
Published: (2025)
by: Sun, Qiyue, et al.
Published: (2025)
VSA: Faster Video Diffusion with Trainable Sparse Attention
by: Zhang, Peiyuan, et al.
Published: (2025)
by: Zhang, Peiyuan, et al.
Published: (2025)
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
by: Cheng, Zhoujun, et al.
Published: (2025)
by: Cheng, Zhoujun, et al.
Published: (2025)
WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction
by: Liu, Chengzhi, et al.
Published: (2026)
by: Liu, Chengzhi, et al.
Published: (2026)
SlimPajama-DC: Understanding Data Combinations for LLM Training
by: Shen, Zhiqiang, et al.
Published: (2023)
by: Shen, Zhiqiang, et al.
Published: (2023)
MWM: Mobile World Models for Action-Conditioned Consistent Prediction
by: Yan, Han, et al.
Published: (2026)
by: Yan, Han, et al.
Published: (2026)
LangCoop: Collaborative Driving with Language
by: Gao, Xiangbo, et al.
Published: (2025)
by: Gao, Xiangbo, et al.
Published: (2025)
How Confident are Video Models? Empowering Video Models to Express their Uncertainty
by: Mei, Zhiting, et al.
Published: (2025)
by: Mei, Zhiting, et al.
Published: (2025)
Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models
by: Zhu, Shangwen, et al.
Published: (2026)
by: Zhu, Shangwen, et al.
Published: (2026)
RoboTrustBench: Benchmarking the Trustworthiness of Video World Models for Robotic Manipulation
by: Li, Huiqiong, et al.
Published: (2026)
by: Li, Huiqiong, et al.
Published: (2026)
Towards Self-Refinement of Vision-Language Models with Triangular Consistency
by: Deng, Yunlong, et al.
Published: (2025)
by: Deng, Yunlong, et al.
Published: (2025)
SPA: Towards A Computational Friendly Cloud-Base and On-Devices Collaboration Seq2seq Personalized Generation with Casual Inference
by: Liu, Yanming, et al.
Published: (2024)
by: Liu, Yanming, et al.
Published: (2024)
CocoaBench: Evaluating Unified Digital Agents in the Wild
by: CocoaBench Team, et al.
Published: (2026)
by: CocoaBench Team, et al.
Published: (2026)
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models
by: Singla, Somanshu, et al.
Published: (2024)
by: Singla, Somanshu, et al.
Published: (2024)
Towards Commonsense Knowledge based Fuzzy Systems for Supporting Size-Related Fine-Grained Object Detection
by: Zhang, Pu, et al.
Published: (2023)
by: Zhang, Pu, et al.
Published: (2023)
ANVIL: Accelerator-Native Video Interpolation via Codec Motion Vector Priors
by: Liu, Shibo
Published: (2026)
by: Liu, Shibo
Published: (2026)
Token Level Routing Inference System for Edge Devices
by: She, Jianshu, et al.
Published: (2025)
by: She, Jianshu, et al.
Published: (2025)
Synthesizing Privacy-Preserving Text Data via Finetuning without Finetuning Billion-Scale LLMs
by: Tan, Bowen, et al.
Published: (2025)
by: Tan, Bowen, et al.
Published: (2025)
PISCO: Precise Video Instance Insertion with Sparse Control
by: Gao, Xiangbo, et al.
Published: (2026)
by: Gao, Xiangbo, et al.
Published: (2026)
VisualTrans: A Benchmark for Real-World Visual Transformation Reasoning
by: Ji, Yuheng, et al.
Published: (2025)
by: Ji, Yuheng, et al.
Published: (2025)
HiLight: Technical Report on the Motern AI Video Language Model
by: Wang, Zhiting, et al.
Published: (2024)
by: Wang, Zhiting, et al.
Published: (2024)
Crystal: Illuminating LLM Abilities on Language and Code
by: Tao, Tianhua, et al.
Published: (2024)
by: Tao, Tianhua, et al.
Published: (2024)
Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation
by: Wu, Yuheng, et al.
Published: (2026)
by: Wu, Yuheng, et al.
Published: (2026)
Olaf-World: Orienting Latent Actions for Video World Modeling
by: Jiang, Yuxin, et al.
Published: (2026)
by: Jiang, Yuxin, et al.
Published: (2026)
PanoWorld: Towards Spatial Supersensing in 360$^\circ$ Panorama World
by: Wang, Changpeng, et al.
Published: (2026)
by: Wang, Changpeng, et al.
Published: (2026)
Markovian Pandora's box
by: Yang, Yuanyuan, et al.
Published: (2025)
by: Yang, Yuanyuan, et al.
Published: (2025)
World Models That Know When They Don't Know - Controllable Video Generation with Calibrated Uncertainty
by: Mei, Zhiting, et al.
Published: (2025)
by: Mei, Zhiting, et al.
Published: (2025)
Background Fades, Foreground Leads: Curriculum-Guided Background Pruning for Efficient Foreground-Centric Collaborative Perception
by: Wu, Yuheng, et al.
Published: (2025)
by: Wu, Yuheng, et al.
Published: (2025)
BiasGuard: A Reasoning-enhanced Bias Detection Tool For Large Language Models
by: Fan, Zhiting, et al.
Published: (2025)
by: Fan, Zhiting, et al.
Published: (2025)
Similar Items
-
Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation
by: Zha, Yuheng, et al.
Published: (2025) -
World Reasoning Arena
by: PAN Team, et al.
Published: (2026) -
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
by: Shi, Yemin, et al.
Published: (2025) -
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
by: PAN Team, et al.
Published: (2025) -
Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models
by: Yin, Yanbin, et al.
Published: (2025)