:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xiang, Jiannan, Liu, Guangyi, Gu, Yi, Gao, Qiyue, Ning, Yuting, Zha, Yuheng, Feng, Zeyu, Tao, Tianhua, Hao, Shibo, Shi, Yemin, Liu, Zhengzhong, Xing, Eric P., Hu, Zhiting
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2406.09455
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation
by: Zha, Yuheng, et al.
Published: (2025)

World Reasoning Arena
by: PAN Team, et al.
Published: (2026)

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
by: Shi, Yemin, et al.
Published: (2025)

PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
by: PAN Team, et al.
Published: (2025)

Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models
by: Yin, Yanbin, et al.
Published: (2025)

Critiques of World Models
by: Xing, Eric, et al.
Published: (2025)

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
by: Hao, Shibo, et al.
Published: (2023)

General Agentic Planning Through Simulative Reasoning with World Models
by: Deng, Mingkai, et al.
Published: (2025)

Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation
by: Gao, Qiyue, et al.
Published: (2025)

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models
by: Hao, Shibo, et al.
Published: (2024)

3D CoCa: Contrastive Learners are 3D Captioners
by: Huang, Ting, et al.
Published: (2025)

MultiHateLoc: Towards Temporal Localisation of Multimodal Hate Content in Online Videos
by: Sun, Qiyue, et al.
Published: (2025)

VSA: Faster Video Diffusion with Trainable Sparse Attention
by: Zhang, Peiyuan, et al.
Published: (2025)

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
by: Cheng, Zhoujun, et al.
Published: (2025)

WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction
by: Liu, Chengzhi, et al.
Published: (2026)

SlimPajama-DC: Understanding Data Combinations for LLM Training
by: Shen, Zhiqiang, et al.
Published: (2023)

MWM: Mobile World Models for Action-Conditioned Consistent Prediction
by: Yan, Han, et al.
Published: (2026)

LangCoop: Collaborative Driving with Language
by: Gao, Xiangbo, et al.
Published: (2025)

How Confident are Video Models? Empowering Video Models to Express their Uncertainty
by: Mei, Zhiting, et al.
Published: (2025)

Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models
by: Zhu, Shangwen, et al.
Published: (2026)

RoboTrustBench: Benchmarking the Trustworthiness of Video World Models for Robotic Manipulation
by: Li, Huiqiong, et al.
Published: (2026)

Towards Self-Refinement of Vision-Language Models with Triangular Consistency
by: Deng, Yunlong, et al.
Published: (2025)

SPA: Towards A Computational Friendly Cloud-Base and On-Devices Collaboration Seq2seq Personalized Generation with Casual Inference
by: Liu, Yanming, et al.
Published: (2024)

CocoaBench: Evaluating Unified Digital Agents in the Wild
by: CocoaBench Team, et al.
Published: (2026)

Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models
by: Singla, Somanshu, et al.
Published: (2024)

Towards Commonsense Knowledge based Fuzzy Systems for Supporting Size-Related Fine-Grained Object Detection
by: Zhang, Pu, et al.
Published: (2023)

ANVIL: Accelerator-Native Video Interpolation via Codec Motion Vector Priors
by: Liu, Shibo
Published: (2026)

Token Level Routing Inference System for Edge Devices
by: She, Jianshu, et al.
Published: (2025)

Synthesizing Privacy-Preserving Text Data via Finetuning without Finetuning Billion-Scale LLMs
by: Tan, Bowen, et al.
Published: (2025)

PISCO: Precise Video Instance Insertion with Sparse Control
by: Gao, Xiangbo, et al.
Published: (2026)

VisualTrans: A Benchmark for Real-World Visual Transformation Reasoning
by: Ji, Yuheng, et al.
Published: (2025)

HiLight: Technical Report on the Motern AI Video Language Model
by: Wang, Zhiting, et al.
Published: (2024)

Crystal: Illuminating LLM Abilities on Language and Code
by: Tao, Tianhua, et al.
Published: (2024)

Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation
by: Wu, Yuheng, et al.
Published: (2026)

Olaf-World: Orienting Latent Actions for Video World Modeling
by: Jiang, Yuxin, et al.
Published: (2026)

PanoWorld: Towards Spatial Supersensing in 360$^\circ$ Panorama World
by: Wang, Changpeng, et al.
Published: (2026)

Markovian Pandora's box
by: Yang, Yuanyuan, et al.
Published: (2025)

World Models That Know When They Don't Know - Controllable Video Generation with Calibrated Uncertainty
by: Mei, Zhiting, et al.
Published: (2025)

Background Fades, Foreground Leads: Curriculum-Guided Background Pruning for Efficient Foreground-Centric Collaborative Perception
by: Wu, Yuheng, et al.
Published: (2025)

BiasGuard: A Reasoning-enhanced Bias Detection Tool For Large Language Models
by: Fan, Zhiting, et al.
Published: (2025)