Saved in:
| Main Authors: | Liang, Yihao, Wang, Ze, Chen, Hao, Sun, Ximeng, Wu, Jialian, Yu, Xiaodong, Liu, Jiang, Barsoum, Emad, Liu, Zicheng, Jha, Niraj K. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.02236 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
by: Wang, Xingrui, et al.
Published: (2025)
by: Wang, Xingrui, et al.
Published: (2025)
KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation
by: Wang, Xingrui, et al.
Published: (2025)
by: Wang, Xingrui, et al.
Published: (2025)
VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking
by: Lin, Jingyang, et al.
Published: (2026)
by: Lin, Jingyang, et al.
Published: (2026)
Learning from Online Videos at Inference Time for Computer-Use Agents
by: Liu, Yujian, et al.
Published: (2025)
by: Liu, Yujian, et al.
Published: (2025)
Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)
by: Li, Bangzheng, et al.
Published: (2025)
Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation
by: Wang, Ze, et al.
Published: (2025)
by: Wang, Ze, et al.
Published: (2025)
ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
by: Guo, Yuxiang, et al.
Published: (2025)
by: Guo, Yuxiang, et al.
Published: (2025)
Unleashing Hour-Scale Video Training for Long Video-Language Understanding
by: Lin, Jingyang, et al.
Published: (2025)
by: Lin, Jingyang, et al.
Published: (2025)
Agent Laboratory: Using LLM Agents as Research Assistants
by: Schmidgall, Samuel, et al.
Published: (2025)
by: Schmidgall, Samuel, et al.
Published: (2025)
Self-Taught Agentic Long Context Understanding
by: Zhuang, Yufan, et al.
Published: (2025)
by: Zhuang, Yufan, et al.
Published: (2025)
DRIFT: Transferring Reasoning Priors for Efficient MLLM Fine-Tuning
by: Huang, Chao, et al.
Published: (2025)
by: Huang, Chao, et al.
Published: (2025)
MOVi: Training-free Text-conditioned Multi-Object Video Generation
by: Rahman, Aimon, et al.
Published: (2025)
by: Rahman, Aimon, et al.
Published: (2025)
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games
by: Mishra, Prakamya, et al.
Published: (2025)
by: Mishra, Prakamya, et al.
Published: (2025)
Instella: Fully Open Language Models with Stellar Performance
by: Liu, Jiang, et al.
Published: (2025)
by: Liu, Jiang, et al.
Published: (2025)
DART: aDaptive Accept RejecT for non-linear top-K subset identification
by: Agarwal, Mridul, et al.
Published: (2020)
by: Agarwal, Mridul, et al.
Published: (2020)
HEED: Density-Weighted Residual Alignment for Hybrid Vision-Language Model Distillation
by: Liang, Yihao, et al.
Published: (2026)
by: Liang, Yihao, et al.
Published: (2026)
Reliable Use of Lemmas via Eligibility Reasoning and Section$-$Aware Reinforcement Learning
by: Xu, Zhikun, et al.
Published: (2026)
by: Xu, Zhikun, et al.
Published: (2026)
Stabilizing Efficient Reasoning with Step-Level Advantage Selection
by: Wang, Han, et al.
Published: (2026)
by: Wang, Han, et al.
Published: (2026)
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
by: Chen, Hao, et al.
Published: (2024)
by: Chen, Hao, et al.
Published: (2024)
TaDA: Training-free recipe for Decoding with Adaptive KV Cache Compression and Mean-centering
by: Joshi, Vinay, et al.
Published: (2025)
by: Joshi, Vinay, et al.
Published: (2025)
CaptionQA: Is Your Caption as Useful as the Image Itself?
by: Yang, Shijia, et al.
Published: (2025)
by: Yang, Shijia, et al.
Published: (2025)
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
by: Zhou, Yuzhen, et al.
Published: (2025)
by: Zhou, Yuzhen, et al.
Published: (2025)
TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents
by: Zhu, Kaijie, et al.
Published: (2026)
by: Zhu, Kaijie, et al.
Published: (2026)
AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection
by: Ray, Pretam, et al.
Published: (2026)
by: Ray, Pretam, et al.
Published: (2026)
Pause and Think: A Dataset and Benchmark for Video-Grounded Assistive Action Suggestion
by: Singh, Shivam, et al.
Published: (2026)
by: Singh, Shivam, et al.
Published: (2026)
SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers
by: Manem, Chaitanya, et al.
Published: (2025)
by: Manem, Chaitanya, et al.
Published: (2025)
PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding
by: An, Zihao, et al.
Published: (2026)
by: An, Zihao, et al.
Published: (2026)
PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning
by: Dong, Daize, et al.
Published: (2026)
by: Dong, Daize, et al.
Published: (2026)
SpecVLM: Fast Speculative Decoding in Vision-Language Models
by: Huang, Haiduo, et al.
Published: (2025)
by: Huang, Haiduo, et al.
Published: (2025)
DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference
by: Singh, Aditya Kumar, et al.
Published: (2026)
by: Singh, Aditya Kumar, et al.
Published: (2026)
Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs
by: Cui, Qinpeng, et al.
Published: (2024)
by: Cui, Qinpeng, et al.
Published: (2024)
An Alternative Trajectory for Generative AI
by: Belova, Margarita, et al.
Published: (2026)
by: Belova, Margarita, et al.
Published: (2026)
DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity
by: Zhu, Haowei, et al.
Published: (2026)
by: Zhu, Haowei, et al.
Published: (2026)
Uncertainty-Aware Transformers: Conformal Prediction for Language Models
by: Vellore, Abhiram, et al.
Published: (2026)
by: Vellore, Abhiram, et al.
Published: (2026)
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
by: Miao, Zichen, et al.
Published: (2024)
by: Miao, Zichen, et al.
Published: (2024)
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
by: An, Zihao, et al.
Published: (2025)
by: An, Zihao, et al.
Published: (2025)
COMFORT: A Continual Fine-Tuning Framework for Foundation Models Targeted at Consumer Healthcare
by: Li, Chia-Hao, et al.
Published: (2024)
by: Li, Chia-Hao, et al.
Published: (2024)
PAGE: Domain-Incremental Adaptation with Past-Agnostic Generative Replay for Smart Healthcare
by: Li, Chia-Hao, et al.
Published: (2024)
by: Li, Chia-Hao, et al.
Published: (2024)
DOCTOR: A Multi-Disease Detection Continual Learning Framework Based on Wearable Medical Sensors
by: Li, Chia-Hao, et al.
Published: (2023)
by: Li, Chia-Hao, et al.
Published: (2023)
E-MMDiT: Revisiting Multimodal Diffusion Transformer Design for Fast Image Synthesis under Limited Resources
by: Shen, Tong, et al.
Published: (2025)
by: Shen, Tong, et al.
Published: (2025)
Similar Items
-
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
by: Wang, Xingrui, et al.
Published: (2025) -
KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation
by: Wang, Xingrui, et al.
Published: (2025) -
VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking
by: Lin, Jingyang, et al.
Published: (2026) -
Learning from Online Videos at Inference Time for Computer-Use Agents
by: Liu, Yujian, et al.
Published: (2025) -
Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)