Saved in:
| Main Authors: | Huang, Chao, Zhang, Zeliang, Liu, Jiang, Sun, Ximeng, Wu, Jialian, Yu, Xiaodong, Wang, Ze, Xu, Chenliang, Barsoum, Emad, Liu, Zicheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.15050 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)
by: Li, Bangzheng, et al.
Published: (2025)
ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
by: Guo, Yuxiang, et al.
Published: (2025)
by: Guo, Yuxiang, et al.
Published: (2025)
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
by: Wang, Xingrui, et al.
Published: (2025)
by: Wang, Xingrui, et al.
Published: (2025)
VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking
by: Lin, Jingyang, et al.
Published: (2026)
by: Lin, Jingyang, et al.
Published: (2026)
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games
by: Mishra, Prakamya, et al.
Published: (2025)
by: Mishra, Prakamya, et al.
Published: (2025)
Learning from Online Videos at Inference Time for Computer-Use Agents
by: Liu, Yujian, et al.
Published: (2025)
by: Liu, Yujian, et al.
Published: (2025)
Agent Laboratory: Using LLM Agents as Research Assistants
by: Schmidgall, Samuel, et al.
Published: (2025)
by: Schmidgall, Samuel, et al.
Published: (2025)
Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation
by: Wang, Ze, et al.
Published: (2025)
by: Wang, Ze, et al.
Published: (2025)
Self-Taught Agentic Long Context Understanding
by: Zhuang, Yufan, et al.
Published: (2025)
by: Zhuang, Yufan, et al.
Published: (2025)
KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation
by: Wang, Xingrui, et al.
Published: (2025)
by: Wang, Xingrui, et al.
Published: (2025)
CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models
by: Liang, Yihao, et al.
Published: (2026)
by: Liang, Yihao, et al.
Published: (2026)
MOVi: Training-free Text-conditioned Multi-Object Video Generation
by: Rahman, Aimon, et al.
Published: (2025)
by: Rahman, Aimon, et al.
Published: (2025)
Stabilizing Efficient Reasoning with Step-Level Advantage Selection
by: Wang, Han, et al.
Published: (2026)
by: Wang, Han, et al.
Published: (2026)
Unleashing Hour-Scale Video Training for Long Video-Language Understanding
by: Lin, Jingyang, et al.
Published: (2025)
by: Lin, Jingyang, et al.
Published: (2025)
Reliable Use of Lemmas via Eligibility Reasoning and Section$-$Aware Reinforcement Learning
by: Xu, Zhikun, et al.
Published: (2026)
by: Xu, Zhikun, et al.
Published: (2026)
Instella: Fully Open Language Models with Stellar Performance
by: Liu, Jiang, et al.
Published: (2025)
by: Liu, Jiang, et al.
Published: (2025)
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
by: Chen, Hao, et al.
Published: (2024)
by: Chen, Hao, et al.
Published: (2024)
Training Large Reasoning Models Efficiently via Progressive Thought Encoding
by: Zhang, Zeliang, et al.
Published: (2026)
by: Zhang, Zeliang, et al.
Published: (2026)
CaptionQA: Is Your Caption as Useful as the Image Itself?
by: Yang, Shijia, et al.
Published: (2025)
by: Yang, Shijia, et al.
Published: (2025)
TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents
by: Zhu, Kaijie, et al.
Published: (2026)
by: Zhu, Kaijie, et al.
Published: (2026)
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
by: Zhou, Yuzhen, et al.
Published: (2025)
by: Zhou, Yuzhen, et al.
Published: (2025)
AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection
by: Ray, Pretam, et al.
Published: (2026)
by: Ray, Pretam, et al.
Published: (2026)
TaDA: Training-free recipe for Decoding with Adaptive KV Cache Compression and Mean-centering
by: Joshi, Vinay, et al.
Published: (2025)
by: Joshi, Vinay, et al.
Published: (2025)
Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability
by: Liu, Jiani, et al.
Published: (2025)
by: Liu, Jiani, et al.
Published: (2025)
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
by: Zhang, Zeliang, et al.
Published: (2024)
by: Zhang, Zeliang, et al.
Published: (2024)
Learning to Transform Dynamically for Better Adversarial Transferability
by: Zhu, Rongyi, et al.
Published: (2024)
by: Zhu, Rongyi, et al.
Published: (2024)
DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference
by: Singh, Aditya Kumar, et al.
Published: (2026)
by: Singh, Aditya Kumar, et al.
Published: (2026)
PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning
by: Dong, Daize, et al.
Published: (2026)
by: Dong, Daize, et al.
Published: (2026)
Pause and Think: A Dataset and Benchmark for Video-Grounded Assistive Action Suggestion
by: Singh, Shivam, et al.
Published: (2026)
by: Singh, Shivam, et al.
Published: (2026)
Why Instruction-Based Unlearning Fails in Diffusion Models?
by: Zhang, Zeliang, et al.
Published: (2026)
by: Zhang, Zeliang, et al.
Published: (2026)
OPENXRD: A Comprehensive Benchmark Framework for LLM/MLLM XRD Question Answering
by: Vosoughi, Ali, et al.
Published: (2025)
by: Vosoughi, Ali, et al.
Published: (2025)
SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers
by: Manem, Chaitanya, et al.
Published: (2025)
by: Manem, Chaitanya, et al.
Published: (2025)
Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs
by: Cui, Qinpeng, et al.
Published: (2024)
by: Cui, Qinpeng, et al.
Published: (2024)
Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP
by: Zhang, Zeliang, et al.
Published: (2024)
by: Zhang, Zeliang, et al.
Published: (2024)
Geo-Expert: Towards Expert-Level Geological Reasoning via Parameter-Efficient Fine-Tuning
by: Guo, Chenyou, et al.
Published: (2026)
by: Guo, Chenyou, et al.
Published: (2026)
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See
by: Zhang, Zeliang, et al.
Published: (2024)
by: Zhang, Zeliang, et al.
Published: (2024)
Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training
by: Zhang, Zeliang, et al.
Published: (2024)
by: Zhang, Zeliang, et al.
Published: (2024)
Forward Learning with Differential Privacy
by: Feng, Mingqian, et al.
Published: (2025)
by: Feng, Mingqian, et al.
Published: (2025)
Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?
by: Feng, Mingqian, et al.
Published: (2024)
by: Feng, Mingqian, et al.
Published: (2024)
Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives
by: Zhang, Zeliang, et al.
Published: (2025)
by: Zhang, Zeliang, et al.
Published: (2025)
Similar Items
-
Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025) -
ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
by: Guo, Yuxiang, et al.
Published: (2025) -
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
by: Wang, Xingrui, et al.
Published: (2025) -
VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking
by: Lin, Jingyang, et al.
Published: (2026) -
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games
by: Mishra, Prakamya, et al.
Published: (2025)