:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Chao, Zhang, Zeliang, Liu, Jiang, Sun, Ximeng, Wu, Jialian, Yu, Xiaodong, Wang, Ze, Xu, Chenliang, Barsoum, Emad, Liu, Zicheng
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2510.15050
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)

ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
by: Guo, Yuxiang, et al.
Published: (2025)

XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
by: Wang, Xingrui, et al.
Published: (2025)

VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking
by: Lin, Jingyang, et al.
Published: (2026)

TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games
by: Mishra, Prakamya, et al.
Published: (2025)

Learning from Online Videos at Inference Time for Computer-Use Agents
by: Liu, Yujian, et al.
Published: (2025)

Agent Laboratory: Using LLM Agents as Research Assistants
by: Schmidgall, Samuel, et al.
Published: (2025)

Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation
by: Wang, Ze, et al.
Published: (2025)

Self-Taught Agentic Long Context Understanding
by: Zhuang, Yufan, et al.
Published: (2025)

KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation
by: Wang, Xingrui, et al.
Published: (2025)

CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models
by: Liang, Yihao, et al.
Published: (2026)

MOVi: Training-free Text-conditioned Multi-Object Video Generation
by: Rahman, Aimon, et al.
Published: (2025)

Stabilizing Efficient Reasoning with Step-Level Advantage Selection
by: Wang, Han, et al.
Published: (2026)

Unleashing Hour-Scale Video Training for Long Video-Language Understanding
by: Lin, Jingyang, et al.
Published: (2025)

Reliable Use of Lemmas via Eligibility Reasoning and Section$-$Aware Reinforcement Learning
by: Xu, Zhikun, et al.
Published: (2026)

Instella: Fully Open Language Models with Stellar Performance
by: Liu, Jiang, et al.
Published: (2025)

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
by: Chen, Hao, et al.
Published: (2024)

Training Large Reasoning Models Efficiently via Progressive Thought Encoding
by: Zhang, Zeliang, et al.
Published: (2026)

CaptionQA: Is Your Caption as Useful as the Image Itself?
by: Yang, Shijia, et al.
Published: (2025)

TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents
by: Zhu, Kaijie, et al.
Published: (2026)

APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
by: Zhou, Yuzhen, et al.
Published: (2025)

AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection
by: Ray, Pretam, et al.
Published: (2026)

TaDA: Training-free recipe for Decoding with Adaptive KV Cache Compression and Mean-centering
by: Joshi, Vinay, et al.
Published: (2025)

Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability
by: Liu, Jiani, et al.
Published: (2025)

Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
by: Zhang, Zeliang, et al.
Published: (2024)

Learning to Transform Dynamically for Better Adversarial Transferability
by: Zhu, Rongyi, et al.
Published: (2024)

DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference
by: Singh, Aditya Kumar, et al.
Published: (2026)

PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning
by: Dong, Daize, et al.
Published: (2026)

Pause and Think: A Dataset and Benchmark for Video-Grounded Assistive Action Suggestion
by: Singh, Shivam, et al.
Published: (2026)

Why Instruction-Based Unlearning Fails in Diffusion Models?
by: Zhang, Zeliang, et al.
Published: (2026)

OPENXRD: A Comprehensive Benchmark Framework for LLM/MLLM XRD Question Answering
by: Vosoughi, Ali, et al.
Published: (2025)

SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers
by: Manem, Chaitanya, et al.
Published: (2025)

Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs
by: Cui, Qinpeng, et al.
Published: (2024)

Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP
by: Zhang, Zeliang, et al.
Published: (2024)

Geo-Expert: Towards Expert-Level Geological Reasoning via Parameter-Efficient Fine-Tuning
by: Guo, Chenyou, et al.
Published: (2026)

Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See
by: Zhang, Zeliang, et al.
Published: (2024)

Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training
by: Zhang, Zeliang, et al.
Published: (2024)

Forward Learning with Differential Privacy
by: Feng, Mingqian, et al.
Published: (2025)

Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?
by: Feng, Mingqian, et al.
Published: (2024)

Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives
by: Zhang, Zeliang, et al.
Published: (2025)