Saved in:
| Main Authors: | Chen, Ruizhe, Fan, Zhiting, Luo, Tianze, Zou, Heqing, Feng, Zhaopeng, Xie, Guiyang, Zhang, Hansheng, Wang, Zhuochen, Liu, Zuozhu, Zhang, Huaijian |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.18100 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
by: Zou, Heqing, et al.
Published: (2024)
by: Zou, Heqing, et al.
Published: (2024)
HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long Video Understanding
by: Zou, Heqing, et al.
Published: (2025)
by: Zou, Heqing, et al.
Published: (2025)
Video-STR: Reinforcing MLLMs in Video Spatio-Temporal Reasoning with Relation Graph
by: Wang, Wentao, et al.
Published: (2025)
by: Wang, Wentao, et al.
Published: (2025)
BiasGuard: A Reasoning-enhanced Bias Detection Tool For Large Language Models
by: Fan, Zhiting, et al.
Published: (2025)
by: Fan, Zhiting, et al.
Published: (2025)
Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos
by: Jiang, Songtao, et al.
Published: (2026)
by: Jiang, Songtao, et al.
Published: (2026)
Med-R2: An Adversarial Benchmark for Evidence-Grounded Reasoning in Medical VLMs
by: Ma, Wen, et al.
Published: (2026)
by: Ma, Wen, et al.
Published: (2026)
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs
by: Fan, Zhiting, et al.
Published: (2024)
by: Fan, Zhiting, et al.
Published: (2024)
BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs
by: Fan, Zhiting, et al.
Published: (2024)
by: Fan, Zhiting, et al.
Published: (2024)
Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level
by: Feng, Zhaopeng, et al.
Published: (2024)
by: Feng, Zhaopeng, et al.
Published: (2024)
Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning
by: Gu, Xin, et al.
Published: (2025)
by: Gu, Xin, et al.
Published: (2025)
FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models
by: Luo, Hanjun, et al.
Published: (2024)
by: Luo, Hanjun, et al.
Published: (2024)
Context-Guided Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2024)
by: Gu, Xin, et al.
Published: (2024)
Tempo-R0: A Video-MLLM for Temporal Video Grounding through Efficient Temporal Sensing Reinforcement Learning
by: Yue, Feng, et al.
Published: (2025)
by: Yue, Feng, et al.
Published: (2025)
Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning
by: Zhang, Xiaotian, et al.
Published: (2025)
by: Zhang, Xiaotian, et al.
Published: (2025)
Open-o3-Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
by: Meng, Jiahao, et al.
Published: (2025)
by: Meng, Jiahao, et al.
Published: (2025)
CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making
by: Jiang, Songtao, et al.
Published: (2025)
by: Jiang, Songtao, et al.
Published: (2025)
VersusDebias: Universal Zero-Shot Debiasing for Text-to-Image Models via SLM-Based Prompt Engineering and Generative Adversary
by: Luo, Hanjun, et al.
Published: (2024)
by: Luo, Hanjun, et al.
Published: (2024)
VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations
by: Dong, Lu, et al.
Published: (2025)
by: Dong, Lu, et al.
Published: (2025)
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding
by: Luo, Fuwen, et al.
Published: (2025)
by: Luo, Fuwen, et al.
Published: (2025)
Towards Long-Form Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2026)
by: Gu, Xin, et al.
Published: (2026)
ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning
by: Xu, Ziqiang, et al.
Published: (2025)
by: Xu, Ziqiang, et al.
Published: (2025)
MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning
by: Feng, Zhaopeng, et al.
Published: (2025)
by: Feng, Zhaopeng, et al.
Published: (2025)
Towards Temporal Compositional Reasoning in Long-Form Sports Videos
by: Cao, Siyu, et al.
Published: (2026)
by: Cao, Siyu, et al.
Published: (2026)
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
by: Li, Yichen, et al.
Published: (2025)
by: Li, Yichen, et al.
Published: (2025)
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
by: Yao, Jiali, et al.
Published: (2025)
by: Yao, Jiali, et al.
Published: (2025)
EvoGround: Self-Evolving Video Agents for Video Temporal Grounding
by: Jung, Minjoon, et al.
Published: (2026)
by: Jung, Minjoon, et al.
Published: (2026)
Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level Alignment
by: Yu, Chengting, et al.
Published: (2024)
by: Yu, Chengting, et al.
Published: (2024)
Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment
by: Zhang, Xiaotian, et al.
Published: (2025)
by: Zhang, Xiaotian, et al.
Published: (2025)
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2025)
by: Gu, Xin, et al.
Published: (2025)
End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning
by: Zhang, Jinrong, et al.
Published: (2023)
by: Zhang, Jinrong, et al.
Published: (2023)
Med-GLIP: Advancing Medical Language-Image Pre-training with Large-scale Grounded Dataset
by: Deng, Ziye, et al.
Published: (2025)
by: Deng, Ziye, et al.
Published: (2025)
Text-based Talking Video Editing with Cascaded Conditional Diffusion
by: Han, Bo, et al.
Published: (2024)
by: Han, Bo, et al.
Published: (2024)
Temporally Grounding Instructional Diagrams in Unconstrained Videos
by: Zhang, Jiahao, et al.
Published: (2024)
by: Zhang, Jiahao, et al.
Published: (2024)
EC-Guide: A Comprehensive E-Commerce Guide for Instruction Tuning and Quantization
by: Feng, Zhaopeng, et al.
Published: (2024)
by: Feng, Zhaopeng, et al.
Published: (2024)
On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes
by: Modi, Rajat, et al.
Published: (2024)
by: Modi, Rajat, et al.
Published: (2024)
DCR: Divide-and-Conquer Reasoning for Multi-choice Question Answering with LLMs
by: Meng, Zijie, et al.
Published: (2024)
by: Meng, Zijie, et al.
Published: (2024)
Bridging Time and Space: Decoupled Spatio-Temporal Alignment for Video Grounding
by: Tu, Xuezhen, et al.
Published: (2026)
by: Tu, Xuezhen, et al.
Published: (2026)
RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation
by: Zhang, Ruoxuan, et al.
Published: (2025)
by: Zhang, Ruoxuan, et al.
Published: (2025)
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
by: Nguyen, Thong, et al.
Published: (2025)
by: Nguyen, Thong, et al.
Published: (2025)
STVG-R1: Incentivizing Instance-Level Reasoning and Grounding in Videos via Reinforcement Learning
by: Zhang, Xiaowen, et al.
Published: (2026)
by: Zhang, Xiaowen, et al.
Published: (2026)
Similar Items
-
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
by: Zou, Heqing, et al.
Published: (2024) -
HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long Video Understanding
by: Zou, Heqing, et al.
Published: (2025) -
Video-STR: Reinforcing MLLMs in Video Spatio-Temporal Reasoning with Relation Graph
by: Wang, Wentao, et al.
Published: (2025) -
BiasGuard: A Reasoning-enhanced Bias Detection Tool For Large Language Models
by: Fan, Zhiting, et al.
Published: (2025) -
Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos
by: Jiang, Songtao, et al.
Published: (2026)