Guardado en:
| Autores principales: | Wang, Ziyue, Jin, Sheng, Zuo, Zhongrong, Wu, Jiawei, Qiu, Han, She, Qi, Zhang, Hao, Jiang, Xudong |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2601.19686 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
EvoVid: Temporal-Centric Self-Evolution for Video Large Language Models
por: Huang, Shiqi, et al.
Publicado: (2026)
por: Huang, Shiqi, et al.
Publicado: (2026)
Frame-Voyager: Learning to Query Frames for Video Large Language Models
por: Yu, Sicheng, et al.
Publicado: (2024)
por: Yu, Sicheng, et al.
Publicado: (2024)
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
por: Lin, Wang, et al.
Publicado: (2025)
por: Lin, Wang, et al.
Publicado: (2025)
Video-in-the-Loop: Span-Grounded Long Video QA with Interleaved Reasoning
por: Wang, Chendong, et al.
Publicado: (2025)
por: Wang, Chendong, et al.
Publicado: (2025)
TokenDial: Continuous Attribute Control in Text-to-Video via Spatiotemporal Token Offsets
por: Liu, Zhixuan, et al.
Publicado: (2026)
por: Liu, Zhixuan, et al.
Publicado: (2026)
VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
por: Wang, Qi, et al.
Publicado: (2025)
por: Wang, Qi, et al.
Publicado: (2025)
Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder
por: Jisheng, Dang, et al.
Publicado: (2025)
por: Jisheng, Dang, et al.
Publicado: (2025)
VILP: Imitation Learning with Latent Video Planning
por: Xu, Zhengtong, et al.
Publicado: (2025)
por: Xu, Zhengtong, et al.
Publicado: (2025)
STAR-Pose: Efficient Low-Resolution Video Human Pose Estimation via Spatial-Temporal Adaptive Super-Resolution
por: Jin, Yucheng, et al.
Publicado: (2025)
por: Jin, Yucheng, et al.
Publicado: (2025)
Generative Neural Video Compression via Video Diffusion Prior
por: Mao, Qi, et al.
Publicado: (2025)
por: Mao, Qi, et al.
Publicado: (2025)
Video-R1: Reinforcing Video Reasoning in MLLMs
por: Feng, Kaituo, et al.
Publicado: (2025)
por: Feng, Kaituo, et al.
Publicado: (2025)
SpaceR: Reinforcing MLLMs in Video Spatial Reasoning
por: Ouyang, Kun, et al.
Publicado: (2025)
por: Ouyang, Kun, et al.
Publicado: (2025)
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
por: Wang, Junke, et al.
Publicado: (2024)
por: Wang, Junke, et al.
Publicado: (2024)
Reinforcing Video Reasoning with Focused Thinking
por: Dang, Jisheng, et al.
Publicado: (2025)
por: Dang, Jisheng, et al.
Publicado: (2025)
LoViC: Efficient Long Video Generation with Context Compression
por: Jiang, Jiaxiu, et al.
Publicado: (2025)
por: Jiang, Jiaxiu, et al.
Publicado: (2025)
Veda: Scalable Video Diffusion via Distilled Sparse Attention
por: Han, Shihao, et al.
Publicado: (2026)
por: Han, Shihao, et al.
Publicado: (2026)
GA2-CLIP: Generic Attribute Anchor for Efficient Prompt Tuningin Video-Language Models
por: Wang, Bin, et al.
Publicado: (2025)
por: Wang, Bin, et al.
Publicado: (2025)
Leveraging Vision-Language Large Models for Interpretable Video Action Recognition with Semantic Tokenization
por: Peng, Jingwei, et al.
Publicado: (2025)
por: Peng, Jingwei, et al.
Publicado: (2025)
Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering
por: Liang, Lili, et al.
Publicado: (2024)
por: Liang, Lili, et al.
Publicado: (2024)
VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning
por: Xu, Zishan, et al.
Publicado: (2025)
por: Xu, Zishan, et al.
Publicado: (2025)
PruneVid: Visual Token Pruning for Efficient Video Large Language Models
por: Huang, Xiaohu, et al.
Publicado: (2024)
por: Huang, Xiaohu, et al.
Publicado: (2024)
AgentCVR: Active Multi-Agent Cross-Video Reasoning via Script-Simulated Reinforcement Learning
por: Qiu, Yilun, et al.
Publicado: (2026)
por: Qiu, Yilun, et al.
Publicado: (2026)
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning
por: Wang, Shijian, et al.
Publicado: (2025)
por: Wang, Shijian, et al.
Publicado: (2025)
VISD: Enhancing Video Reasoning via Structured Self-Distillation
por: Lin, Hao, et al.
Publicado: (2026)
por: Lin, Hao, et al.
Publicado: (2026)
VideoOrion: Tokenizing Object Dynamics in Videos
por: Feng, Yicheng, et al.
Publicado: (2024)
por: Feng, Yicheng, et al.
Publicado: (2024)
VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering
por: Meng, Yiran, et al.
Publicado: (2025)
por: Meng, Yiran, et al.
Publicado: (2025)
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation
por: Huang, Zhe, et al.
Publicado: (2025)
por: Huang, Zhe, et al.
Publicado: (2025)
Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding
por: Xie, Yuan, et al.
Publicado: (2025)
por: Xie, Yuan, et al.
Publicado: (2025)
KTV: Keyframes and Key Tokens Selection for Efficient Training-Free Video LLMs
por: Song, Baiyang, et al.
Publicado: (2026)
por: Song, Baiyang, et al.
Publicado: (2026)
CRISP: Contrastive Residual Injection and Semantic Prompting for Continual Video Instance Segmentation
por: Liu, Baichen, et al.
Publicado: (2025)
por: Liu, Baichen, et al.
Publicado: (2025)
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
por: Wu, Peiran, et al.
Publicado: (2025)
por: Wu, Peiran, et al.
Publicado: (2025)
FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning
por: Ge, Haonan, et al.
Publicado: (2025)
por: Ge, Haonan, et al.
Publicado: (2025)
InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression
por: Ye, Haotian, et al.
Publicado: (2025)
por: Ye, Haotian, et al.
Publicado: (2025)
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval
por: Fang, Han, et al.
Publicado: (2024)
por: Fang, Han, et al.
Publicado: (2024)
ViSS-R1: Self-Supervised Reinforcement Video Reasoning
por: Fang, Bo, et al.
Publicado: (2025)
por: Fang, Bo, et al.
Publicado: (2025)
Token Merging via Spatiotemporal Information Mining for Surgical Video Understanding
por: Jiang, Xixi, et al.
Publicado: (2025)
por: Jiang, Xixi, et al.
Publicado: (2025)
Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos
por: Jiang, Songtao, et al.
Publicado: (2026)
por: Jiang, Songtao, et al.
Publicado: (2026)
CETCAM: Camera-Controllable Video Generation via Consistent and Extensible Tokenization
por: Zhao, Zelin, et al.
Publicado: (2025)
por: Zhao, Zelin, et al.
Publicado: (2025)
Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning
por: Zhang, Haoji, et al.
Publicado: (2025)
por: Zhang, Haoji, et al.
Publicado: (2025)
Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination
por: Tang, Yolo Y., et al.
Publicado: (2025)
por: Tang, Yolo Y., et al.
Publicado: (2025)
Ejemplares similares
-
EvoVid: Temporal-Centric Self-Evolution for Video Large Language Models
por: Huang, Shiqi, et al.
Publicado: (2026) -
Frame-Voyager: Learning to Query Frames for Video Large Language Models
por: Yu, Sicheng, et al.
Publicado: (2024) -
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
por: Lin, Wang, et al.
Publicado: (2025) -
Video-in-the-Loop: Span-Grounded Long Video QA with Interleaved Reasoning
por: Wang, Chendong, et al.
Publicado: (2025) -
TokenDial: Continuous Attribute Control in Text-to-Video via Spatiotemporal Token Offsets
por: Liu, Zhixuan, et al.
Publicado: (2026)