Saved in:
| Main Authors: | Sridhar, Deepak, Bhardwaj, Kartikeya, Jeyaraj, Jeya Pradha, Vasconcelos, Nuno, Nayak, Ankita, Teague, Harris |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.17045 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide
by: Lee, Dohun, et al.
Published: (2024)
by: Lee, Dohun, et al.
Published: (2024)
Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models
by: Sridhar, Deepak, et al.
Published: (2024)
by: Sridhar, Deepak, et al.
Published: (2024)
Can ChatGPT Learn My Life From a Week of First-Person Video?
by: Harris, Keegan
Published: (2025)
by: Harris, Keegan
Published: (2025)
PipeFlow: Pipelined Processing and Motion-Aware Frame Selection for Long-Form Video Editing
by: Munir, Mustafa, et al.
Published: (2025)
by: Munir, Mustafa, et al.
Published: (2025)
V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions
by: Fan, Chenrui, et al.
Published: (2025)
by: Fan, Chenrui, et al.
Published: (2025)
Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty
by: Hahn, Meera, et al.
Published: (2024)
by: Hahn, Meera, et al.
Published: (2024)
Enhancing Long Video Generation Consistency without Tuning
by: Li, Xingyao, et al.
Published: (2024)
by: Li, Xingyao, et al.
Published: (2024)
Transporting Task Vectors across Different Architectures without Training
by: Rinaldi, Filippo, et al.
Published: (2026)
by: Rinaldi, Filippo, et al.
Published: (2026)
What's Holding Back Latent Visual Reasoning?
by: Viveiros, André G., et al.
Published: (2026)
by: Viveiros, André G., et al.
Published: (2026)
Iterative Refinement Improves Compositional Image Generation
by: Jaiswal, Shantanu, et al.
Published: (2026)
by: Jaiswal, Shantanu, et al.
Published: (2026)
ObjectAlign: Neuro-Symbolic Object Consistency Verification and Correction
by: Munir, Mustafa, et al.
Published: (2025)
by: Munir, Mustafa, et al.
Published: (2025)
Low-cost Robust Night-time Aerial Material Segmentation through Hyperspectral Data and Sparse Spatio-Temporal Learning
by: Bajaj, Chandrajit, et al.
Published: (2024)
by: Bajaj, Chandrajit, et al.
Published: (2024)
Video Diffusion Alignment via Reward Gradients
by: Prabhudesai, Mihir, et al.
Published: (2024)
by: Prabhudesai, Mihir, et al.
Published: (2024)
Point-RTD: Replaced Token Denoising for Pretraining Transformer Models on Point Clouds
by: Stone, Gunner, et al.
Published: (2025)
by: Stone, Gunner, et al.
Published: (2025)
Reasoning-Enhanced Object-Centric Learning for Videos
by: Li, Jian, et al.
Published: (2024)
by: Li, Jian, et al.
Published: (2024)
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents
by: Jian, Xiangru, et al.
Published: (2026)
by: Jian, Xiangru, et al.
Published: (2026)
Solving Physics Olympiad via Reinforcement Learning on Physics Simulators
by: Prabhudesai, Mihir, et al.
Published: (2026)
by: Prabhudesai, Mihir, et al.
Published: (2026)
Training Video Foundation Models with NVIDIA NeMo
by: Patel, Zeeshan, et al.
Published: (2025)
by: Patel, Zeeshan, et al.
Published: (2025)
Transferring Visual Explainability of Self-Explaining Models to Prediction-Only Models without Additional Training
by: Yoshikawa, Yuya, et al.
Published: (2025)
by: Yoshikawa, Yuya, et al.
Published: (2025)
A Very Big Video Reasoning Suite
by: Wang, Maijunxian, et al.
Published: (2026)
by: Wang, Maijunxian, et al.
Published: (2026)
Trained Models Tell Us How to Make Them Robust to Spurious Correlation without Group Annotation
by: Ghaznavi, Mahdi, et al.
Published: (2024)
by: Ghaznavi, Mahdi, et al.
Published: (2024)
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment
by: Saravanan, Darshana, et al.
Published: (2024)
by: Saravanan, Darshana, et al.
Published: (2024)
VIDEOP2R: Video Understanding from Perception to Reasoning
by: Jiang, Yifan, et al.
Published: (2025)
by: Jiang, Yifan, et al.
Published: (2025)
Diffusion Adversarial Post-Training for One-Step Video Generation
by: Lin, Shanchuan, et al.
Published: (2025)
by: Lin, Shanchuan, et al.
Published: (2025)
VideoGEM: Training-free Action Grounding in Videos
by: Vogel, Felix, et al.
Published: (2025)
by: Vogel, Felix, et al.
Published: (2025)
Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models
by: Zhang, Gengwei, et al.
Published: (2026)
by: Zhang, Gengwei, et al.
Published: (2026)
FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning
by: Liu, Jason Jingzhou, et al.
Published: (2025)
by: Liu, Jason Jingzhou, et al.
Published: (2025)
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
by: Huang, Xun, et al.
Published: (2025)
by: Huang, Xun, et al.
Published: (2025)
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
by: Lin, Shanchuan, et al.
Published: (2025)
by: Lin, Shanchuan, et al.
Published: (2025)
DISCO: Mitigating Bias in Deep Learning with Conditional Distance Correlation
by: Kavak, Emre, et al.
Published: (2025)
by: Kavak, Emre, et al.
Published: (2025)
SCHEME: Scalable Channel Mixer for Vision Transformers
by: Sridhar, Deepak, et al.
Published: (2023)
by: Sridhar, Deepak, et al.
Published: (2023)
Rethinking Chain-of-Thought Reasoning for Videos
by: Zhong, Yiwu, et al.
Published: (2025)
by: Zhong, Yiwu, et al.
Published: (2025)
ViPRA: Video Prediction for Robot Actions
by: Routray, Sandeep, et al.
Published: (2025)
by: Routray, Sandeep, et al.
Published: (2025)
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
by: Min, Juhong, et al.
Published: (2024)
by: Min, Juhong, et al.
Published: (2024)
EgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms
by: VanVoorst, Brian, et al.
Published: (2026)
by: VanVoorst, Brian, et al.
Published: (2026)
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
by: Yoon, Jaehong, et al.
Published: (2024)
by: Yoon, Jaehong, et al.
Published: (2024)
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
by: Wan, David, et al.
Published: (2024)
by: Wan, David, et al.
Published: (2024)
Diving into Self-Evolving Training for Multimodal Reasoning
by: Liu, Wei, et al.
Published: (2024)
by: Liu, Wei, et al.
Published: (2024)
GreenFactory: Ensembling Zero-Cost Proxies to Estimate Performance of Neural Networks
by: Cortês, Gabriel, et al.
Published: (2025)
by: Cortês, Gabriel, et al.
Published: (2025)
Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal Propagation Analysis for Large Language Models
by: Bhardwaj, Kartikeya, et al.
Published: (2024)
by: Bhardwaj, Kartikeya, et al.
Published: (2024)
Similar Items
-
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide
by: Lee, Dohun, et al.
Published: (2024) -
Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models
by: Sridhar, Deepak, et al.
Published: (2024) -
Can ChatGPT Learn My Life From a Week of First-Person Video?
by: Harris, Keegan
Published: (2025) -
PipeFlow: Pipelined Processing and Motion-Aware Frame Selection for Long-Form Video Editing
by: Munir, Mustafa, et al.
Published: (2025) -
V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions
by: Fan, Chenrui, et al.
Published: (2025)