:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sridhar, Deepak, Bhardwaj, Kartikeya, Jeyaraj, Jeya Pradha, Vasconcelos, Nuno, Nayak, Ankita, Teague, Harris
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2510.17045
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide
by: Lee, Dohun, et al.
Published: (2024)

Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models
by: Sridhar, Deepak, et al.
Published: (2024)

Can ChatGPT Learn My Life From a Week of First-Person Video?
by: Harris, Keegan
Published: (2025)

PipeFlow: Pipelined Processing and Motion-Aware Frame Selection for Long-Form Video Editing
by: Munir, Mustafa, et al.
Published: (2025)

V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions
by: Fan, Chenrui, et al.
Published: (2025)

Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty
by: Hahn, Meera, et al.
Published: (2024)

Enhancing Long Video Generation Consistency without Tuning
by: Li, Xingyao, et al.
Published: (2024)

Transporting Task Vectors across Different Architectures without Training
by: Rinaldi, Filippo, et al.
Published: (2026)

What's Holding Back Latent Visual Reasoning?
by: Viveiros, André G., et al.
Published: (2026)

Iterative Refinement Improves Compositional Image Generation
by: Jaiswal, Shantanu, et al.
Published: (2026)

ObjectAlign: Neuro-Symbolic Object Consistency Verification and Correction
by: Munir, Mustafa, et al.
Published: (2025)

Low-cost Robust Night-time Aerial Material Segmentation through Hyperspectral Data and Sparse Spatio-Temporal Learning
by: Bajaj, Chandrajit, et al.
Published: (2024)

Video Diffusion Alignment via Reward Gradients
by: Prabhudesai, Mihir, et al.
Published: (2024)

Point-RTD: Replaced Token Denoising for Pretraining Transformer Models on Point Clouds
by: Stone, Gunner, et al.
Published: (2025)

Reasoning-Enhanced Object-Centric Learning for Videos
by: Li, Jian, et al.
Published: (2024)

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents
by: Jian, Xiangru, et al.
Published: (2026)

Solving Physics Olympiad via Reinforcement Learning on Physics Simulators
by: Prabhudesai, Mihir, et al.
Published: (2026)

Training Video Foundation Models with NVIDIA NeMo
by: Patel, Zeeshan, et al.
Published: (2025)

Transferring Visual Explainability of Self-Explaining Models to Prediction-Only Models without Additional Training
by: Yoshikawa, Yuya, et al.
Published: (2025)

A Very Big Video Reasoning Suite
by: Wang, Maijunxian, et al.
Published: (2026)

Trained Models Tell Us How to Make Them Robust to Spurious Correlation without Group Annotation
by: Ghaznavi, Mahdi, et al.
Published: (2024)

VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment
by: Saravanan, Darshana, et al.
Published: (2024)

VIDEOP2R: Video Understanding from Perception to Reasoning
by: Jiang, Yifan, et al.
Published: (2025)

Diffusion Adversarial Post-Training for One-Step Video Generation
by: Lin, Shanchuan, et al.
Published: (2025)

VideoGEM: Training-free Action Grounding in Videos
by: Vogel, Felix, et al.
Published: (2025)

Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models
by: Zhang, Gengwei, et al.
Published: (2026)

FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning
by: Liu, Jason Jingzhou, et al.
Published: (2025)

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
by: Huang, Xun, et al.
Published: (2025)

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
by: Lin, Shanchuan, et al.
Published: (2025)

DISCO: Mitigating Bias in Deep Learning with Conditional Distance Correlation
by: Kavak, Emre, et al.
Published: (2025)

SCHEME: Scalable Channel Mixer for Vision Transformers
by: Sridhar, Deepak, et al.
Published: (2023)

Rethinking Chain-of-Thought Reasoning for Videos
by: Zhong, Yiwu, et al.
Published: (2025)

ViPRA: Video Prediction for Robot Actions
by: Routray, Sandeep, et al.
Published: (2025)

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
by: Min, Juhong, et al.
Published: (2024)

EgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms
by: VanVoorst, Brian, et al.
Published: (2026)

SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
by: Yoon, Jaehong, et al.
Published: (2024)

Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
by: Wan, David, et al.
Published: (2024)

Diving into Self-Evolving Training for Multimodal Reasoning
by: Liu, Wei, et al.
Published: (2024)

GreenFactory: Ensembling Zero-Cost Proxies to Estimate Performance of Neural Networks
by: Cortês, Gabriel, et al.
Published: (2025)

Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal Propagation Analysis for Large Language Models
by: Bhardwaj, Kartikeya, et al.
Published: (2024)