Saved in:
| Main Authors: | Li, Pengyi, Abdullaeva, Irina, Gambashidze, Alexander, Kuznetsov, Andrey, Oseledets, Ivan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.03183 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Test-Time Reasoning Through Visual Human Preferences with VLMs and Soft Rewards
by: Gambashidze, Alexander, et al.
Published: (2025)
by: Gambashidze, Alexander, et al.
Published: (2025)
Listener-Rewarded Thinking in VLMs for Image Preferences
by: Gambashidze, Alexander, et al.
Published: (2025)
by: Gambashidze, Alexander, et al.
Published: (2025)
Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities
by: Li, Pengyi, et al.
Published: (2026)
by: Li, Pengyi, et al.
Published: (2026)
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
by: Li, Pengyi, et al.
Published: (2025)
by: Li, Pengyi, et al.
Published: (2025)
OmniFusion Technical Report
by: Goncharova, Elizaveta, et al.
Published: (2024)
by: Goncharova, Elizaveta, et al.
Published: (2024)
From Frames to Clips: Training-free Adaptive Key Clip Selection for Long-Form Video Understanding
by: Sun, Guangyu, et al.
Published: (2025)
by: Sun, Guangyu, et al.
Published: (2025)
CoMa: Contextual Massing Generation with Vision-Language Models
by: Maslov, Evgenii, et al.
Published: (2026)
by: Maslov, Evgenii, et al.
Published: (2026)
MindShift: Analyzing Language Models' Reactions to Psychological Prompts
by: Vasiliuk, Anton, et al.
Published: (2025)
by: Vasiliuk, Anton, et al.
Published: (2025)
Tensor-Train Point Cloud Compression and Efficient Approximate Nearest-Neighbor Search
by: Novikov, Georgii, et al.
Published: (2024)
by: Novikov, Georgii, et al.
Published: (2024)
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos
by: Feng, X., et al.
Published: (2026)
by: Feng, X., et al.
Published: (2026)
Training-Free Action Recognition and Goal Inference with Dynamic Frame Selection
by: Keat, Ee Yeo, et al.
Published: (2024)
by: Keat, Ee Yeo, et al.
Published: (2024)
Aligning Diffusion Models with Noise-Conditioned Perception
by: Gambashidze, Alexander, et al.
Published: (2024)
by: Gambashidze, Alexander, et al.
Published: (2024)
KTV: Keyframes and Key Tokens Selection for Efficient Training-Free Video LLMs
by: Song, Baiyang, et al.
Published: (2026)
by: Song, Baiyang, et al.
Published: (2026)
NoReGeo: Non-Reasoning Geometry Benchmark
by: Abdullaeva, Irina, et al.
Published: (2026)
by: Abdullaeva, Irina, et al.
Published: (2026)
Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax
by: Butakov, Ivan, et al.
Published: (2024)
by: Butakov, Ivan, et al.
Published: (2024)
ESQA: Event Sequences Question Answering
by: Abdullaeva, Irina, et al.
Published: (2024)
by: Abdullaeva, Irina, et al.
Published: (2024)
M-LLM Based Video Frame Selection for Efficient Video Understanding
by: Hu, Kai, et al.
Published: (2025)
by: Hu, Kai, et al.
Published: (2025)
Event-Anchored Frame Selection for Effective Long-Video Understanding
by: Chen, Wang, et al.
Published: (2026)
by: Chen, Wang, et al.
Published: (2026)
Adaptive Greedy Frame Selection for Long Video Understanding
by: Huang, Yuning, et al.
Published: (2026)
by: Huang, Yuning, et al.
Published: (2026)
Weak-to-Strong 3D Object Detection with X-Ray Distillation
by: Gambashidze, Alexander, et al.
Published: (2024)
by: Gambashidze, Alexander, et al.
Published: (2024)
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
by: Jang, Sangwon, et al.
Published: (2025)
by: Jang, Sangwon, et al.
Published: (2025)
KFS-Bench: Comprehensive Evaluation of Key Frame Sampling in Long Video Understanding
by: Li, Zongyao, et al.
Published: (2025)
by: Li, Zongyao, et al.
Published: (2025)
From Captions to Keyframes: KeyScore for Multimodal Frame Scoring and Video-Language Understanding
by: Lin, Shih-Yao, et al.
Published: (2025)
by: Lin, Shih-Yao, et al.
Published: (2025)
Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences
by: Korzh, Dmitrii, et al.
Published: (2025)
by: Korzh, Dmitrii, et al.
Published: (2025)
Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding
by: Tan, Wenhui, et al.
Published: (2026)
by: Tan, Wenhui, et al.
Published: (2026)
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
by: Bao, Xiaoyi, et al.
Published: (2025)
by: Bao, Xiaoyi, et al.
Published: (2025)
Spread them Apart: Towards Robust Watermarking of Generated Content
by: Pautov, Mikhail, et al.
Published: (2025)
by: Pautov, Mikhail, et al.
Published: (2025)
Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding
by: Chen, Wang, et al.
Published: (2026)
by: Chen, Wang, et al.
Published: (2026)
DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion
by: Hwang, Geunmin, et al.
Published: (2025)
by: Hwang, Geunmin, et al.
Published: (2025)
Frame by Familiar Frame: Understanding Replication in Video Diffusion Models
by: Rahman, Aimon, et al.
Published: (2024)
by: Rahman, Aimon, et al.
Published: (2024)
FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding
by: Huang, De-An, et al.
Published: (2025)
by: Huang, De-An, et al.
Published: (2025)
Geological Field Restoration through the Lens of Image Inpainting
by: Trifonov, Vladislav, et al.
Published: (2025)
by: Trifonov, Vladislav, et al.
Published: (2025)
OCC-RAG: Optimal Cognitive Core for Faithful Question Answering
by: Savkin, Maksim, et al.
Published: (2026)
by: Savkin, Maksim, et al.
Published: (2026)
A case study of spatiotemporal forecasting techniques for weather forecasting
by: Sofi, Shakir Showkat, et al.
Published: (2022)
by: Sofi, Shakir Showkat, et al.
Published: (2022)
Shot-Aware Frame Sampling for Video Understanding
by: Zhao, Mengyu, et al.
Published: (2026)
by: Zhao, Mengyu, et al.
Published: (2026)
Latent Inter-Frame Pruning: A Training-Free Method Bridging Traditional Video Compression and Modern Diffusion Transformers for Efficient Generation
by: Menn, Dennis, et al.
Published: (2026)
by: Menn, Dennis, et al.
Published: (2026)
One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding
by: Zhang, Zheyu, et al.
Published: (2026)
by: Zhang, Zheyu, et al.
Published: (2026)
Graph-to-Frame RAG: Visual-Space Knowledge Fusion for Training-Free and Auditable Video Reasoning
by: Yang, Songyuan, et al.
Published: (2026)
by: Yang, Songyuan, et al.
Published: (2026)
Generative Frame Sampler for Long Video Understanding
by: Yao, Linli, et al.
Published: (2025)
by: Yao, Linli, et al.
Published: (2025)
Logit-KL Flow Matching: Non-Autoregressive Text Generation via Sampling-Hybrid Inference
by: Sevriugov, Egor, et al.
Published: (2024)
by: Sevriugov, Egor, et al.
Published: (2024)
Similar Items
-
Test-Time Reasoning Through Visual Human Preferences with VLMs and Soft Rewards
by: Gambashidze, Alexander, et al.
Published: (2025) -
Listener-Rewarded Thinking in VLMs for Image Preferences
by: Gambashidze, Alexander, et al.
Published: (2025) -
Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities
by: Li, Pengyi, et al.
Published: (2026) -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
by: Li, Pengyi, et al.
Published: (2025) -
OmniFusion Technical Report
by: Goncharova, Elizaveta, et al.
Published: (2024)