:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Pengyi, Abdullaeva, Irina, Gambashidze, Alexander, Kuznetsov, Andrey, Oseledets, Ivan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2502.03183
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Test-Time Reasoning Through Visual Human Preferences with VLMs and Soft Rewards
by: Gambashidze, Alexander, et al.
Published: (2025)

Listener-Rewarded Thinking in VLMs for Image Preferences
by: Gambashidze, Alexander, et al.
Published: (2025)

Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities
by: Li, Pengyi, et al.
Published: (2026)

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
by: Li, Pengyi, et al.
Published: (2025)

OmniFusion Technical Report
by: Goncharova, Elizaveta, et al.
Published: (2024)

From Frames to Clips: Training-free Adaptive Key Clip Selection for Long-Form Video Understanding
by: Sun, Guangyu, et al.
Published: (2025)

CoMa: Contextual Massing Generation with Vision-Language Models
by: Maslov, Evgenii, et al.
Published: (2026)

MindShift: Analyzing Language Models' Reactions to Psychological Prompts
by: Vasiliuk, Anton, et al.
Published: (2025)

Tensor-Train Point Cloud Compression and Efficient Approximate Nearest-Neighbor Search
by: Novikov, Georgii, et al.
Published: (2024)

Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos
by: Feng, X., et al.
Published: (2026)

Training-Free Action Recognition and Goal Inference with Dynamic Frame Selection
by: Keat, Ee Yeo, et al.
Published: (2024)

Aligning Diffusion Models with Noise-Conditioned Perception
by: Gambashidze, Alexander, et al.
Published: (2024)

KTV: Keyframes and Key Tokens Selection for Efficient Training-Free Video LLMs
by: Song, Baiyang, et al.
Published: (2026)

NoReGeo: Non-Reasoning Geometry Benchmark
by: Abdullaeva, Irina, et al.
Published: (2026)

Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax
by: Butakov, Ivan, et al.
Published: (2024)

ESQA: Event Sequences Question Answering
by: Abdullaeva, Irina, et al.
Published: (2024)

M-LLM Based Video Frame Selection for Efficient Video Understanding
by: Hu, Kai, et al.
Published: (2025)

Event-Anchored Frame Selection for Effective Long-Video Understanding
by: Chen, Wang, et al.
Published: (2026)

Adaptive Greedy Frame Selection for Long Video Understanding
by: Huang, Yuning, et al.
Published: (2026)

Weak-to-Strong 3D Object Detection with X-Ray Distillation
by: Gambashidze, Alexander, et al.
Published: (2024)

Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
by: Jang, Sangwon, et al.
Published: (2025)

KFS-Bench: Comprehensive Evaluation of Key Frame Sampling in Long Video Understanding
by: Li, Zongyao, et al.
Published: (2025)

From Captions to Keyframes: KeyScore for Multimodal Frame Scoring and Video-Language Understanding
by: Lin, Shih-Yao, et al.
Published: (2025)

Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences
by: Korzh, Dmitrii, et al.
Published: (2025)

Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding
by: Tan, Wenhui, et al.
Published: (2026)

DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
by: Bao, Xiaoyi, et al.
Published: (2025)

Spread them Apart: Towards Robust Watermarking of Generated Content
by: Pautov, Mikhail, et al.
Published: (2025)

Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding
by: Chen, Wang, et al.
Published: (2026)

DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion
by: Hwang, Geunmin, et al.
Published: (2025)

Frame by Familiar Frame: Understanding Replication in Video Diffusion Models
by: Rahman, Aimon, et al.
Published: (2024)

FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding
by: Huang, De-An, et al.
Published: (2025)

Geological Field Restoration through the Lens of Image Inpainting
by: Trifonov, Vladislav, et al.
Published: (2025)

OCC-RAG: Optimal Cognitive Core for Faithful Question Answering
by: Savkin, Maksim, et al.
Published: (2026)

A case study of spatiotemporal forecasting techniques for weather forecasting
by: Sofi, Shakir Showkat, et al.
Published: (2022)

Shot-Aware Frame Sampling for Video Understanding
by: Zhao, Mengyu, et al.
Published: (2026)

Latent Inter-Frame Pruning: A Training-Free Method Bridging Traditional Video Compression and Modern Diffusion Transformers for Efficient Generation
by: Menn, Dennis, et al.
Published: (2026)

One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding
by: Zhang, Zheyu, et al.
Published: (2026)

Graph-to-Frame RAG: Visual-Space Knowledge Fusion for Training-Free and Auditable Video Reasoning
by: Yang, Songyuan, et al.
Published: (2026)

Generative Frame Sampler for Long Video Understanding
by: Yao, Linli, et al.
Published: (2025)

Logit-KL Flow Matching: Non-Autoregressive Text Generation via Sampling-Hybrid Inference
by: Sevriugov, Egor, et al.
Published: (2024)