:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Hu, Kai, Gao, Feng, Nie, Xiaohan, Zhou, Peng, Tran, Son, Neiman, Tal, Wang, Lingyun, Shah, Mubarak, Hamid, Raffay, Yin, Bing, Chilimbi, Trishul
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computer Vision and Pattern Recognition Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2502.19680
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

CoLLM: A Large Language Model for Composed Image Retrieval
di: Huynh, Chuong, et al.
Pubblicazione: (2025)

DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models
di: Ram, Shwetha, et al.
Pubblicazione: (2024)

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
di: Swetha, Sirnam, et al.
Pubblicazione: (2024)

Open Vocabulary Multi-Label Video Classification
di: Gupta, Rohit, et al.
Pubblicazione: (2024)

VidLA: Video-Language Alignment at Scale
di: Rizve, Mamshad Nayeem, et al.
Pubblicazione: (2024)

Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs
di: Luo, Jinqi, et al.
Pubblicazione: (2026)

ViLL-E: Video LLM Embeddings for Retrieval
di: Gupta, Rohit, et al.
Pubblicazione: (2026)

From Frames to Clips: Training-free Adaptive Key Clip Selection for Long-Form Video Understanding
di: Sun, Guangyu, et al.
Pubblicazione: (2025)

CompLLM: Compression for Long Context Q&A
di: Berton, Gabriele, et al.
Pubblicazione: (2025)

Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
di: Li, Kunyang, et al.
Pubblicazione: (2026)

Evolutionary Contrastive Distillation for Language Model Alignment
di: Katz-Samuels, Julian, et al.
Pubblicazione: (2024)

Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding
di: Fioresi, Joseph, et al.
Pubblicazione: (2025)

AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs
di: Corrado, Nicholas E., et al.
Pubblicazione: (2025)

Adaptive Greedy Frame Selection for Long Video Understanding
di: Huang, Yuning, et al.
Pubblicazione: (2026)

TimeLogic: A Temporal Logic Benchmark for Video QA
di: Swetha, Sirnam, et al.
Pubblicazione: (2025)

GIFT: Global Irreplaceability Frame Targeting for Efficient Video Understanding
di: Ma, Junpeng, et al.
Pubblicazione: (2026)

From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos
di: Gupta, Animesh, et al.
Pubblicazione: (2025)

Temporally Consistent Referring Video Object Segmentation with Hybrid Memory
di: Miao, Bo, et al.
Pubblicazione: (2024)

Event-Anchored Frame Selection for Effective Long-Video Understanding
di: Chen, Wang, et al.
Pubblicazione: (2026)

GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers
di: Pillai, Manu S, et al.
Pubblicazione: (2024)

Improving LLM Video Understanding with 16 Frames Per Second
di: Li, Yixuan, et al.
Pubblicazione: (2025)

Robust Multi-Task Learning with Excess Risks
di: He, Yifei, et al.
Pubblicazione: (2024)

PackCache: A Training-Free Acceleration Method for Unified Autoregressive Video Generation via Compact KV-Cache
di: Li, Kunyang, et al.
Pubblicazione: (2026)

Investigating Memorization in Video Diffusion Models
di: Chen, Chen, et al.
Pubblicazione: (2024)

Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding
di: Tan, Wenhui, et al.
Pubblicazione: (2026)

Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
di: Zhang, Shaojie, et al.
Pubblicazione: (2025)

LADDER: An Efficient Framework for Video Frame Interpolation
di: Shen, Tong, et al.
Pubblicazione: (2024)

GVD: Guiding Video Diffusion Model for Scalable Video Distillation
di: Li, Kunyang, et al.
Pubblicazione: (2025)

TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
di: Chen, Shimin, et al.
Pubblicazione: (2024)

InfoPO: On Mutual Information Maximization for Large Language Model Alignment
di: Xiao, Teng, et al.
Pubblicazione: (2025)

CityGuessr: City-Level Video Geo-Localization on a Global Scale
di: Kulkarni, Parth Parag, et al.
Pubblicazione: (2024)

FrameDiT: Diffusion Transformer with Matrix Attention for Efficient Video Generation
di: Le, Minh Khoa, et al.
Pubblicazione: (2026)

VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers
di: Li, Ruanjun, et al.
Pubblicazione: (2025)

Leveraging Pre-Trained Visual Models for AI-Generated Video Detection
di: Veeramachaneni, Keerthi, et al.
Pubblicazione: (2025)

Frame by Familiar Frame: Understanding Replication in Video Diffusion Models
di: Rahman, Aimon, et al.
Pubblicazione: (2024)

Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding
di: Chen, Wang, et al.
Pubblicazione: (2026)

FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding
di: Huang, De-An, et al.
Pubblicazione: (2025)

Exploring Reasoning-Infused Text Embedding with Large Language Models for Zero-Shot Dense Retrieval
di: Liu, Yuxiang, et al.
Pubblicazione: (2025)

HFS: Holistic Query-Aware Frame Selection for Efficient Video Reasoning
di: Yang, Yiqing, et al.
Pubblicazione: (2025)

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
di: Ristea, Nicolae-Catalin, et al.
Pubblicazione: (2023)