:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chang, Aiden, De Melo, Celso, Lukin, Stephanie M.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.16421
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Visual Agentic Memory: Enabling Online Long Video Understanding via Online Indexing, Hierarchical Memory, and Agentic Retrieval
by: Li, Aiden Yiliu, et al.
Published: (2026)

What and When to Look?: Temporal Span Proposal Network for Video Relation Detection
by: Woo, Sangmin, et al.
Published: (2021)

Look Twice: Training-Free Evidence Highlighting in Multimodal Large Language Models
by: Morini, Marco, et al.
Published: (2026)

ViLCo-Bench: VIdeo Language COntinual learning Benchmark
by: Tang, Tianqi, et al.
Published: (2024)

Unleash the Potential of CLIP for Video Highlight Detection
by: Han, Donghoon, et al.
Published: (2024)

Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision
by: Zhang, Chenshuang, et al.
Published: (2025)

LookAhead Tuning: Safer Language Models via Partial Answer Previews
by: Liu, Kangwei, et al.
Published: (2025)

Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs
by: Shabtay, Nimrod, et al.
Published: (2026)

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models
by: Wang, Xingrui, et al.
Published: (2025)

Unsupervised Transcript-assisted Video Summarization and Highlight Detection
by: Barbakos, Spyros, et al.
Published: (2025)

Thinking Ahead: Foresight Intelligence in MLLMs and World Models
by: Gong, Zhantao, et al.
Published: (2025)

A Modern Look at Simplicity Bias in Image Classification Tasks
by: Chang, Xiaoguang, et al.
Published: (2025)

AI-Generated Images: What Humans and Machines See When They Look at the Same Image
by: Poletti, Silvia, et al.
Published: (2026)

GPTSee: Enhancing Moment Retrieval and Highlight Detection via Description-Based Similarity Features
by: Sun, Yunzhuo, et al.
Published: (2024)

Predicting the Next Action by Modeling the Abstract Goal
by: Roy, Debaditya, et al.
Published: (2022)

What Matters in Range View 3D Object Detection
by: Wilson, Benjamin, et al.
Published: (2024)

LatentPilot: Scene-Aware Vision-and-Language Navigation by Dreaming Ahead with Latent Visual Reasoning
by: Hao, Haihong, et al.
Published: (2026)

Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection
by: Yang, Jin, et al.
Published: (2024)

Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection
by: Um, Sung Jin, et al.
Published: (2025)

Overcoming Semantic Dilution in Transformer-Based Next Frame Prediction
by: Nguyen, Hy, et al.
Published: (2025)

Automated Detection of Sport Highlights from Audio and Video Sources
by: Della Santa, Francesco, et al.
Published: (2025)

VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
by: Paul, Dhiman, et al.
Published: (2024)

Memorize What Matters: Emergent Scene Decomposition from Multitraverse
by: Li, Yiming, et al.
Published: (2024)

Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
by: Ren, Shuhuai, et al.
Published: (2025)

Modality Translation for Object Detection Adaptation Without Forgetting Prior Knowledge
by: Medeiros, Heitor Rapela, et al.
Published: (2024)

ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models
by: Hamdan, Shadi, et al.
Published: (2025)

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
by: Zhou, Chunting, et al.
Published: (2024)

VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction
by: Ji, Longbin, et al.
Published: (2026)

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
by: Tian, Keyu, et al.
Published: (2024)

Generating Narrated Lecture Videos from Slides with Synchronized Highlights
by: Holmberg, Alexander
Published: (2025)

Fostering Video Reasoning via Next-Event Prediction
by: Wang, Haonan, et al.
Published: (2025)

Looking into Concept Explanation Methods for Diabetic Retinopathy Classification
by: Storås, Andrea M., et al.
Published: (2024)

What Makes a Maze Look Like a Maze?
by: Hsu, Joy, et al.
Published: (2024)

Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection
by: Chandhok, Shivam, et al.
Published: (2025)

What Matters in Practical Learned Image Compression
by: Tatwawadi, Kedar, et al.
Published: (2026)

Object Aware Egocentric Online Action Detection
by: An, Joungbin, et al.
Published: (2024)

What to Do Next? Memorizing skills from Egocentric Instructional Video
by: Bi, Jing, et al.
Published: (2025)

What Happens Next? Anticipating Future Motion by Generating Point Trajectories
by: Boduljak, Gabrijel, et al.
Published: (2025)

What Matters for Scalable and Robust Learning in End-to-End Driving Planners?
by: Holtz, David, et al.
Published: (2026)

What Matters to You? Towards Visual Representation Alignment for Robot Learning
by: Tian, Ran, et al.
Published: (2023)