:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Qiao, Changze, Lu, Mingming
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.22006
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Enhancing Long Video Understanding via Hierarchical Event-Based Memory
by: Cheng, Dingxin, et al.
Published: (2024)

Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism
by: Li, Kaiyu, et al.
Published: (2025)

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
by: Zhang, Fan, et al.
Published: (2024)

HRGS: Hierarchical Gaussian Splatting for Memory-Efficient High-Resolution 3D Reconstruction
by: Li, Changbai, et al.
Published: (2025)

Chain-of-Memory: Enhancing GUI Agents for Cross-Application Navigation
by: Gao, Xinzge, et al.
Published: (2025)

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
by: Zhang, Haowei, et al.
Published: (2026)

MASR: Self-Reflective Reasoning through Multimodal Hierarchical Attention Focusing for Agent-based Video Understanding
by: Cao, Shiwen, et al.
Published: (2025)

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning
by: Gupta, Gunshi, et al.
Published: (2025)

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
by: Li, Yan, et al.
Published: (2026)

ParkingE2E: Camera-based End-to-end Parking Network, from Images to Planning
by: Li, Changze, et al.
Published: (2024)

Beyond Static Visual Tokens: Structured Sequential Visual Chain-of-Thought Reasoning
by: Guo, Guangfu, et al.
Published: (2026)

Spatia: Video Generation with Updatable Spatial Memory
by: Zhao, Jinjing, et al.
Published: (2025)

FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding
by: Xie, Yiweng, et al.
Published: (2026)

SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation
by: Tan, Shanwen, et al.
Published: (2026)

VIB-Probe: Detecting and Mitigating Hallucinations in Vision-Language Models via Variational Information Bottleneck
by: Zhang, Feiran, et al.
Published: (2026)

Aquila: A Hierarchically Aligned Visual-Language Model for Enhanced Remote Sensing Image Comprehension
by: Lu, Kaixuan, et al.
Published: (2024)

E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control
by: Gu, Qiao, et al.
Published: (2026)

Visual Agentic Memory: Enabling Online Long Video Understanding via Online Indexing, Hierarchical Memory, and Agentic Retrieval
by: Li, Aiden Yiliu, et al.
Published: (2026)

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
by: Ding, Yanbo, et al.
Published: (2024)

Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation
by: Dai, Dawei, et al.
Published: (2025)

HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models
by: Zhou, Ziqin, et al.
Published: (2025)

Pack and Force Your Memory: Long-form and Consistent Video Generation
by: Wu, Xiaofei, et al.
Published: (2025)

Enhanced Anime Image Generation Using USE-CMHSA-GAN
by: Lu, J.
Published: (2024)

Shallow Features Matter: Hierarchical Memory with Heterogeneous Interaction for Unsupervised Video Object Segmentation
by: Xiangyu, Zheng, et al.
Published: (2025)

COLI: A Hierarchical Efficient Compressor for Large Images
by: Wang, Haoran, et al.
Published: (2025)

H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
by: Chen, Siran, et al.
Published: (2025)

EM-Vid: Training-Free Entity-Centric Memory for Efficient and Consistent Multi-Shot Video Generation
by: Vandersanden, Jente, et al.
Published: (2026)

Sharp Eyes and Memory for VideoLLMs: Information-Aware Visual Token Pruning for Efficient and Reliable VideoLLM Reasoning
by: Qin, Jialong, et al.
Published: (2025)

Memory-Efficient Fine-Tuning for Quantized Diffusion Model
by: Ryu, Hyogon, et al.
Published: (2024)

Memory-Efficient Prompt Tuning for Incremental Histopathology Classification
by: Zhu, Yu, et al.
Published: (2024)

Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios
by: Yan, Peizheng, et al.
Published: (2026)

HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation
by: Kumbong, Hermann, et al.
Published: (2025)

From Structure to Detail: Hierarchical Distillation for Efficient Diffusion Model
by: Cheng, Hanbo, et al.
Published: (2025)

LocationAgent: A Hierarchical Agent for Image Geolocation via Decoupling Strategy and Evidence from Parametric Knowledge
by: Li, Qiujun, et al.
Published: (2026)

LightMem: Lightweight and Efficient Memory-Augmented Generation
by: Fang, Jizhan, et al.
Published: (2025)

EVA: Efficient Reinforcement Learning for End-to-End Video Agent
by: Zhang, Yaolun, et al.
Published: (2026)

Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge
by: Jiang, Bowen, et al.
Published: (2023)

ProtoGS: Efficient and High-Quality Rendering with 3D Gaussian Prototypes
by: Gao, Zhengqing, et al.
Published: (2025)

MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning
by: Cai, Zhixi, et al.
Published: (2026)

Efficient Large-Deformation Medical Image Registration via Recurrent Dynamic Correlation
by: Li, Tianran, et al.
Published: (2025)