Saved in:
| Main Authors: | Qiao, Changze, Lu, Mingming |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.22006 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
by: Cheng, Dingxin, et al.
Published: (2024)
by: Cheng, Dingxin, et al.
Published: (2024)
Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism
by: Li, Kaiyu, et al.
Published: (2025)
by: Li, Kaiyu, et al.
Published: (2025)
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
by: Zhang, Fan, et al.
Published: (2024)
by: Zhang, Fan, et al.
Published: (2024)
HRGS: Hierarchical Gaussian Splatting for Memory-Efficient High-Resolution 3D Reconstruction
by: Li, Changbai, et al.
Published: (2025)
by: Li, Changbai, et al.
Published: (2025)
Chain-of-Memory: Enhancing GUI Agents for Cross-Application Navigation
by: Gao, Xinzge, et al.
Published: (2025)
by: Gao, Xinzge, et al.
Published: (2025)
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
by: Zhang, Haowei, et al.
Published: (2026)
by: Zhang, Haowei, et al.
Published: (2026)
MASR: Self-Reflective Reasoning through Multimodal Hierarchical Attention Focusing for Agent-based Video Understanding
by: Cao, Shiwen, et al.
Published: (2025)
by: Cao, Shiwen, et al.
Published: (2025)
Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning
by: Gupta, Gunshi, et al.
Published: (2025)
by: Gupta, Gunshi, et al.
Published: (2025)
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
by: Li, Yan, et al.
Published: (2026)
by: Li, Yan, et al.
Published: (2026)
ParkingE2E: Camera-based End-to-end Parking Network, from Images to Planning
by: Li, Changze, et al.
Published: (2024)
by: Li, Changze, et al.
Published: (2024)
Beyond Static Visual Tokens: Structured Sequential Visual Chain-of-Thought Reasoning
by: Guo, Guangfu, et al.
Published: (2026)
by: Guo, Guangfu, et al.
Published: (2026)
Spatia: Video Generation with Updatable Spatial Memory
by: Zhao, Jinjing, et al.
Published: (2025)
by: Zhao, Jinjing, et al.
Published: (2025)
FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding
by: Xie, Yiweng, et al.
Published: (2026)
by: Xie, Yiweng, et al.
Published: (2026)
SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation
by: Tan, Shanwen, et al.
Published: (2026)
by: Tan, Shanwen, et al.
Published: (2026)
VIB-Probe: Detecting and Mitigating Hallucinations in Vision-Language Models via Variational Information Bottleneck
by: Zhang, Feiran, et al.
Published: (2026)
by: Zhang, Feiran, et al.
Published: (2026)
Aquila: A Hierarchically Aligned Visual-Language Model for Enhanced Remote Sensing Image Comprehension
by: Lu, Kaixuan, et al.
Published: (2024)
by: Lu, Kaixuan, et al.
Published: (2024)
E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control
by: Gu, Qiao, et al.
Published: (2026)
by: Gu, Qiao, et al.
Published: (2026)
Visual Agentic Memory: Enabling Online Long Video Understanding via Online Indexing, Hierarchical Memory, and Agentic Retrieval
by: Li, Aiden Yiliu, et al.
Published: (2026)
by: Li, Aiden Yiliu, et al.
Published: (2026)
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
by: Ding, Yanbo, et al.
Published: (2024)
by: Ding, Yanbo, et al.
Published: (2024)
Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation
by: Dai, Dawei, et al.
Published: (2025)
by: Dai, Dawei, et al.
Published: (2025)
HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models
by: Zhou, Ziqin, et al.
Published: (2025)
by: Zhou, Ziqin, et al.
Published: (2025)
Pack and Force Your Memory: Long-form and Consistent Video Generation
by: Wu, Xiaofei, et al.
Published: (2025)
by: Wu, Xiaofei, et al.
Published: (2025)
Enhanced Anime Image Generation Using USE-CMHSA-GAN
by: Lu, J.
Published: (2024)
by: Lu, J.
Published: (2024)
Shallow Features Matter: Hierarchical Memory with Heterogeneous Interaction for Unsupervised Video Object Segmentation
by: Xiangyu, Zheng, et al.
Published: (2025)
by: Xiangyu, Zheng, et al.
Published: (2025)
COLI: A Hierarchical Efficient Compressor for Large Images
by: Wang, Haoran, et al.
Published: (2025)
by: Wang, Haoran, et al.
Published: (2025)
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
by: Chen, Siran, et al.
Published: (2025)
by: Chen, Siran, et al.
Published: (2025)
EM-Vid: Training-Free Entity-Centric Memory for Efficient and Consistent Multi-Shot Video Generation
by: Vandersanden, Jente, et al.
Published: (2026)
by: Vandersanden, Jente, et al.
Published: (2026)
Sharp Eyes and Memory for VideoLLMs: Information-Aware Visual Token Pruning for Efficient and Reliable VideoLLM Reasoning
by: Qin, Jialong, et al.
Published: (2025)
by: Qin, Jialong, et al.
Published: (2025)
Memory-Efficient Fine-Tuning for Quantized Diffusion Model
by: Ryu, Hyogon, et al.
Published: (2024)
by: Ryu, Hyogon, et al.
Published: (2024)
Memory-Efficient Prompt Tuning for Incremental Histopathology Classification
by: Zhu, Yu, et al.
Published: (2024)
by: Zhu, Yu, et al.
Published: (2024)
Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios
by: Yan, Peizheng, et al.
Published: (2026)
by: Yan, Peizheng, et al.
Published: (2026)
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation
by: Kumbong, Hermann, et al.
Published: (2025)
by: Kumbong, Hermann, et al.
Published: (2025)
From Structure to Detail: Hierarchical Distillation for Efficient Diffusion Model
by: Cheng, Hanbo, et al.
Published: (2025)
by: Cheng, Hanbo, et al.
Published: (2025)
LocationAgent: A Hierarchical Agent for Image Geolocation via Decoupling Strategy and Evidence from Parametric Knowledge
by: Li, Qiujun, et al.
Published: (2026)
by: Li, Qiujun, et al.
Published: (2026)
LightMem: Lightweight and Efficient Memory-Augmented Generation
by: Fang, Jizhan, et al.
Published: (2025)
by: Fang, Jizhan, et al.
Published: (2025)
EVA: Efficient Reinforcement Learning for End-to-End Video Agent
by: Zhang, Yaolun, et al.
Published: (2026)
by: Zhang, Yaolun, et al.
Published: (2026)
Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge
by: Jiang, Bowen, et al.
Published: (2023)
by: Jiang, Bowen, et al.
Published: (2023)
ProtoGS: Efficient and High-Quality Rendering with 3D Gaussian Prototypes
by: Gao, Zhengqing, et al.
Published: (2025)
by: Gao, Zhengqing, et al.
Published: (2025)
MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning
by: Cai, Zhixi, et al.
Published: (2026)
by: Cai, Zhixi, et al.
Published: (2026)
Efficient Large-Deformation Medical Image Registration via Recurrent Dynamic Correlation
by: Li, Tianran, et al.
Published: (2025)
by: Li, Tianran, et al.
Published: (2025)
Similar Items
-
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
by: Cheng, Dingxin, et al.
Published: (2024) -
Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism
by: Li, Kaiyu, et al.
Published: (2025) -
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
by: Zhang, Fan, et al.
Published: (2024) -
HRGS: Hierarchical Gaussian Splatting for Memory-Efficient High-Resolution 3D Reconstruction
by: Li, Changbai, et al.
Published: (2025) -
Chain-of-Memory: Enhancing GUI Agents for Cross-Application Navigation
by: Gao, Xinzge, et al.
Published: (2025)