Saved in:
| Main Authors: | Hu, Wenbo, Hong, Yining, Wang, Yanjun, Gao, Leison, Wei, Zibu, Yao, Xingcheng, Peng, Nanyun, Bitton, Yonatan, Szpektor, Idan, Chang, Kai-Wei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.22657 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation
by: Bansal, Hritik, et al.
Published: (2024)
by: Bansal, Hritik, et al.
Published: (2024)
Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?
by: Zhang, Yue, et al.
Published: (2026)
by: Zhang, Yue, et al.
Published: (2026)
Contrastive Sequential-Diffusion Learning: Non-linear and Multi-Scene Instructional Video Synthesis
by: Ramos, Vasco, et al.
Published: (2024)
by: Ramos, Vasco, et al.
Published: (2024)
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions
by: Yanuka, Moran, et al.
Published: (2024)
by: Yanuka, Moran, et al.
Published: (2024)
LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues
by: Wu, Di, et al.
Published: (2026)
by: Wu, Di, et al.
Published: (2026)
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
by: Zohar, Orr, et al.
Published: (2024)
by: Zohar, Orr, et al.
Published: (2024)
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
by: Gordon, Brian, et al.
Published: (2023)
by: Gordon, Brian, et al.
Published: (2023)
Error-Driven Scene Editing for 3D Grounding in Large Language Models
by: Zhang, Yue, et al.
Published: (2025)
by: Zhang, Yue, et al.
Published: (2025)
Unblocking Fine-Grained Evaluation of Detailed Captions: An Explaining AutoRater and Critic-and-Revise Pipeline
by: Gordon, Brian, et al.
Published: (2025)
by: Gordon, Brian, et al.
Published: (2025)
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models
by: Bitton-Guetta, Nitzan, et al.
Published: (2024)
by: Bitton-Guetta, Nitzan, et al.
Published: (2024)
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
by: Wu, Di, et al.
Published: (2024)
by: Wu, Di, et al.
Published: (2024)
Distinguishing Ignorance from Error in LLM Hallucinations
by: Simhi, Adi, et al.
Published: (2024)
by: Simhi, Adi, et al.
Published: (2024)
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
by: Simhi, Adi, et al.
Published: (2024)
by: Simhi, Adi, et al.
Published: (2024)
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation
by: Slobodkin, Aviv, et al.
Published: (2025)
by: Slobodkin, Aviv, et al.
Published: (2025)
Generating Coherent Sequences of Visual Illustrations for Real-World Manual Tasks
by: Bordalo, João, et al.
Published: (2024)
by: Bordalo, João, et al.
Published: (2024)
Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents
by: Bei, Yuanchen, et al.
Published: (2026)
by: Bei, Yuanchen, et al.
Published: (2026)
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence
by: Hong, Yining, et al.
Published: (2025)
by: Hong, Yining, et al.
Published: (2025)
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
by: Simhi, Adi, et al.
Published: (2025)
by: Simhi, Adi, et al.
Published: (2025)
HyperMem: Hypergraph Memory for Long-Term Conversations
by: Yue, Juwei, et al.
Published: (2026)
by: Yue, Juwei, et al.
Published: (2026)
TiMem: Temporal-Hierarchical Memory Consolidation for Long-Horizon Conversational Agents
by: Li, Kai, et al.
Published: (2026)
by: Li, Kai, et al.
Published: (2026)
ES-Mem: Event Segmentation-Based Memory for Long-Term Dialogue Agents
by: Zou, Huhai, et al.
Published: (2026)
by: Zou, Huhai, et al.
Published: (2026)
VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation
by: Bansal, Hritik, et al.
Published: (2025)
by: Bansal, Hritik, et al.
Published: (2025)
3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning
by: Yang, Yuncong, et al.
Published: (2024)
by: Yang, Yuncong, et al.
Published: (2024)
TeleMem: Building Long-Term and Multimodal Memory for Agentic AI
by: Chen, Chunliang, et al.
Published: (2025)
by: Chen, Chunliang, et al.
Published: (2025)
DimMem: Dimensional Structuring for Efficient Long-Term Agent Memory
by: Qiu, Wentao, et al.
Published: (2026)
by: Qiu, Wentao, et al.
Published: (2026)
MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents
by: Hu, Tianyu, et al.
Published: (2026)
by: Hu, Tianyu, et al.
Published: (2026)
EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
by: Li, Yuyang, et al.
Published: (2026)
by: Li, Yuyang, et al.
Published: (2026)
MemReader: From Passive to Active Extraction for Long-Term Agent Memory
by: Kang, Jingyi, et al.
Published: (2026)
by: Kang, Jingyi, et al.
Published: (2026)
Mem2ActBench: A Benchmark for Evaluating Long-Term Memory Utilization in Task-Oriented Autonomous Agents
by: Shen, Yiting, et al.
Published: (2026)
by: Shen, Yiting, et al.
Published: (2026)
DLLM Agent: See Farther, Run Faster
by: Zhen, Huiling, et al.
Published: (2026)
by: Zhen, Huiling, et al.
Published: (2026)
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
by: Orgad, Hadas, et al.
Published: (2024)
by: Orgad, Hadas, et al.
Published: (2024)
Beyond the Noise: Aligning Prompts with Latent Representations in Diffusion Models
by: Ramos, Vasco, et al.
Published: (2025)
by: Ramos, Vasco, et al.
Published: (2025)
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks
by: Hu, Wenbo, et al.
Published: (2026)
by: Hu, Wenbo, et al.
Published: (2026)
Inside-Out: Hidden Factual Knowledge in LLMs
by: Gekhman, Zorik, et al.
Published: (2025)
by: Gekhman, Zorik, et al.
Published: (2025)
CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding
by: Kirmayr, Johannes, et al.
Published: (2025)
by: Kirmayr, Johannes, et al.
Published: (2025)
Chem3DLLM: 3D Multimodal Large Language Models for Chemistry
by: Jiang, Lei, et al.
Published: (2025)
by: Jiang, Lei, et al.
Published: (2025)
MemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language Models
by: Ha, Hyeonjeong, et al.
Published: (2026)
by: Ha, Hyeonjeong, et al.
Published: (2026)
Matryoshka Query Transformer for Large Vision-Language Models
by: Hu, Wenbo, et al.
Published: (2024)
by: Hu, Wenbo, et al.
Published: (2024)
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models
by: Ren, Xiyu, et al.
Published: (2026)
by: Ren, Xiyu, et al.
Published: (2026)
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
by: Chhikara, Prateek, et al.
Published: (2025)
by: Chhikara, Prateek, et al.
Published: (2025)
Similar Items
-
TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation
by: Bansal, Hritik, et al.
Published: (2024) -
Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?
by: Zhang, Yue, et al.
Published: (2026) -
Contrastive Sequential-Diffusion Learning: Non-linear and Multi-Scene Instructional Video Synthesis
by: Ramos, Vasco, et al.
Published: (2024) -
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions
by: Yanuka, Moran, et al.
Published: (2024) -
LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues
by: Wu, Di, et al.
Published: (2026)