:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sarch, Gabriel, Jang, Lawrence, Tarr, Michael J., Cohen, William W., Marino, Kenneth, Fragkiadaki, Katerina
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2406.14596
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models
by: Sarch, Gabriel, et al.
Published: (2024)

Grounded Reinforcement Learning for Visual Reasoning
by: Sarch, Gabriel, et al.
Published: (2025)

ODIN: A Single Model for 2D and 3D Segmentation
by: Jain, Ayush, et al.
Published: (2024)

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
by: Chu, Wen-Hsuan, et al.
Published: (2024)

Reanimating Images using Neural Representations of Dynamic Stimuli
by: Yeung, Jacob, et al.
Published: (2024)

TAPIP3D: Tracking Any Point in Persistent 3D Geometry
by: Zhang, Bowei, et al.
Published: (2025)

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations
by: Ke, Tsung-Wei, et al.
Published: (2024)

Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning
by: Shibata, Yuto, et al.
Published: (2026)

Dex4D: Task-Agnostic Point Track Policy for Sim-to-Real Dexterous Manipulation
by: Kuang, Yuxuan, et al.
Published: (2026)

Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors
by: Chu, Wen-Hsuan, et al.
Published: (2025)

Aligning Text-to-Image Diffusion Models with Reward Backpropagation
by: Prabhudesai, Mihir, et al.
Published: (2023)

ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents
by: Wang, Yichen, et al.
Published: (2025)

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers
by: Lin, Yuchen, et al.
Published: (2025)

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
by: Wei, Tong, et al.
Published: (2025)

Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models
by: Chu, Wen-Hsuan, et al.
Published: (2023)

Vero: An Open RL Recipe for General Visual Reasoning
by: Sarch, Gabriel, et al.
Published: (2026)

RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph
by: Liu, Yifan, et al.
Published: (2025)

Video Diffusion Alignment via Reward Gradients
by: Prabhudesai, Mihir, et al.
Published: (2024)

Diffusion Beats Autoregressive in Data-Constrained Settings
by: Prabhudesai, Mihir, et al.
Published: (2025)

Unified Multimodal Discrete Diffusion
by: Swerdlow, Alexander, et al.
Published: (2025)

Grounding Task Assistance with Multimodal Cues from a Single Demonstration
by: Sarch, Gabriel, et al.
Published: (2025)

Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
by: Ravi, Sahithya, et al.
Published: (2025)

Ella: Embodied Social Agents with Lifelong Memory
by: Zhang, Hongxin, et al.
Published: (2025)

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
by: Shi, Yudi, et al.
Published: (2024)

EMemBench: Interactive Benchmarking of Episodic Memory for VLM Agents
by: Li, Xinze, et al.
Published: (2026)

VLM-KD: Knowledge Distillation from VLM for Long-Tail Visual Recognition
by: Zhang, Zaiwei, et al.
Published: (2024)

FindingDory: A Benchmark to Evaluate Memory in Embodied Agents
by: Yadav, Karmesh, et al.
Published: (2025)

Iterative Refinement Improves Compositional Image Generation
by: Jaiswal, Shantanu, et al.
Published: (2026)

Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding
by: Fan, Yue, et al.
Published: (2024)

AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents
by: Wang, Pan, et al.
Published: (2026)

BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning
by: Zhan, Qiusi, et al.
Published: (2025)

Video Depth without Video Models
by: Ke, Bingxin, et al.
Published: (2024)

MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second
by: Lin, Chenguo, et al.
Published: (2025)

Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model
by: Xu, Wenjiang, et al.
Published: (2025)

IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
by: Lu, Xiaoya, et al.
Published: (2025)

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning
by: Gupta, Gunshi, et al.
Published: (2025)

Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement
by: Gkanatsios, Nikolaos, et al.
Published: (2023)

TimeWarp: Evaluating Web Agents by Revisiting the Past
by: Ishmam, Md Farhan, et al.
Published: (2026)

RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes
by: Wu, Leyi, et al.
Published: (2026)

HomeGuard: VLM-based Embodied Safeguard for Identifying Contextual Risk in Household Task
by: Lu, Xiaoya, et al.
Published: (2026)