Saved in:
| Main Authors: | Sarch, Gabriel, Jang, Lawrence, Tarr, Michael J., Cohen, William W., Marino, Kenneth, Fragkiadaki, Katerina |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.14596 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models
by: Sarch, Gabriel, et al.
Published: (2024)
by: Sarch, Gabriel, et al.
Published: (2024)
Grounded Reinforcement Learning for Visual Reasoning
by: Sarch, Gabriel, et al.
Published: (2025)
by: Sarch, Gabriel, et al.
Published: (2025)
ODIN: A Single Model for 2D and 3D Segmentation
by: Jain, Ayush, et al.
Published: (2024)
by: Jain, Ayush, et al.
Published: (2024)
DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
by: Chu, Wen-Hsuan, et al.
Published: (2024)
by: Chu, Wen-Hsuan, et al.
Published: (2024)
Reanimating Images using Neural Representations of Dynamic Stimuli
by: Yeung, Jacob, et al.
Published: (2024)
by: Yeung, Jacob, et al.
Published: (2024)
TAPIP3D: Tracking Any Point in Persistent 3D Geometry
by: Zhang, Bowei, et al.
Published: (2025)
by: Zhang, Bowei, et al.
Published: (2025)
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations
by: Ke, Tsung-Wei, et al.
Published: (2024)
by: Ke, Tsung-Wei, et al.
Published: (2024)
Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning
by: Shibata, Yuto, et al.
Published: (2026)
by: Shibata, Yuto, et al.
Published: (2026)
Dex4D: Task-Agnostic Point Track Policy for Sim-to-Real Dexterous Manipulation
by: Kuang, Yuxuan, et al.
Published: (2026)
by: Kuang, Yuxuan, et al.
Published: (2026)
Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors
by: Chu, Wen-Hsuan, et al.
Published: (2025)
by: Chu, Wen-Hsuan, et al.
Published: (2025)
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
by: Prabhudesai, Mihir, et al.
Published: (2023)
by: Prabhudesai, Mihir, et al.
Published: (2023)
ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents
by: Wang, Yichen, et al.
Published: (2025)
by: Wang, Yichen, et al.
Published: (2025)
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers
by: Lin, Yuchen, et al.
Published: (2025)
by: Lin, Yuchen, et al.
Published: (2025)
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
by: Wei, Tong, et al.
Published: (2025)
by: Wei, Tong, et al.
Published: (2025)
Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models
by: Chu, Wen-Hsuan, et al.
Published: (2023)
by: Chu, Wen-Hsuan, et al.
Published: (2023)
Vero: An Open RL Recipe for General Visual Reasoning
by: Sarch, Gabriel, et al.
Published: (2026)
by: Sarch, Gabriel, et al.
Published: (2026)
RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph
by: Liu, Yifan, et al.
Published: (2025)
by: Liu, Yifan, et al.
Published: (2025)
Video Diffusion Alignment via Reward Gradients
by: Prabhudesai, Mihir, et al.
Published: (2024)
by: Prabhudesai, Mihir, et al.
Published: (2024)
Diffusion Beats Autoregressive in Data-Constrained Settings
by: Prabhudesai, Mihir, et al.
Published: (2025)
by: Prabhudesai, Mihir, et al.
Published: (2025)
Unified Multimodal Discrete Diffusion
by: Swerdlow, Alexander, et al.
Published: (2025)
by: Swerdlow, Alexander, et al.
Published: (2025)
Grounding Task Assistance with Multimodal Cues from a Single Demonstration
by: Sarch, Gabriel, et al.
Published: (2025)
by: Sarch, Gabriel, et al.
Published: (2025)
Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
by: Ravi, Sahithya, et al.
Published: (2025)
by: Ravi, Sahithya, et al.
Published: (2025)
Ella: Embodied Social Agents with Lifelong Memory
by: Zhang, Hongxin, et al.
Published: (2025)
by: Zhang, Hongxin, et al.
Published: (2025)
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
by: Shi, Yudi, et al.
Published: (2024)
by: Shi, Yudi, et al.
Published: (2024)
EMemBench: Interactive Benchmarking of Episodic Memory for VLM Agents
by: Li, Xinze, et al.
Published: (2026)
by: Li, Xinze, et al.
Published: (2026)
VLM-KD: Knowledge Distillation from VLM for Long-Tail Visual Recognition
by: Zhang, Zaiwei, et al.
Published: (2024)
by: Zhang, Zaiwei, et al.
Published: (2024)
FindingDory: A Benchmark to Evaluate Memory in Embodied Agents
by: Yadav, Karmesh, et al.
Published: (2025)
by: Yadav, Karmesh, et al.
Published: (2025)
Iterative Refinement Improves Compositional Image Generation
by: Jaiswal, Shantanu, et al.
Published: (2026)
by: Jaiswal, Shantanu, et al.
Published: (2026)
Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding
by: Fan, Yue, et al.
Published: (2024)
by: Fan, Yue, et al.
Published: (2024)
AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents
by: Wang, Pan, et al.
Published: (2026)
by: Wang, Pan, et al.
Published: (2026)
BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning
by: Zhan, Qiusi, et al.
Published: (2025)
by: Zhan, Qiusi, et al.
Published: (2025)
Video Depth without Video Models
by: Ke, Bingxin, et al.
Published: (2024)
by: Ke, Bingxin, et al.
Published: (2024)
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second
by: Lin, Chenguo, et al.
Published: (2025)
by: Lin, Chenguo, et al.
Published: (2025)
Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model
by: Xu, Wenjiang, et al.
Published: (2025)
by: Xu, Wenjiang, et al.
Published: (2025)
IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
by: Lu, Xiaoya, et al.
Published: (2025)
by: Lu, Xiaoya, et al.
Published: (2025)
Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning
by: Gupta, Gunshi, et al.
Published: (2025)
by: Gupta, Gunshi, et al.
Published: (2025)
Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement
by: Gkanatsios, Nikolaos, et al.
Published: (2023)
by: Gkanatsios, Nikolaos, et al.
Published: (2023)
TimeWarp: Evaluating Web Agents by Revisiting the Past
by: Ishmam, Md Farhan, et al.
Published: (2026)
by: Ishmam, Md Farhan, et al.
Published: (2026)
RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes
by: Wu, Leyi, et al.
Published: (2026)
by: Wu, Leyi, et al.
Published: (2026)
HomeGuard: VLM-based Embodied Safeguard for Identifying Contextual Risk in Household Task
by: Lu, Xiaoya, et al.
Published: (2026)
by: Lu, Xiaoya, et al.
Published: (2026)
Similar Items
-
HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models
by: Sarch, Gabriel, et al.
Published: (2024) -
Grounded Reinforcement Learning for Visual Reasoning
by: Sarch, Gabriel, et al.
Published: (2025) -
ODIN: A Single Model for 2D and 3D Segmentation
by: Jain, Ayush, et al.
Published: (2024) -
DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
by: Chu, Wen-Hsuan, et al.
Published: (2024) -
Reanimating Images using Neural Representations of Dynamic Stimuli
by: Yeung, Jacob, et al.
Published: (2024)