:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Renhao, Geng, Haoran, Li, Tingle, Wang, Feishi, Anumanchipalli, Gopala, Darrell, Trevor, Li, Boyi, Abbeel, Pieter, Malik, Jitendra, Efros, Alexei A.
Format:	Preprint
Published:	2025
Subjects:	Robotics Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.02864
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Self-Supervised Audio-Visual Soundscape Stylization
by: Li, Tingle, et al.
Published: (2024)

Audio Texture Manipulation by Exemplar-Based Analogy
by: Cheng, Kan Jen, et al.
Published: (2025)

Prioritized Generative Replay
by: Wang, Renhao, et al.
Published: (2024)

Interactive Task Planning with Language Models
by: Li, Boyi, et al.
Published: (2023)

Sounding that Object: Interactive Object-Aware Image to Audio Generation
by: Li, Tingle, et al.
Published: (2025)

ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation
by: Heng, Liang, et al.
Published: (2025)

Rodrigues Network for Learning Robot Actions
by: Zhang, Jialiang, et al.
Published: (2025)

Synthesizing Moving People with 3D Control
by: Li, Boyi, et al.
Published: (2024)

D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping
by: Lou, Haozhe, et al.
Published: (2026)

DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy
by: Wang, Yuran, et al.
Published: (2025)

Rethinking Patch Dependence for Masked Autoencoders
by: Fu, Letian, et al.
Published: (2024)

SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending
by: Kuang, Yuxuan, et al.
Published: (2025)

Deep Sensorimotor Control by Imitating Predictive Models of Human Motion
by: Singh, Himanshu Gaurav, et al.
Published: (2025)

DIPOLE: Fusing Vision and Geometry for Robust Visuomotor Generalization
by: Tang, Yikai, et al.
Published: (2025)

Multi-Objective Learning for Diffusion Models: A Statistical Theory under Semi-Supervised Learning
by: Cheng, Ziheng, et al.
Published: (2026)

StyleStream: Real-Time Zero-Shot Voice Style Conversion
by: Liu, Yisi, et al.
Published: (2026)

End-to-end RL Improves Dexterous Grasping Policies
by: Singh, Ritvik, et al.
Published: (2025)

Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities
by: Lin, Guan-Ting, et al.
Published: (2025)

Learning Humanoid Locomotion over Challenging Terrain
by: Radosavovic, Ilija, et al.
Published: (2024)

Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction
by: Li, Boyi, et al.
Published: (2022)

Towards Hierarchical Spoken Language Dysfluency Modeling
by: Lian, Jiachen, et al.
Published: (2024)

Large Video Planner Enables Generalizable Robot Control
by: Chen, Boyuan, et al.
Published: (2025)

It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models
by: Harrington, Anne, et al.
Published: (2025)

Closing the Visual Sim-to-Real Gap with Object-Composable NeRFs
by: Mishra, Nikhil, et al.
Published: (2024)

How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference
by: Lin, Toru, et al.
Published: (2026)

Twisting Lids Off with Two Hands
by: Lin, Toru, et al.
Published: (2024)

Visual Imitation Enables Contextual Humanoid Control
by: Allshire, Arthur, et al.
Published: (2025)

RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding
by: Liu, Yisi, et al.
Published: (2025)

xT: Nested Tokenization for Larger Context in Large Images
by: Gupta, Ritwik, et al.
Published: (2024)

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
by: Lian, Long, et al.
Published: (2023)

Learning Sim-to-Real Humanoid Locomotion in 15 Minutes
by: Seo, Younggyo, et al.
Published: (2025)

Schrodinger Audio-Visual Editor: Object-Level Audiovisual Removal
by: Xu, Weihan, et al.
Published: (2025)

From Generated Human Videos to Physically Plausible Robot Trajectories
by: Ni, James, et al.
Published: (2025)

A Unified Framework for Model Editing
by: Gupta, Akshat, et al.
Published: (2024)

Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3
by: Yoon, Junsang, et al.
Published: (2024)

Rebuilding ROME : Resolving Model Collapse during Sequential Model Editing
by: Gupta, Akshat, et al.
Published: (2024)

Geometric Interpretation of Layer Normalization and a Comparative Analysis with RMSNorm
by: Gupta, Akshat, et al.
Published: (2024)

Model Editing at Scale leads to Gradual and Catastrophic Forgetting
by: Gupta, Akshat, et al.
Published: (2024)

Self-Assessment Tests are Unreliable Measures of LLM Personality
by: Gupta, Akshat, et al.
Published: (2023)

Multimodal Segmentation for Vocal Tract Modeling
by: Jain, Rishi, et al.
Published: (2024)