:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Longfei, Fan, Zhiwen, Cong, Wenyan, Liu, Xinhang, Yin, Yuyang, Foutter, Matt, Pan, Panwang, You, Chenyu, Wang, Yue, Wang, Zhangyang, Zhao, Yao, Pavone, Marco, Wei, Yunchao
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.07978
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models
by: Cong, Wenyan, et al.
Published: (2025)

4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency
by: Yin, Yuyang, et al.
Published: (2023)

InstantSplat: Sparse-view Gaussian Splatting in Seconds
by: Fan, Zhiwen, et al.
Published: (2024)

Can Test-Time Scaling Improve World Foundation Model?
by: Cong, Wenyan, et al.
Published: (2025)

VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment
by: Cong, Wenyan, et al.
Published: (2025)

Large Spatial Model: End-to-end Unposed Images to Semantic 3D
by: Fan, Zhiwen, et al.
Published: (2024)

RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models
by: Kwok, Jacky, et al.
Published: (2025)

StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation
by: Xing, Ke, et al.
Published: (2025)

Real-Time Anomaly Detection and Reactive Planning with Large Language Models
by: Sinha, Rohan, et al.
Published: (2024)

Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models
by: Liang, Hanwen, et al.
Published: (2024)

Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses
by: Fan, Zhiwen, et al.
Published: (2024)

FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
by: Zhu, Zehao, et al.
Published: (2023)

Realistic Extreme Behavior Generation for Improved AV Testing
by: Dyro, Robert, et al.
Published: (2024)

Vision Foundation Model Embedding-Based Semantic Anomaly Detection
by: Ronecker, Max Peter, et al.
Published: (2025)

PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion
by: Yin, Yuyang, et al.
Published: (2025)

4K4DGen: Panoramic 4D Generation at 4K Resolution
by: Li, Renjie, et al.
Published: (2024)

DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
by: Wen, Kairun, et al.
Published: (2025)

SpatialTree: How Spatial Abilities Branch Out in MLLMs
by: Xiao, Yuxi, et al.
Published: (2025)

Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications
by: Foutter, Matthew, et al.
Published: (2024)

ReachBot Field Tests in a Mojave Desert Lava Tube as a Martian Analog
by: Chen, Tony G., et al.
Published: (2024)

Egocentric World Model for Photorealistic Hand-Object Interaction Synthesis
by: Li, Dayou, et al.
Published: (2026)

Martian Exploration of Lava Tubes (MELT) with ReachBot: Scientific Investigation and Concept of Operations
by: Di, Julia, et al.
Published: (2024)

INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing
by: Abi-Karam, Stefan, et al.
Published: (2023)

CIPHER: Culvert Inspection through Pairwise Frame Selection and High-Efficiency Reconstruction
by: Lee, Seoyoung, et al.
Published: (2026)

LLM-AutoDiff: Auto-Differentiate Any LLM Workflow
by: Yin, Li, et al.
Published: (2025)

PlainQAFact: Retrieval-augmented Factual Consistency Evaluation Metric for Biomedical Plain Language Summarization
by: You, Zhiwen, et al.
Published: (2025)

GaussianStego: A Generalizable Stenography Pipeline for Generative 3D Gaussians Splatting
by: Li, Chenxin, et al.
Published: (2024)

CryoFastAR: Fast Cryo-EM Ab Initio Reconstruction Made Easy
by: Zhang, Jiakai, et al.
Published: (2025)

Expressive Gaussian Human Avatars from Monocular RGB Video
by: Hu, Hezhen, et al.
Published: (2024)

Extrapolated Urban View Synthesis Benchmark
by: Han, Xiangyu, et al.
Published: (2024)

InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior
by: Lin, Chenguo, et al.
Published: (2024)

PACE: Pacing Operator Learning to Accurate Optical Field Simulation for Complicated Photonic Devices
by: Zhu, Hanqing, et al.
Published: (2024)

Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
by: Zhou, Shijie, et al.
Published: (2025)

Scale Where It Matters: Training-Free Localized Scaling for Diffusion Models
by: Ren, Qin, et al.
Published: (2025)

A Stabilized High‐Order Spectral Model With Adaptive Residual‐Based Artificial Viscosity for Fully‐Nonlinear Free‐Surface Flow
by: Longfei Cong, et al.
Published: (2025)

InfoAffect: Affective Annotations of Infographics in Information Spread
by: Fu, Zihang, et al.
Published: (2025)

Enhance-A-Video: Better Generated Video for Free
by: Luo, Yang, et al.
Published: (2025)

APOLLO: SGD-like Memory, AdamW-level Performance
by: Zhu, Hanqing, et al.
Published: (2024)

HumanCrafter: Synergizing Generalizable Human Reconstruction and Semantic 3D Segmentation
by: Pan, Panwang, et al.
Published: (2025)

Characterizing the current systems in the Martian ionosphere
by: Gao, Jiawei, et al.
Published: (2024)