:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gu, Jeffrey, Jeon, Minkyu, Ma, Ambri, Yeung-Levy, Serena, Zhong, Ellen D.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.06332
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CryoBench: Diverse and challenging datasets for the heterogeneity problem in cryo-EM
by: Jeon, Minkyu, et al.
Published: (2024)

Foundation Models Secretly Understand Neural Network Weights: Enhancing Hypernetwork Architectures with Foundation Models
by: Gu, Jeffrey, et al.
Published: (2025)

Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models
by: Endo, Mark, et al.
Published: (2025)

Cryo-forum: A framework for orientation recovery with uncertainty measure with the application in cryo-EM image analysis
by: Chung, Szu-Chi
Published: (2023)

Continuous Perception Matters: Diagnosing Temporal Integration Failures in Multimodal Models
by: Wang, Zeyu, et al.
Published: (2024)

Zero-shot Action Localization via the Confidence of Large Vision-Language Models
by: Aklilu, Josiah, et al.
Published: (2024)

Multi-Human Mesh Recovery with Transformers
by: Wang, Zeyu, et al.
Published: (2024)

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration
by: Endo, Mark, et al.
Published: (2024)

Revisiting Active Learning in the Era of Vision Foundation Models
by: Gupte, Sanket Rajan, et al.
Published: (2024)

Diffusion-HPC: Synthetic Data Generation for Human Mesh Recovery in Challenging Domains
by: Weng, Zhenzhen, et al.
Published: (2023)

CryoSplat: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction
by: Chen, Suyi, et al.
Published: (2025)

CryoFastAR: Fast Cryo-EM Ab Initio Reconstruction Made Easy
by: Zhang, Jiakai, et al.
Published: (2025)

The Impact of Image Resolution on Biomedical Multimodal Large Language Models
by: Chen, Liangyu, et al.
Published: (2025)

Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models
by: Sui, Elaine, et al.
Published: (2024)

DeforHMR: Vision Transformer with Deformable Cross-Attention for 3D Human Mesh Recovery
by: Heo, Jaewoo, et al.
Published: (2024)

Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data
by: Zhang, Yuhui, et al.
Published: (2024)

Multiscale guidance of protein structure prediction with heterogeneous cryo-EM data
by: Raghu, Rishwanth, et al.
Published: (2025)

Motion Diffusion-Guided 3D Global HMR from a Dynamic Camera
by: Heo, Jaewoo, et al.
Published: (2024)

Viewpoint Textual Inversion: Discovering Scene Representations and 3D View Control in 2D Diffusion Models
by: Burgess, James, et al.
Published: (2023)

Protein Graph Neural Networks for Heterogeneous Cryo-EM Reconstruction
by: Krook, Jonathan, et al.
Published: (2026)

CryoSPIN: Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference
by: Shekarforoush, Shayan, et al.
Published: (2024)

DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM
by: Shen, Yingjun, et al.
Published: (2024)

Robust single-particle cryo-EM image denoising and restoration
by: Zhang, Jing, et al.
Published: (2024)

CHIMERA: Adaptive Cache Injection and Semantic Anchor Prompting for Zero-shot Image Morphing with Morphing-oriented Metrics
by: Kye, Dahyeon, et al.
Published: (2025)

Depth-guided NeRF Training via Earth Mover's Distance
by: Rau, Anita, et al.
Published: (2024)

VideoAgent: Long-form Video Understanding with Large Language Model as Agent
by: Wang, Xiaohan, et al.
Published: (2024)

μ-Bench: A Vision-Language Benchmark for Microscopy Understanding
by: Lozano, Alejandro, et al.
Published: (2024)

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
by: Zohar, Orr, et al.
Published: (2024)

Seeing Is Believing? A Benchmark for Multimodal Large Language Models on Visual Illusions and Anomalies
by: Hou, Wenjin, et al.
Published: (2026)

NegVQA: Can Vision Language Models Understand Negation?
by: Zhang, Yuhui, et al.
Published: (2025)

R3eVision: A Survey on Robust Rendering, Restoration, and Enhancement for 3D Low-Level Vision
by: Kwon, Weeyoung, et al.
Published: (2025)

Fine-tuning MLLMs Without Forgetting Is Easier Than You Think
by: Li, He, et al.
Published: (2026)

Anny-Fit: All-Age Human Mesh Recovery
by: Bravo-Sánchez, Laura, et al.
Published: (2026)

Resolving compositional and conformational heterogeneity in cryo-EM with deformable 3D Gaussian representations
by: He, Bintao, et al.
Published: (2025)

CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders
by: Xu, Chentianye, et al.
Published: (2024)

V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think
by: Tang, Bingda, et al.
Published: (2026)

GEM: 3D Gaussian Splatting for Efficient and Accurate Cryo-EM Reconstruction
by: Qu, Huaizhi, et al.
Published: (2025)

Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM
by: Weng, Zhenzhen, et al.
Published: (2024)

Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
by: Wu, Shengguang, et al.
Published: (2025)

Data or Language Supervision: What Makes CLIP Better than DINO?
by: Liu, Yiming, et al.
Published: (2025)