:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Huang, Weikai, Zhang, Jieyu, Li, Sijun, Jia, Taoyang, Duan, Jiafei, Cheng, Yunqian, Cho, Jaemin, Wallingford, Matthew, Soraki, Rustin, Kim, Chris Dongjoo, Liu, Shuo, Clay, Donovan, Anderson, Taira, Han, Winson, Farhadi, Ali, Hariharan, Bharath, Ren, Zhongzheng, Krishna, Ranjay
Formato:	Preprint
Publicado:	2026
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2604.08626
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

ObjectForesight: Predicting Future 3D Object Trajectories from Human Videos
por: Soraki, Rustin, et al.
Publicado: (2026)

Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding
por: Huang, Weikai, et al.
Publicado: (2025)

Posterior Augmented Flow Matching
por: Stoica, George, et al.
Publicado: (2026)

CrossFusion: A Multi-Scale Cross-Attention Convolutional Fusion Model for Cancer Survival Prediction
por: Soraki, Rustin, et al.
Publicado: (2025)

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding
por: Clark, Christopher, et al.
Publicado: (2026)

SAM 3D Animal: Promptable Animal 3D Reconstruction from Images in the Wild
por: Hu, Xuyi, et al.
Publicado: (2026)

Generate Any Scene: Scene Graph Driven Data Synthesis for Visual Generation Training
por: Gao, Ziqi, et al.
Publicado: (2024)

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
por: Ma, Zixian, et al.
Publicado: (2024)

PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology
por: Ghezloo, Fatemeh, et al.
Publicado: (2025)

MolmoAct2: Action Reasoning Models for Real-world Deployment
por: Fang, Haoquan, et al.
Publicado: (2026)

Selective Visual Representations Improve Convergence and Generalization for Embodied AI
por: Eftekhar, Ainaz, et al.
Publicado: (2023)

Task Me Anything
por: Zhang, Jieyu, et al.
Publicado: (2024)

WildGaussians: 3D Gaussian Splatting in the Wild
por: Kulhanek, Jonas, et al.
Publicado: (2024)

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web
por: Gupta, Tanmay, et al.
Publicado: (2026)

WildFusion: Multimodal Implicit 3D Reconstructions in the Wild
por: Liu, Yanbaihui, et al.
Publicado: (2024)

MolmoPoint: Better Pointing for VLMs with Grounding Tokens
por: Clark, Christopher, et al.
Publicado: (2026)

MOD-UV: Learning Mobile Object Detectors from Unlabeled Videos
por: Sun, Yihong, et al.
Publicado: (2024)

WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
por: Guo, Yansong, et al.
Publicado: (2025)

WildCAT3D: Appearance-Aware Multi-View Diffusion in the Wild
por: Alper, Morris, et al.
Publicado: (2025)

Beyond the Frame: Generating 360 Panoramic Videos from Perspective Videos
por: Luo, Rundong, et al.
Publicado: (2025)

The case for nation states, particularly small ones
por: Michael Rustin
Publicado: (2025)

Labour: Welfare, Freedom, Virtue, OK but New Thinking Needed
por: Michael Rustin
Publicado: (2024)

Detect Anything 3D in the Wild
por: Zhang, Hanxue, et al.
Publicado: (2025)

Counter-Current Learning: A Biologically Plausible Dual Network Approach for Deep Learning
por: Kao, Chia-Hsiang, et al.
Publicado: (2024)

The One RING: a Robotic Indoor Navigation Generalist
por: Eftekhar, Ainaz, et al.
Publicado: (2024)

Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos
por: Gao, Ziqi, et al.
Publicado: (2026)

SmartWilds: Multimodal Wildlife Monitoring Dataset
por: Kline, Jenna, et al.
Publicado: (2025)

MolmoAct: Action Reasoning Models that can Reason in Space
por: Lee, Jason, et al.
Publicado: (2025)

GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
por: Deshpande, Abhay, et al.
Publicado: (2025)

3D Annotation Of Arbitrary Objects In The Wild
por: Blomqvist, Kenneth, et al.
Publicado: (2021)

CoSMo3D: Open-World Promptable 3D Semantic Part Segmentation through LLM-Guided Canonical Spatial Modeling
por: Jin, Li, et al.
Publicado: (2026)

The comedies of Oscar Wilde / Oscar Wilde
por: Wilde, Oscar
Publicado: (1959)

WildIFEval: Instruction Following in the Wild
por: Lior, Gili, et al.
Publicado: (2025)

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
por: Shen, Ethan, et al.
Publicado: (2024)

Potential for Polynomial Solution for NP-Complete Problems using Quantum Computation
por: Badihian, Neema Rustin
Publicado: (2025)

Increasing through Workshops the Amount of Time Kindergarten Parents Read to Their Children.
por: Rustin, Terry A.
Publicado: (1989)

Efficient Vector Search in the Wild: One Model for Multi-K Queries
por: Peng, Yifan, et al.
Publicado: (2026)

Wild
Publicado: (2026)

WildSmoke: Ready-to-Use Dynamic 3D Smoke Assets from a Single Video in the Wild
por: Liu, Yuqiu, et al.
Publicado: (2025)

Iterated Learning Improves Compositionality in Large Vision-Language Models
por: Zheng, Chenhao, et al.
Publicado: (2024)