Guardado en:
| Autores principales: | Jeon, Byungwoo, Jeong, Yoonwoo, Lee, Hyunseok, Cho, Minsu, Shin, Jinwoo |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2602.04476 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Cog3DMap: Multi-View Vision-Language Reasoning with 3D Cognitive Maps
por: Gwak, Chanyoung, et al.
Publicado: (2026)
por: Gwak, Chanyoung, et al.
Publicado: (2026)
NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
por: Jeong, Yoonwoo, et al.
Publicado: (2023)
por: Jeong, Yoonwoo, et al.
Publicado: (2023)
SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning
por: Jeon, Byungwoo, et al.
Publicado: (2026)
por: Jeon, Byungwoo, et al.
Publicado: (2026)
RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos
por: Lee, Junmyeong, et al.
Publicado: (2024)
por: Lee, Junmyeong, et al.
Publicado: (2024)
Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting
por: Jeong, Yoonwoo, et al.
Publicado: (2025)
por: Jeong, Yoonwoo, et al.
Publicado: (2025)
MV-SAM: Multi-view Promptable Segmentation using Pointmap Guidance
por: Jeong, Yoonwoo, et al.
Publicado: (2026)
por: Jeong, Yoonwoo, et al.
Publicado: (2026)
Video Summarization with Large Language Models
por: Lee, Min Jung, et al.
Publicado: (2025)
por: Lee, Min Jung, et al.
Publicado: (2025)
uCLIP: Parameter-Efficient Multilingual Extension of Vision-Language Models with Unpaired Data
por: Chung, Dahyun, et al.
Publicado: (2025)
por: Chung, Dahyun, et al.
Publicado: (2025)
DextER: Language-driven Dexterous Grasp Generation with Embodied Reasoning
por: Lee, Junha, et al.
Publicado: (2026)
por: Lee, Junha, et al.
Publicado: (2026)
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model
por: Won, John, et al.
Publicado: (2025)
por: Won, John, et al.
Publicado: (2025)
Phantom of Latent for Large Language and Vision Models
por: Lee, Byung-Kwan, et al.
Publicado: (2024)
por: Lee, Byung-Kwan, et al.
Publicado: (2024)
Similarity-Aware Selective State-Space Modeling for Semantic Correspondence
por: Kim, Seungwook, et al.
Publicado: (2025)
por: Kim, Seungwook, et al.
Publicado: (2025)
Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos
por: Jeon, Subin, et al.
Publicado: (2024)
por: Jeon, Subin, et al.
Publicado: (2024)
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
por: Jeong, Jinho, et al.
Publicado: (2025)
por: Jeong, Jinho, et al.
Publicado: (2025)
Part-Aware Bottom-Up Group Reasoning for Fine-Grained Social Interaction Detection
por: Kim, Dongkeun, et al.
Publicado: (2025)
por: Kim, Dongkeun, et al.
Publicado: (2025)
Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model
por: Kim, Dongwon, et al.
Publicado: (2026)
por: Kim, Dongwon, et al.
Publicado: (2026)
Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling
por: Lee, Kyungmin, et al.
Publicado: (2025)
por: Lee, Kyungmin, et al.
Publicado: (2025)
Self-Consistent Latent Reasoning: Long Latent Sequence Reasoning for Vision-Language Model
por: Wang, Chenfeng, et al.
Publicado: (2026)
por: Wang, Chenfeng, et al.
Publicado: (2026)
SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models
por: Yu, Youngjoon, et al.
Publicado: (2024)
por: Yu, Youngjoon, et al.
Publicado: (2024)
HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy
por: Koo, Myungkyu, et al.
Publicado: (2025)
por: Koo, Myungkyu, et al.
Publicado: (2025)
Affostruction: 3D Affordance Grounding with Generative Reconstruction
por: Park, Chunghyun, et al.
Publicado: (2026)
por: Park, Chunghyun, et al.
Publicado: (2026)
Harnessing the Power of Training-Free Techniques in Text-to-2D Generation for Text-to-3D Generation via Score Distillation Sampling
por: Lee, Junhong, et al.
Publicado: (2025)
por: Lee, Junhong, et al.
Publicado: (2025)
MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model
por: Lee, Youngwan, et al.
Publicado: (2026)
por: Lee, Youngwan, et al.
Publicado: (2026)
ExploreGS: Explorable 3D Scene Reconstruction with Virtual Camera Samplings and Diffusion Priors
por: Kim, Minsu, et al.
Publicado: (2025)
por: Kim, Minsu, et al.
Publicado: (2025)
PanoGrounder: Bridging 2D and 3D with Panoramic Scene Representations for VLM-based 3D Visual Grounding
por: Jung, Seongmin, et al.
Publicado: (2025)
por: Jung, Seongmin, et al.
Publicado: (2025)
Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays
por: Cho, Yeongjae, et al.
Publicado: (2024)
por: Cho, Yeongjae, et al.
Publicado: (2024)
Restoration-Aligned Generative Flow Models for Blind Motion Deblurring
por: Kim, Insoo, et al.
Publicado: (2026)
por: Kim, Insoo, et al.
Publicado: (2026)
Multi-modal Attribute Prompting for Vision-Language Models
por: Liu, Xin, et al.
Publicado: (2024)
por: Liu, Xin, et al.
Publicado: (2024)
Learning Correlation Structures for Vision Transformers
por: Kim, Manjin, et al.
Publicado: (2024)
por: Kim, Manjin, et al.
Publicado: (2024)
Learning Multi-frame and Monocular Prior for Estimating Geometry in Dynamic Scenes
por: Park, Seong Hyeon, et al.
Publicado: (2025)
por: Park, Seong Hyeon, et al.
Publicado: (2025)
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
por: Liang, Qiao, et al.
Publicado: (2025)
por: Liang, Qiao, et al.
Publicado: (2025)
Adversarial Robustification via Text-to-Image Diffusion Models
por: Choi, Daewon, et al.
Publicado: (2024)
por: Choi, Daewon, et al.
Publicado: (2024)
MORDA: A Synthetic Dataset to Facilitate Adaptation of Object Detectors to Unseen Real-target Domain While Preserving Performance on Real-source Domain
por: Lim, Hojun, et al.
Publicado: (2025)
por: Lim, Hojun, et al.
Publicado: (2025)
Improving Multi-modal Large Language Model through Boosting Vision Capabilities
por: Sun, Yanpeng, et al.
Publicado: (2024)
por: Sun, Yanpeng, et al.
Publicado: (2024)
Explaining Multi-modal Large Language Models by Analyzing their Vision Perception
por: Giulivi, Loris, et al.
Publicado: (2024)
por: Giulivi, Loris, et al.
Publicado: (2024)
StarFT: Robust Fine-tuning of Zero-shot Models via Spuriosity Alignment
por: Kim, Younghyun, et al.
Publicado: (2025)
por: Kim, Younghyun, et al.
Publicado: (2025)
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
por: Kang, Dahyun, et al.
Publicado: (2024)
por: Kang, Dahyun, et al.
Publicado: (2024)
Leveraging 3D Geometric Priors in 2D Rotation Symmetry Detection
por: Seo, Ahyun, et al.
Publicado: (2025)
por: Seo, Ahyun, et al.
Publicado: (2025)
Space-Time Forecasting of Dynamic Scenes with Motion-aware Gaussian Grouping
por: Lee, Junmyeong, et al.
Publicado: (2026)
por: Lee, Junmyeong, et al.
Publicado: (2026)
FreeAction: Training-Free Techniques for Enhanced Fidelity of Trajectory-to-Video Generation
por: Kim, Seungwook, et al.
Publicado: (2025)
por: Kim, Seungwook, et al.
Publicado: (2025)
Ejemplares similares
-
Cog3DMap: Multi-View Vision-Language Reasoning with 3D Cognitive Maps
por: Gwak, Chanyoung, et al.
Publicado: (2026) -
NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
por: Jeong, Yoonwoo, et al.
Publicado: (2023) -
SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning
por: Jeon, Byungwoo, et al.
Publicado: (2026) -
RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos
por: Lee, Junmyeong, et al.
Publicado: (2024) -
Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting
por: Jeong, Yoonwoo, et al.
Publicado: (2025)