Saved in:
| Main Authors: | Kim, Youngmin, Choo, Kyobin, Park, Jiwoo, Kim, Minseo, Kim, Chanyoung, Kim, Junhyeok, Hwang, Seong Jae |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.14705 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Interpreting Attention Heads for Image-to-Text Information Flow in Large Vision-Language Models
by: Kim, Jinyeong, et al.
Published: (2025)
by: Kim, Jinyeong, et al.
Published: (2025)
Disentangling Disentangled Representations: Towards Improved Latent Units via Diffusion Models
by: Jun, Youngjun, et al.
Published: (2024)
by: Jun, Youngjun, et al.
Published: (2024)
Delaunay Canopy: Building Wireframe Reconstruction from Airborne LiDAR Point Clouds via Delaunay Graph
by: Kim, Donghyun, et al.
Published: (2026)
by: Kim, Donghyun, et al.
Published: (2026)
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
by: Kang, Seil, et al.
Published: (2025)
by: Kang, Seil, et al.
Published: (2025)
CoBra: Complementary Branch Fusing Class and Semantic Knowledge for Robust Weakly Supervised Semantic Segmentation
by: Han, Woojung, et al.
Published: (2024)
by: Han, Woojung, et al.
Published: (2024)
Mono-Modalizing Extremely Heterogeneous Multi-Modal Medical Image Registration
by: Choo, Kyobin, et al.
Published: (2025)
by: Choo, Kyobin, et al.
Published: (2025)
See What You Are Told: Visual Attention Sink in Large Multimodal Models
by: Kang, Seil, et al.
Published: (2025)
by: Kang, Seil, et al.
Published: (2025)
Fourier Decomposition for Explicit Representation of 3D Point Cloud Attributes
by: Kim, Donghyun, et al.
Published: (2025)
by: Kim, Donghyun, et al.
Published: (2025)
FEAST: Fully Connected Expressive Attention for Spatial Transcriptomics
by: Jeong, Taejin, et al.
Published: (2026)
by: Jeong, Taejin, et al.
Published: (2026)
EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation
by: Kim, Chanyoung, et al.
Published: (2024)
by: Kim, Chanyoung, et al.
Published: (2024)
Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis
by: Han, Woojung, et al.
Published: (2025)
by: Han, Woojung, et al.
Published: (2025)
Rethinking Graph Convolution for 2D-to-3D Hand Pose Lifting
by: Kim, Chanyoung, et al.
Published: (2026)
by: Kim, Chanyoung, et al.
Published: (2026)
Real-Time Visual Attribution Streaming in Thinking Model
by: Kang, Seil, et al.
Published: (2026)
by: Kang, Seil, et al.
Published: (2026)
Interpreting vision transformers via residual replacement model
by: Kim, Jinyeong, et al.
Published: (2025)
by: Kim, Jinyeong, et al.
Published: (2025)
Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning
by: Han, Woojung, et al.
Published: (2024)
by: Han, Woojung, et al.
Published: (2024)
Pathology-Aware Adaptive Watermarking for Text-Driven Medical Image Synthesis
by: Kim, Chanyoung, et al.
Published: (2025)
by: Kim, Chanyoung, et al.
Published: (2025)
Anchoring and Rescaling Attention for Semantically Coherent Inbetweening
by: Choi, Tae Eun, et al.
Published: (2026)
by: Choi, Tae Eun, et al.
Published: (2026)
ViKey: Enhancing Temporal Understanding in Videos via Visual Prompting
by: Lee, Yeonkyung, et al.
Published: (2026)
by: Lee, Yeonkyung, et al.
Published: (2026)
Slice-Consistent 3D Volumetric Brain CT-to-MRI Translation with 2D Brownian Bridge Diffusion Model
by: Choo, Kyobin, et al.
Published: (2024)
by: Choo, Kyobin, et al.
Published: (2024)
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
by: Kim, Chanyoung, et al.
Published: (2024)
by: Kim, Chanyoung, et al.
Published: (2024)
DiffSLT: Enhancing Diversity in Sign Language Translation via Diffusion Model
by: Moon, JiHwan, et al.
Published: (2024)
by: Moon, JiHwan, et al.
Published: (2024)
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
by: Kim, Jungeun, et al.
Published: (2024)
by: Kim, Jungeun, et al.
Published: (2024)
PLATYPUS: Progressive Local Surface Estimator for Arbitrary-Scale Point Cloud Upsampling
by: Kim, Donghyun, et al.
Published: (2024)
by: Kim, Donghyun, et al.
Published: (2024)
MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection
by: Oh, Youngmin, et al.
Published: (2024)
by: Oh, Youngmin, et al.
Published: (2024)
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues
by: Kim, Youngmin, et al.
Published: (2025)
by: Kim, Youngmin, et al.
Published: (2025)
FALCON: Frequency Adjoint Link with CONtinuous Density Mask for Fast Single Image Dehazing
by: Kim, Donghyun, et al.
Published: (2024)
by: Kim, Donghyun, et al.
Published: (2024)
WAVE: Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image
by: Park, Jiwoo, et al.
Published: (2025)
by: Park, Jiwoo, et al.
Published: (2025)
KRETA: A Benchmark for Korean Reading and Reasoning in Text-Rich VQA Attuned to Diverse Visual Contexts
by: Hwang, Taebaek, et al.
Published: (2025)
by: Hwang, Taebaek, et al.
Published: (2025)
LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System
by: Park, Hongbeen, et al.
Published: (2025)
by: Park, Hongbeen, et al.
Published: (2025)
Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding
by: Kim, Jiwan, et al.
Published: (2026)
by: Kim, Jiwan, et al.
Published: (2026)
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
by: Kim, Jiwan, et al.
Published: (2025)
by: Kim, Jiwan, et al.
Published: (2025)
Test-Time Training for Visual Foresight Vision-Language-Action Models
by: Park, Sangwu, et al.
Published: (2026)
by: Park, Sangwu, et al.
Published: (2026)
Weakly Supervised Video Scene Graph Generation via Natural Language Supervision
by: Kim, Kibum, et al.
Published: (2025)
by: Kim, Kibum, et al.
Published: (2025)
Parameter Efficient Fine Tuning for Multi-scanner PET to PET Reconstruction
by: Kim, Yumin, et al.
Published: (2024)
by: Kim, Yumin, et al.
Published: (2024)
OVS Meets Continual Learning: Towards Sustainable Open-Vocabulary Segmentation
by: Hwang, Dongjun, et al.
Published: (2024)
by: Hwang, Dongjun, et al.
Published: (2024)
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild
by: Kim, Junhyeok, et al.
Published: (2025)
by: Kim, Junhyeok, et al.
Published: (2025)
RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype Learning
by: Yoon, Kanghoon, et al.
Published: (2024)
by: Yoon, Kanghoon, et al.
Published: (2024)
Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models
by: Hwang, Sungwon, et al.
Published: (2025)
by: Hwang, Sungwon, et al.
Published: (2025)
OpenFS: Multi-Hand-Capable Fingerspelling Recognition with Implicit Signing-Hand Detection and Frame-Wise Letter-Conditioned Synthesis
by: Cha, Junuk, et al.
Published: (2026)
by: Cha, Junuk, et al.
Published: (2026)
Training Strategies for Isolated Sign Language Recognition
by: Kvanchiani, Karina, et al.
Published: (2024)
by: Kvanchiani, Karina, et al.
Published: (2024)
Similar Items
-
Interpreting Attention Heads for Image-to-Text Information Flow in Large Vision-Language Models
by: Kim, Jinyeong, et al.
Published: (2025) -
Disentangling Disentangled Representations: Towards Improved Latent Units via Diffusion Models
by: Jun, Youngjun, et al.
Published: (2024) -
Delaunay Canopy: Building Wireframe Reconstruction from Airborne LiDAR Point Clouds via Delaunay Graph
by: Kim, Donghyun, et al.
Published: (2026) -
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
by: Kang, Seil, et al.
Published: (2025) -
CoBra: Complementary Branch Fusing Class and Semantic Knowledge for Robust Weakly Supervised Semantic Segmentation
by: Han, Woojung, et al.
Published: (2024)