:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kim, Youngmin, Choo, Kyobin, Park, Jiwoo, Kim, Minseo, Kim, Chanyoung, Kim, Junhyeok, Hwang, Seong Jae
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.14705
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Interpreting Attention Heads for Image-to-Text Information Flow in Large Vision-Language Models
by: Kim, Jinyeong, et al.
Published: (2025)

Disentangling Disentangled Representations: Towards Improved Latent Units via Diffusion Models
by: Jun, Youngjun, et al.
Published: (2024)

Delaunay Canopy: Building Wireframe Reconstruction from Airborne LiDAR Point Clouds via Delaunay Graph
by: Kim, Donghyun, et al.
Published: (2026)

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
by: Kang, Seil, et al.
Published: (2025)

CoBra: Complementary Branch Fusing Class and Semantic Knowledge for Robust Weakly Supervised Semantic Segmentation
by: Han, Woojung, et al.
Published: (2024)

Mono-Modalizing Extremely Heterogeneous Multi-Modal Medical Image Registration
by: Choo, Kyobin, et al.
Published: (2025)

See What You Are Told: Visual Attention Sink in Large Multimodal Models
by: Kang, Seil, et al.
Published: (2025)

Fourier Decomposition for Explicit Representation of 3D Point Cloud Attributes
by: Kim, Donghyun, et al.
Published: (2025)

FEAST: Fully Connected Expressive Attention for Spatial Transcriptomics
by: Jeong, Taejin, et al.
Published: (2026)

EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation
by: Kim, Chanyoung, et al.
Published: (2024)

Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis
by: Han, Woojung, et al.
Published: (2025)

Rethinking Graph Convolution for 2D-to-3D Hand Pose Lifting
by: Kim, Chanyoung, et al.
Published: (2026)

Real-Time Visual Attribution Streaming in Thinking Model
by: Kang, Seil, et al.
Published: (2026)

Interpreting vision transformers via residual replacement model
by: Kim, Jinyeong, et al.
Published: (2025)

Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning
by: Han, Woojung, et al.
Published: (2024)

Pathology-Aware Adaptive Watermarking for Text-Driven Medical Image Synthesis
by: Kim, Chanyoung, et al.
Published: (2025)

Anchoring and Rescaling Attention for Semantically Coherent Inbetweening
by: Choi, Tae Eun, et al.
Published: (2026)

ViKey: Enhancing Temporal Understanding in Videos via Visual Prompting
by: Lee, Yeonkyung, et al.
Published: (2026)

Slice-Consistent 3D Volumetric Brain CT-to-MRI Translation with 2D Brownian Bridge Diffusion Model
by: Choo, Kyobin, et al.
Published: (2024)

Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
by: Kim, Chanyoung, et al.
Published: (2024)

DiffSLT: Enhancing Diversity in Sign Language Translation via Diffusion Model
by: Moon, JiHwan, et al.
Published: (2024)

Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
by: Kim, Jungeun, et al.
Published: (2024)

PLATYPUS: Progressive Local Surface Estimator for Arbitrary-Scale Point Cloud Upsampling
by: Kim, Donghyun, et al.
Published: (2024)

MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection
by: Oh, Youngmin, et al.
Published: (2024)

Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues
by: Kim, Youngmin, et al.
Published: (2025)

FALCON: Frequency Adjoint Link with CONtinuous Density Mask for Fast Single Image Dehazing
by: Kim, Donghyun, et al.
Published: (2024)

WAVE: Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image
by: Park, Jiwoo, et al.
Published: (2025)

KRETA: A Benchmark for Korean Reading and Reasoning in Text-Rich VQA Attuned to Diverse Visual Contexts
by: Hwang, Taebaek, et al.
Published: (2025)

LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System
by: Park, Hongbeen, et al.
Published: (2025)

Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding
by: Kim, Jiwan, et al.
Published: (2026)

CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
by: Kim, Jiwan, et al.
Published: (2025)

Test-Time Training for Visual Foresight Vision-Language-Action Models
by: Park, Sangwu, et al.
Published: (2026)

Weakly Supervised Video Scene Graph Generation via Natural Language Supervision
by: Kim, Kibum, et al.
Published: (2025)

Parameter Efficient Fine Tuning for Multi-scanner PET to PET Reconstruction
by: Kim, Yumin, et al.
Published: (2024)

OVS Meets Continual Learning: Towards Sustainable Open-Vocabulary Segmentation
by: Hwang, Dongjun, et al.
Published: (2024)

EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild
by: Kim, Junhyeok, et al.
Published: (2025)

RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype Learning
by: Yoon, Kanghoon, et al.
Published: (2024)

Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models
by: Hwang, Sungwon, et al.
Published: (2025)

OpenFS: Multi-Hand-Capable Fingerspelling Recognition with Implicit Signing-Hand Detection and Frame-Wise Letter-Conditioned Synthesis
by: Cha, Junuk, et al.
Published: (2026)

Training Strategies for Isolated Sign Language Recognition
by: Kvanchiani, Karina, et al.
Published: (2024)