:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jung, Daeun, Jang, Jaehyeok, Jang, Sooyoung, Park, Yu Rang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2501.13277
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Bidirectional Multimodal Prompt Learning with Scale-Aware Training for Few-Shot Multi-Class Anomaly Detection
by: Lee, Yujin, et al.
Published: (2024)

AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding
by: Jung, Chaeyoung, et al.
Published: (2025)

ContactGen: Contact-Guided Interactive 3D Human Generation for Partners
by: Gu, Dongjun, et al.
Published: (2024)

I2AM: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
by: Park, Junseo, et al.
Published: (2024)

Learning Relative Representations for Fine-Grained Multimodal Alignment with Limited Data
by: Kim, Shiwon, et al.
Published: (2026)

ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts
by: Choi, Sangbum, et al.
Published: (2025)

GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability
by: Jo, Sehyeong, et al.
Published: (2025)

MINT: Molecularly Informed Training with Spatial Transcriptomics Supervision for Pathology Foundation Models
by: Lee, Minsoo, et al.
Published: (2026)

Repurposing Geometric Foundation Models for Multi-view Diffusion
by: Jang, Wooseok, et al.
Published: (2026)

Learning Object-Centric Representations in SAR Images with Multi-Level Feature Fusion
by: Jang, Oh-Tae, et al.
Published: (2025)

Model Agnostic Preference Optimization for Medical Image Segmentation
by: Nam, Yunseong, et al.
Published: (2025)

MetaRanker: Human-in-the-loop Active Ranking for Metalens Image Quality
by: Park, Yujin, et al.
Published: (2026)

StyleForge: Enhancing Text-to-Image Synthesis for Any Artistic Styles with Dual Binding
by: Park, Junseo, et al.
Published: (2024)

DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
by: Shim, Jaehyeok, et al.
Published: (2024)

PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models
by: So, Junhyuk, et al.
Published: (2025)

A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks
by: Jung, Hoin, et al.
Published: (2024)

Why Far Looks Up: Probing Spatial Representation in Vision-Language Models
by: Min, Cheolhong, et al.
Published: (2026)

Visual Accommodation: Rethinking Image Scale as a Learnable Variable for Object Detection
by: Seo, Daeun, et al.
Published: (2024)

MedROI: Codec-Agnostic Region of Interest-Centric Compression for Medical Images
by: Kim, Jiwon, et al.
Published: (2026)

ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On
by: Park, Junseo, et al.
Published: (2025)

NViST: In the Wild New View Synthesis from a Single Image with Transformers
by: Jang, Wonbong, et al.
Published: (2023)

Hierarchical Mutual Distillation for Multi-View Fusion: Learning from All Possible View Combinations
by: Yang, Jiwoong, et al.
Published: (2024)

Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models
by: Jang, Sangwon, et al.
Published: (2024)

Decision-Aware Attention Propagation for Vision Transformer Explainability
by: Jo, Sehyeong, et al.
Published: (2026)

Benchmarking DINOv3 for Multi-Task Stroke Analysis on Non-Contrast CT
by: Zhang, Donghao, et al.
Published: (2025)

Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models
by: Jung, Chaeyoung, et al.
Published: (2025)

Keep it SymPL: Symbolic Projective Layout for Allocentric Spatial Reasoning in Vision-Language Models
by: Jang, Jaeyun, et al.
Published: (2026)

Test-Time Augmentation for Pose-invariant Face Recognition
by: Jung, Jaemin, et al.
Published: (2025)

Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation
by: Park, Joonhyung, et al.
Published: (2025)

Patch-wise Graph Contrastive Learning for Image Translation
by: Jung, Chanyong, et al.
Published: (2023)

Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting
by: Jang, Hyeonseo, et al.
Published: (2026)

Domain Generalization Using Large Pretrained Models with Mixture-of-Adapters
by: Lee, Gyuseong, et al.
Published: (2023)

TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models
by: Lee, Hyeongmin, et al.
Published: (2024)

LIDIA: Precise Liver Tumor Diagnosis on Multi-Phase Contrast-Enhanced CT via Iterative Fusion and Asymmetric Contrastive Learning
by: Huang, Wei, et al.
Published: (2024)

Improving Cone-Beam CT Image Quality with Knowledge Distillation-Enhanced Diffusion Model in Imbalanced Data Settings
by: Hwang, Joonil, et al.
Published: (2024)

SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
by: Go, Hyojun, et al.
Published: (2024)

ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images
by: Kim, Sangwook, et al.
Published: (2025)

Finetuning Pre-trained Model with Limited Data for LiDAR-based 3D Object Detection by Bridging Domain Gaps
by: Jang, Jiyun, et al.
Published: (2024)

Descriptive Image-Text Matching with Graded Contextual Similarity
by: Jang, Jinhyun, et al.
Published: (2025)

Q-Drift: Quantization-Aware Drift Correction for Diffusion Model Sampling
by: Ryu, Sooyoung, et al.
Published: (2026)