Saved in:
| Main Authors: | Jung, Daeun, Jang, Jaehyeok, Jang, Sooyoung, Park, Yu Rang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.13277 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Bidirectional Multimodal Prompt Learning with Scale-Aware Training for Few-Shot Multi-Class Anomaly Detection
by: Lee, Yujin, et al.
Published: (2024)
by: Lee, Yujin, et al.
Published: (2024)
AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding
by: Jung, Chaeyoung, et al.
Published: (2025)
by: Jung, Chaeyoung, et al.
Published: (2025)
ContactGen: Contact-Guided Interactive 3D Human Generation for Partners
by: Gu, Dongjun, et al.
Published: (2024)
by: Gu, Dongjun, et al.
Published: (2024)
I2AM: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
by: Park, Junseo, et al.
Published: (2024)
by: Park, Junseo, et al.
Published: (2024)
Learning Relative Representations for Fine-Grained Multimodal Alignment with Limited Data
by: Kim, Shiwon, et al.
Published: (2026)
by: Kim, Shiwon, et al.
Published: (2026)
ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts
by: Choi, Sangbum, et al.
Published: (2025)
by: Choi, Sangbum, et al.
Published: (2025)
GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability
by: Jo, Sehyeong, et al.
Published: (2025)
by: Jo, Sehyeong, et al.
Published: (2025)
MINT: Molecularly Informed Training with Spatial Transcriptomics Supervision for Pathology Foundation Models
by: Lee, Minsoo, et al.
Published: (2026)
by: Lee, Minsoo, et al.
Published: (2026)
Repurposing Geometric Foundation Models for Multi-view Diffusion
by: Jang, Wooseok, et al.
Published: (2026)
by: Jang, Wooseok, et al.
Published: (2026)
Learning Object-Centric Representations in SAR Images with Multi-Level Feature Fusion
by: Jang, Oh-Tae, et al.
Published: (2025)
by: Jang, Oh-Tae, et al.
Published: (2025)
Model Agnostic Preference Optimization for Medical Image Segmentation
by: Nam, Yunseong, et al.
Published: (2025)
by: Nam, Yunseong, et al.
Published: (2025)
MetaRanker: Human-in-the-loop Active Ranking for Metalens Image Quality
by: Park, Yujin, et al.
Published: (2026)
by: Park, Yujin, et al.
Published: (2026)
StyleForge: Enhancing Text-to-Image Synthesis for Any Artistic Styles with Dual Binding
by: Park, Junseo, et al.
Published: (2024)
by: Park, Junseo, et al.
Published: (2024)
DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
by: Shim, Jaehyeok, et al.
Published: (2024)
by: Shim, Jaehyeok, et al.
Published: (2024)
PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models
by: So, Junhyuk, et al.
Published: (2025)
by: So, Junhyuk, et al.
Published: (2025)
A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks
by: Jung, Hoin, et al.
Published: (2024)
by: Jung, Hoin, et al.
Published: (2024)
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models
by: Min, Cheolhong, et al.
Published: (2026)
by: Min, Cheolhong, et al.
Published: (2026)
Visual Accommodation: Rethinking Image Scale as a Learnable Variable for Object Detection
by: Seo, Daeun, et al.
Published: (2024)
by: Seo, Daeun, et al.
Published: (2024)
MedROI: Codec-Agnostic Region of Interest-Centric Compression for Medical Images
by: Kim, Jiwon, et al.
Published: (2026)
by: Kim, Jiwon, et al.
Published: (2026)
ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On
by: Park, Junseo, et al.
Published: (2025)
by: Park, Junseo, et al.
Published: (2025)
NViST: In the Wild New View Synthesis from a Single Image with Transformers
by: Jang, Wonbong, et al.
Published: (2023)
by: Jang, Wonbong, et al.
Published: (2023)
Hierarchical Mutual Distillation for Multi-View Fusion: Learning from All Possible View Combinations
by: Yang, Jiwoong, et al.
Published: (2024)
by: Yang, Jiwoong, et al.
Published: (2024)
Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models
by: Jang, Sangwon, et al.
Published: (2024)
by: Jang, Sangwon, et al.
Published: (2024)
Decision-Aware Attention Propagation for Vision Transformer Explainability
by: Jo, Sehyeong, et al.
Published: (2026)
by: Jo, Sehyeong, et al.
Published: (2026)
Benchmarking DINOv3 for Multi-Task Stroke Analysis on Non-Contrast CT
by: Zhang, Donghao, et al.
Published: (2025)
by: Zhang, Donghao, et al.
Published: (2025)
Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models
by: Jung, Chaeyoung, et al.
Published: (2025)
by: Jung, Chaeyoung, et al.
Published: (2025)
Keep it SymPL: Symbolic Projective Layout for Allocentric Spatial Reasoning in Vision-Language Models
by: Jang, Jaeyun, et al.
Published: (2026)
by: Jang, Jaeyun, et al.
Published: (2026)
Test-Time Augmentation for Pose-invariant Face Recognition
by: Jung, Jaemin, et al.
Published: (2025)
by: Jung, Jaemin, et al.
Published: (2025)
Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation
by: Park, Joonhyung, et al.
Published: (2025)
by: Park, Joonhyung, et al.
Published: (2025)
Patch-wise Graph Contrastive Learning for Image Translation
by: Jung, Chanyong, et al.
Published: (2023)
by: Jung, Chanyong, et al.
Published: (2023)
Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting
by: Jang, Hyeonseo, et al.
Published: (2026)
by: Jang, Hyeonseo, et al.
Published: (2026)
Domain Generalization Using Large Pretrained Models with Mixture-of-Adapters
by: Lee, Gyuseong, et al.
Published: (2023)
by: Lee, Gyuseong, et al.
Published: (2023)
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models
by: Lee, Hyeongmin, et al.
Published: (2024)
by: Lee, Hyeongmin, et al.
Published: (2024)
LIDIA: Precise Liver Tumor Diagnosis on Multi-Phase Contrast-Enhanced CT via Iterative Fusion and Asymmetric Contrastive Learning
by: Huang, Wei, et al.
Published: (2024)
by: Huang, Wei, et al.
Published: (2024)
Improving Cone-Beam CT Image Quality with Knowledge Distillation-Enhanced Diffusion Model in Imbalanced Data Settings
by: Hwang, Joonil, et al.
Published: (2024)
by: Hwang, Joonil, et al.
Published: (2024)
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
by: Go, Hyojun, et al.
Published: (2024)
by: Go, Hyojun, et al.
Published: (2024)
ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images
by: Kim, Sangwook, et al.
Published: (2025)
by: Kim, Sangwook, et al.
Published: (2025)
Finetuning Pre-trained Model with Limited Data for LiDAR-based 3D Object Detection by Bridging Domain Gaps
by: Jang, Jiyun, et al.
Published: (2024)
by: Jang, Jiyun, et al.
Published: (2024)
Descriptive Image-Text Matching with Graded Contextual Similarity
by: Jang, Jinhyun, et al.
Published: (2025)
by: Jang, Jinhyun, et al.
Published: (2025)
Q-Drift: Quantization-Aware Drift Correction for Diffusion Model Sampling
by: Ryu, Sooyoung, et al.
Published: (2026)
by: Ryu, Sooyoung, et al.
Published: (2026)
Similar Items
-
Bidirectional Multimodal Prompt Learning with Scale-Aware Training for Few-Shot Multi-Class Anomaly Detection
by: Lee, Yujin, et al.
Published: (2024) -
AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding
by: Jung, Chaeyoung, et al.
Published: (2025) -
ContactGen: Contact-Guided Interactive 3D Human Generation for Partners
by: Gu, Dongjun, et al.
Published: (2024) -
I2AM: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
by: Park, Junseo, et al.
Published: (2024) -
Learning Relative Representations for Fine-Grained Multimodal Alignment with Limited Data
by: Kim, Shiwon, et al.
Published: (2026)