Saved in:
| Main Authors: | Park, Jaeyoo, Chun, Sanghyuk, Kim, Wonjae, Yun, Sangdoo, Han, Bohyung |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.19389 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Probabilistic Language-Image Pre-Training
by: Chun, Sanghyuk, et al.
Published: (2024)
by: Chun, Sanghyuk, et al.
Published: (2024)
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
by: Kim, Wonjae, et al.
Published: (2024)
by: Kim, Wonjae, et al.
Published: (2024)
LongProLIP: A Probabilistic Vision-Language Model with Long Context Text
by: Chun, Sanghyuk, et al.
Published: (2025)
by: Chun, Sanghyuk, et al.
Published: (2025)
Toward Interactive Regional Understanding in Vision-Large Language Models
by: Lee, Jungbeom, et al.
Published: (2024)
by: Lee, Jungbeom, et al.
Published: (2024)
Language-only Efficient Training of Zero-shot Composed Image Retrieval
by: Gu, Geonmo, et al.
Published: (2023)
by: Gu, Geonmo, et al.
Published: (2023)
Cross-Class Feature Augmentation for Class Incremental Learning
by: Kim, Taehoon, et al.
Published: (2023)
by: Kim, Taehoon, et al.
Published: (2023)
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding
by: Park, Jaeyoo, et al.
Published: (2024)
by: Park, Jaeyoo, et al.
Published: (2024)
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion
by: Gu, Geonmo, et al.
Published: (2023)
by: Gu, Geonmo, et al.
Published: (2023)
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models
by: Park, Seulki, et al.
Published: (2023)
by: Park, Seulki, et al.
Published: (2023)
An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval
by: Byun, Jaeseok, et al.
Published: (2024)
by: Byun, Jaeseok, et al.
Published: (2024)
ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO
by: Chun, Sanghyuk, et al.
Published: (2022)
by: Chun, Sanghyuk, et al.
Published: (2022)
Improved Probabilistic Image-Text Representations
by: Chun, Sanghyuk
Published: (2023)
by: Chun, Sanghyuk
Published: (2023)
PhysGaia: A Physics-Aware Benchmark with Multi-Body Interactions for Dynamic Novel View Synthesis
by: Kim, Mijeong, et al.
Published: (2025)
by: Kim, Mijeong, et al.
Published: (2025)
Learning with Unmasked Tokens Drives Stronger Vision Learners
by: Kim, Taekyung, et al.
Published: (2023)
by: Kim, Taekyung, et al.
Published: (2023)
Rotary Position Embedding for Vision Transformer
by: Heo, Byeongho, et al.
Published: (2024)
by: Heo, Byeongho, et al.
Published: (2024)
Token Bottleneck: One Token to Remember Dynamics
by: Kim, Taekyung, et al.
Published: (2025)
by: Kim, Taekyung, et al.
Published: (2025)
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
by: Park, Yong-Hyun, et al.
Published: (2024)
by: Park, Yong-Hyun, et al.
Published: (2024)
Multiplicity is an Inevitable and Inherent Challenge in Multimodal Learning
by: Chun, Sanghyuk
Published: (2025)
by: Chun, Sanghyuk
Published: (2025)
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs
by: Kim, Minji, et al.
Published: (2025)
by: Kim, Minji, et al.
Published: (2025)
GP-4DGS: Probabilistic 4D Gaussian Splatting from Monocular Video via Variational Gaussian Processes
by: Kim, Mijeong, et al.
Published: (2026)
by: Kim, Mijeong, et al.
Published: (2026)
DNNs May Determine Major Properties of Their Outputs Early, with Timing Possibly Driven by Bias
by: Park, Song, et al.
Published: (2025)
by: Park, Song, et al.
Published: (2025)
FIFO-Diffusion: Generating Infinite Videos from Text without Training
by: Kim, Jihwan, et al.
Published: (2024)
by: Kim, Jihwan, et al.
Published: (2024)
Towards Calibrated Robust Fine-Tuning of Vision-Language Models
by: Oh, Changdae, et al.
Published: (2023)
by: Oh, Changdae, et al.
Published: (2023)
Leveraging Temporal Contextualization for Video Action Recognition
by: Kim, Minji, et al.
Published: (2024)
by: Kim, Minji, et al.
Published: (2024)
Masking meets Supervision: A Strong Learning Alliance
by: Heo, Byeongho, et al.
Published: (2023)
by: Heo, Byeongho, et al.
Published: (2023)
Model Stock: All we need is just a few fine-tuned models
by: Jang, Dong-Hwan, et al.
Published: (2024)
by: Jang, Dong-Hwan, et al.
Published: (2024)
Learning to See What You Need: Gaze Attention for Multimodal Large Language Models
by: Song, Junha, et al.
Published: (2026)
by: Song, Junha, et al.
Published: (2026)
Read, Watch and Scream! Sound Generation from Text and Video
by: Jeong, Yujin, et al.
Published: (2024)
by: Jeong, Yujin, et al.
Published: (2024)
Match me if you can: Semi-Supervised Semantic Correspondence Learning with Unpaired Images
by: Kim, Jiwon, et al.
Published: (2023)
by: Kim, Jiwon, et al.
Published: (2023)
4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization
by: Kim, Mijeong, et al.
Published: (2024)
by: Kim, Mijeong, et al.
Published: (2024)
CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition
by: Semnani, Sina J., et al.
Published: (2025)
by: Semnani, Sina J., et al.
Published: (2025)
Mitigating Cross-Image Information Leakage in LVLMs for Multi-Image Tasks
by: Park, Yeji, et al.
Published: (2025)
by: Park, Yeji, et al.
Published: (2025)
TextGuider: Training-Free Guidance for Text Rendering via Attention Alignment
by: Baek, Kanghyun, et al.
Published: (2025)
by: Baek, Kanghyun, et al.
Published: (2025)
Fine-Grained Captioning of Long Videos through Scene Graph Consolidation
by: Chu, Sanghyeok, et al.
Published: (2025)
by: Chu, Sanghyeok, et al.
Published: (2025)
Diffusion-Based Conditional Image Editing through Optimized Inference with Guidance
by: Lee, Hyunsoo, et al.
Published: (2024)
by: Lee, Hyunsoo, et al.
Published: (2024)
Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation
by: Lee, Junsung, et al.
Published: (2024)
by: Lee, Junsung, et al.
Published: (2024)
Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation
by: Kwak, Min-Seop, et al.
Published: (2025)
by: Kwak, Min-Seop, et al.
Published: (2025)
ChimeraLoRA: Multi-Head LoRA-Guided Synthetic Datasets
by: Kim, Hoyoung, et al.
Published: (2026)
by: Kim, Hoyoung, et al.
Published: (2026)
ODGS: 3D Scene Reconstruction from Omnidirectional Images with 3D Gaussian Splattings
by: Lee, Suyoung, et al.
Published: (2024)
by: Lee, Suyoung, et al.
Published: (2024)
Merge and Bound: Direct Manipulations on Weights for Class Incremental Learning
by: Kim, Taehoon, et al.
Published: (2025)
by: Kim, Taehoon, et al.
Published: (2025)
Similar Items
-
Probabilistic Language-Image Pre-Training
by: Chun, Sanghyuk, et al.
Published: (2024) -
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
by: Kim, Wonjae, et al.
Published: (2024) -
LongProLIP: A Probabilistic Vision-Language Model with Long Context Text
by: Chun, Sanghyuk, et al.
Published: (2025) -
Toward Interactive Regional Understanding in Vision-Large Language Models
by: Lee, Jungbeom, et al.
Published: (2024) -
Language-only Efficient Training of Zero-shot Composed Image Retrieval
by: Gu, Geonmo, et al.
Published: (2023)