:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Chun, Sanghyuk, Kim, Wonjae, Park, Song, Yun, Sangdoo
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computer Vision and Pattern Recognition Machine Learning
Accesso online:	https://arxiv.org/abs/2410.18857
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

LongProLIP: A Probabilistic Vision-Language Model with Long Context Text
di: Chun, Sanghyuk, et al.
Pubblicazione: (2025)

Language-only Efficient Training of Zero-shot Composed Image Retrieval
di: Gu, Geonmo, et al.
Pubblicazione: (2023)

Emergence of Text Readability in Vision Language Models
di: Park, Jaeyoo, et al.
Pubblicazione: (2025)

HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
di: Kim, Wonjae, et al.
Pubblicazione: (2024)

Improved Probabilistic Image-Text Representations
di: Chun, Sanghyuk
Pubblicazione: (2023)

CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion
di: Gu, Geonmo, et al.
Pubblicazione: (2023)

Toward Interactive Regional Understanding in Vision-Large Language Models
di: Lee, Jungbeom, et al.
Pubblicazione: (2024)

Multiplicity is an Inevitable and Inherent Challenge in Multimodal Learning
di: Chun, Sanghyuk
Pubblicazione: (2025)

ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO
di: Chun, Sanghyuk, et al.
Pubblicazione: (2022)

Rotary Position Embedding for Vision Transformer
di: Heo, Byeongho, et al.
Pubblicazione: (2024)

DNNs May Determine Major Properties of Their Outputs Early, with Timing Possibly Driven by Bias
di: Park, Song, et al.
Pubblicazione: (2025)

DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation
di: Oh, Changdae, et al.
Pubblicazione: (2024)

Match me if you can: Semi-Supervised Semantic Correspondence Learning with Unpaired Images
di: Kim, Jiwon, et al.
Pubblicazione: (2023)

Masking meets Supervision: A Strong Learning Alliance
di: Heo, Byeongho, et al.
Pubblicazione: (2023)

STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment
di: Lee, Jaewoo, et al.
Pubblicazione: (2023)

Similarity of Neural Architectures using Adversarial Attack Transferability
di: Hwang, Jaehui, et al.
Pubblicazione: (2022)

RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models
di: Park, Seulki, et al.
Pubblicazione: (2023)

RL makes MLLMs see better than SFT
di: Song, Junha, et al.
Pubblicazione: (2025)

Model Stock: All we need is just a few fine-tuned models
di: Jang, Dong-Hwan, et al.
Pubblicazione: (2024)

An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval
di: Byun, Jaeseok, et al.
Pubblicazione: (2024)

Contrastive Localized Language-Image Pre-Training
di: Chen, Hong-You, et al.
Pubblicazione: (2024)

Centered Masking for Language-Image Pre-Training
di: Liang, Mingliang, et al.
Pubblicazione: (2024)

Embedding Geometries of Contrastive Language-Image Pre-Training
di: Chou, Jason Chuan-Chih, et al.
Pubblicazione: (2024)

Extract Free Dense Misalignment from CLIP
di: Nam, JeongYeon, et al.
Pubblicazione: (2024)

Probabilistic Precision and Recall Towards Reliable Evaluation of Generative Models
di: Park, Dogyun, et al.
Pubblicazione: (2023)

A Closer Look at the Robustness of Contrastive Language-Image Pre-Training (CLIP)
di: Tu, Weijie, et al.
Pubblicazione: (2024)

Improving Generative Pre-Training: An In-depth Study of Masked Image Modeling and Denoising Models
di: Choi, Hyesong, et al.
Pubblicazione: (2024)

Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
di: Chen, Yangyi, et al.
Pubblicazione: (2025)

Memory-Efficient Personalization of Text-to-Image Diffusion Models via Selective Optimization Strategies
di: Choi, Seokeon, et al.
Pubblicazione: (2025)

Test-Time Training for Visual Foresight Vision-Language-Action Models
di: Park, Sangwu, et al.
Pubblicazione: (2026)

PARIC: Probabilistic Attention Regularization for Language Guided Image Classification from Pre-trained Vison Language Models
di: Nautiyal, Mayank, et al.
Pubblicazione: (2025)

Visual Pre-Training on Unlabeled Images using Reinforcement Learning
di: Ghosh, Dibya, et al.
Pubblicazione: (2025)

Steering Guidance for Personalized Text-to-Image Diffusion Models
di: Park, Sunghyun, et al.
Pubblicazione: (2025)

Stabilizing Consistency Training: A Flow Map Analysis and Self-Distillation
di: Kim, Youngjoong, et al.
Pubblicazione: (2026)

High-Fidelity Text-to-Image Generation from Pre-Trained Vision-Language Models via Distribution-Conditioned Diffusion Decoding
di: Hong, Ji Woo, et al.
Pubblicazione: (2026)

Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training
di: Zhang, Wenyu, et al.
Pubblicazione: (2024)

Vision-Language Generative Model for View-Specific Chest X-ray Generation
di: Lee, Hyungyung, et al.
Pubblicazione: (2023)

Zero-Shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model
di: Cao, Cong, et al.
Pubblicazione: (2024)

TimePre: Bridging Accuracy, Efficiency, and Stability in Probabilistic Time-Series Forecasting
di: Jiang, Lingyu, et al.
Pubblicazione: (2025)

Open-Vocabulary Panoptic Segmentation Using BERT Pre-Training of Vision-Language Multiway Transformer Model
di: Chen, Yi-Chia, et al.
Pubblicazione: (2024)