Guardado en:
| Autores principales: | Lee, Soeun, Kim, Si-Woo, Kim, Taewhan, Kim, Dong-Jin |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2409.18046 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning
por: Kim, Taewhan, et al.
Publicado: (2024)
por: Kim, Taewhan, et al.
Publicado: (2024)
SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
por: Kim, Si-Woo, et al.
Publicado: (2025)
por: Kim, Si-Woo, et al.
Publicado: (2025)
SIDA: Synthetic Image Driven Zero-shot Domain Adaptation
por: Kim, Ye-Chan, et al.
Publicado: (2025)
por: Kim, Ye-Chan, et al.
Publicado: (2025)
SAIL: Similarity-Aware Guidance and Inter-Caption Augmentation-based Learning for Weakly-Supervised Dense Video Captioning
por: Kim, Ye-Chan, et al.
Publicado: (2026)
por: Kim, Ye-Chan, et al.
Publicado: (2026)
Sali4Vid: Saliency-Aware Video Reweighting and Adaptive Caption Retrieval for Dense Video Captioning
por: Jeon, MinJu, et al.
Publicado: (2025)
por: Jeon, MinJu, et al.
Publicado: (2025)
CIC: A Framework for Culturally-Aware Image Captioning
por: Yun, Youngsik, et al.
Publicado: (2024)
por: Yun, Youngsik, et al.
Publicado: (2024)
EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations
por: Kim, Hyunjong, et al.
Publicado: (2025)
por: Kim, Hyunjong, et al.
Publicado: (2025)
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
por: Kim, Wonkyun, et al.
Publicado: (2024)
por: Kim, Wonkyun, et al.
Publicado: (2024)
ChartCap: Mitigating Hallucination of Dense Chart Captioning
por: Lim, Junyoung, et al.
Publicado: (2025)
por: Lim, Junyoung, et al.
Publicado: (2025)
Modality-Aware Representation Learning for Zero-shot Sketch-based Image Retrieval
por: Lyou, Eunyi, et al.
Publicado: (2024)
por: Lyou, Eunyi, et al.
Publicado: (2024)
Unifying Vision-Language Latents for Zero-label Image Caption Enhancement
por: Byun, Sanghyun, et al.
Publicado: (2025)
por: Byun, Sanghyun, et al.
Publicado: (2025)
Zero-shot Text-guided Infinite Image Synthesis with LLM guidance
por: Kwon, Soyeong, et al.
Publicado: (2024)
por: Kwon, Soyeong, et al.
Publicado: (2024)
See It All: Contextualized Late Aggregation for 3D Dense Captioning
por: Kim, Minjung, et al.
Publicado: (2024)
por: Kim, Minjung, et al.
Publicado: (2024)
Completely Weakly Supervised Class-Incremental Learning for Semantic Segmentation
por: Kim, David Minkwan, et al.
Publicado: (2025)
por: Kim, David Minkwan, et al.
Publicado: (2025)
Video Summarization: Towards Entity-Aware Captions
por: Ayyubi, Hammad A., et al.
Publicado: (2023)
por: Ayyubi, Hammad A., et al.
Publicado: (2023)
Towards Retrieval-Augmented Architectures for Image Captioning
por: Sarto, Sara, et al.
Publicado: (2024)
por: Sarto, Sara, et al.
Publicado: (2024)
Decoding fMRI Data into Captions using Prefix Language Modeling
por: Shen, Vyacheslav, et al.
Publicado: (2025)
por: Shen, Vyacheslav, et al.
Publicado: (2025)
Knowledge Generation for Zero-shot Knowledge-based VQA
por: Cao, Rui, et al.
Publicado: (2024)
por: Cao, Rui, et al.
Publicado: (2024)
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
por: Oh, Youngtaek, et al.
Publicado: (2024)
por: Oh, Youngtaek, et al.
Publicado: (2024)
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
por: Li, Wenyan, et al.
Publicado: (2024)
por: Li, Wenyan, et al.
Publicado: (2024)
LITTA: Late-Interaction and Test-Time Alignment for Visually-Grounded Multimodal Retrieval
por: Kim, Seonok
Publicado: (2026)
por: Kim, Seonok
Publicado: (2026)
Text-only Synthesis for Image Captioning
por: Zhou, Qing, et al.
Publicado: (2024)
por: Zhou, Qing, et al.
Publicado: (2024)
The Role of Data Curation in Image Captioning
por: Li, Wenyan, et al.
Publicado: (2023)
por: Li, Wenyan, et al.
Publicado: (2023)
Contrastive Language Prompting to Ease False Positives in Medical Anomaly Detection
por: Park, YeongHyeon, et al.
Publicado: (2024)
por: Park, YeongHyeon, et al.
Publicado: (2024)
LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles
por: Ng, Ho Yin 'Sam', et al.
Publicado: (2025)
por: Ng, Ho Yin 'Sam', et al.
Publicado: (2025)
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues
por: Kim, Youngmin, et al.
Publicado: (2025)
por: Kim, Youngmin, et al.
Publicado: (2025)
EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning
por: Zhang, Junzhe, et al.
Publicado: (2024)
por: Zhang, Junzhe, et al.
Publicado: (2024)
From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation
por: Gondal, Moazzam Umer, et al.
Publicado: (2025)
por: Gondal, Moazzam Umer, et al.
Publicado: (2025)
Personalized Scientific Figure Caption Generation: An Empirical Study on Author-Specific Writing Style Transfer
por: Kim, Jaeyoung, et al.
Publicado: (2025)
por: Kim, Jaeyoung, et al.
Publicado: (2025)
MATE: Meet At The Embedding -- Connecting Images with Long Texts
por: Jang, Young Kyun, et al.
Publicado: (2024)
por: Jang, Young Kyun, et al.
Publicado: (2024)
InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment
por: Long, Yuxing, et al.
Publicado: (2024)
por: Long, Yuxing, et al.
Publicado: (2024)
Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task
por: Dhawan, Aashish, et al.
Publicado: (2026)
por: Dhawan, Aashish, et al.
Publicado: (2026)
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
por: Xing, Long, et al.
Publicado: (2025)
por: Xing, Long, et al.
Publicado: (2025)
Generating Accurate and Detailed Captions for High-Resolution Images
por: Lee, Hankyeol, et al.
Publicado: (2025)
por: Lee, Hankyeol, et al.
Publicado: (2025)
FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model
por: Lee, Yebin, et al.
Publicado: (2024)
por: Lee, Yebin, et al.
Publicado: (2024)
Multi-LLM Collaborative Caption Generation in Scientific Documents
por: Kim, Jaeyoung, et al.
Publicado: (2025)
por: Kim, Jaeyoung, et al.
Publicado: (2025)
CAPEEN: Image Captioning with Early Exits and Knowledge Distillation
por: Bajpai, Divya Jyoti, et al.
Publicado: (2024)
por: Bajpai, Divya Jyoti, et al.
Publicado: (2024)
Temporal Image Caption Retrieval Competition -- Description and Results
por: Pokrywka, Jakub, et al.
Publicado: (2024)
por: Pokrywka, Jakub, et al.
Publicado: (2024)
Text Change Detection in Multilingual Documents Using Image Comparison
por: Park, Doyoung, et al.
Publicado: (2024)
por: Park, Doyoung, et al.
Publicado: (2024)
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning
por: Tu, Yunbin, et al.
Publicado: (2024)
por: Tu, Yunbin, et al.
Publicado: (2024)
Ejemplares similares
-
ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning
por: Kim, Taewhan, et al.
Publicado: (2024) -
SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
por: Kim, Si-Woo, et al.
Publicado: (2025) -
SIDA: Synthetic Image Driven Zero-shot Domain Adaptation
por: Kim, Ye-Chan, et al.
Publicado: (2025) -
SAIL: Similarity-Aware Guidance and Inter-Caption Augmentation-based Learning for Weakly-Supervised Dense Video Captioning
por: Kim, Ye-Chan, et al.
Publicado: (2026) -
Sali4Vid: Saliency-Aware Video Reweighting and Adaptive Caption Retrieval for Dense Video Captioning
por: Jeon, MinJu, et al.
Publicado: (2025)