:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Karaca, Ali Can, Ozelbas, M. Enes, Berber, Saadettin, Karimli, Orkhan, Yildirim, Turabi, Amasyali, M. Fatih
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning Multimedia
Online Access:	https://arxiv.org/abs/2501.10075
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Challenging Dataset and Multi-modal Gated Mixture of Experts Model for Remote Sensing Copy-Move Forgery Understanding
by: Zhang, Ze, et al.
Published: (2025)

MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption
by: Liu, Ruixun, et al.
Published: (2024)

Hallucination Localization in Video Captioning
by: Nakada, Shota, et al.
Published: (2025)

Cap2Sum: Learning to Summarize Videos by Generating Captions
by: Zhao, Cairong, et al.
Published: (2024)

NewsCaption: Named-Entity aware Captioning for Out-of-Context Media
by: Singh, Anurag, et al.
Published: (2024)

What Media Frames Reveal About Stance: A Dataset and Study about Memes in Climate Change Discourse
by: Zhou, Shijia, et al.
Published: (2025)

Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation
by: Ye, Chengyang, et al.
Published: (2024)

Voices, Faces, and Feelings: Multi-modal Emotion-Cognition Captioning for Mental Health Understanding
by: Zhou, Zhiyuan, et al.
Published: (2026)

Multi Agents Semantic Emotion Aligned Music to Image Generation with Music Derived Captions
by: Shi, Junchang, et al.
Published: (2025)

Unsupervised Ego- and Exo-centric Dense Procedural Activity Captioning via Gaze Consensus Adaptation
by: Shi, Zhaofeng, et al.
Published: (2025)

Copy-Move Forgery Detection and Question Answering for Remote Sensing Image
by: Zhang, Ze, et al.
Published: (2024)

AeroLite: Tag-Guided Lightweight Generation of Aerial Image Captions
by: Zi, Xing, et al.
Published: (2025)

SVD: Spatial Video Dataset
by: Izadimehr, M. H., et al.
Published: (2025)

MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning
by: Truong, Quang-Trung, et al.
Published: (2025)

Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance
by: Zhu, Yongshuo, et al.
Published: (2024)

Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images
by: Yuan, Bo, et al.
Published: (2024)

A Benchmark for Ultra-High-Resolution Remote Sensing MLLMs
by: Dang, Yunkai, et al.
Published: (2025)

SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection
by: Li, Yuxuan, et al.
Published: (2024)

Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval
by: Yang, Rui, et al.
Published: (2024)

Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
by: Ning, Hailong, et al.
Published: (2025)

Change Detection Between Optical Remote Sensing Imagery and Map Data via Segment Anything Model (SAM)
by: Chen, Hongruixuan, et al.
Published: (2024)

Visual and Text Prompt Segmentation: A Novel Multi-Model Framework for Remote Sensing
by: Zi, Xing, et al.
Published: (2025)

PolySmart @ TRECVid 2024 Video Captioning (VTT)
by: Wu, Jiaxin, et al.
Published: (2024)

Resource-Efficient Reference-Free Evaluation of Audio Captions
by: Mahfuz, Rehana, et al.
Published: (2024)

Mitigating Image Captioning Hallucinations in Vision-Language Models
by: Zhao, Fei, et al.
Published: (2025)

TPIFM: A Task-Aware Model for Evaluating Perceptual Interaction Fluency in Remote AR Collaboration
by: Song, Jiarun, et al.
Published: (2026)

PMPGuard: Catching Pseudo-Matched Pairs in Remote Sensing Image-Text Retrieval
by: Ouyang, Pengxiang, et al.
Published: (2025)

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
by: Mei, Xinhao, et al.
Published: (2023)

ProCap: Projection-Aware Captioning for Spatial Augmented Reality
by: Cao, Zimo, et al.
Published: (2026)

Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
by: Yuan, Yi, et al.
Published: (2024)

OneDiff: A Generalist Model for Image Difference Captioning
by: Hu, Erdong, et al.
Published: (2024)

Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
by: Song, Zijie, et al.
Published: (2023)

Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation
by: Gkoumas, Dimitris, et al.
Published: (2024)

RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
by: Zhang, Zilun, et al.
Published: (2023)

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
by: Yang, Chenglin, et al.
Published: (2023)

Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
by: Wang, Jiuniu, et al.
Published: (2025)

Reply with Sticker: New Dataset and Model for Sticker Retrieval
by: Liang, Bin, et al.
Published: (2024)

EMID: An Emotional Aligned Dataset in Audio-Visual Modality
by: Zou, Jialing, et al.
Published: (2023)

Understanding What Is Not Said:Referring Remote Sensing Image Segmentation with Scarce Expressions
by: Ye, Kai, et al.
Published: (2025)

Edit As You Wish: Video Caption Editing with Multi-grained User Control
by: Yao, Linli, et al.
Published: (2023)