Saved in:
| Main Authors: | Karaca, Ali Can, Ozelbas, M. Enes, Berber, Saadettin, Karimli, Orkhan, Yildirim, Turabi, Amasyali, M. Fatih |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.10075 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Challenging Dataset and Multi-modal Gated Mixture of Experts Model for Remote Sensing Copy-Move Forgery Understanding
by: Zhang, Ze, et al.
Published: (2025)
by: Zhang, Ze, et al.
Published: (2025)
MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption
by: Liu, Ruixun, et al.
Published: (2024)
by: Liu, Ruixun, et al.
Published: (2024)
Hallucination Localization in Video Captioning
by: Nakada, Shota, et al.
Published: (2025)
by: Nakada, Shota, et al.
Published: (2025)
Cap2Sum: Learning to Summarize Videos by Generating Captions
by: Zhao, Cairong, et al.
Published: (2024)
by: Zhao, Cairong, et al.
Published: (2024)
NewsCaption: Named-Entity aware Captioning for Out-of-Context Media
by: Singh, Anurag, et al.
Published: (2024)
by: Singh, Anurag, et al.
Published: (2024)
What Media Frames Reveal About Stance: A Dataset and Study about Memes in Climate Change Discourse
by: Zhou, Shijia, et al.
Published: (2025)
by: Zhou, Shijia, et al.
Published: (2025)
Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation
by: Ye, Chengyang, et al.
Published: (2024)
by: Ye, Chengyang, et al.
Published: (2024)
Voices, Faces, and Feelings: Multi-modal Emotion-Cognition Captioning for Mental Health Understanding
by: Zhou, Zhiyuan, et al.
Published: (2026)
by: Zhou, Zhiyuan, et al.
Published: (2026)
Multi Agents Semantic Emotion Aligned Music to Image Generation with Music Derived Captions
by: Shi, Junchang, et al.
Published: (2025)
by: Shi, Junchang, et al.
Published: (2025)
Unsupervised Ego- and Exo-centric Dense Procedural Activity Captioning via Gaze Consensus Adaptation
by: Shi, Zhaofeng, et al.
Published: (2025)
by: Shi, Zhaofeng, et al.
Published: (2025)
Copy-Move Forgery Detection and Question Answering for Remote Sensing Image
by: Zhang, Ze, et al.
Published: (2024)
by: Zhang, Ze, et al.
Published: (2024)
AeroLite: Tag-Guided Lightweight Generation of Aerial Image Captions
by: Zi, Xing, et al.
Published: (2025)
by: Zi, Xing, et al.
Published: (2025)
SVD: Spatial Video Dataset
by: Izadimehr, M. H., et al.
Published: (2025)
by: Izadimehr, M. H., et al.
Published: (2025)
MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning
by: Truong, Quang-Trung, et al.
Published: (2025)
by: Truong, Quang-Trung, et al.
Published: (2025)
Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance
by: Zhu, Yongshuo, et al.
Published: (2024)
by: Zhu, Yongshuo, et al.
Published: (2024)
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images
by: Yuan, Bo, et al.
Published: (2024)
by: Yuan, Bo, et al.
Published: (2024)
A Benchmark for Ultra-High-Resolution Remote Sensing MLLMs
by: Dang, Yunkai, et al.
Published: (2025)
by: Dang, Yunkai, et al.
Published: (2025)
SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection
by: Li, Yuxuan, et al.
Published: (2024)
by: Li, Yuxuan, et al.
Published: (2024)
Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval
by: Yang, Rui, et al.
Published: (2024)
by: Yang, Rui, et al.
Published: (2024)
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
by: Ning, Hailong, et al.
Published: (2025)
by: Ning, Hailong, et al.
Published: (2025)
Change Detection Between Optical Remote Sensing Imagery and Map Data via Segment Anything Model (SAM)
by: Chen, Hongruixuan, et al.
Published: (2024)
by: Chen, Hongruixuan, et al.
Published: (2024)
Visual and Text Prompt Segmentation: A Novel Multi-Model Framework for Remote Sensing
by: Zi, Xing, et al.
Published: (2025)
by: Zi, Xing, et al.
Published: (2025)
PolySmart @ TRECVid 2024 Video Captioning (VTT)
by: Wu, Jiaxin, et al.
Published: (2024)
by: Wu, Jiaxin, et al.
Published: (2024)
Resource-Efficient Reference-Free Evaluation of Audio Captions
by: Mahfuz, Rehana, et al.
Published: (2024)
by: Mahfuz, Rehana, et al.
Published: (2024)
Mitigating Image Captioning Hallucinations in Vision-Language Models
by: Zhao, Fei, et al.
Published: (2025)
by: Zhao, Fei, et al.
Published: (2025)
TPIFM: A Task-Aware Model for Evaluating Perceptual Interaction Fluency in Remote AR Collaboration
by: Song, Jiarun, et al.
Published: (2026)
by: Song, Jiarun, et al.
Published: (2026)
PMPGuard: Catching Pseudo-Matched Pairs in Remote Sensing Image-Text Retrieval
by: Ouyang, Pengxiang, et al.
Published: (2025)
by: Ouyang, Pengxiang, et al.
Published: (2025)
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
by: Mei, Xinhao, et al.
Published: (2023)
by: Mei, Xinhao, et al.
Published: (2023)
ProCap: Projection-Aware Captioning for Spatial Augmented Reality
by: Cao, Zimo, et al.
Published: (2026)
by: Cao, Zimo, et al.
Published: (2026)
Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
by: Yuan, Yi, et al.
Published: (2024)
by: Yuan, Yi, et al.
Published: (2024)
OneDiff: A Generalist Model for Image Difference Captioning
by: Hu, Erdong, et al.
Published: (2024)
by: Hu, Erdong, et al.
Published: (2024)
Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
by: Song, Zijie, et al.
Published: (2023)
by: Song, Zijie, et al.
Published: (2023)
Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation
by: Gkoumas, Dimitris, et al.
Published: (2024)
by: Gkoumas, Dimitris, et al.
Published: (2024)
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
by: Zhang, Zilun, et al.
Published: (2023)
by: Zhang, Zilun, et al.
Published: (2023)
IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
by: Yang, Chenglin, et al.
Published: (2023)
by: Yang, Chenglin, et al.
Published: (2023)
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
by: Wang, Jiuniu, et al.
Published: (2025)
by: Wang, Jiuniu, et al.
Published: (2025)
Reply with Sticker: New Dataset and Model for Sticker Retrieval
by: Liang, Bin, et al.
Published: (2024)
by: Liang, Bin, et al.
Published: (2024)
EMID: An Emotional Aligned Dataset in Audio-Visual Modality
by: Zou, Jialing, et al.
Published: (2023)
by: Zou, Jialing, et al.
Published: (2023)
Understanding What Is Not Said:Referring Remote Sensing Image Segmentation with Scarce Expressions
by: Ye, Kai, et al.
Published: (2025)
by: Ye, Kai, et al.
Published: (2025)
Edit As You Wish: Video Caption Editing with Multi-grained User Control
by: Yao, Linli, et al.
Published: (2023)
by: Yao, Linli, et al.
Published: (2023)
Similar Items
-
Challenging Dataset and Multi-modal Gated Mixture of Experts Model for Remote Sensing Copy-Move Forgery Understanding
by: Zhang, Ze, et al.
Published: (2025) -
MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption
by: Liu, Ruixun, et al.
Published: (2024) -
Hallucination Localization in Video Captioning
by: Nakada, Shota, et al.
Published: (2025) -
Cap2Sum: Learning to Summarize Videos by Generating Captions
by: Zhao, Cairong, et al.
Published: (2024) -
NewsCaption: Named-Entity aware Captioning for Out-of-Context Media
by: Singh, Anurag, et al.
Published: (2024)