:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hu, Jia Cheng, Cavicchioli, Roberto, Capotondi, Alessandro
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2408.13963
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning
by: Hu, Jia Cheng, et al.
Published: (2022)

Diffusion Is Your Friend in Show, Suggest and Tell
by: Hu, Jia Cheng, et al.
Published: (2025)

Windowed-FourierMixer: Enhancing Clutter-Free Room Modeling with Fourier Transform
by: Henriques, Bruno, et al.
Published: (2024)

Bidirectional Awareness Induction in Autoregressive Seq2Seq Models
by: Hu, Jia Cheng, et al.
Published: (2024)

Heterogeneous Encoders Scaling In The Transformer For Neural Machine Translation
by: Hu, Jia Cheng, et al.
Published: (2023)

Dual-Stream Collaborative Transformer for Image Captioning
by: Wan, Jun, et al.
Published: (2026)

Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
by: Song, Zijie, et al.
Published: (2023)

DEVICE: Depth and Visual Concepts Aware Transformer for OCR-based Image Captioning
by: Xu, Dongsheng, et al.
Published: (2023)

Automated Image Captioning with CNNs and Transformers
by: Cahyono, Joshua Adrian, et al.
Published: (2024)

UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models
by: Van Nguyen, Quan, et al.
Published: (2024)

FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation
by: Ge, Yunyang, et al.
Published: (2025)

AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration
by: Cai, Hongyi, et al.
Published: (2024)

Brazilian Portuguese Image Captioning with Transformers: A Study on Cross-Native-Translated Dataset
by: Bromonschenkel, Gabriel, et al.
Published: (2026)

CaptionQA: Is Your Caption as Useful as the Image Itself?
by: Yang, Shijia, et al.
Published: (2025)

Transformer based Multitask Learning for Image Captioning and Object Detection
by: Basak, Debolena, et al.
Published: (2024)

HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction
by: Byeon, Keunho, et al.
Published: (2026)

CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2025)

A Lightweight Sparse Focus Transformer for Remote Sensing Image Change Captioning
by: Sun, Dongwei, et al.
Published: (2024)

Analyzing Transformer Models and Knowledge Distillation Approaches for Image Captioning on Edge AI
by: Kwok, Wing Man Casca, et al.
Published: (2025)

Lightweight Vision Transformer with Window and Spatial Attention for Food Image Classification
by: Gao, Xinle, et al.
Published: (2025)

Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
by: Cheng, Sheng, et al.
Published: (2024)

A Fourier Transform Framework for Domain Adaptation
by: Luo, Le, et al.
Published: (2024)

SwinShadow: Shifted Window for Ambiguous Adjacent Shadow Detection
by: Wang, Yonghui, et al.
Published: (2024)

Image Captioning via Compact Bidirectional Architecture
by: Song, Zijie, et al.
Published: (2022)

SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning
by: Zhang, Lin, et al.
Published: (2025)

RMT: Retentive Networks Meet Vision Transformers
by: Fan, Qihang, et al.
Published: (2023)

TransRAD: Retentive Vision Transformer for Enhanced Radar Object Detection
by: Cheng, Lei, et al.
Published: (2025)

A Novel Lightweight Transformer with Edge-Aware Fusion for Remote Sensing Image Captioning
by: Das, Swadhin, et al.
Published: (2025)

DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance
by: Wang, Cong, et al.
Published: (2023)

Enhancing DR Classification with Swin Transformer and Shifted Window Attention
by: Boulaabi, Meher, et al.
Published: (2025)

Towards Retrieval-Augmented Architectures for Image Captioning
by: Sarto, Sara, et al.
Published: (2024)

Live Video Captioning
by: Blanco-Fernández, Eduardo, et al.
Published: (2024)

Regional Attention-Enhanced Swin Transformer for Clinically Relevant Medical Image Captioning
by: Naz, Zubia, et al.
Published: (2025)

RetCompletion:High-Speed Inference Image Completion with Retentive Network
by: Cang, Yueyang, et al.
Published: (2024)

What Makes for Good Image Captions?
by: Chen, Delong, et al.
Published: (2024)

Benchmarking and Improving Detail Image Caption
by: Dong, Hongyuan, et al.
Published: (2024)

Scene Graph-guided SegCaptioning Transformer with Fine-grained Alignment for Controllable Video Segmentation and Captioning
by: Zhang, Xu, et al.
Published: (2026)

Fourier Transform Multiple Instance Learning for Whole Slide Image Classification
by: Bilic, Anthony, et al.
Published: (2025)

Image Generation from Image Captioning -- Invertible Approach
by: Menon, Nandakishore S, et al.
Published: (2024)

HiSem: Hierarchical Semantic Disentangling for Remote Sensing Image Change Captioning
by: Wang, Man, et al.
Published: (2026)