Saved in:
| Main Authors: | Hu, Jia Cheng, Cavicchioli, Roberto, Capotondi, Alessandro |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.13963 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning
by: Hu, Jia Cheng, et al.
Published: (2022)
by: Hu, Jia Cheng, et al.
Published: (2022)
Diffusion Is Your Friend in Show, Suggest and Tell
by: Hu, Jia Cheng, et al.
Published: (2025)
by: Hu, Jia Cheng, et al.
Published: (2025)
Windowed-FourierMixer: Enhancing Clutter-Free Room Modeling with Fourier Transform
by: Henriques, Bruno, et al.
Published: (2024)
by: Henriques, Bruno, et al.
Published: (2024)
Bidirectional Awareness Induction in Autoregressive Seq2Seq Models
by: Hu, Jia Cheng, et al.
Published: (2024)
by: Hu, Jia Cheng, et al.
Published: (2024)
Heterogeneous Encoders Scaling In The Transformer For Neural Machine Translation
by: Hu, Jia Cheng, et al.
Published: (2023)
by: Hu, Jia Cheng, et al.
Published: (2023)
Dual-Stream Collaborative Transformer for Image Captioning
by: Wan, Jun, et al.
Published: (2026)
by: Wan, Jun, et al.
Published: (2026)
Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
by: Song, Zijie, et al.
Published: (2023)
by: Song, Zijie, et al.
Published: (2023)
DEVICE: Depth and Visual Concepts Aware Transformer for OCR-based Image Captioning
by: Xu, Dongsheng, et al.
Published: (2023)
by: Xu, Dongsheng, et al.
Published: (2023)
Automated Image Captioning with CNNs and Transformers
by: Cahyono, Joshua Adrian, et al.
Published: (2024)
by: Cahyono, Joshua Adrian, et al.
Published: (2024)
UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models
by: Van Nguyen, Quan, et al.
Published: (2024)
by: Van Nguyen, Quan, et al.
Published: (2024)
FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation
by: Ge, Yunyang, et al.
Published: (2025)
by: Ge, Yunyang, et al.
Published: (2025)
AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration
by: Cai, Hongyi, et al.
Published: (2024)
by: Cai, Hongyi, et al.
Published: (2024)
Brazilian Portuguese Image Captioning with Transformers: A Study on Cross-Native-Translated Dataset
by: Bromonschenkel, Gabriel, et al.
Published: (2026)
by: Bromonschenkel, Gabriel, et al.
Published: (2026)
CaptionQA: Is Your Caption as Useful as the Image Itself?
by: Yang, Shijia, et al.
Published: (2025)
by: Yang, Shijia, et al.
Published: (2025)
Transformer based Multitask Learning for Image Captioning and Object Detection
by: Basak, Debolena, et al.
Published: (2024)
by: Basak, Debolena, et al.
Published: (2024)
HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction
by: Byeon, Keunho, et al.
Published: (2026)
by: Byeon, Keunho, et al.
Published: (2026)
CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2025)
by: Saito, Kuniaki, et al.
Published: (2025)
A Lightweight Sparse Focus Transformer for Remote Sensing Image Change Captioning
by: Sun, Dongwei, et al.
Published: (2024)
by: Sun, Dongwei, et al.
Published: (2024)
Analyzing Transformer Models and Knowledge Distillation Approaches for Image Captioning on Edge AI
by: Kwok, Wing Man Casca, et al.
Published: (2025)
by: Kwok, Wing Man Casca, et al.
Published: (2025)
Lightweight Vision Transformer with Window and Spatial Attention for Food Image Classification
by: Gao, Xinle, et al.
Published: (2025)
by: Gao, Xinle, et al.
Published: (2025)
Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
by: Cheng, Sheng, et al.
Published: (2024)
by: Cheng, Sheng, et al.
Published: (2024)
A Fourier Transform Framework for Domain Adaptation
by: Luo, Le, et al.
Published: (2024)
by: Luo, Le, et al.
Published: (2024)
SwinShadow: Shifted Window for Ambiguous Adjacent Shadow Detection
by: Wang, Yonghui, et al.
Published: (2024)
by: Wang, Yonghui, et al.
Published: (2024)
Image Captioning via Compact Bidirectional Architecture
by: Song, Zijie, et al.
Published: (2022)
by: Song, Zijie, et al.
Published: (2022)
SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning
by: Zhang, Lin, et al.
Published: (2025)
by: Zhang, Lin, et al.
Published: (2025)
RMT: Retentive Networks Meet Vision Transformers
by: Fan, Qihang, et al.
Published: (2023)
by: Fan, Qihang, et al.
Published: (2023)
TransRAD: Retentive Vision Transformer for Enhanced Radar Object Detection
by: Cheng, Lei, et al.
Published: (2025)
by: Cheng, Lei, et al.
Published: (2025)
A Novel Lightweight Transformer with Edge-Aware Fusion for Remote Sensing Image Captioning
by: Das, Swadhin, et al.
Published: (2025)
by: Das, Swadhin, et al.
Published: (2025)
DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance
by: Wang, Cong, et al.
Published: (2023)
by: Wang, Cong, et al.
Published: (2023)
Enhancing DR Classification with Swin Transformer and Shifted Window Attention
by: Boulaabi, Meher, et al.
Published: (2025)
by: Boulaabi, Meher, et al.
Published: (2025)
Towards Retrieval-Augmented Architectures for Image Captioning
by: Sarto, Sara, et al.
Published: (2024)
by: Sarto, Sara, et al.
Published: (2024)
Live Video Captioning
by: Blanco-Fernández, Eduardo, et al.
Published: (2024)
by: Blanco-Fernández, Eduardo, et al.
Published: (2024)
Regional Attention-Enhanced Swin Transformer for Clinically Relevant Medical Image Captioning
by: Naz, Zubia, et al.
Published: (2025)
by: Naz, Zubia, et al.
Published: (2025)
RetCompletion:High-Speed Inference Image Completion with Retentive Network
by: Cang, Yueyang, et al.
Published: (2024)
by: Cang, Yueyang, et al.
Published: (2024)
What Makes for Good Image Captions?
by: Chen, Delong, et al.
Published: (2024)
by: Chen, Delong, et al.
Published: (2024)
Benchmarking and Improving Detail Image Caption
by: Dong, Hongyuan, et al.
Published: (2024)
by: Dong, Hongyuan, et al.
Published: (2024)
Scene Graph-guided SegCaptioning Transformer with Fine-grained Alignment for Controllable Video Segmentation and Captioning
by: Zhang, Xu, et al.
Published: (2026)
by: Zhang, Xu, et al.
Published: (2026)
Fourier Transform Multiple Instance Learning for Whole Slide Image Classification
by: Bilic, Anthony, et al.
Published: (2025)
by: Bilic, Anthony, et al.
Published: (2025)
Image Generation from Image Captioning -- Invertible Approach
by: Menon, Nandakishore S, et al.
Published: (2024)
by: Menon, Nandakishore S, et al.
Published: (2024)
HiSem: Hierarchical Semantic Disentangling for Remote Sensing Image Change Captioning
by: Wang, Man, et al.
Published: (2026)
by: Wang, Man, et al.
Published: (2026)
Similar Items
-
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning
by: Hu, Jia Cheng, et al.
Published: (2022) -
Diffusion Is Your Friend in Show, Suggest and Tell
by: Hu, Jia Cheng, et al.
Published: (2025) -
Windowed-FourierMixer: Enhancing Clutter-Free Room Modeling with Fourier Transform
by: Henriques, Bruno, et al.
Published: (2024) -
Bidirectional Awareness Induction in Autoregressive Seq2Seq Models
by: Hu, Jia Cheng, et al.
Published: (2024) -
Heterogeneous Encoders Scaling In The Transformer For Neural Machine Translation
by: Hu, Jia Cheng, et al.
Published: (2023)