:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kwok, Wing Man Casca, Tung, Yip Chiu, Bhagchandani, Kunal
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.03607
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CAPEEN: Image Captioning with Early Exits and Knowledge Distillation
by: Bajpai, Divya Jyoti, et al.
Published: (2024)

Edge-Efficient Image Restoration: Transformer Distillation into State-Space Models
by: Miriyala, Srinivas Soumitri, et al.
Published: (2026)

Compositional Oil Spill Detection Based on Object Detector and Adapted Segment Anything Model from SAR Images
by: Wu, Wenhui, et al.
Published: (2024)

An Edge AI System Based on FPGA Platform for Railway Fault Detection
by: Li, Jiale, et al.
Published: (2024)

A Novel Lightweight Transformer with Edge-Aware Fusion for Remote Sensing Image Captioning
by: Das, Swadhin, et al.
Published: (2025)

Efficient Knowledge Distillation of SAM for Medical Image Segmentation
by: Patil, Kunal Dasharath, et al.
Published: (2025)

A Transformer-in-Transformer Network Utilizing Knowledge Distillation for Image Recognition
by: Rahman, Dewan Tauhid, et al.
Published: (2025)

Token Compression Meets Compact Vision Transformers: A Survey and Comparative Evaluation for Edge AI
by: Nguyen, Phat, et al.
Published: (2025)

UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models
by: Van Nguyen, Quan, et al.
Published: (2024)

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
by: Cheng, Kanzhi, et al.
Published: (2025)

Where Do Images Come From? Analyzing Captions to Geographically Profile Datasets
by: Basu, Abhipsa, et al.
Published: (2026)

Dual-Stream Collaborative Transformer for Image Captioning
by: Wan, Jun, et al.
Published: (2026)

Efficient Few-Shot Learning for Edge AI via Knowledge Distillation on MobileViT
by: Tsuyuki, Shuhei, et al.
Published: (2026)

Analyzing Image Beyond Visual Aspect: Image Emotion Classification via Multiple-Affective Captioning
by: Zhou, Zibo, et al.
Published: (2025)

Image Generation from Image Captioning -- Invertible Approach
by: Menon, Nandakishore S, et al.
Published: (2024)

Beam-Guided Knowledge Replay for Knowledge-Rich Image Captioning using Vision-Language Model
by: AlJunaid, Reem, et al.
Published: (2025)

Towards Optimal Trade-offs in Knowledge Distillation for CNNs and Vision Transformers at the Edge
by: Violos, John, et al.
Published: (2024)

Shifted Window Fourier Transform And Retention For Image Captioning
by: Hu, Jia Cheng, et al.
Published: (2024)

Automated Image Captioning with CNNs and Transformers
by: Cahyono, Joshua Adrian, et al.
Published: (2024)

EdgeGaussians -- 3D Edge Mapping via Gaussian Splatting
by: Chelani, Kunal, et al.
Published: (2024)

Knowledge Distillation via the Target-aware Transformer
by: Lin, Sihao, et al.
Published: (2022)

Context-aware Difference Distilling for Multi-change Captioning
by: Tu, Yunbin, et al.
Published: (2024)

m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers
by: Lo, Ka Man, et al.
Published: (2024)

CaptionFool: Universal Image Captioning Model Attacks
by: Parekh, Swapnil
Published: (2026)

CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation
by: Lee, Jungsoo, et al.
Published: (2025)

CaptionQA: Is Your Caption as Useful as the Image Itself?
by: Yang, Shijia, et al.
Published: (2025)

Point Clouds Are Specialized Images: A Knowledge Transfer Approach for 3D Understanding
by: Kang, Jiachen, et al.
Published: (2023)

Transformer based Multitask Learning for Image Captioning and Object Detection
by: Basak, Debolena, et al.
Published: (2024)

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling
by: Wang, Yu, et al.
Published: (2022)

HiSem: Hierarchical Semantic Disentangling for Remote Sensing Image Change Captioning
by: Wang, Man, et al.
Published: (2026)

Detecting and Understanding Hateful Contents in Memes Through Captioning and Visual Question-Answering
by: Anaissi, Ali, et al.
Published: (2025)

Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
by: Song, Zijie, et al.
Published: (2023)

Transformer Architecture for NetsDB
by: Kamble, Subodh, et al.
Published: (2024)

Knowledge Distillation in Vision Transformers: A Critical Review
by: Habib, Gousia, et al.
Published: (2023)

Knowledge Distillation via Query Selection for Detection Transformer
by: Liu, Yi, et al.
Published: (2024)

Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank
by: Wu, Jiaxin, et al.
Published: (2024)

CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2025)

Caption-Matching: A Multimodal Approach for Cross-Domain Image Retrieval
by: Iijima, Lucas, et al.
Published: (2024)

Adjust Your Focus: Defocus Deblurring From Dual-Pixel Images Using Explicit Multi-Scale Cross-Correlation
by: Swami, Kunal
Published: (2025)

A Lightweight Sparse Focus Transformer for Remote Sensing Image Change Captioning
by: Sun, Dongwei, et al.
Published: (2024)