Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Tan, Jing Jie, Mokraoui, Anissa, Kwan, Ban-Hoe, Ng, Danny Wee-Kiat, Hum, Yan-Chai
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Human-Computer Interaction
Online Access:	https://arxiv.org/abs/2512.08873
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918239990185984
author	Tan, Jing Jie Mokraoui, Anissa Kwan, Ban-Hoe Ng, Danny Wee-Kiat Hum, Yan-Chai
author_facet	Tan, Jing Jie Mokraoui, Anissa Kwan, Ban-Hoe Ng, Danny Wee-Kiat Hum, Yan-Chai
contents	Image captioning is essential in many fields including assisting visually impaired individuals, improving content management systems, and enhancing human-computer interaction. However, a recent challenge in this domain is dealing with low-resolution image (LRI). While performance can be improved by using larger models like transformers for encoding, these models are typically heavyweight, demanding significant computational resources and memory, leading to challenges in retraining. To address this, the proposed SOLI (Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning) approach presents a solution specifically designed for lightweight, low-resolution images captioning. It employs a Siamese network architecture to optimize latent embeddings, enhancing the efficiency and accuracy of the image-to-text translation process. By focusing on a dual-pathway neural network structure, SOLI minimizes computational overhead without sacrificing performance, making it an ideal choice for training on resource-constrained scenarios.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_08873
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning Tan, Jing Jie Mokraoui, Anissa Kwan, Ban-Hoe Ng, Danny Wee-Kiat Hum, Yan-Chai Computer Vision and Pattern Recognition Artificial Intelligence Human-Computer Interaction Image captioning is essential in many fields including assisting visually impaired individuals, improving content management systems, and enhancing human-computer interaction. However, a recent challenge in this domain is dealing with low-resolution image (LRI). While performance can be improved by using larger models like transformers for encoding, these models are typically heavyweight, demanding significant computational resources and memory, leading to challenges in retraining. To address this, the proposed SOLI (Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning) approach presents a solution specifically designed for lightweight, low-resolution images captioning. It employs a Siamese network architecture to optimize latent embeddings, enhancing the efficiency and accuracy of the image-to-text translation process. By focusing on a dual-pathway neural network structure, SOLI minimizes computational overhead without sacrificing performance, making it an ideal choice for training on resource-constrained scenarios.
title	Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning
topic	Computer Vision and Pattern Recognition Artificial Intelligence Human-Computer Interaction
url	https://arxiv.org/abs/2512.08873

Similar Items