:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	De, Anik, Penamakuri, Abhirama Subramanyam, Yadav, Rajeev, Rathore, Aditya, Shah, Harshiv, Sharma, Devesh, Agarwal, Sagar, Kumar, Pravin, Mishra, Anand
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2511.23071
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2024)

When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2025)

PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis
by: Lokesh, K, et al.
Published: (2026)

Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs
by: Gautam, Somraj, et al.
Published: (2025)

Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation
by: Vaidya, Shreyas, et al.
Published: (2023)

Audiopedia: Audio QA with Knowledge
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2024)

Image Synthesis with Graph Conditioning: CLIP-Guided Diffusion Models for Scene Graphs
by: Mishra, Rameshwar, et al.
Published: (2024)

ISETHDR: A Physics-based Synthetic Radiance Dataset for High Dynamic Range Driving Scenes
by: Liu, Zhenyi, et al.
Published: (2024)

SkyScenes: A Synthetic Dataset for Aerial Scene Understanding
by: Khose, Sahil, et al.
Published: (2023)

Text-Scene: A Scene-to-Language Parsing Framework for 3D Scene Understanding
by: Li, Haoyuan, et al.
Published: (2025)

SANPO: A Scene Understanding, Accessibility and Human Navigation Dataset
by: Waghmare, Sagar M., et al.
Published: (2023)

BharatBBQ: A Multilingual Bias Benchmark for Question Answering in the Indian Context
by: Tomar, Aditya, et al.
Published: (2025)

SEA-Vision: A Multilingual Benchmark for Comprehensive Document and Scene Text Understanding in Southeast Asia
by: Yue, Pengfei, et al.
Published: (2026)

LET-US: Long Event-Text Understanding of Scenes
by: Chen, Rui, et al.
Published: (2025)

TextVidBench: A Benchmark for Long Video Scene Text Understanding
by: Zhong, Yangyang, et al.
Published: (2025)

Aggregated Text Transformer for Scene Text Detection
by: Zhou, Zhao, et al.
Published: (2022)

The First Swahili Language Scene Text Detection and Recognition Dataset
by: Douamba, Fadila Wendigoundi, et al.
Published: (2024)

Learning Under Low Illumination: A Dataset and Algorithm for Traffic Sign Recognition
by: Mishra, Aditya, et al.
Published: (2025)

Partial Scene Text Retrieval
by: Wang, Hao, et al.
Published: (2024)

Inverse Scene Text Removal
by: Yoshimatsu, Takumi, et al.
Published: (2025)

Text‐Guided Interactive Scene Synthesis with Scene Prior Guidance
by: Shaoheng Fang, et al.
Published: (2025)

StyleText: A Large-Scale Dataset and Benchmark for Stylized Scene Text Inpainting
by: Simonyan, Aleksandr, et al.
Published: (2026)

TextMamba: Scene Text Detector with Mamba
by: Zhao, Qiyan, et al.
Published: (2025)

Cyberbullying Detection in Hinglish Text Using MURIL and Explainable AI
by: Kumar, Devesh
Published: (2025)

TextSculptor: Training and Benchmarking Scene Text Editing
by: Lin, Yiheng, et al.
Published: (2026)

Text-Pass Filter: An Efficient Scene Text Detector
by: Yang, Chuang, et al.
Published: (2026)

IndicSTR12: A Dataset for Indic Scene Text Recognition
by: Lunia, Harsh, et al.
Published: (2024)

GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System
by: Kumari, Lalita, et al.
Published: (2024)

JaWildText: A Benchmark for Vision-Language Models on Japanese Scene Text Understanding
by: Maeda, Koki, et al.
Published: (2026)

DreamText: High Fidelity Scene Text Synthesis
by: Wang, Yibin, et al.
Published: (2024)

Recognition-Synergistic Scene Text Editing
by: Fang, Zhengyao, et al.
Published: (2025)

Instruction-Guided Scene Text Recognition
by: Du, Yongkun, et al.
Published: (2024)

Decoder Pre-Training with only Text for Scene Text Recognition
by: Zhao, Shuai, et al.
Published: (2024)

TEACH: Text Encoding as Curriculum Hints for Scene Text Recognition
by: Yang, Xiahan, et al.
Published: (2025)

JSTR: Judgment Improves Scene Text Recognition
by: Fujitake, Masato
Published: (2024)

TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation
by: Bansal, Hritik, et al.
Published: (2024)

Scene-Text Grounding for Text-Based Video Question Answering
by: Zhou, Sheng, et al.
Published: (2024)

SceneTextStylizer: A Training-Free Scene Text Style Transfer Framework with Diffusion Model
by: Yuan, Honghui, et al.
Published: (2025)

Text-to-Scene with Large Reasoning Models
by: Berdoz, Frédéric, et al.
Published: (2025)

Lumos : Empowering Multimodal LLMs with Scene Text Recognition
by: Shenoy, Ashish, et al.
Published: (2024)