Saved in:
| Main Authors: | De, Anik, Penamakuri, Abhirama Subramanyam, Yadav, Rajeev, Rathore, Aditya, Shah, Harshiv, Sharma, Devesh, Agarwal, Sagar, Kumar, Pravin, Mishra, Anand |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.23071 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2024)
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2024)
When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2025)
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2025)
PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis
by: Lokesh, K, et al.
Published: (2026)
by: Lokesh, K, et al.
Published: (2026)
Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs
by: Gautam, Somraj, et al.
Published: (2025)
by: Gautam, Somraj, et al.
Published: (2025)
Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation
by: Vaidya, Shreyas, et al.
Published: (2023)
by: Vaidya, Shreyas, et al.
Published: (2023)
Audiopedia: Audio QA with Knowledge
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2024)
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2024)
Image Synthesis with Graph Conditioning: CLIP-Guided Diffusion Models for Scene Graphs
by: Mishra, Rameshwar, et al.
Published: (2024)
by: Mishra, Rameshwar, et al.
Published: (2024)
ISETHDR: A Physics-based Synthetic Radiance Dataset for High Dynamic Range Driving Scenes
by: Liu, Zhenyi, et al.
Published: (2024)
by: Liu, Zhenyi, et al.
Published: (2024)
SkyScenes: A Synthetic Dataset for Aerial Scene Understanding
by: Khose, Sahil, et al.
Published: (2023)
by: Khose, Sahil, et al.
Published: (2023)
Text-Scene: A Scene-to-Language Parsing Framework for 3D Scene Understanding
by: Li, Haoyuan, et al.
Published: (2025)
by: Li, Haoyuan, et al.
Published: (2025)
SANPO: A Scene Understanding, Accessibility and Human Navigation Dataset
by: Waghmare, Sagar M., et al.
Published: (2023)
by: Waghmare, Sagar M., et al.
Published: (2023)
BharatBBQ: A Multilingual Bias Benchmark for Question Answering in the Indian Context
by: Tomar, Aditya, et al.
Published: (2025)
by: Tomar, Aditya, et al.
Published: (2025)
SEA-Vision: A Multilingual Benchmark for Comprehensive Document and Scene Text Understanding in Southeast Asia
by: Yue, Pengfei, et al.
Published: (2026)
by: Yue, Pengfei, et al.
Published: (2026)
LET-US: Long Event-Text Understanding of Scenes
by: Chen, Rui, et al.
Published: (2025)
by: Chen, Rui, et al.
Published: (2025)
TextVidBench: A Benchmark for Long Video Scene Text Understanding
by: Zhong, Yangyang, et al.
Published: (2025)
by: Zhong, Yangyang, et al.
Published: (2025)
Aggregated Text Transformer for Scene Text Detection
by: Zhou, Zhao, et al.
Published: (2022)
by: Zhou, Zhao, et al.
Published: (2022)
The First Swahili Language Scene Text Detection and Recognition Dataset
by: Douamba, Fadila Wendigoundi, et al.
Published: (2024)
by: Douamba, Fadila Wendigoundi, et al.
Published: (2024)
Learning Under Low Illumination: A Dataset and Algorithm for Traffic Sign Recognition
by: Mishra, Aditya, et al.
Published: (2025)
by: Mishra, Aditya, et al.
Published: (2025)
Partial Scene Text Retrieval
by: Wang, Hao, et al.
Published: (2024)
by: Wang, Hao, et al.
Published: (2024)
Inverse Scene Text Removal
by: Yoshimatsu, Takumi, et al.
Published: (2025)
by: Yoshimatsu, Takumi, et al.
Published: (2025)
Text‐Guided Interactive Scene Synthesis with Scene Prior Guidance
by: Shaoheng Fang, et al.
Published: (2025)
by: Shaoheng Fang, et al.
Published: (2025)
StyleText: A Large-Scale Dataset and Benchmark for Stylized Scene Text Inpainting
by: Simonyan, Aleksandr, et al.
Published: (2026)
by: Simonyan, Aleksandr, et al.
Published: (2026)
TextMamba: Scene Text Detector with Mamba
by: Zhao, Qiyan, et al.
Published: (2025)
by: Zhao, Qiyan, et al.
Published: (2025)
Cyberbullying Detection in Hinglish Text Using MURIL and Explainable AI
by: Kumar, Devesh
Published: (2025)
by: Kumar, Devesh
Published: (2025)
TextSculptor: Training and Benchmarking Scene Text Editing
by: Lin, Yiheng, et al.
Published: (2026)
by: Lin, Yiheng, et al.
Published: (2026)
Text-Pass Filter: An Efficient Scene Text Detector
by: Yang, Chuang, et al.
Published: (2026)
by: Yang, Chuang, et al.
Published: (2026)
IndicSTR12: A Dataset for Indic Scene Text Recognition
by: Lunia, Harsh, et al.
Published: (2024)
by: Lunia, Harsh, et al.
Published: (2024)
GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System
by: Kumari, Lalita, et al.
Published: (2024)
by: Kumari, Lalita, et al.
Published: (2024)
JaWildText: A Benchmark for Vision-Language Models on Japanese Scene Text Understanding
by: Maeda, Koki, et al.
Published: (2026)
by: Maeda, Koki, et al.
Published: (2026)
DreamText: High Fidelity Scene Text Synthesis
by: Wang, Yibin, et al.
Published: (2024)
by: Wang, Yibin, et al.
Published: (2024)
Recognition-Synergistic Scene Text Editing
by: Fang, Zhengyao, et al.
Published: (2025)
by: Fang, Zhengyao, et al.
Published: (2025)
Instruction-Guided Scene Text Recognition
by: Du, Yongkun, et al.
Published: (2024)
by: Du, Yongkun, et al.
Published: (2024)
Decoder Pre-Training with only Text for Scene Text Recognition
by: Zhao, Shuai, et al.
Published: (2024)
by: Zhao, Shuai, et al.
Published: (2024)
TEACH: Text Encoding as Curriculum Hints for Scene Text Recognition
by: Yang, Xiahan, et al.
Published: (2025)
by: Yang, Xiahan, et al.
Published: (2025)
JSTR: Judgment Improves Scene Text Recognition
by: Fujitake, Masato
Published: (2024)
by: Fujitake, Masato
Published: (2024)
TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation
by: Bansal, Hritik, et al.
Published: (2024)
by: Bansal, Hritik, et al.
Published: (2024)
Scene-Text Grounding for Text-Based Video Question Answering
by: Zhou, Sheng, et al.
Published: (2024)
by: Zhou, Sheng, et al.
Published: (2024)
SceneTextStylizer: A Training-Free Scene Text Style Transfer Framework with Diffusion Model
by: Yuan, Honghui, et al.
Published: (2025)
by: Yuan, Honghui, et al.
Published: (2025)
Text-to-Scene with Large Reasoning Models
by: Berdoz, Frédéric, et al.
Published: (2025)
by: Berdoz, Frédéric, et al.
Published: (2025)
Lumos : Empowering Multimodal LLMs with Scene Text Recognition
by: Shenoy, Ashish, et al.
Published: (2024)
by: Shenoy, Ashish, et al.
Published: (2024)
Similar Items
-
Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2024) -
When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2025) -
PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis
by: Lokesh, K, et al.
Published: (2026) -
Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs
by: Gautam, Somraj, et al.
Published: (2025) -
Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation
by: Vaidya, Shreyas, et al.
Published: (2023)