:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sun, Lixu, Yolwas, Nurmemet, Silamu, Wushour
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2511.08133
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Lightweight Context-Driven Training-Free Network for Scene Text Segmentation and Recognition
by: Chakraborty, Ritabrata, et al.
Published: (2025)

EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition
by: Wang, Xiao, et al.
Published: (2025)

JSTR: Judgment Improves Scene Text Recognition
by: Fujitake, Masato
Published: (2024)

Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
by: Maracani, Andrea, et al.
Published: (2025)

Memory-Inspired Temporal Prompt Interaction for Text-Image Classification
by: Yu, Xinyao, et al.
Published: (2024)

TextMamba: Scene Text Detector with Mamba
by: Zhao, Qiyan, et al.
Published: (2025)

Policy Optimized Text-to-Image Pipeline Design
by: Gadot, Uri, et al.
Published: (2025)

TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes
by: Fu, Yanping, et al.
Published: (2024)

AutoMR: A Universal Time Series Motion Recognition Pipeline
by: Zhang, Likun, et al.
Published: (2025)

DreamText: High Fidelity Scene Text Synthesis
by: Wang, Yibin, et al.
Published: (2024)

Fast Real-Time Pipeline for Robust Arm Gesture Recognition
by: Bagladi, Milán Zsolt, et al.
Published: (2025)

Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles
by: Liao, Haicheng, et al.
Published: (2025)

Text-Scene: A Scene-to-Language Parsing Framework for 3D Scene Understanding
by: Li, Haoyuan, et al.
Published: (2025)

Handwritten Text Recognition: A Survey
by: Garrido-Munoz, Carlos, et al.
Published: (2025)

TAG: Thinking with Action Unit Grounding for Facial Expression Recognition
by: Lin, Haobo, et al.
Published: (2026)

Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding
by: De, Anik, et al.
Published: (2025)

StyleText: A Large-Scale Dataset and Benchmark for Stylized Scene Text Inpainting
by: Simonyan, Aleksandr, et al.
Published: (2026)

Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
by: Zhan, Yufei, et al.
Published: (2023)

DualTSR: Unified Dual-Diffusion Transformer for Scene Text Image Super-Resolution
by: Niu, Axi, et al.
Published: (2026)

3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation
by: Zhang, Frank, et al.
Published: (2024)

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving
by: Tang, Tao, et al.
Published: (2024)

JaWildText: A Benchmark for Vision-Language Models on Japanese Scene Text Understanding
by: Maeda, Koki, et al.
Published: (2026)

ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought Reasoning
by: Wang, Xiao, et al.
Published: (2025)

DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training
by: Xie, Yu, et al.
Published: (2024)

PaintScene4D: Consistent 4D Scene Generation from Text Prompts
by: Gupta, Vinayak, et al.
Published: (2024)

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
by: Zhou, Shijie, et al.
Published: (2024)

InstructOCR: Instruction Boosting Scene Text Spotting
by: Duan, Chen, et al.
Published: (2024)

Text-VQA Aug: Pipelined Harnessing of Large Multimodal Models for Automated Synthesis
by: Joshi, Soham, et al.
Published: (2025)

Integrating Prior Observations for Incremental 3D Scene Graph Prediction
by: Renz, Marian, et al.
Published: (2025)

Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models
by: Zhang, Huixuan, et al.
Published: (2025)

Seeing Beyond the Scene: Analyzing and Mitigating Background Bias in Action Recognition
by: Zhou, Ellie, et al.
Published: (2025)

CURVE: Learning Causality-Inspired Invariant Representations for Robust Scene Understanding via Uncertainty-Guided Regularization
by: Liang, Yue, et al.
Published: (2026)

Jailbreaking on Text-to-Video Models via Scene Splitting Strategy
by: Lee, Wonjun, et al.
Published: (2025)

TSTMotion: Training-free Scene-aware Text-to-motion Generation
by: Guo, Ziyan, et al.
Published: (2025)

Region Prompt Tuning: Fine-grained Scene Text Detection Utilizing Region Text Prompt
by: Lin, Xingtao, et al.
Published: (2024)

A Large-scale Dataset for Robust Complex Anime Scene Text Detection
by: Dong, Ziyi, et al.
Published: (2025)

TripleFDS: Triple Feature Disentanglement and Synthesis for Scene Text Editing
by: Bao, Yuchen, et al.
Published: (2025)

Hybrid CNN-ViT Framework for Motion-Blurred Scene Text Restoration
by: Rashid, Umar, et al.
Published: (2025)

LatentEditor: Text Driven Local Editing of 3D Scenes
by: Khalid, Umar, et al.
Published: (2023)

Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation
by: Vaidya, Shreyas, et al.
Published: (2023)