:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Jinghong, Ota, Koichi, Gu, Wen, Hasegawa, Shinobu
Format:	Preprint
Published:	2023
Subjects:	Computer Vision and Pattern Recognition Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2305.17401
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Object Recognition from Scientific Document based on Compartment Refinement Framework
by: Li, Jinghong, et al.
Published: (2023)

Hierarchical Tree-structured Knowledge Graph For Academic Insight Survey
by: Li, Jinghong, et al.
Published: (2024)

A Survey Forest Diagram : Gain a Divergent Insight View on a Specific Research Topic
by: Li, Jinghong, et al.
Published: (2024)

HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
by: Chan, Adrian, et al.
Published: (2024)

Lumos : Empowering Multimodal LLMs with Scene Text Recognition
by: Shenoy, Ashish, et al.
Published: (2024)

Text Role Classification in Scientific Charts Using Multimodal Transformers
by: Kim, Hye Jin, et al.
Published: (2024)

Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
by: Motamed, Saman, et al.
Published: (2023)

JSTR: Judgment Improves Scene Text Recognition
by: Fujitake, Masato
Published: (2024)

Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection
by: Mei, Jingbiao, et al.
Published: (2025)

C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion
by: Yoon, Hee Suk, et al.
Published: (2024)

Mostly Text, Smart Visuals: Asymmetric Text-Visual Pruning for Large Vision-Language Models
by: Li, Sijie, et al.
Published: (2026)

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization
by: Cheng, Jiale, et al.
Published: (2025)

Text-guided Controllable Mesh Refinement for Interactive 3D Modeling
by: Chen, Yun-Chun, et al.
Published: (2024)

Glyph: Scaling Context Windows via Visual-Text Compression
by: Cheng, Jiale, et al.
Published: (2025)

Text-centric Alignment for Multi-Modality Learning
by: Tsai, Yun-Da, et al.
Published: (2024)

Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
by: Li, Hao, et al.
Published: (2024)

ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention Refinement
by: Salamatian, Ali, et al.
Published: (2025)

Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation
by: Chang, Wei-Chia, et al.
Published: (2025)

A Comparative Study of Machine Unlearning Techniques for Image and Text Classification Models
by: Safa, Omar M., et al.
Published: (2024)

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
by: Yuan, Huizhuo, et al.
Published: (2024)

Calibrating Multimodal Consensus for Emotion Recognition
by: Zhong, Guowei, et al.
Published: (2025)

Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance
by: Zhao, Linxi, et al.
Published: (2024)

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
by: Lei, Jiayi, et al.
Published: (2025)

DreamReward: Text-to-3D Generation with Human Preference
by: Ye, Junliang, et al.
Published: (2024)

Indian Sign Language Recognition Using Mediapipe Holistic
by: G, Velmathi, et al.
Published: (2023)

T$^3$Bench: Benchmarking Current Progress in Text-to-3D Generation
by: He, Yuze, et al.
Published: (2023)

SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers
by: Kawada, Takuro, et al.
Published: (2025)

Analytical Softmax Temperature Setting from Feature Dimensions for Model- and Domain-Robust Classification
by: Hasegawa, Tatsuhito, et al.
Published: (2025)

Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection
by: Park, Kwanyong, et al.
Published: (2024)

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
by: Chen, Zhaorun, et al.
Published: (2024)

Towards Visual Text Grounding of Multimodal Large Language Model
by: Li, Ming, et al.
Published: (2025)

Generative Technology for Human Emotion Recognition: A Scope Review
by: Ma, Fei, et al.
Published: (2024)

Low-Resource Heuristics for Bahnaric Optical Character Recognition Improvement
by: Tran, Phat, et al.
Published: (2026)

Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
by: Yi, Hao, et al.
Published: (2024)

What Shape Is Optimal for Masks in Text Removal?
by: Nakada, Hyakka, et al.
Published: (2025)

Evaluating Numerical Reasoning in Text-to-Image Models
by: Kajić, Ivana, et al.
Published: (2024)

Generating Fine Details of Entity Interactions
by: Gu, Xinyi, et al.
Published: (2025)

Fusing Domain-Specific Content from Large Language Models into Knowledge Graphs for Enhanced Zero Shot Object State Classification
by: Gouidis, Filippos, et al.
Published: (2024)

Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
by: Zhou, Yiyang, et al.
Published: (2023)

Alt-Text with Context: Improving Accessibility for Images on Twitter
by: Srivatsan, Nikita, et al.
Published: (2023)