Saved in:
| Main Authors: | Li, Jinghong, Ota, Koichi, Gu, Wen, Hasegawa, Shinobu |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2305.17401 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Object Recognition from Scientific Document based on Compartment Refinement Framework
by: Li, Jinghong, et al.
Published: (2023)
by: Li, Jinghong, et al.
Published: (2023)
Hierarchical Tree-structured Knowledge Graph For Academic Insight Survey
by: Li, Jinghong, et al.
Published: (2024)
by: Li, Jinghong, et al.
Published: (2024)
A Survey Forest Diagram : Gain a Divergent Insight View on a Specific Research Topic
by: Li, Jinghong, et al.
Published: (2024)
by: Li, Jinghong, et al.
Published: (2024)
HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
by: Chan, Adrian, et al.
Published: (2024)
by: Chan, Adrian, et al.
Published: (2024)
Lumos : Empowering Multimodal LLMs with Scene Text Recognition
by: Shenoy, Ashish, et al.
Published: (2024)
by: Shenoy, Ashish, et al.
Published: (2024)
Text Role Classification in Scientific Charts Using Multimodal Transformers
by: Kim, Hye Jin, et al.
Published: (2024)
by: Kim, Hye Jin, et al.
Published: (2024)
Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
by: Motamed, Saman, et al.
Published: (2023)
by: Motamed, Saman, et al.
Published: (2023)
JSTR: Judgment Improves Scene Text Recognition
by: Fujitake, Masato
Published: (2024)
by: Fujitake, Masato
Published: (2024)
Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection
by: Mei, Jingbiao, et al.
Published: (2025)
by: Mei, Jingbiao, et al.
Published: (2025)
C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion
by: Yoon, Hee Suk, et al.
Published: (2024)
by: Yoon, Hee Suk, et al.
Published: (2024)
Mostly Text, Smart Visuals: Asymmetric Text-Visual Pruning for Large Vision-Language Models
by: Li, Sijie, et al.
Published: (2026)
by: Li, Sijie, et al.
Published: (2026)
VPO: Aligning Text-to-Video Generation Models with Prompt Optimization
by: Cheng, Jiale, et al.
Published: (2025)
by: Cheng, Jiale, et al.
Published: (2025)
Text-guided Controllable Mesh Refinement for Interactive 3D Modeling
by: Chen, Yun-Chun, et al.
Published: (2024)
by: Chen, Yun-Chun, et al.
Published: (2024)
Glyph: Scaling Context Windows via Visual-Text Compression
by: Cheng, Jiale, et al.
Published: (2025)
by: Cheng, Jiale, et al.
Published: (2025)
Text-centric Alignment for Multi-Modality Learning
by: Tsai, Yun-Da, et al.
Published: (2024)
by: Tsai, Yun-Da, et al.
Published: (2024)
Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
by: Li, Hao, et al.
Published: (2024)
by: Li, Hao, et al.
Published: (2024)
ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention Refinement
by: Salamatian, Ali, et al.
Published: (2025)
by: Salamatian, Ali, et al.
Published: (2025)
Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation
by: Chang, Wei-Chia, et al.
Published: (2025)
by: Chang, Wei-Chia, et al.
Published: (2025)
A Comparative Study of Machine Unlearning Techniques for Image and Text Classification Models
by: Safa, Omar M., et al.
Published: (2024)
by: Safa, Omar M., et al.
Published: (2024)
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
by: Yuan, Huizhuo, et al.
Published: (2024)
by: Yuan, Huizhuo, et al.
Published: (2024)
Calibrating Multimodal Consensus for Emotion Recognition
by: Zhong, Guowei, et al.
Published: (2025)
by: Zhong, Guowei, et al.
Published: (2025)
Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance
by: Zhao, Linxi, et al.
Published: (2024)
by: Zhao, Linxi, et al.
Published: (2024)
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
by: Lei, Jiayi, et al.
Published: (2025)
by: Lei, Jiayi, et al.
Published: (2025)
DreamReward: Text-to-3D Generation with Human Preference
by: Ye, Junliang, et al.
Published: (2024)
by: Ye, Junliang, et al.
Published: (2024)
Indian Sign Language Recognition Using Mediapipe Holistic
by: G, Velmathi, et al.
Published: (2023)
by: G, Velmathi, et al.
Published: (2023)
T$^3$Bench: Benchmarking Current Progress in Text-to-3D Generation
by: He, Yuze, et al.
Published: (2023)
by: He, Yuze, et al.
Published: (2023)
SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers
by: Kawada, Takuro, et al.
Published: (2025)
by: Kawada, Takuro, et al.
Published: (2025)
Analytical Softmax Temperature Setting from Feature Dimensions for Model- and Domain-Robust Classification
by: Hasegawa, Tatsuhito, et al.
Published: (2025)
by: Hasegawa, Tatsuhito, et al.
Published: (2025)
Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection
by: Park, Kwanyong, et al.
Published: (2024)
by: Park, Kwanyong, et al.
Published: (2024)
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
by: Chen, Zhaorun, et al.
Published: (2024)
by: Chen, Zhaorun, et al.
Published: (2024)
Towards Visual Text Grounding of Multimodal Large Language Model
by: Li, Ming, et al.
Published: (2025)
by: Li, Ming, et al.
Published: (2025)
Generative Technology for Human Emotion Recognition: A Scope Review
by: Ma, Fei, et al.
Published: (2024)
by: Ma, Fei, et al.
Published: (2024)
Low-Resource Heuristics for Bahnaric Optical Character Recognition Improvement
by: Tran, Phat, et al.
Published: (2026)
by: Tran, Phat, et al.
Published: (2026)
Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
by: Yi, Hao, et al.
Published: (2024)
by: Yi, Hao, et al.
Published: (2024)
What Shape Is Optimal for Masks in Text Removal?
by: Nakada, Hyakka, et al.
Published: (2025)
by: Nakada, Hyakka, et al.
Published: (2025)
Evaluating Numerical Reasoning in Text-to-Image Models
by: Kajić, Ivana, et al.
Published: (2024)
by: Kajić, Ivana, et al.
Published: (2024)
Generating Fine Details of Entity Interactions
by: Gu, Xinyi, et al.
Published: (2025)
by: Gu, Xinyi, et al.
Published: (2025)
Fusing Domain-Specific Content from Large Language Models into Knowledge Graphs for Enhanced Zero Shot Object State Classification
by: Gouidis, Filippos, et al.
Published: (2024)
by: Gouidis, Filippos, et al.
Published: (2024)
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
by: Zhou, Yiyang, et al.
Published: (2023)
by: Zhou, Yiyang, et al.
Published: (2023)
Alt-Text with Context: Improving Accessibility for Images on Twitter
by: Srivatsan, Nikita, et al.
Published: (2023)
by: Srivatsan, Nikita, et al.
Published: (2023)
Similar Items
-
Object Recognition from Scientific Document based on Compartment Refinement Framework
by: Li, Jinghong, et al.
Published: (2023) -
Hierarchical Tree-structured Knowledge Graph For Academic Insight Survey
by: Li, Jinghong, et al.
Published: (2024) -
A Survey Forest Diagram : Gain a Divergent Insight View on a Specific Research Topic
by: Li, Jinghong, et al.
Published: (2024) -
HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
by: Chan, Adrian, et al.
Published: (2024) -
Lumos : Empowering Multimodal LLMs with Scene Text Recognition
by: Shenoy, Ashish, et al.
Published: (2024)