Saved in:
| Main Authors: | Nakada, Hyakka, Kubota, Marika |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.22499 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Robustness of Structured Data Extraction from Perspectively Distorted Documents
by: Nakada, Hyakka, et al.
Published: (2025)
by: Nakada, Hyakka, et al.
Published: (2025)
Centered Masking for Language-Image Pre-Training
by: Liang, Mingliang, et al.
Published: (2024)
by: Liang, Mingliang, et al.
Published: (2024)
BodyShapeGPT: SMPL Body Shape Manipulation with LLMs
by: Árbol, Baldomero R., et al.
Published: (2024)
by: Árbol, Baldomero R., et al.
Published: (2024)
Model Interpretability and Rationale Extraction by Input Mask Optimization
by: Brinner, Marc, et al.
Published: (2025)
by: Brinner, Marc, et al.
Published: (2025)
FisherMask: Enhancing Neural Network Labeling Efficiency in Image Classification Using Fisher Information
by: Gul, Shreen, et al.
Published: (2024)
by: Gul, Shreen, et al.
Published: (2024)
Mostly Text, Smart Visuals: Asymmetric Text-Visual Pruning for Large Vision-Language Models
by: Li, Sijie, et al.
Published: (2026)
by: Li, Sijie, et al.
Published: (2026)
Reverse Stable Diffusion: What prompt was used to generate this image?
by: Croitoru, Florinel-Alin, et al.
Published: (2023)
by: Croitoru, Florinel-Alin, et al.
Published: (2023)
Text-centric Alignment for Multi-Modality Learning
by: Tsai, Yun-Da, et al.
Published: (2024)
by: Tsai, Yun-Da, et al.
Published: (2024)
Evaluating Numerical Reasoning in Text-to-Image Models
by: Kajić, Ivana, et al.
Published: (2024)
by: Kajić, Ivana, et al.
Published: (2024)
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
by: Li, Siting, et al.
Published: (2025)
by: Li, Siting, et al.
Published: (2025)
MAGIC: Near-Optimal Data Attribution for Deep Learning
by: Ilyas, Andrew, et al.
Published: (2025)
by: Ilyas, Andrew, et al.
Published: (2025)
HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
by: Chan, Adrian, et al.
Published: (2024)
by: Chan, Adrian, et al.
Published: (2024)
Alt-Text with Context: Improving Accessibility for Images on Twitter
by: Srivatsan, Nikita, et al.
Published: (2023)
by: Srivatsan, Nikita, et al.
Published: (2023)
Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
by: Li, Hao, et al.
Published: (2024)
by: Li, Hao, et al.
Published: (2024)
Lumos : Empowering Multimodal LLMs with Scene Text Recognition
by: Shenoy, Ashish, et al.
Published: (2024)
by: Shenoy, Ashish, et al.
Published: (2024)
Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning
by: Piergiovanni, AJ, et al.
Published: (2024)
by: Piergiovanni, AJ, et al.
Published: (2024)
VPO: Aligning Text-to-Video Generation Models with Prompt Optimization
by: Cheng, Jiale, et al.
Published: (2025)
by: Cheng, Jiale, et al.
Published: (2025)
BiasConnect: Investigating Bias Interactions in Text-to-Image Models
by: Shukla, Pushkar, et al.
Published: (2025)
by: Shukla, Pushkar, et al.
Published: (2025)
Glyph: Scaling Context Windows via Visual-Text Compression
by: Cheng, Jiale, et al.
Published: (2025)
by: Cheng, Jiale, et al.
Published: (2025)
T-MARS: Improving Visual Representations by Circumventing Text Feature Learning
by: Maini, Pratyush, et al.
Published: (2023)
by: Maini, Pratyush, et al.
Published: (2023)
Text-to-Image Cross-Modal Generation: A Systematic Review
by: Żelaszczyk, Maciej, et al.
Published: (2024)
by: Żelaszczyk, Maciej, et al.
Published: (2024)
Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP
by: Kim, Eunji, et al.
Published: (2024)
by: Kim, Eunji, et al.
Published: (2024)
BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generation
by: Hosseyni, S. Rohollah, et al.
Published: (2024)
by: Hosseyni, S. Rohollah, et al.
Published: (2024)
Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation
by: Yin, Shukang, et al.
Published: (2024)
by: Yin, Shukang, et al.
Published: (2024)
Advanced Multimodal Deep Learning Architecture for Image-Text Matching
by: Wang, Jinyin, et al.
Published: (2024)
by: Wang, Jinyin, et al.
Published: (2024)
Text Role Classification in Scientific Charts Using Multimodal Transformers
by: Kim, Hye Jin, et al.
Published: (2024)
by: Kim, Hye Jin, et al.
Published: (2024)
DreamReward: Text-to-3D Generation with Human Preference
by: Ye, Junliang, et al.
Published: (2024)
by: Ye, Junliang, et al.
Published: (2024)
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
by: Zhang, Letian, et al.
Published: (2023)
by: Zhang, Letian, et al.
Published: (2023)
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
by: Chen, Junyi, et al.
Published: (2023)
by: Chen, Junyi, et al.
Published: (2023)
FairImagen: Post-Processing for Bias Mitigation in Text-to-Image Models
by: Fu, Zihao, et al.
Published: (2025)
by: Fu, Zihao, et al.
Published: (2025)
Text-Enhanced Data-free Approach for Federated Class-Incremental Learning
by: Tran, Minh-Tuan, et al.
Published: (2024)
by: Tran, Minh-Tuan, et al.
Published: (2024)
T$^3$Bench: Benchmarking Current Progress in Text-to-3D Generation
by: He, Yuze, et al.
Published: (2023)
by: He, Yuze, et al.
Published: (2023)
Quilt-1M: One Million Image-Text Pairs for Histopathology
by: Ikezogwo, Wisdom Oluchi, et al.
Published: (2023)
by: Ikezogwo, Wisdom Oluchi, et al.
Published: (2023)
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
by: Arazi, Alan, et al.
Published: (2026)
by: Arazi, Alan, et al.
Published: (2026)
A Framework For Refining Text Classification and Object Recognition from Academic Articles
by: Li, Jinghong, et al.
Published: (2023)
by: Li, Jinghong, et al.
Published: (2023)
MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks
by: Wu, Yiming, et al.
Published: (2024)
by: Wu, Yiming, et al.
Published: (2024)
PromptTA: Prompt-driven Text Adapter for Source-free Domain Generalization
by: Zhang, Haoran, et al.
Published: (2024)
by: Zhang, Haoran, et al.
Published: (2024)
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
by: Miranda, Imanol, et al.
Published: (2024)
by: Miranda, Imanol, et al.
Published: (2024)
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
by: Cai, Yuzhu, et al.
Published: (2024)
by: Cai, Yuzhu, et al.
Published: (2024)
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
by: Patel, Maitreya, et al.
Published: (2023)
by: Patel, Maitreya, et al.
Published: (2023)
Similar Items
-
Robustness of Structured Data Extraction from Perspectively Distorted Documents
by: Nakada, Hyakka, et al.
Published: (2025) -
Centered Masking for Language-Image Pre-Training
by: Liang, Mingliang, et al.
Published: (2024) -
BodyShapeGPT: SMPL Body Shape Manipulation with LLMs
by: Árbol, Baldomero R., et al.
Published: (2024) -
Model Interpretability and Rationale Extraction by Input Mask Optimization
by: Brinner, Marc, et al.
Published: (2025) -
FisherMask: Enhancing Neural Network Labeling Efficiency in Image Classification Using Fisher Information
by: Gul, Shreen, et al.
Published: (2024)