Saved in:
| Main Authors: | Nakada, Hyakka, Tanaka, Yoshiyasu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.17607 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
What Shape Is Optimal for Masks in Text Removal?
by: Nakada, Hyakka, et al.
Published: (2025)
by: Nakada, Hyakka, et al.
Published: (2025)
Understanding and Rectifying Safety Perception Distortion in VLMs
by: Zou, Xiaohan, et al.
Published: (2025)
by: Zou, Xiaohan, et al.
Published: (2025)
Model Interpretability and Rationale Extraction by Input Mask Optimization
by: Brinner, Marc, et al.
Published: (2025)
by: Brinner, Marc, et al.
Published: (2025)
Improving MLLM Historical Record Extraction with Test-Time Image
by: Archibald, Taylor, et al.
Published: (2025)
by: Archibald, Taylor, et al.
Published: (2025)
Distributionally Robust Alignment for Medical Federated Vision-Language Pre-training Under Data Heterogeneity
by: Shuai, Zitao, et al.
Published: (2024)
by: Shuai, Zitao, et al.
Published: (2024)
LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining
by: Shen, Huawen, et al.
Published: (2024)
by: Shen, Huawen, et al.
Published: (2024)
RealKIE: Five Novel Datasets for Enterprise Key Information Extraction
by: Townsend, Benjamin, et al.
Published: (2024)
by: Townsend, Benjamin, et al.
Published: (2024)
RA-RRG: Multimodal Retrieval-Augmented Radiology Report Generation with Key Phrase Extraction
by: Park, Jonggwon, et al.
Published: (2025)
by: Park, Jonggwon, et al.
Published: (2025)
What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights
by: Wen, Xin, et al.
Published: (2024)
by: Wen, Xin, et al.
Published: (2024)
Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective
by: Zhang, Yanan, et al.
Published: (2024)
by: Zhang, Yanan, et al.
Published: (2024)
Rethinking Comprehensive Benchmark for Chart Understanding: A Perspective from Scientific Literature
by: Shen, Lingdong, et al.
Published: (2024)
by: Shen, Lingdong, et al.
Published: (2024)
LAPDoc: Layout-Aware Prompting for Documents
by: Lamott, Marcel, et al.
Published: (2024)
by: Lamott, Marcel, et al.
Published: (2024)
Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models
by: Góral, Gracjan, et al.
Published: (2024)
by: Góral, Gracjan, et al.
Published: (2024)
Investigating VLM Hallucination from a Cognitive Psychology Perspective: A First Step Toward Interpretation with Intriguing Observations
by: Liu, Xiangrui, et al.
Published: (2025)
by: Liu, Xiangrui, et al.
Published: (2025)
Data Redaction from Conditional Generative Models
by: Kong, Zhifeng, et al.
Published: (2023)
by: Kong, Zhifeng, et al.
Published: (2023)
Learning from Synthetic Data for Visual Grounding
by: He, Ruozhen, et al.
Published: (2024)
by: He, Ruozhen, et al.
Published: (2024)
DocAtlas: Multilingual Document Understanding Across 80+ Languages
by: Heakl, Ahmed, et al.
Published: (2026)
by: Heakl, Ahmed, et al.
Published: (2026)
DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation
by: Zhou, Yu, et al.
Published: (2025)
by: Zhou, Yu, et al.
Published: (2025)
Examining the Robustness of Homogeneity Bias to Hyperparameter Adjustments in GPT-4
by: Lee, Messi H. J.
Published: (2025)
by: Lee, Messi H. J.
Published: (2025)
Rethinking Weight Decay for Robust Fine-Tuning of Foundation Models
by: Tian, Junjiao, et al.
Published: (2024)
by: Tian, Junjiao, et al.
Published: (2024)
On Structured State-Space Duality
by: Hu, Jerry Yao-Chieh, et al.
Published: (2025)
by: Hu, Jerry Yao-Chieh, et al.
Published: (2025)
Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs
by: Lopez-Duran, Miguel, et al.
Published: (2025)
by: Lopez-Duran, Miguel, et al.
Published: (2025)
Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting
by: Hamed, Omar, et al.
Published: (2024)
by: Hamed, Omar, et al.
Published: (2024)
A Language Anchor-Guided Method for Robust Noisy Domain Generalization
by: Dai, Zilin, et al.
Published: (2025)
by: Dai, Zilin, et al.
Published: (2025)
URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering
by: Teng, Ge, et al.
Published: (2024)
by: Teng, Ge, et al.
Published: (2024)
Improve Academic Query Resolution through BERT-based Question Extraction from Images
by: Kamal, Nidhi, et al.
Published: (2024)
by: Kamal, Nidhi, et al.
Published: (2024)
Composition-Grounded Data Synthesis for Visual Reasoning
by: Gu, Xinyi, et al.
Published: (2025)
by: Gu, Xinyi, et al.
Published: (2025)
Controlled Training Data Generation with Diffusion Models
by: Yeo, Teresa, et al.
Published: (2024)
by: Yeo, Teresa, et al.
Published: (2024)
MAGIC: Near-Optimal Data Attribution for Deep Learning
by: Ilyas, Andrew, et al.
Published: (2025)
by: Ilyas, Andrew, et al.
Published: (2025)
Bidirectional Long-Range Parser for Sequential Data Understanding
by: Leotescu, George, et al.
Published: (2024)
by: Leotescu, George, et al.
Published: (2024)
A Survey on Data Augmentation in Large Model Era
by: Zhou, Yue, et al.
Published: (2024)
by: Zhou, Yue, et al.
Published: (2024)
ICONS: Influence Consensus for Vision-Language Data Selection
by: Wu, Xindi, et al.
Published: (2024)
by: Wu, Xindi, et al.
Published: (2024)
Style Feature Extraction Using Contrastive Conditioned Variational Autoencoders with Mutual Information Constraints
by: Yasutomi, Suguru, et al.
Published: (2023)
by: Yasutomi, Suguru, et al.
Published: (2023)
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
by: Wu, Zijian, et al.
Published: (2025)
by: Wu, Zijian, et al.
Published: (2025)
Data Alignment for Zero-Shot Concept Generation in Dermatology AI
by: Gadgil, Soham, et al.
Published: (2024)
by: Gadgil, Soham, et al.
Published: (2024)
Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation
by: Yin, Shukang, et al.
Published: (2024)
by: Yin, Shukang, et al.
Published: (2024)
Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
by: Madaan, Divyam, et al.
Published: (2025)
by: Madaan, Divyam, et al.
Published: (2025)
Text-Enhanced Data-free Approach for Federated Class-Incremental Learning
by: Tran, Minh-Tuan, et al.
Published: (2024)
by: Tran, Minh-Tuan, et al.
Published: (2024)
Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension
by: Parolari, Luca, et al.
Published: (2024)
by: Parolari, Luca, et al.
Published: (2024)
FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data
by: Xu, Binqian, et al.
Published: (2024)
by: Xu, Binqian, et al.
Published: (2024)
Similar Items
-
What Shape Is Optimal for Masks in Text Removal?
by: Nakada, Hyakka, et al.
Published: (2025) -
Understanding and Rectifying Safety Perception Distortion in VLMs
by: Zou, Xiaohan, et al.
Published: (2025) -
Model Interpretability and Rationale Extraction by Input Mask Optimization
by: Brinner, Marc, et al.
Published: (2025) -
Improving MLLM Historical Record Extraction with Test-Time Image
by: Archibald, Taylor, et al.
Published: (2025) -
Distributionally Robust Alignment for Medical Federated Vision-Language Pre-training Under Data Heterogeneity
by: Shuai, Zitao, et al.
Published: (2024)