:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nakada, Hyakka, Tanaka, Yoshiyasu
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2511.17607
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

What Shape Is Optimal for Masks in Text Removal?
by: Nakada, Hyakka, et al.
Published: (2025)

Understanding and Rectifying Safety Perception Distortion in VLMs
by: Zou, Xiaohan, et al.
Published: (2025)

Model Interpretability and Rationale Extraction by Input Mask Optimization
by: Brinner, Marc, et al.
Published: (2025)

Improving MLLM Historical Record Extraction with Test-Time Image
by: Archibald, Taylor, et al.
Published: (2025)

Distributionally Robust Alignment for Medical Federated Vision-Language Pre-training Under Data Heterogeneity
by: Shuai, Zitao, et al.
Published: (2024)

LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining
by: Shen, Huawen, et al.
Published: (2024)

RealKIE: Five Novel Datasets for Enterprise Key Information Extraction
by: Townsend, Benjamin, et al.
Published: (2024)

RA-RRG: Multimodal Retrieval-Augmented Radiology Report Generation with Key Phrase Extraction
by: Park, Jonggwon, et al.
Published: (2025)

What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights
by: Wen, Xin, et al.
Published: (2024)

Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective
by: Zhang, Yanan, et al.
Published: (2024)

Rethinking Comprehensive Benchmark for Chart Understanding: A Perspective from Scientific Literature
by: Shen, Lingdong, et al.
Published: (2024)

LAPDoc: Layout-Aware Prompting for Documents
by: Lamott, Marcel, et al.
Published: (2024)

Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models
by: Góral, Gracjan, et al.
Published: (2024)

Investigating VLM Hallucination from a Cognitive Psychology Perspective: A First Step Toward Interpretation with Intriguing Observations
by: Liu, Xiangrui, et al.
Published: (2025)

Data Redaction from Conditional Generative Models
by: Kong, Zhifeng, et al.
Published: (2023)

Learning from Synthetic Data for Visual Grounding
by: He, Ruozhen, et al.
Published: (2024)

DocAtlas: Multilingual Document Understanding Across 80+ Languages
by: Heakl, Ahmed, et al.
Published: (2026)

DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation
by: Zhou, Yu, et al.
Published: (2025)

Examining the Robustness of Homogeneity Bias to Hyperparameter Adjustments in GPT-4
by: Lee, Messi H. J.
Published: (2025)

Rethinking Weight Decay for Robust Fine-Tuning of Foundation Models
by: Tian, Junjiao, et al.
Published: (2024)

On Structured State-Space Duality
by: Hu, Jerry Yao-Chieh, et al.
Published: (2025)

Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs
by: Lopez-Duran, Miguel, et al.
Published: (2025)

Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting
by: Hamed, Omar, et al.
Published: (2024)

A Language Anchor-Guided Method for Robust Noisy Domain Generalization
by: Dai, Zilin, et al.
Published: (2025)

URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering
by: Teng, Ge, et al.
Published: (2024)

Improve Academic Query Resolution through BERT-based Question Extraction from Images
by: Kamal, Nidhi, et al.
Published: (2024)

Composition-Grounded Data Synthesis for Visual Reasoning
by: Gu, Xinyi, et al.
Published: (2025)

Controlled Training Data Generation with Diffusion Models
by: Yeo, Teresa, et al.
Published: (2024)

MAGIC: Near-Optimal Data Attribution for Deep Learning
by: Ilyas, Andrew, et al.
Published: (2025)

Bidirectional Long-Range Parser for Sequential Data Understanding
by: Leotescu, George, et al.
Published: (2024)

A Survey on Data Augmentation in Large Model Era
by: Zhou, Yue, et al.
Published: (2024)

ICONS: Influence Consensus for Vision-Language Data Selection
by: Wu, Xindi, et al.
Published: (2024)

Style Feature Extraction Using Contrastive Conditioned Variational Autoencoders with Mutual Information Constraints
by: Yasutomi, Suguru, et al.
Published: (2023)

SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
by: Wu, Zijian, et al.
Published: (2025)

Data Alignment for Zero-Shot Concept Generation in Dermatology AI
by: Gadgil, Soham, et al.
Published: (2024)

Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation
by: Yin, Shukang, et al.
Published: (2024)

Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
by: Madaan, Divyam, et al.
Published: (2025)

Text-Enhanced Data-free Approach for Federated Class-Incremental Learning
by: Tran, Minh-Tuan, et al.
Published: (2024)

Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension
by: Parolari, Luca, et al.
Published: (2024)

FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data
by: Xu, Binqian, et al.
Published: (2024)