Saved in:
| Main Authors: | Feng, Siyuan, Yoshinaga, Teruya, Hayashi, Katsuhiko, Washio, Koki, Kamigaito, Hidetaka |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.19141 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Artwork Explanation in Large-scale Vision Language Models
by: Hayashi, Kazuki, et al.
Published: (2024)
by: Hayashi, Kazuki, et al.
Published: (2024)
Can Impressions of Music be Extracted from Thumbnail Images?
by: Harada, Takashi, et al.
Published: (2025)
by: Harada, Takashi, et al.
Published: (2025)
TextTIGER: Text-based Intelligent Generation with Entity Prompt Refinement for Text-to-Image Generation
by: Ozaki, Shintaro, et al.
Published: (2025)
by: Ozaki, Shintaro, et al.
Published: (2025)
Diagnosing Vision Language Models' Perception by Leveraging Human Methods for Color Vision Deficiencies
by: Hayashi, Kazuki, et al.
Published: (2025)
by: Hayashi, Kazuki, et al.
Published: (2025)
Manga Generation via Layout-controllable Diffusion
by: Chen, Siyu, et al.
Published: (2024)
by: Chen, Siyu, et al.
Published: (2024)
IRR: Image Review Ranking Framework for Evaluating Vision-Language Models
by: Hayashi, Kazuki, et al.
Published: (2024)
by: Hayashi, Kazuki, et al.
Published: (2024)
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models
by: Atuhurra, Jesse, et al.
Published: (2024)
by: Atuhurra, Jesse, et al.
Published: (2024)
The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting
by: Tomita, Masayo, et al.
Published: (2025)
by: Tomita, Masayo, et al.
Published: (2025)
Multi-Frame Vision-Language Model for Long-form Reasoning in Driver Behavior Analysis
by: Takato, Hiroshi, et al.
Published: (2024)
by: Takato, Hiroshi, et al.
Published: (2024)
Towards Temporal Change Explanations from Bi-Temporal Satellite Images
by: Tsujimoto, Ryo, et al.
Published: (2024)
by: Tsujimoto, Ryo, et al.
Published: (2024)
VLURes: Benchmarking VLM Visual and Linguistic Understanding in Low-Resource Languages
by: Atuhurra, Jesse, et al.
Published: (2025)
by: Atuhurra, Jesse, et al.
Published: (2025)
From Formal Language Theory to Statistical Learning: Finite Observability of Subregular Languages
by: Hayashi, Katsuhiko, et al.
Published: (2025)
by: Hayashi, Katsuhiko, et al.
Published: (2025)
MMCIG: Multimodal Cover Image Generation for Text-only Documents and Its Dataset Construction via Pseudo-labeling
by: Kim, Hyeyeon, et al.
Published: (2025)
by: Kim, Hyeyeon, et al.
Published: (2025)
J-ORA: A Framework and Multimodal Dataset for Japanese Object Identification, Reference, Action Prediction in Robot Perception
by: Atuhurra, Jesse, et al.
Published: (2025)
by: Atuhurra, Jesse, et al.
Published: (2025)
MangaFlow: An End-to-End Agentic Framework for Controllable Story to Manga Generation
by: Wang, Muyao, et al.
Published: (2026)
by: Wang, Muyao, et al.
Published: (2026)
MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding
by: Baek, Jeonghun, et al.
Published: (2025)
by: Baek, Jeonghun, et al.
Published: (2025)
Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding
by: Baek, Jeonghun, et al.
Published: (2026)
by: Baek, Jeonghun, et al.
Published: (2026)
MangaUB: A Manga Understanding Benchmark for Large Multimodal Models
by: Ikuta, Hikaru, et al.
Published: (2024)
by: Ikuta, Hikaru, et al.
Published: (2024)
Toward Automatic Safe Driving Instruction: A Large-Scale Vision Language Model Approach
by: Sakajo, Haruki, et al.
Published: (2025)
by: Sakajo, Haruki, et al.
Published: (2025)
The Manga Whisperer: Automatically Generating Transcriptions for Comics
by: Sachdeva, Ragav, et al.
Published: (2024)
by: Sachdeva, Ragav, et al.
Published: (2024)
Number it: Temporal Grounding Videos like Flipping Manga
by: Wu, Yongliang, et al.
Published: (2024)
by: Wu, Yongliang, et al.
Published: (2024)
Inference-time Trajectory Optimization for Manga Image Editing
by: Furuta, Ryosuke
Published: (2026)
by: Furuta, Ryosuke
Published: (2026)
Unified Interpretation of Smoothing Methods for Negative Sampling Loss Functions in Knowledge Graph Embedding
by: Feng, Xincan, et al.
Published: (2024)
by: Feng, Xincan, et al.
Published: (2024)
Model-based Subsampling for Knowledge Graph Completion
by: Feng, Xincan, et al.
Published: (2023)
by: Feng, Xincan, et al.
Published: (2023)
MangaNinja: Line Art Colorization with Precise Reference Following
by: Liu, Zhiheng, et al.
Published: (2025)
by: Liu, Zhiheng, et al.
Published: (2025)
Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names
by: Sachdeva, Ragav, et al.
Published: (2024)
by: Sachdeva, Ragav, et al.
Published: (2024)
Region-Wise Correspondence Prediction between Manga Line Art Images
by: Li, Yingxuan, et al.
Published: (2025)
by: Li, Yingxuan, et al.
Published: (2025)
Re:Verse -- Can Your VLM Read a Manga?
by: Baranwal, Aaditya, et al.
Published: (2025)
by: Baranwal, Aaditya, et al.
Published: (2025)
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
by: Wu, Jianzong, et al.
Published: (2024)
by: Wu, Jianzong, et al.
Published: (2024)
Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection
by: Li, Yingxuan, et al.
Published: (2023)
by: Li, Yingxuan, et al.
Published: (2023)
MangaDiT: Reference-Guided Line Art Colorization with Hierarchical Attention in Diffusion Transformers
by: Qiu, Qianru, et al.
Published: (2025)
by: Qiu, Qianru, et al.
Published: (2025)
LayoutFlow: Flow Matching for Layout Generation
by: Guerreiro, Julian Jorge Andrade, et al.
Published: (2024)
by: Guerreiro, Julian Jorge Andrade, et al.
Published: (2024)
SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
by: Tanaka, Shohei, et al.
Published: (2024)
by: Tanaka, Shohei, et al.
Published: (2024)
Co-Layout: LLM-driven Co-optimization for Interior Layout
by: Xiang, Chucheng, et al.
Published: (2025)
by: Xiang, Chucheng, et al.
Published: (2025)
GANime: Generating Anime and Manga Character Drawings from Sketches with Deep Learning
by: Vu, Tai, et al.
Published: (2025)
by: Vu, Tai, et al.
Published: (2025)
Defining and Evaluating Visual Language Models' Basic Spatial Abilities: A Perspective from Psychometrics
by: Xu, Wenrui, et al.
Published: (2025)
by: Xu, Wenrui, et al.
Published: (2025)
VASCAR: Content-Aware Layout Generation via Visual-Aware Self-Correction
by: Zhang, Jiahao, et al.
Published: (2024)
by: Zhang, Jiahao, et al.
Published: (2024)
Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
by: Horita, Daichi, et al.
Published: (2023)
by: Horita, Daichi, et al.
Published: (2023)
Layout Anything: One Transformer for Universal Room Layout Estimation
by: Mia, Md Sohag, et al.
Published: (2025)
by: Mia, Md Sohag, et al.
Published: (2025)
LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
by: Zheng, Guangcong, et al.
Published: (2023)
by: Zheng, Guangcong, et al.
Published: (2023)
Similar Items
-
Towards Artwork Explanation in Large-scale Vision Language Models
by: Hayashi, Kazuki, et al.
Published: (2024) -
Can Impressions of Music be Extracted from Thumbnail Images?
by: Harada, Takashi, et al.
Published: (2025) -
TextTIGER: Text-based Intelligent Generation with Entity Prompt Refinement for Text-to-Image Generation
by: Ozaki, Shintaro, et al.
Published: (2025) -
Diagnosing Vision Language Models' Perception by Leveraging Human Methods for Color Vision Deficiencies
by: Hayashi, Kazuki, et al.
Published: (2025) -
Manga Generation via Layout-controllable Diffusion
by: Chen, Siyu, et al.
Published: (2024)