Saved in:
| Main Authors: | Taniguchi, Takara, Shimoda, Wataru, Yamaguchi, Kota, Nakayama, Hideki |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.24331 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learning Gaussian Data Augmentation in Feature Space for One-shot Object Detection in Manga
by: Taniguchi, Takara, et al.
Published: (2024)
by: Taniguchi, Takara, et al.
Published: (2024)
EDGE-Shield: Efficient Denoising-staGE Shield for Violative Content Filtering via Scalable Reference-Based Matching
by: Taniguchi, Takara, et al.
Published: (2026)
by: Taniguchi, Takara, et al.
Published: (2026)
MangaUB: A Manga Understanding Benchmark for Large Multimodal Models
by: Ikuta, Hikaru, et al.
Published: (2024)
by: Ikuta, Hikaru, et al.
Published: (2024)
Harnessing the Latent Diffusion Model for Training-Free Image Style Transfer
by: Masui, Kento, et al.
Published: (2024)
by: Masui, Kento, et al.
Published: (2024)
On Parallelism in Music and Language: A Perspective from Symbol Emergence Systems based on Probabilistic Generative Models
by: Taniguchi, Tadahiro
Published: (2025)
by: Taniguchi, Tadahiro
Published: (2025)
OTR: Synthesizing Overlay Text Dataset for Text Removal
by: Zdenek, Jan, et al.
Published: (2025)
by: Zdenek, Jan, et al.
Published: (2025)
Multimodal Markup Document Models for Graphic Design Completion
by: Kikuchi, Kotaro, et al.
Published: (2024)
by: Kikuchi, Kotaro, et al.
Published: (2024)
OpenCOLE: Towards Reproducible Automatic Graphic Design Generation
by: Inoue, Naoto, et al.
Published: (2024)
by: Inoue, Naoto, et al.
Published: (2024)
FashionReGen: LLM-Empowered Fashion Report Generation
by: Ding, Yujuan, et al.
Published: (2024)
by: Ding, Yujuan, et al.
Published: (2024)
AdaptaGen: Domain-Specific Image Generation through Hierarchical Semantic Optimization Framework
by: Zhang, Suoxiang, et al.
Published: (2025)
by: Zhang, Suoxiang, et al.
Published: (2025)
LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
by: Hao, Bowen, et al.
Published: (2025)
by: Hao, Bowen, et al.
Published: (2025)
CAFE: Channel-Autoregressive Factorized Encoding for Robust Biosignal Spatial Super-Resolution
by: Liu, Hongjun, et al.
Published: (2026)
by: Liu, Hongjun, et al.
Published: (2026)
MindFuse: Towards GenAI Explainability in Marketing Strategy Co-Creation
by: Farseev, Aleksandr, et al.
Published: (2025)
by: Farseev, Aleksandr, et al.
Published: (2025)
A 3D Framework for Improving Low-Latency Multi-Channel Live Streaming
by: Aiersilan, Aizierjiang, et al.
Published: (2024)
by: Aiersilan, Aizierjiang, et al.
Published: (2024)
Immersive Fantasy Based on Digital Nostalgia: Environmental Narratives for the Korean Millennials and Gen Z
by: Doh, Yerin, et al.
Published: (2025)
by: Doh, Yerin, et al.
Published: (2025)
Face Consistency Benchmark for GenAI Video
by: Podstawski, Michal, et al.
Published: (2025)
by: Podstawski, Michal, et al.
Published: (2025)
Total Disentanglement of Font Images into Style and Character Class Features
by: Haraguchi, Daichi, et al.
Published: (2024)
by: Haraguchi, Daichi, et al.
Published: (2024)
PanoGen++: Domain-Adapted Text-Guided Panoramic Environment Generation for Vision-and-Language Navigation
by: Wang, Sen, et al.
Published: (2025)
by: Wang, Sen, et al.
Published: (2025)
A Low-Latency 3D Live Remote Visualization System for Tourist Sites Integrating Dynamic and Pre-captured Static Point Clouds
by: Matsumoto, Takahiro, et al.
Published: (2025)
by: Matsumoto, Takahiro, et al.
Published: (2025)
GenState-AI: State-Aware Dataset for Text-to-Video Retrieval on AI-Generated Videos
by: Li, Minghan, et al.
Published: (2026)
by: Li, Minghan, et al.
Published: (2026)
Generative AI-enabled Mobile Tactical Multimedia Networks: Distribution, Generation, and Perception
by: Xu, Minrui, et al.
Published: (2024)
by: Xu, Minrui, et al.
Published: (2024)
Hyperbolic Multimodal Generative Representation Learning for Generalized Zero-Shot Multimodal Information Extraction
by: Zhou, Baohang, et al.
Published: (2026)
by: Zhou, Baohang, et al.
Published: (2026)
MangaFlow: An End-to-End Agentic Framework for Controllable Story to Manga Generation
by: Wang, Muyao, et al.
Published: (2026)
by: Wang, Muyao, et al.
Published: (2026)
Editing on the Generative Manifold: A Theoretical and Empirical Study of General Diffusion-Based Image Editing Trade-offs
by: Hu, Yi, et al.
Published: (2026)
by: Hu, Yi, et al.
Published: (2026)
DeepStream: Prototyping Deep Joint Source-Channel Coding for Real-Time Multimedia Transmissions
by: Chi, Kaiyi, et al.
Published: (2025)
by: Chi, Kaiyi, et al.
Published: (2025)
PiGW: A Plug-in Generative Watermarking Framework
by: Ma, Rui, et al.
Published: (2024)
by: Ma, Rui, et al.
Published: (2024)
Cap2Sum: Learning to Summarize Videos by Generating Captions
by: Zhao, Cairong, et al.
Published: (2024)
by: Zhao, Cairong, et al.
Published: (2024)
Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing
by: Zhang, Juan, et al.
Published: (2024)
by: Zhang, Juan, et al.
Published: (2024)
Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline
by: Yang, Dingyi, et al.
Published: (2024)
by: Yang, Dingyi, et al.
Published: (2024)
Multi-Reference Generative Face Video Compression with Contrastive Learning
by: Konuko, Goluck, et al.
Published: (2024)
by: Konuko, Goluck, et al.
Published: (2024)
Analyzing Recursiveness in Multimodal Generative Artificial Intelligence: Stability or Divergence?
by: Conde, Javier, et al.
Published: (2024)
by: Conde, Javier, et al.
Published: (2024)
Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing
by: Tong, Haonan, et al.
Published: (2024)
by: Tong, Haonan, et al.
Published: (2024)
MultiSoundGen: Video-to-Audio Generation for Multi-Event Scenarios via SlowFast Contrastive Audio-Visual Pretraining and Direct Preference Optimization
by: Yang, Jianxuan, et al.
Published: (2025)
by: Yang, Jianxuan, et al.
Published: (2025)
Latent Feature-Guided Conditional Diffusion for Generative Image Semantic Communication
by: Chen, Zehao, et al.
Published: (2025)
by: Chen, Zehao, et al.
Published: (2025)
Automatically Generating High-Precision Simulated Road Networking in Traffic Scenario
by: Xie, Liang, et al.
Published: (2025)
by: Xie, Liang, et al.
Published: (2025)
Automated Radiology Report Generation Based on Topic-Keyword Semantic Guidance
by: Xiao, Jing, et al.
Published: (2025)
by: Xiao, Jing, et al.
Published: (2025)
Augmenting Intra-Modal Understanding in MLLMs for Robust Multimodal Keyphrase Generation
by: Cao, Jiajun, et al.
Published: (2025)
by: Cao, Jiajun, et al.
Published: (2025)
UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning
by: Bai, Hayes, et al.
Published: (2026)
by: Bai, Hayes, et al.
Published: (2026)
LLM2Manim: Pedagogy-Aware AI Generation of STEM Animations
by: Joshi, Aastha, et al.
Published: (2026)
by: Joshi, Aastha, et al.
Published: (2026)
REAL: Realism Evaluation of Text-to-Image Generation Models for Effective Data Augmentation
by: Li, Ran, et al.
Published: (2025)
by: Li, Ran, et al.
Published: (2025)
Similar Items
-
Learning Gaussian Data Augmentation in Feature Space for One-shot Object Detection in Manga
by: Taniguchi, Takara, et al.
Published: (2024) -
EDGE-Shield: Efficient Denoising-staGE Shield for Violative Content Filtering via Scalable Reference-Based Matching
by: Taniguchi, Takara, et al.
Published: (2026) -
MangaUB: A Manga Understanding Benchmark for Large Multimodal Models
by: Ikuta, Hikaru, et al.
Published: (2024) -
Harnessing the Latent Diffusion Model for Training-Free Image Style Transfer
by: Masui, Kento, et al.
Published: (2024) -
On Parallelism in Music and Language: A Perspective from Symbol Emergence Systems based on Probabilistic Generative Models
by: Taniguchi, Tadahiro
Published: (2025)