Saved in:
| Main Authors: | Zheng, Tong, Sone, Shusaku, Ushiku, Yoshitaka, Oba, Yuki, Ma, Jiaxin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.01802 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TriFusion-SR: Joint Tri-Modal Medical Image Fusion and SR
by: Dharejo, Fayaz Ali, et al.
Published: (2026)
by: Dharejo, Fayaz Ali, et al.
Published: (2026)
SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
by: Tanaka, Shohei, et al.
Published: (2024)
by: Tanaka, Shohei, et al.
Published: (2024)
SciPostLayoutTree: A Dataset for Structural Analysis of Scientific Posters
by: Tanaka, Shohei, et al.
Published: (2025)
by: Tanaka, Shohei, et al.
Published: (2025)
Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction
by: Gu, Yi, et al.
Published: (2025)
by: Gu, Yi, et al.
Published: (2025)
SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images
by: Shinoda, Risa, et al.
Published: (2024)
by: Shinoda, Risa, et al.
Published: (2024)
AgroBench: Vision-Language Model Benchmark in Agriculture
by: Shinoda, Risa, et al.
Published: (2025)
by: Shinoda, Risa, et al.
Published: (2025)
CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2025)
by: Saito, Kuniaki, et al.
Published: (2025)
Spatial-Spectral Binarized Neural Network for Panchromatic and Multi-spectral Images Fusion
by: Jiang, Yizhen, et al.
Published: (2025)
by: Jiang, Yizhen, et al.
Published: (2025)
XFMamba: Cross-Fusion Mamba for Multi-View Medical Image Classification
by: Zheng, Xiaoyu, et al.
Published: (2025)
by: Zheng, Xiaoyu, et al.
Published: (2025)
HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2026)
by: Saito, Kuniaki, et al.
Published: (2026)
HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2025)
by: Saito, Kuniaki, et al.
Published: (2025)
Recipe Generation from Unsegmented Cooking Videos
by: Nishimura, Taichi, et al.
Published: (2022)
by: Nishimura, Taichi, et al.
Published: (2022)
Balanced Diffusion-Guided Fusion for Multimodal Remote Sensing Classification
by: Liu, Hao, et al.
Published: (2025)
by: Liu, Hao, et al.
Published: (2025)
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering
by: Ukai, Mahiro, et al.
Published: (2024)
by: Ukai, Mahiro, et al.
Published: (2024)
BiSegMamba: Efficient Bidirectional Tri-Oriented Mamba for 3D Medical Image Segmentation
by: Zada, Bakht, et al.
Published: (2026)
by: Zada, Bakht, et al.
Published: (2026)
Multimodal Fusion Learning with Dual Attention for Medical Imaging
by: Dhar, Joy, et al.
Published: (2024)
by: Dhar, Joy, et al.
Published: (2024)
SciPostGen: Bridging the Gap between Scientific Papers and Poster Layouts
by: Inadumi, Shun, et al.
Published: (2025)
by: Inadumi, Shun, et al.
Published: (2025)
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
by: Ohkawa, Takehiko, et al.
Published: (2023)
by: Ohkawa, Takehiko, et al.
Published: (2023)
GraphMMP: A Graph Neural Network Model with Mutual Information and Global Fusion for Multimodal Medical Prognosis
by: Shan, Xuhao, et al.
Published: (2025)
by: Shan, Xuhao, et al.
Published: (2025)
DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection
by: Ye, Jiaxin, et al.
Published: (2024)
by: Ye, Jiaxin, et al.
Published: (2024)
MedTri: A Platform for Structured Medical Report Normalization to Enhance Vision-Language Pretraining
by: Chu, Yuetan, et al.
Published: (2026)
by: Chu, Yuetan, et al.
Published: (2026)
Tri-Modal Fusion Transformers for UAV-based Object Detection
by: Iaboni, Craig, et al.
Published: (2026)
by: Iaboni, Craig, et al.
Published: (2026)
HyPCA-Net: Advancing Multimodal Fusion in Medical Image Analysis
by: Dhar, J., et al.
Published: (2026)
by: Dhar, J., et al.
Published: (2026)
Neural Processing of Tri-Plane Hybrid Neural Fields
by: Cardace, Adriano, et al.
Published: (2023)
by: Cardace, Adriano, et al.
Published: (2023)
Reference-Free Image Quality Assessment for Virtual Try-On via Human Feedback
by: Hirakawa, Yuki, et al.
Published: (2026)
by: Hirakawa, Yuki, et al.
Published: (2026)
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark
by: Maeda, Koki, et al.
Published: (2024)
by: Maeda, Koki, et al.
Published: (2024)
Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism
by: Zheng, Jun, et al.
Published: (2024)
by: Zheng, Jun, et al.
Published: (2024)
iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance
by: Zheng, Jun, et al.
Published: (2026)
by: Zheng, Jun, et al.
Published: (2026)
USCNet: Transformer-Based Multimodal Fusion with Segmentation Guidance for Urolithiasis Classification
by: Wang, Changmiao, et al.
Published: (2026)
by: Wang, Changmiao, et al.
Published: (2026)
Mobile-VTON: High-Fidelity On-Device Virtual Try-On
by: Wan, Zhenchen, et al.
Published: (2026)
by: Wan, Zhenchen, et al.
Published: (2026)
Simultaneous Tri-Modal Medical Image Fusion and Super-Resolution using Conditional Diffusion Model
by: Xu, Yushen, et al.
Published: (2024)
by: Xu, Yushen, et al.
Published: (2024)
Interactive Multimodal Fusion with Temporal Modeling
by: Yu, Jun, et al.
Published: (2025)
by: Yu, Jun, et al.
Published: (2025)
FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba
by: Xie, Xinyu, et al.
Published: (2024)
by: Xie, Xinyu, et al.
Published: (2024)
Evaluating the Fairness of Neural Collapse in Medical Image Classification
by: Mouheb, Kaouther, et al.
Published: (2024)
by: Mouheb, Kaouther, et al.
Published: (2024)
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
by: Xu, Yuhao, et al.
Published: (2024)
by: Xu, Yuhao, et al.
Published: (2024)
SMFusion: Semantic-Preserving Fusion of Multimodal Medical Images for Enhanced Clinical Diagnosis
by: Xiang, Haozhe, et al.
Published: (2025)
by: Xiang, Haozhe, et al.
Published: (2025)
SNN-Driven Multimodal Human Action Recognition via Sparse Spatial-Temporal Data Fusion
by: Zheng, Naichuan, et al.
Published: (2025)
by: Zheng, Naichuan, et al.
Published: (2025)
TriFusion-AE: Language-Guided Depth and LiDAR Fusion for Robust Point Cloud Processing
by: Neogi, Susmit
Published: (2025)
by: Neogi, Susmit
Published: (2025)
TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data
by: Du, Siyi, et al.
Published: (2024)
by: Du, Siyi, et al.
Published: (2024)
Multi-Contrast Fusion Module: An attention mechanism integrating multi-contrast features for fetal torso plane classification
by: Zhu, Shengjun, et al.
Published: (2025)
by: Zhu, Shengjun, et al.
Published: (2025)
Similar Items
-
TriFusion-SR: Joint Tri-Modal Medical Image Fusion and SR
by: Dharejo, Fayaz Ali, et al.
Published: (2026) -
SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
by: Tanaka, Shohei, et al.
Published: (2024) -
SciPostLayoutTree: A Dataset for Structural Analysis of Scientific Posters
by: Tanaka, Shohei, et al.
Published: (2025) -
Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction
by: Gu, Yi, et al.
Published: (2025) -
SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images
by: Shinoda, Risa, et al.
Published: (2024)