Saved in:
| Main Authors: | Kim, Haven, Jung, Jongmin, Jeong, Dasaem, Nam, Juhan |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2309.11093 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On the de-duplication of the Lakh MIDI dataset
by: Choi, Eunjin, et al.
Published: (2025)
by: Choi, Eunjin, et al.
Published: (2025)
LAV: Audio-Driven Dynamic Visual Generation with Neural Compression and StyleGAN2
by: Jung, Jongmin, et al.
Published: (2025)
by: Jung, Jongmin, et al.
Published: (2025)
Automatic Time Signature Determination for New Scores Using Lyrics for Latent Rhythmic Structure
by: Liao, Callie C., et al.
Published: (2023)
by: Liao, Callie C., et al.
Published: (2023)
Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models
by: Ren, Jie, et al.
Published: (2024)
by: Ren, Jie, et al.
Published: (2024)
Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval
by: Doh, SeungHeon, et al.
Published: (2024)
by: Doh, SeungHeon, et al.
Published: (2024)
Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models
by: Kwon, Taegyun, et al.
Published: (2024)
by: Kwon, Taegyun, et al.
Published: (2024)
Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound
by: Lee, Junwon, et al.
Published: (2024)
by: Lee, Junwon, et al.
Published: (2024)
Hear What Matters! Text-conditioned Selective Video-to-Audio Generation
by: Lee, Junwon, et al.
Published: (2025)
by: Lee, Junwon, et al.
Published: (2025)
KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context
by: Lee, Nahyun, et al.
Published: (2026)
by: Lee, Nahyun, et al.
Published: (2026)
Training Data Efficiency in Multimodal Process Reward Models
by: Li, Jinyuan, et al.
Published: (2026)
by: Li, Jinyuan, et al.
Published: (2026)
ChemDFM-X: Towards Large Multimodal Model for Chemistry
by: Zhao, Zihan, et al.
Published: (2024)
by: Zhao, Zihan, et al.
Published: (2024)
Multimodal Large Language Models for Medicine: A Comprehensive Survey
by: Ye, Jiarui, et al.
Published: (2025)
by: Ye, Jiarui, et al.
Published: (2025)
MUDI: A Multimodal Biomedical Dataset for Understanding Pharmacodynamic Drug-Drug Interactions
by: Ngo, Tung-Lam, et al.
Published: (2025)
by: Ngo, Tung-Lam, et al.
Published: (2025)
Large Language Models for Computer-Aided Design: A Survey
by: Zhang, Licheng, et al.
Published: (2025)
by: Zhang, Licheng, et al.
Published: (2025)
Emotion Collider: Dual Hyperbolic Mirror Manifolds for Sentiment Recovery via Anti Emotion Reflection
by: Fu, Rong, et al.
Published: (2026)
by: Fu, Rong, et al.
Published: (2026)
MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification
by: Shah, Siddhant Bikram, et al.
Published: (2024)
by: Shah, Siddhant Bikram, et al.
Published: (2024)
Where Do We Go from Here? Multi-scale Allocentric Relational Inference from Natural Spatial Descriptions
by: Paz-Argaman, Tzuf, et al.
Published: (2024)
by: Paz-Argaman, Tzuf, et al.
Published: (2024)
Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage
by: He, Ziyi, et al.
Published: (2026)
by: He, Ziyi, et al.
Published: (2026)
Proactive Conversational Assistant for a Procedural Manual Task based on Audio and IMU
by: Mahfuz, Rehana, et al.
Published: (2026)
by: Mahfuz, Rehana, et al.
Published: (2026)
StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data
by: Pellegrain, Victor, et al.
Published: (2021)
by: Pellegrain, Victor, et al.
Published: (2021)
Mixture of LoRA Experts
by: Wu, Xun, et al.
Published: (2024)
by: Wu, Xun, et al.
Published: (2024)
ModalImmune: Immunity Driven Unlearning via Self Destructive Training
by: Fu, Rong, et al.
Published: (2026)
by: Fu, Rong, et al.
Published: (2026)
Multimodal Multi-loss Fusion Network for Sentiment Analysis
by: Wu, Zehui, et al.
Published: (2023)
by: Wu, Zehui, et al.
Published: (2023)
DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis
by: Wang, Pan, et al.
Published: (2024)
by: Wang, Pan, et al.
Published: (2024)
OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
by: Park, Jeongkyun, et al.
Published: (2023)
by: Park, Jeongkyun, et al.
Published: (2023)
Doctor Sun: A Bilingual Multimodal Large Language Model for Biomedical AI
by: Xue, Dong, et al.
Published: (2025)
by: Xue, Dong, et al.
Published: (2025)
Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
by: Wang, Qingni, et al.
Published: (2024)
by: Wang, Qingni, et al.
Published: (2024)
EEG2TEXT-CN: An Exploratory Study of Open-Vocabulary Chinese Text-EEG Alignment via Large Language Model and Contrastive Learning on ChineseEEG
by: Lu, Jacky Tai-Yu, et al.
Published: (2025)
by: Lu, Jacky Tai-Yu, et al.
Published: (2025)
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
by: Liu, Jing, et al.
Published: (2023)
by: Liu, Jing, et al.
Published: (2023)
Language Models as Black-Box Optimizers for Vision-Language Models
by: Liu, Shihong, et al.
Published: (2023)
by: Liu, Shihong, et al.
Published: (2023)
Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation
by: Kim, Wongyu, et al.
Published: (2025)
by: Kim, Wongyu, et al.
Published: (2025)
ExDDV: A New Dataset for Explainable Deepfake Detection in Video
by: Hondru, Vlad, et al.
Published: (2025)
by: Hondru, Vlad, et al.
Published: (2025)
Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models
by: Mehta, Atharva, et al.
Published: (2025)
by: Mehta, Atharva, et al.
Published: (2025)
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems
by: Wang, Chenxi, et al.
Published: (2025)
by: Wang, Chenxi, et al.
Published: (2025)
LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model
by: Sun, Yirong, et al.
Published: (2025)
by: Sun, Yirong, et al.
Published: (2025)
Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion
by: Zhang, Yichi, et al.
Published: (2024)
by: Zhang, Yichi, et al.
Published: (2024)
'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue
by: Gao, Rena, et al.
Published: (2024)
by: Gao, Rena, et al.
Published: (2024)
MultiScript30k: Leveraging Multilingual Embeddings to Extend Cross Script Parallel Data
by: Driggers-Ellis, Christopher, et al.
Published: (2025)
by: Driggers-Ellis, Christopher, et al.
Published: (2025)
MultiMedEdit: A Scenario-Aware Benchmark for Evaluating Knowledge Editing in Medical VQA
by: Wen, Shengtao, et al.
Published: (2025)
by: Wen, Shengtao, et al.
Published: (2025)
MuPHI: Learning Implicit Multimodal Harm Reasoning via Semantically Grounded Reward Optimization
by: Saha, Anisha, et al.
Published: (2026)
by: Saha, Anisha, et al.
Published: (2026)
Similar Items
-
On the de-duplication of the Lakh MIDI dataset
by: Choi, Eunjin, et al.
Published: (2025) -
LAV: Audio-Driven Dynamic Visual Generation with Neural Compression and StyleGAN2
by: Jung, Jongmin, et al.
Published: (2025) -
Automatic Time Signature Determination for New Scores Using Lyrics for Latent Rhythmic Structure
by: Liao, Callie C., et al.
Published: (2023) -
Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models
by: Ren, Jie, et al.
Published: (2024) -
Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval
by: Doh, SeungHeon, et al.
Published: (2024)