Saved in:
| Main Authors: | Vilaça, Luís, Yu, Yi, Viana, Paula |
|---|---|
| Format: | Preprint |
| Published: |
2022
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2202.13673 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
by: Vilaca, Luis, et al.
Published: (2024)
by: Vilaca, Luis, et al.
Published: (2024)
Metric Learning with Progressive Self-Distillation for Audio-Visual Embedding Learning
by: Zeng, Donghuo, et al.
Published: (2025)
by: Zeng, Donghuo, et al.
Published: (2025)
Anchor-aware Deep Metric Learning for Audio-visual Retrieval
by: Zeng, Donghuo, et al.
Published: (2024)
by: Zeng, Donghuo, et al.
Published: (2024)
Audio Does Matter: Importance-Aware Multi-Granularity Fusion for Video Moment Retrieval
by: Lin, Junan, et al.
Published: (2025)
by: Lin, Junan, et al.
Published: (2025)
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model
by: Vosoughi, Ali, et al.
Published: (2025)
by: Vosoughi, Ali, et al.
Published: (2025)
On the Effect of Data-Augmentation on Local Embedding Properties in the Contrastive Learning of Music Audio Representations
by: McCallum, Matthew C., et al.
Published: (2024)
by: McCallum, Matthew C., et al.
Published: (2024)
Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance
by: Bao, Xuchan, et al.
Published: (2024)
by: Bao, Xuchan, et al.
Published: (2024)
ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval
by: Fu, Siyuan, et al.
Published: (2025)
by: Fu, Siyuan, et al.
Published: (2025)
Towards Multilingual Audio-Visual Question Answering
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
MERGE -- A Bimodal Audio-Lyrics Dataset for Static Music Emotion Recognition
by: Louro, Pedro Lima, et al.
Published: (2024)
by: Louro, Pedro Lima, et al.
Published: (2024)
TalkPlay-Tools: Conversational Music Recommendation with LLM Tool Calling
by: Doh, Seungheon, et al.
Published: (2025)
by: Doh, Seungheon, et al.
Published: (2025)
Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval
by: Doh, SeungHeon, et al.
Published: (2024)
by: Doh, SeungHeon, et al.
Published: (2024)
JEPOO: Highly Accurate Joint Estimation of Pitch, Onset and Offset for Music Information Retrieval
by: Wei, Haojie, et al.
Published: (2023)
by: Wei, Haojie, et al.
Published: (2023)
Streaming Piano Transcription Based on Consistent Onset and Offset Decoding with Sustain Pedal Detection
by: Wei, Weixing, et al.
Published: (2025)
by: Wei, Weixing, et al.
Published: (2025)
SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription
by: Zang, Yongyi, et al.
Published: (2023)
by: Zang, Yongyi, et al.
Published: (2023)
Learning Normal Patterns in Musical Loops
by: Dadman, Shayan, et al.
Published: (2025)
by: Dadman, Shayan, et al.
Published: (2025)
Music Genre Classification: Ensemble Learning with Subcomponents-level Attention
by: Liu, Yichen, et al.
Published: (2024)
by: Liu, Yichen, et al.
Published: (2024)
Discrepancy-Aware Attention Network for Enhanced Audio-Visual Zero-Shot Learning
by: Yu, RunLin, et al.
Published: (2024)
by: Yu, RunLin, et al.
Published: (2024)
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment
by: Senocak, Arda, et al.
Published: (2024)
by: Senocak, Arda, et al.
Published: (2024)
Classifying Shelf Life Quality of Pineapples by Combining Audio and Visual Features
by: Jiang, Yi-Lu, et al.
Published: (2025)
by: Jiang, Yi-Lu, et al.
Published: (2025)
Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning
by: Xu, Le, et al.
Published: (2025)
by: Xu, Le, et al.
Published: (2025)
Learning Self-Supervised Audio-Visual Representations for Sound Recommendations
by: Krishnamurthy, Sudha
Published: (2024)
by: Krishnamurthy, Sudha
Published: (2024)
DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information
by: Nakada, Shota, et al.
Published: (2024)
by: Nakada, Shota, et al.
Published: (2024)
Audio-Vision Contrastive Learning for Phonological Class Recognition
by: Liu, Daiqi, et al.
Published: (2025)
by: Liu, Daiqi, et al.
Published: (2025)
Benchmarking Cross-Domain Audio-Visual Deception Detection
by: Guo, Xiaobao, et al.
Published: (2024)
by: Guo, Xiaobao, et al.
Published: (2024)
A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability
by: Tseng, Li-Yang, et al.
Published: (2024)
by: Tseng, Li-Yang, et al.
Published: (2024)
TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation
by: Choi, Keunwoo, et al.
Published: (2025)
by: Choi, Keunwoo, et al.
Published: (2025)
Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning
by: Li, Wenrui, et al.
Published: (2024)
by: Li, Wenrui, et al.
Published: (2024)
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
by: Tseng, Yuan, et al.
Published: (2023)
by: Tseng, Yuan, et al.
Published: (2023)
PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores
by: Goncalves, Lucas, et al.
Published: (2024)
by: Goncalves, Lucas, et al.
Published: (2024)
Multimodal Transformer Distillation for Audio-Visual Synchronization
by: Chen, Xuanjun, et al.
Published: (2022)
by: Chen, Xuanjun, et al.
Published: (2022)
3D Audio-Visual Segmentation
by: Sokolov, Artem, et al.
Published: (2024)
by: Sokolov, Artem, et al.
Published: (2024)
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
by: Niizumi, Daisuke, et al.
Published: (2024)
by: Niizumi, Daisuke, et al.
Published: (2024)
Pilot-guided Multimodal Semantic Communication for Audio-Visual Event Localization
by: Yu, Fei, et al.
Published: (2024)
by: Yu, Fei, et al.
Published: (2024)
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
by: Zhao, Yusheng, et al.
Published: (2025)
by: Zhao, Yusheng, et al.
Published: (2025)
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition
by: Liu, Zehua, et al.
Published: (2024)
by: Liu, Zehua, et al.
Published: (2024)
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
by: Choi, Jeongsoo, et al.
Published: (2023)
by: Choi, Jeongsoo, et al.
Published: (2023)
AD-AVSR: Asymmetric Dual-stream Enhancement for Robust Audio-Visual Speech Recognition
by: Xue, Junxiao, et al.
Published: (2025)
by: Xue, Junxiao, et al.
Published: (2025)
Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies
by: Astrid, Marcella, et al.
Published: (2024)
by: Astrid, Marcella, et al.
Published: (2024)
SNIFR : Boosting Fine-Grained Child Harmful Content Detection Through Audio-Visual Alignment with Cascaded Cross-Transformer
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Similar Items
-
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
by: Vilaca, Luis, et al.
Published: (2024) -
Metric Learning with Progressive Self-Distillation for Audio-Visual Embedding Learning
by: Zeng, Donghuo, et al.
Published: (2025) -
Anchor-aware Deep Metric Learning for Audio-visual Retrieval
by: Zeng, Donghuo, et al.
Published: (2024) -
Audio Does Matter: Importance-Aware Multi-Granularity Fusion for Video Moment Retrieval
by: Lin, Junan, et al.
Published: (2025) -
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model
by: Vosoughi, Ali, et al.
Published: (2025)