Saved in:
| Main Authors: | Nguyen, Khanh-Binh, Park, Chae Jung |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.02004 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Audio-Visual Segmentation via Unlabeled Frame Exploitation
by: Liu, Jinxiang, et al.
Published: (2024)
by: Liu, Jinxiang, et al.
Published: (2024)
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
by: Yang, Qi, et al.
Published: (2023)
by: Yang, Qi, et al.
Published: (2023)
Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
by: Chen, Tianxiang, et al.
Published: (2024)
by: Chen, Tianxiang, et al.
Published: (2024)
Audit After Segmentation: Reference-Free Mask Quality Assessment for Language-Referred Audio-Visual Segmentation
by: Zhou, Jinxing, et al.
Published: (2026)
by: Zhou, Jinxing, et al.
Published: (2026)
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
by: Liu, Chen, et al.
Published: (2025)
by: Liu, Chen, et al.
Published: (2025)
Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation
by: Wu, Renjie, et al.
Published: (2023)
by: Wu, Renjie, et al.
Published: (2023)
DDAVS: Disentangled Audio Semantics and Delayed Bidirectional Alignment for Audio-Visual Segmentation
by: Tian, Jingqi, et al.
Published: (2025)
by: Tian, Jingqi, et al.
Published: (2025)
VGGSounder: Audio-Visual Evaluations for Foundation Models
by: Zverev, Daniil, et al.
Published: (2025)
by: Zverev, Daniil, et al.
Published: (2025)
3D Audio-Visual Segmentation
by: Sokolov, Artem, et al.
Published: (2024)
by: Sokolov, Artem, et al.
Published: (2024)
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
by: Kang, Minjae, et al.
Published: (2025)
by: Kang, Minjae, et al.
Published: (2025)
Object-AVEdit: An Object-level Audio-Visual Editing Model
by: Fu, Youquan, et al.
Published: (2025)
by: Fu, Youquan, et al.
Published: (2025)
Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content
by: Wu, Sheng, et al.
Published: (2024)
by: Wu, Sheng, et al.
Published: (2024)
Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio
by: Jung, Jongmin, et al.
Published: (2025)
by: Jung, Jongmin, et al.
Published: (2025)
Automated Classification of Phonetic Segments in Child Speech Using Raw Ultrasound Imaging
by: Ani, Saja Al, et al.
Published: (2024)
by: Ani, Saja Al, et al.
Published: (2024)
GIRAFE: Glottal Imaging Dataset for Advanced Segmentation, Analysis, and Facilitative Playbacks Evaluation
by: Andrade-Miranda, G., et al.
Published: (2024)
by: Andrade-Miranda, G., et al.
Published: (2024)
Sounding Highlights: Dual-Pathway Audio Encoders for Audio-Visual Video Highlight Detection
by: Joo, Seohyun, et al.
Published: (2026)
by: Joo, Seohyun, et al.
Published: (2026)
Enhancing Lie Detection Accuracy: A Comparative Study of Classic ML, CNN, and GCN Models using Audio-Visual Features
by: Abdelwahab, Abdelrahman, et al.
Published: (2024)
by: Abdelwahab, Abdelrahman, et al.
Published: (2024)
SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
by: Qian, Xinyuan, et al.
Published: (2024)
by: Qian, Xinyuan, et al.
Published: (2024)
MAVERIX: Multimodal Audio-Visual Evaluation and Recognition IndeX
by: Xie, Liuyue, et al.
Published: (2025)
by: Xie, Liuyue, et al.
Published: (2025)
Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization
by: Katamneni, Vinaya Sree, et al.
Published: (2024)
by: Katamneni, Vinaya Sree, et al.
Published: (2024)
ZeroSep: Separate Anything in Audio with Zero Training
by: Huang, Chao, et al.
Published: (2025)
by: Huang, Chao, et al.
Published: (2025)
What's Making That Sound Right Now? Video-centric Audio-Visual Localization
by: Choi, Hahyeon, et al.
Published: (2025)
by: Choi, Hahyeon, et al.
Published: (2025)
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
by: Burchi, Maxime, et al.
Published: (2024)
by: Burchi, Maxime, et al.
Published: (2024)
ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data
by: Liu, Zeyi, et al.
Published: (2024)
by: Liu, Zeyi, et al.
Published: (2024)
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
by: Vilaca, Luis, et al.
Published: (2024)
by: Vilaca, Luis, et al.
Published: (2024)
Audio-Visual Instance Segmentation
by: Guo, Ruohao, et al.
Published: (2023)
by: Guo, Ruohao, et al.
Published: (2023)
Bridging Audio and Vision: Zero-Shot Audiovisual Segmentation by Connecting Pretrained Models
by: Lee, Seung-jae, et al.
Published: (2025)
by: Lee, Seung-jae, et al.
Published: (2025)
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
by: Pascual, Santiago, et al.
Published: (2024)
by: Pascual, Santiago, et al.
Published: (2024)
Towards Unconstrained Audio Splicing Detection and Localization with Neural Networks
by: Moussa, Denise, et al.
Published: (2022)
by: Moussa, Denise, et al.
Published: (2022)
From Vision to Sound: Advancing Audio Anomaly Detection with Vision-Based Algorithms
by: Barusco, Manuel, et al.
Published: (2025)
by: Barusco, Manuel, et al.
Published: (2025)
Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows
by: Mo, Shentong, et al.
Published: (2026)
by: Mo, Shentong, et al.
Published: (2026)
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization
by: Liu, Kai, et al.
Published: (2025)
by: Liu, Kai, et al.
Published: (2025)
LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters
by: Zhang, Haomin, et al.
Published: (2025)
by: Zhang, Haomin, et al.
Published: (2025)
Synthesizing Audio from Silent Video using Sequence to Sequence Modeling
by: Belinchon, Hugo Garrido-Lestache, et al.
Published: (2024)
by: Belinchon, Hugo Garrido-Lestache, et al.
Published: (2024)
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
by: Aneja, Shivangi, et al.
Published: (2023)
by: Aneja, Shivangi, et al.
Published: (2023)
JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching
by: Kwon, Mingi, et al.
Published: (2025)
by: Kwon, Mingi, et al.
Published: (2025)
Contrastive Conditional Latent Diffusion for Audio-visual Segmentation
by: Mao, Yuxin, et al.
Published: (2023)
by: Mao, Yuxin, et al.
Published: (2023)
Conditional GAN for Enhancing Diffusion Models in Efficient and Authentic Global Gesture Generation from Audios
by: Cheng, Yongkang, et al.
Published: (2024)
by: Cheng, Yongkang, et al.
Published: (2024)
Audio-Enhanced Vision-Language Modeling with Latent Space Broadening for High Quality Data Expansion
by: Sun, Yu, et al.
Published: (2025)
by: Sun, Yu, et al.
Published: (2025)
GaussianSpeech: Audio-Driven Gaussian Avatars
by: Aneja, Shivangi, et al.
Published: (2024)
by: Aneja, Shivangi, et al.
Published: (2024)
Similar Items
-
Audio-Visual Segmentation via Unlabeled Frame Exploitation
by: Liu, Jinxiang, et al.
Published: (2024) -
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
by: Yang, Qi, et al.
Published: (2023) -
Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
by: Chen, Tianxiang, et al.
Published: (2024) -
Audit After Segmentation: Reference-Free Mask Quality Assessment for Language-Referred Audio-Visual Segmentation
by: Zhou, Jinxing, et al.
Published: (2026) -
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
by: Liu, Chen, et al.
Published: (2025)