Saved in:
| Main Authors: | Wang, Zili, Yang, Qi, Shi, Linsu, Yu, Jiazhong, Liang, Qinghua, Li, Fei, Xiang, Shiming |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.01708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Taming Modality Entanglement in Continual Audio-Visual Segmentation
by: Hong, Yuyang, et al.
Published: (2025)
by: Hong, Yuyang, et al.
Published: (2025)
Continuous Speculative Decoding for Autoregressive Image Generation
by: Wang, Zili, et al.
Published: (2024)
by: Wang, Zili, et al.
Published: (2024)
Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
by: Chen, Lin, et al.
Published: (2025)
by: Chen, Lin, et al.
Published: (2025)
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
by: Yang, Qi, et al.
Published: (2024)
by: Yang, Qi, et al.
Published: (2024)
SeaVIS: Sound-Enhanced Association for Online Audio-Visual Instance Segmentation
by: Zhu, Yingjian, et al.
Published: (2026)
by: Zhu, Yingjian, et al.
Published: (2026)
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
by: Yang, Qi, et al.
Published: (2023)
by: Yang, Qi, et al.
Published: (2023)
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
by: Liu, Chen, et al.
Published: (2025)
by: Liu, Chen, et al.
Published: (2025)
Revisiting Audio-Visual Segmentation with Vision-Centric Transformer
by: Huang, Shaofei, et al.
Published: (2025)
by: Huang, Shaofei, et al.
Published: (2025)
Improving Visual Quality and Transferability of Adversarial Attacks on Face Recognition Simultaneously with Adversarial Restoration
by: Zhou, Fengfan, et al.
Published: (2023)
by: Zhou, Fengfan, et al.
Published: (2023)
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
by: Wang, Kai, et al.
Published: (2024)
by: Wang, Kai, et al.
Published: (2024)
Segment Any 3D Gaussians
by: Cen, Jiazhong, et al.
Published: (2023)
by: Cen, Jiazhong, et al.
Published: (2023)
Beyond Sequential Distance: Inter-Modal Distance Invariant Position Encoding
by: Chen, Lin, et al.
Published: (2026)
by: Chen, Lin, et al.
Published: (2026)
Le-DETR: Revisiting Real-Time Detection Transformer with Efficient Encoder Design
by: Huang, Jiannan, et al.
Published: (2026)
by: Huang, Jiannan, et al.
Published: (2026)
BEVANet: Bilateral Efficient Visual Attention Network for Real-Time Semantic Segmentation
by: Huang, Ping-Mao, et al.
Published: (2025)
by: Huang, Ping-Mao, et al.
Published: (2025)
Implicit Counterfactual Learning for Audio-Visual Segmentation
by: Zha, Mingfeng, et al.
Published: (2025)
by: Zha, Mingfeng, et al.
Published: (2025)
SAM-MI: A Mask-Injected Framework for Enhancing Open-Vocabulary Semantic Segmentation with SAM
by: Chen, Lin, et al.
Published: (2025)
by: Chen, Lin, et al.
Published: (2025)
Segment Anything in 3D with Radiance Fields
by: Cen, Jiazhong, et al.
Published: (2023)
by: Cen, Jiazhong, et al.
Published: (2023)
Audio-Visual Instance Segmentation
by: Guo, Ruohao, et al.
Published: (2023)
by: Guo, Ruohao, et al.
Published: (2023)
Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
by: Chen, Tianxiang, et al.
Published: (2024)
by: Chen, Tianxiang, et al.
Published: (2024)
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
by: Liu, Chen, et al.
Published: (2025)
by: Liu, Chen, et al.
Published: (2025)
Segment Any 4D Gaussians
by: Ji, Shengxiang, et al.
Published: (2024)
by: Ji, Shengxiang, et al.
Published: (2024)
RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything
by: Xu, Shilin, et al.
Published: (2024)
by: Xu, Shilin, et al.
Published: (2024)
Golden Cudgel Network for Real-Time Semantic Segmentation
by: Yang, Guoyu, et al.
Published: (2025)
by: Yang, Guoyu, et al.
Published: (2025)
YOLOv8-SMOT: An Efficient and Robust Framework for Real-Time Small Object Tracking via Slice-Assisted Training and Adaptive Association
by: Yu, Xiang, et al.
Published: (2025)
by: Yu, Xiang, et al.
Published: (2025)
Generating Attribute-Aware Human Motions from Textual Prompt
by: Wang, Xinghan, et al.
Published: (2025)
by: Wang, Xinghan, et al.
Published: (2025)
SpeechForensics: Audio-Visual Speech Representation Learning for Face Forgery Detection
by: Liang, Yachao, et al.
Published: (2025)
by: Liang, Yachao, et al.
Published: (2025)
Research on Audio-Visual Quality Assessment Dataset and Method for User-Generated Omnidirectional Video
by: Zhao, Fei, et al.
Published: (2025)
by: Zhao, Fei, et al.
Published: (2025)
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
by: Ying, Kaining, et al.
Published: (2025)
by: Ying, Kaining, et al.
Published: (2025)
Real-Time Idling Vehicles Detection using Combined Audio-Visual Deep Learning
by: Li, Xiwen, et al.
Published: (2023)
by: Li, Xiwen, et al.
Published: (2023)
HyCTAS: Multi-Objective Hybrid Convolution-Transformer Architecture Search for Real-Time Image Segmentation
by: Yu, Hongyuan, et al.
Published: (2024)
by: Yu, Hongyuan, et al.
Published: (2024)
From Waveforms to Pixels: A Survey on Audio-Visual Segmentation
by: Li, Jia, et al.
Published: (2025)
by: Li, Jia, et al.
Published: (2025)
SCTNet: Single-Branch CNN with Transformer Semantic Information for Real-Time Segmentation
by: Xu, Zhengze, et al.
Published: (2023)
by: Xu, Zhengze, et al.
Published: (2023)
Noise-Tolerant Learning for Audio-Visual Action Recognition
by: Han, Haochen, et al.
Published: (2022)
by: Han, Haochen, et al.
Published: (2022)
Unsupervised Audio-Visual Segmentation with Modality Alignment
by: Bhosale, Swapnil, et al.
Published: (2024)
by: Bhosale, Swapnil, et al.
Published: (2024)
LightAVSeg: Lightweight Audio-Visual Segmentation
by: Zhong, Qing, et al.
Published: (2026)
by: Zhong, Qing, et al.
Published: (2026)
Complementary and Contrastive Learning for Audio-Visual Segmentation
by: Gong, Sitong, et al.
Published: (2025)
by: Gong, Sitong, et al.
Published: (2025)
Unveiling and Mitigating Bias in Audio Visual Segmentation
by: Sun, Peiwen, et al.
Published: (2024)
by: Sun, Peiwen, et al.
Published: (2024)
A Lightweight Multi-Scale Attention Framework for Real-Time Spinal Endoscopic Instance Segmentation
by: Lai, Qi, et al.
Published: (2025)
by: Lai, Qi, et al.
Published: (2025)
DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer
by: Lyu, Hengye, et al.
Published: (2026)
by: Lyu, Hengye, et al.
Published: (2026)
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
by: Luo, Ziyang, et al.
Published: (2025)
by: Luo, Ziyang, et al.
Published: (2025)
Similar Items
-
Taming Modality Entanglement in Continual Audio-Visual Segmentation
by: Hong, Yuyang, et al.
Published: (2025) -
Continuous Speculative Decoding for Autoregressive Image Generation
by: Wang, Zili, et al.
Published: (2024) -
Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
by: Chen, Lin, et al.
Published: (2025) -
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
by: Yang, Qi, et al.
Published: (2024) -
SeaVIS: Sound-Enhanced Association for Online Audio-Visual Instance Segmentation
by: Zhu, Yingjian, et al.
Published: (2026)