:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Li, Xiang, Qiu, Kai, Wang, Jinglu, Xu, Xiaohao, Singh, Rita, Yamazak, Kashu, Chen, Hao, Huang, Xiaonan, Raj, Bhiksha
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2403.04924
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
von: Li, Xiang, et al.
Veröffentlicht: (2023)

Customizable Perturbation Synthesis for Robust SLAM Benchmarking
von: Xu, Xiaohao, et al.
Veröffentlicht: (2024)

From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking
von: Xu, Xiaohao, et al.
Veröffentlicht: (2024)

SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions
von: Baali, Massa, et al.
Veröffentlicht: (2025)

Efficient Autoregressive Audio Modeling via Next-Scale Prediction
von: Qiu, Kai, et al.
Veröffentlicht: (2024)

Human Voice is Unique
von: Singh, Rita, et al.
Veröffentlicht: (2025)

ControlVAR: Exploring Controllable Visual Autoregressive Modeling
von: Li, Xiang, et al.
Veröffentlicht: (2024)

Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video
von: Xu, Xiaohao, et al.
Veröffentlicht: (2025)

Domain Adaptation for Contrastive Audio-Language Models
von: Deshmukh, Soham, et al.
Veröffentlicht: (2024)

On the Robust Approximation of ASR Metrics
von: Waheed, Abdul, et al.
Veröffentlicht: (2025)

DELULU: Discriminative Embedding Learning Using Latent Units for Speaker-Aware Self-Trained Speech Foundational Model
von: Baali, Massa, et al.
Veröffentlicht: (2025)

Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
von: Qiu, Kai, et al.
Veröffentlicht: (2025)

Speech Robust Bench: A Robustness Benchmark For Speech Recognition
von: Shah, Muhammad A., et al.
Veröffentlicht: (2024)

Tessellated Linear Model for Age Prediction from Voice
von: Alharthi, Dareen, et al.
Veröffentlicht: (2025)

What Do Speech Foundation Models Not Learn About Speech?
von: Waheed, Abdul, et al.
Veröffentlicht: (2024)

Image Tokenizer Needs Post-Training
von: Qiu, Kai, et al.
Veröffentlicht: (2025)

Perturbation Ontology based Graph Attention Networks
von: Wang, Yichen, et al.
Veröffentlicht: (2024)

Latent Geometry Beyond Search: Amortizing Planning in World Models
von: Nguyen, Hoang, et al.
Veröffentlicht: (2026)

Probing Collision Grounding in Vision-Language Models for Safe Human-Robot Collaboration
von: Wang, Jun, et al.
Veröffentlicht: (2026)

What and When to Learn: CURriculum Ranking Loss for Large-Scale Speaker Verification
von: Baali, Massa, et al.
Veröffentlicht: (2026)

ADIFF: Explaining audio difference using natural language
von: Deshmukh, Soham, et al.
Veröffentlicht: (2025)

Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings
von: Shah, Ankit, et al.
Veröffentlicht: (2025)

ADIFF: Explaining audio difference using natural language
von: Deshmukh, Soham, et al.
Veröffentlicht: (2025)

Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
von: Dixit, Satvik, et al.
Veröffentlicht: (2024)

Mellow: a small audio language model for reasoning
von: Deshmukh, Soham, et al.
Veröffentlicht: (2025)

CAARMA: Class Augmentation with Adversarial Mixup Regularization
von: Baali, Massa, et al.
Veröffentlicht: (2025)

Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models
von: Atwany, Hanin, et al.
Veröffentlicht: (2025)

Photorealistic Phantom Roads in Real Scenes: Disentangling 3D Hallucinations from Physical Geometry
von: Nguyen, Hoang, et al.
Veröffentlicht: (2025)

Revisiting Acoustic Features for Robust ASR
von: Shah, Muhammad A., et al.
Veröffentlicht: (2024)

Natural Selection via Foundation Models for Soft Robot Evolution
von: Chen, Changhe, et al.
Veröffentlicht: (2025)

Token Prediction as Implicit Classification to Identify LLM-Generated Text
von: Chen, Yutian, et al.
Veröffentlicht: (2023)

PDAF: A Phonetic Debiasing Attention Framework For Speaker Verification
von: Baali, Massa, et al.
Veröffentlicht: (2024)

SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios
von: Bukhari, Hazim, et al.
Veröffentlicht: (2024)

Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection
von: Raghavan, Ksheeraja, et al.
Veröffentlicht: (2024)

ImageFolder: Autoregressive Image Generation with Folded Tokens
von: Li, Xiang, et al.
Veröffentlicht: (2024)

Completing Visual Objects via Bridging Generation and Segmentation
von: Li, Xiang, et al.
Veröffentlicht: (2023)

When Search Becomes Memory: Turning Robot Design Trials into Transferable Skills
von: Wang, Yunfei, et al.
Veröffentlicht: (2026)

Improving Membership Inference in ASR Model Auditing with Perturbed Loss Features
von: Teixeira, Francisco, et al.
Veröffentlicht: (2024)

Customizing Visual-Language Foundation Models for Multi-modal Anomaly Detection and Reasoning
von: Xu, Xiaohao, et al.
Veröffentlicht: (2024)

On Catastrophic Inheritance of Large Foundation Models
von: Chen, Hao, et al.
Veröffentlicht: (2024)