Saved in:
| Main Authors: | Chaudhuri, Yashwardhan, Kumar, Ankit, Phukan, Orchid Chetia, Buduru, Arun Balaji |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.05968 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FGA: Fourier-Guided Attention Network for Crowd Count Estimation
by: Chaudhuri, Yashwardhan, et al.
Published: (2024)
by: Chaudhuri, Yashwardhan, et al.
Published: (2024)
VoxMed: One-Step Respiratory Disease Classifier using Digital Stethoscope Sounds
by: Mundra, Paridhi, et al.
Published: (2024)
by: Mundra, Paridhi, et al.
Published: (2024)
ASGIR: Audio Spectrogram Transformer Guided Classification And Information Retrieval For Birds
by: Chaudhuri, Yashwardhan, et al.
Published: (2024)
by: Chaudhuri, Yashwardhan, et al.
Published: (2024)
Towards Multilingual Audio-Visual Question Answering
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Are Paralinguistic Representations all that is needed for Speech Emotion Recognition?
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
BB-Patch: BlackBox Adversarial Patch-Attack using Zeroth-Order Optimization
by: Kumar, Satyadwyoom, et al.
Published: (2024)
by: Kumar, Satyadwyoom, et al.
Published: (2024)
SNIFR : Boosting Fine-Grained Child Harmful Content Detection Through Audio-Visual Alignment with Cascaded Cross-Transformer
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
CoLLAB: A Collaborative Approach for Multilingual Abuse Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
SONIC: Synergizing VisiON Foundation Models for Stress RecogNItion from ECG signals
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
The Reasonable Effectiveness of Speaker Embeddings for Violence Detection
by: Jain, Sarthak, et al.
Published: (2024)
by: Jain, Sarthak, et al.
Published: (2024)
Enhancing In-Domain and Out-Domain EmoFake Detection via Cooperative Multilingual Speech Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages
by: Girish, et al.
Published: (2026)
by: Girish, et al.
Published: (2026)
Real-Time Crowd Counting for Embedded Systems with Lightweight Architecture
by: Zhao, Zhiyuan, et al.
Published: (2025)
by: Zhao, Zhiyuan, et al.
Published: (2025)
Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Learning Discriminative Features for Crowd Counting
by: Chen, Yuehai, et al.
Published: (2023)
by: Chen, Yuehai, et al.
Published: (2023)
ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
PERSONA: An Application for Emotion Recognition, Gender Recognition and Age Estimation
by: Koshal, Devyani, et al.
Published: (2024)
by: Koshal, Devyani, et al.
Published: (2024)
AVR: Synergizing Foundation Models for Audio-Visual Humor Detection
by: Sharma, Sarthak, et al.
Published: (2024)
by: Sharma, Sarthak, et al.
Published: (2024)
Towards Neural Audio Codec Source Parsing
by: Phukan, Orchid Chetia, et al.
Published: (2025)
by: Phukan, Orchid Chetia, et al.
Published: (2025)
FOCA: Multimodal Malware Classification via Hyperbolic Cross-Attention
by: Choudhury, Nitin, et al.
Published: (2026)
by: Choudhury, Nitin, et al.
Published: (2026)
RepSFNet : A Single Fusion Network with Structural Reparameterization for Crowd Counting
by: Achmadiah, Mas Nurul, et al.
Published: (2026)
by: Achmadiah, Mas Nurul, et al.
Published: (2026)
Embodied Crowd Counting
by: Long, Runling, et al.
Published: (2025)
by: Long, Runling, et al.
Published: (2025)
Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning
by: Jain, Sarthak, et al.
Published: (2024)
by: Jain, Sarthak, et al.
Published: (2024)
Density Estimation and Crowd Counting
by: Sunil, Balachandra Devarangadi, et al.
Published: (2025)
by: Sunil, Balachandra Devarangadi, et al.
Published: (2025)
3D Crowd Counting via Geometric Attention-guided Multi-View Fusion
by: Zhang, Qi, et al.
Published: (2020)
by: Zhang, Qi, et al.
Published: (2020)
Semi-Supervised Multi-View Crowd Counting by Ranking Multi-View Fusion Models
by: Zhang, Qi, et al.
Published: (2025)
by: Zhang, Qi, et al.
Published: (2025)
CountFormer: Multi-View Crowd Counting Transformer
by: Mo, Hong, et al.
Published: (2024)
by: Mo, Hong, et al.
Published: (2024)
Phys-3D: Physics-Constrained Real-Time Crowd Tracking and Counting on Railway Platforms
by: Zeng, Bin, et al.
Published: (2026)
by: Zeng, Bin, et al.
Published: (2026)
RACANet: Reliability-Aware Crowd Anchor Network for RGB-T Crowd Counting
by: Shi, Jinghao, et al.
Published: (2026)
by: Shi, Jinghao, et al.
Published: (2026)
Single Domain Generalization for Crowd Counting
by: Peng, Zhuoxuan, et al.
Published: (2024)
by: Peng, Zhuoxuan, et al.
Published: (2024)
Rethinking Global Context in Crowd Counting
by: Sun, Guolei, et al.
Published: (2021)
by: Sun, Guolei, et al.
Published: (2021)
A Dual-Modulation Framework for RGB-T Crowd Counting via Spatially Modulated Attention and Adaptive Fusion
by: Feng, Yuhong, et al.
Published: (2025)
by: Feng, Yuhong, et al.
Published: (2025)
Transformer-Based Dual-Optical Attention Fusion Crowd Head Point Counting and Localization Network
by: Zhou, Fei, et al.
Published: (2025)
by: Zhou, Fei, et al.
Published: (2025)
Local Information Matters: A Rethink of Crowd Counting
by: Pan, Tianhang, et al.
Published: (2025)
by: Pan, Tianhang, et al.
Published: (2025)
Curriculum for Crowd Counting -- Is it Worthy?
by: Khan, Muhammad Asif, et al.
Published: (2024)
by: Khan, Muhammad Asif, et al.
Published: (2024)
Semi-Supervised Crowd Counting with Contextual Modeling: Facilitating Holistic Understanding of Crowd Scenes
by: Qian, Yifei, et al.
Published: (2023)
by: Qian, Yifei, et al.
Published: (2023)
DSGC-Net: A Dual-Stream Graph Convolutional Network for Crowd Counting via Feature Correlation Mining
by: Wu, Yihong, et al.
Published: (2025)
by: Wu, Yihong, et al.
Published: (2025)
The Effectiveness of a Simplified Model Structure for Crowd Counting
by: Chen, Lei, et al.
Published: (2024)
by: Chen, Lei, et al.
Published: (2024)
Similar Items
-
FGA: Fourier-Guided Attention Network for Crowd Count Estimation
by: Chaudhuri, Yashwardhan, et al.
Published: (2024) -
VoxMed: One-Step Respiratory Disease Classifier using Digital Stethoscope Sounds
by: Mundra, Paridhi, et al.
Published: (2024) -
ASGIR: Audio Spectrogram Transformer Guided Classification And Information Retrieval For Birds
by: Chaudhuri, Yashwardhan, et al.
Published: (2024) -
Towards Multilingual Audio-Visual Question Answering
by: Phukan, Orchid Chetia, et al.
Published: (2024) -
Are Paralinguistic Representations all that is needed for Speech Emotion Recognition?
by: Phukan, Orchid Chetia, et al.
Published: (2024)