Saved in:
| Main Authors: | Jung, Chaeyoung, Jang, Youngjoon, Lee, Seungwoo, Chung, Joon Son |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.13143 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding
by: Jung, Chaeyoung, et al.
Published: (2025)
by: Jung, Chaeyoung, et al.
Published: (2025)
Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models
by: Jung, Chaeyoung, et al.
Published: (2025)
by: Jung, Chaeyoung, et al.
Published: (2025)
Keep What Audio Cannot Say: Context-Preserving Token Pruning for Omni-LLMs
by: Jung, Chaeyoung, et al.
Published: (2026)
by: Jung, Chaeyoung, et al.
Published: (2026)
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
by: Kim, Jongsuk, et al.
Published: (2024)
by: Kim, Jongsuk, et al.
Published: (2024)
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
by: Jung, Chaeyoung, et al.
Published: (2024)
by: Jung, Chaeyoung, et al.
Published: (2024)
Probing Cross-modal Information Hubs in Audio-Visual LLMs
by: Jung, Jihoo, et al.
Published: (2026)
by: Jung, Jihoo, et al.
Published: (2026)
VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis
by: Jung, Jaemin, et al.
Published: (2024)
by: Jung, Jaemin, et al.
Published: (2024)
Fast-Slow Efficient Training for Multimodal Large Language Models via Visual Token Pruning
by: Zhang, Dingkun, et al.
Published: (2026)
by: Zhang, Dingkun, et al.
Published: (2026)
Test-Time Augmentation for Pose-invariant Face Recognition
by: Jung, Jaemin, et al.
Published: (2025)
by: Jung, Jaemin, et al.
Published: (2025)
Two Heads Are Better Than One: Audio-Visual Speech Error Correction with Dual Hypotheses
by: Kim, Sungnyun, et al.
Published: (2025)
by: Kim, Sungnyun, et al.
Published: (2025)
From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers
by: Feng, Jiu, et al.
Published: (2024)
by: Feng, Jiu, et al.
Published: (2024)
AsymVLM: Asymmetric Token Pruning for Efficient Vision-Language Model Inference
by: Feng, Yilin, et al.
Published: (2026)
by: Feng, Yilin, et al.
Published: (2026)
Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization
by: Son, Seungwoo, et al.
Published: (2024)
by: Son, Seungwoo, et al.
Published: (2024)
InfiniteAudio: Infinite-Length Audio Generation with Consistency
by: Jung, Chaeyoung, et al.
Published: (2025)
by: Jung, Chaeyoung, et al.
Published: (2025)
ResPrune: Text-Conditioned Subspace Reconstruction for Visual Token Pruning in Large Vision-Language Models
by: Li, Xu, et al.
Published: (2026)
by: Li, Xu, et al.
Published: (2026)
LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling
by: Kwak, Doyeop, et al.
Published: (2025)
by: Kwak, Doyeop, et al.
Published: (2025)
The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers
by: Son, Seungwoo, et al.
Published: (2023)
by: Son, Seungwoo, et al.
Published: (2023)
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models
by: Jo, Dongwon, et al.
Published: (2024)
by: Jo, Dongwon, et al.
Published: (2024)
Hierarchical Attention-based Graph Neural Network with Relevance-driven Pruning
by: Kum, Seungwoo
Published: (2026)
by: Kum, Seungwoo
Published: (2026)
Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap
by: Nam, KiHyun, et al.
Published: (2025)
by: Nam, KiHyun, et al.
Published: (2025)
COPAL: Continual Pruning in Large Language Generative Models
by: Malla, Srikanth, et al.
Published: (2024)
by: Malla, Srikanth, et al.
Published: (2024)
On the Nature of Attention Sink that Shapes Decoding Strategy in Omni-LLMs
by: Yoo, Suho, et al.
Published: (2026)
by: Yoo, Suho, et al.
Published: (2026)
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
by: Jo, Dongwon, et al.
Published: (2026)
by: Jo, Dongwon, et al.
Published: (2026)
MoLT: Mixture of Layer-Wise Tokens for Efficient Audio-Visual Learning
by: Rho, Kyeongha, et al.
Published: (2025)
by: Rho, Kyeongha, et al.
Published: (2025)
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
by: Alvar, Saeed Ranjbar, et al.
Published: (2025)
by: Alvar, Saeed Ranjbar, et al.
Published: (2025)
Deep Understanding of Sign Language for Sign to Subtitle Alignment
by: Jang, Youngjoon, et al.
Published: (2025)
by: Jang, Youngjoon, et al.
Published: (2025)
Window-Diffusion: Accelerating Diffusion Language Model Inference with Windowed Token Pruning and Caching
by: Zuo, Fengrui, et al.
Published: (2026)
by: Zuo, Fengrui, et al.
Published: (2026)
Segmentwise Pruning in Audio-Language Models
by: Gibier, Marcel, et al.
Published: (2025)
by: Gibier, Marcel, et al.
Published: (2025)
FASP: Fast and Accurate Structured Pruning of Large Language Models
by: Hu, Hanyu, et al.
Published: (2025)
by: Hu, Hanyu, et al.
Published: (2025)
On the Importance of a Multi-Scale Calibration for Quantization
by: Son, Seungwoo, et al.
Published: (2026)
by: Son, Seungwoo, et al.
Published: (2026)
Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
by: Jang, Hongsun, et al.
Published: (2024)
by: Jang, Hongsun, et al.
Published: (2024)
Fast and Effective Weight Update for Pruned Large Language Models
by: Boža, Vladimír
Published: (2024)
by: Boža, Vladimír
Published: (2024)
PagedEviction: Structured Block-wise KV Cache Pruning for Efficient Large Language Model Inference
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models
by: Wang, Qinsi, et al.
Published: (2025)
by: Wang, Qinsi, et al.
Published: (2025)
DRIFT: Drift-Resilient Invariant-Feature Transformer for DGA Detection
by: Lee, Chaeyoung, et al.
Published: (2026)
by: Lee, Chaeyoung, et al.
Published: (2026)
Localizing and Editing Knowledge in Large Audio-Language Models
by: Chung, Sung Kyun, et al.
Published: (2026)
by: Chung, Sung Kyun, et al.
Published: (2026)
AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models
by: Baek, Changwoo, et al.
Published: (2026)
by: Baek, Changwoo, et al.
Published: (2026)
Towards Efficient Automatic Self-Pruning of Large Language Models
by: Huang, Weizhong, et al.
Published: (2025)
by: Huang, Weizhong, et al.
Published: (2025)
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
by: Fu, Qichen, et al.
Published: (2024)
by: Fu, Qichen, et al.
Published: (2024)
Fast Inference for Augmented Large Language Models
by: Shahout, Rana, et al.
Published: (2024)
by: Shahout, Rana, et al.
Published: (2024)
Similar Items
-
AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding
by: Jung, Chaeyoung, et al.
Published: (2025) -
Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models
by: Jung, Chaeyoung, et al.
Published: (2025) -
Keep What Audio Cannot Say: Context-Preserving Token Pruning for Omni-LLMs
by: Jung, Chaeyoung, et al.
Published: (2026) -
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
by: Kim, Jongsuk, et al.
Published: (2024) -
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
by: Jung, Chaeyoung, et al.
Published: (2024)