Saved in:
| Main Authors: | Feng, Liqian, Wang, Lintao, Hu, Kun, Kong, Dehui, Wang, Zhiyong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.10845 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improving Gloss-free Sign Language Translation by Reducing Representation Density
by: Ye, Jinhui, et al.
Published: (2024)
by: Ye, Jinhui, et al.
Published: (2024)
KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation
by: Du, Guanyi, et al.
Published: (2026)
by: Du, Guanyi, et al.
Published: (2026)
Teach Me Sign: Stepwise Prompting LLM for Sign Language Production
by: An, Zhaoyi, et al.
Published: (2025)
by: An, Zhaoyi, et al.
Published: (2025)
Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production
by: Tang, Shengeng, et al.
Published: (2024)
by: Tang, Shengeng, et al.
Published: (2024)
MultimodalHugs: Enabling Sign Language Processing in Hugging Face
by: Sant, Gerard, et al.
Published: (2025)
by: Sant, Gerard, et al.
Published: (2025)
Linguistics-Vision Monotonic Consistent Network for Sign Language Production
by: Wang, Xu, et al.
Published: (2024)
by: Wang, Xu, et al.
Published: (2024)
Towards Better Text-to-Image Generation Alignment via Attention Modulation
by: Wu, Yihang, et al.
Published: (2024)
by: Wu, Yihang, et al.
Published: (2024)
IsoSignVid2Aud: Sign Language Video to Audio Conversion without Text Intermediaries
by: Kavediya, Harsh, et al.
Published: (2025)
by: Kavediya, Harsh, et al.
Published: (2025)
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description
by: Jin, Zeyu, et al.
Published: (2024)
by: Jin, Zeyu, et al.
Published: (2024)
StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
by: Li, Fengjin, et al.
Published: (2025)
by: Li, Fengjin, et al.
Published: (2025)
TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis
by: Quan, Weize, et al.
Published: (2024)
by: Quan, Weize, et al.
Published: (2024)
Diverse Sign Language Translation
by: Shen, Xin, et al.
Published: (2024)
by: Shen, Xin, et al.
Published: (2024)
Radio Frequency Signal based Human Silhouette Segmentation: A Sequential Diffusion Approach
by: Wen, Penghui, et al.
Published: (2024)
by: Wen, Penghui, et al.
Published: (2024)
MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation
by: Wu, Shih-Lun, et al.
Published: (2025)
by: Wu, Shih-Lun, et al.
Published: (2025)
ProtT3: Protein-to-Text Generation for Text-based Protein Understanding
by: Liu, Zhiyuan, et al.
Published: (2024)
by: Liu, Zhiyuan, et al.
Published: (2024)
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
by: Lu, Jinghui, et al.
Published: (2024)
by: Lu, Jinghui, et al.
Published: (2024)
Selective Contrastive Learning For Gloss Free Sign Language Translation
by: Lai, Changhao, et al.
Published: (2026)
by: Lai, Changhao, et al.
Published: (2026)
Scaling up Multimodal Pre-training for Sign Language Understanding
by: Zhou, Wengang, et al.
Published: (2024)
by: Zhou, Wengang, et al.
Published: (2024)
Terrain Diffusion Network: Climatic-Aware Terrain Generation with Geological Sketch Guidance
by: Hu, Zexin, et al.
Published: (2023)
by: Hu, Zexin, et al.
Published: (2023)
Distilling Implicit Multimodal Knowledge into Large Language Models for Zero-Resource Dialogue Generation
by: Zhang, Bo, et al.
Published: (2024)
by: Zhang, Bo, et al.
Published: (2024)
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
by: Kim, Jungeun, et al.
Published: (2024)
by: Kim, Jungeun, et al.
Published: (2024)
SoMeLVLM: A Large Vision Language Model for Social Media Processing
by: Zhang, Xinnong, et al.
Published: (2024)
by: Zhang, Xinnong, et al.
Published: (2024)
TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection
by: Ma, Zhiming, et al.
Published: (2025)
by: Ma, Zhiming, et al.
Published: (2025)
Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
by: Wang, Qingni, et al.
Published: (2024)
by: Wang, Qingni, et al.
Published: (2024)
Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue
by: Ouyang, Kun, et al.
Published: (2024)
by: Ouyang, Kun, et al.
Published: (2024)
Gloss2Text: Sign Language Gloss translation using LLMs and Semantically Aware Label Smoothing
by: Fayyazsanavi, Pooya, et al.
Published: (2024)
by: Fayyazsanavi, Pooya, et al.
Published: (2024)
Listening to the Unspoken: Exploring "365" Aspects of Multimodal Interview Performance Assessment
by: Li, Jia, et al.
Published: (2025)
by: Li, Jia, et al.
Published: (2025)
Gloss-Free Sign Language Translation: An Unbiased Evaluation of Progress in the Field
by: Sincan, Ozge Mercanoglu, et al.
Published: (2026)
by: Sincan, Ozge Mercanoglu, et al.
Published: (2026)
UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts
by: Cheng, Zhi-Qi, et al.
Published: (2024)
by: Cheng, Zhi-Qi, et al.
Published: (2024)
Hierarchical Sub-action Tree for Continuous Sign Language Recognition
by: Yang, Dejie, et al.
Published: (2025)
by: Yang, Dejie, et al.
Published: (2025)
LatentSpeech: Latent Diffusion for Text-To-Speech Generation
by: Lou, Haowei, et al.
Published: (2024)
by: Lou, Haowei, et al.
Published: (2024)
LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower Reward
by: Zhao, Yi, et al.
Published: (2025)
by: Zhao, Yi, et al.
Published: (2025)
RiverEcho: Real-Time Interactive Digital System for Ancient Yellow River Culture
by: Wang, Haofeng, et al.
Published: (2025)
by: Wang, Haofeng, et al.
Published: (2025)
AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning
by: Gao, Jun, et al.
Published: (2024)
by: Gao, Jun, et al.
Published: (2024)
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
by: Luo, Jianjie, et al.
Published: (2024)
by: Luo, Jianjie, et al.
Published: (2024)
Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model
by: Chen, Xiaolin, et al.
Published: (2022)
by: Chen, Xiaolin, et al.
Published: (2022)
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
by: Hu, Anwen, et al.
Published: (2023)
by: Hu, Anwen, et al.
Published: (2023)
MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter
by: Liu, Zhiyuan, et al.
Published: (2023)
by: Liu, Zhiyuan, et al.
Published: (2023)
Traits Run Deep: Enhancing Personality Assessment via Psychology-Guided LLM Representations and Multimodal Apparent Behaviors
by: Li, Jia, et al.
Published: (2025)
by: Li, Jia, et al.
Published: (2025)
MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark
by: Peng, Yuezhang, et al.
Published: (2025)
by: Peng, Yuezhang, et al.
Published: (2025)
Similar Items
-
Improving Gloss-free Sign Language Translation by Reducing Representation Density
by: Ye, Jinhui, et al.
Published: (2024) -
KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation
by: Du, Guanyi, et al.
Published: (2026) -
Teach Me Sign: Stepwise Prompting LLM for Sign Language Production
by: An, Zhaoyi, et al.
Published: (2025) -
Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production
by: Tang, Shengeng, et al.
Published: (2024) -
MultimodalHugs: Enabling Sign Language Processing in Hugging Face
by: Sant, Gerard, et al.
Published: (2025)