:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Feng, Liqian, Wang, Lintao, Hu, Kun, Kong, Dehui, Wang, Zhiyong
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Multimedia
Online Access:	https://arxiv.org/abs/2509.10845
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improving Gloss-free Sign Language Translation by Reducing Representation Density
by: Ye, Jinhui, et al.
Published: (2024)

KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation
by: Du, Guanyi, et al.
Published: (2026)

Teach Me Sign: Stepwise Prompting LLM for Sign Language Production
by: An, Zhaoyi, et al.
Published: (2025)

Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production
by: Tang, Shengeng, et al.
Published: (2024)

MultimodalHugs: Enabling Sign Language Processing in Hugging Face
by: Sant, Gerard, et al.
Published: (2025)

Linguistics-Vision Monotonic Consistent Network for Sign Language Production
by: Wang, Xu, et al.
Published: (2024)

Towards Better Text-to-Image Generation Alignment via Attention Modulation
by: Wu, Yihang, et al.
Published: (2024)

IsoSignVid2Aud: Sign Language Video to Audio Conversion without Text Intermediaries
by: Kavediya, Harsh, et al.
Published: (2025)

SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description
by: Jin, Zeyu, et al.
Published: (2024)

StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
by: Li, Fengjin, et al.
Published: (2025)

TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis
by: Quan, Weize, et al.
Published: (2024)

Diverse Sign Language Translation
by: Shen, Xin, et al.
Published: (2024)

Radio Frequency Signal based Human Silhouette Segmentation: A Sequential Diffusion Approach
by: Wen, Penghui, et al.
Published: (2024)

MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation
by: Wu, Shih-Lun, et al.
Published: (2025)

ProtT3: Protein-to-Text Generation for Text-based Protein Understanding
by: Liu, Zhiyuan, et al.
Published: (2024)

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
by: Lu, Jinghui, et al.
Published: (2024)

Selective Contrastive Learning For Gloss Free Sign Language Translation
by: Lai, Changhao, et al.
Published: (2026)

Scaling up Multimodal Pre-training for Sign Language Understanding
by: Zhou, Wengang, et al.
Published: (2024)

Terrain Diffusion Network: Climatic-Aware Terrain Generation with Geological Sketch Guidance
by: Hu, Zexin, et al.
Published: (2023)

Distilling Implicit Multimodal Knowledge into Large Language Models for Zero-Resource Dialogue Generation
by: Zhang, Bo, et al.
Published: (2024)

Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
by: Kim, Jungeun, et al.
Published: (2024)

SoMeLVLM: A Large Vision Language Model for Social Media Processing
by: Zhang, Xinnong, et al.
Published: (2024)

TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection
by: Ma, Zhiming, et al.
Published: (2025)

Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
by: Wang, Qingni, et al.
Published: (2024)

Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue
by: Ouyang, Kun, et al.
Published: (2024)

Gloss2Text: Sign Language Gloss translation using LLMs and Semantically Aware Label Smoothing
by: Fayyazsanavi, Pooya, et al.
Published: (2024)

Listening to the Unspoken: Exploring "365" Aspects of Multimodal Interview Performance Assessment
by: Li, Jia, et al.
Published: (2025)

Gloss-Free Sign Language Translation: An Unbiased Evaluation of Progress in the Field
by: Sincan, Ozge Mercanoglu, et al.
Published: (2026)

UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts
by: Cheng, Zhi-Qi, et al.
Published: (2024)

Hierarchical Sub-action Tree for Continuous Sign Language Recognition
by: Yang, Dejie, et al.
Published: (2025)

LatentSpeech: Latent Diffusion for Text-To-Speech Generation
by: Lou, Haowei, et al.
Published: (2024)

LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower Reward
by: Zhao, Yi, et al.
Published: (2025)

RiverEcho: Real-Time Interactive Digital System for Ancient Yellow River Culture
by: Wang, Haofeng, et al.
Published: (2025)

AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning
by: Gao, Jun, et al.
Published: (2024)

Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
by: Luo, Jianjie, et al.
Published: (2024)

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model
by: Chen, Xiaolin, et al.
Published: (2022)

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
by: Hu, Anwen, et al.
Published: (2023)

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter
by: Liu, Zhiyuan, et al.
Published: (2023)

Traits Run Deep: Enhancing Personality Assessment via Psychology-Guided LLM Representations and Multimodal Apparent Behaviors
by: Li, Jia, et al.
Published: (2025)

MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark
by: Peng, Yuezhang, et al.
Published: (2025)