Saved in:
| Main Authors: | Zhang, Ziheng, Hou, Yunzhong, Liu, Naijing, Zheng, Liang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.13846 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences
by: Pinhanez, Claudio, et al.
Published: (2024)
by: Pinhanez, Claudio, et al.
Published: (2024)
MERIT: Maximum-normalized Element-wise Ratio for Language Model Large-batch Training
by: Luo, Yang, et al.
Published: (2025)
by: Luo, Yang, et al.
Published: (2025)
In-context Language Learning for Endangered Languages in Speech Recognition
by: Li, Zhaolin, et al.
Published: (2025)
by: Li, Zhaolin, et al.
Published: (2025)
From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2026)
by: Pan, Yu, et al.
Published: (2026)
Translation of Multifaceted Data without Re-Training of Machine Translation Systems
by: Moon, Hyeonseok, et al.
Published: (2024)
by: Moon, Hyeonseok, et al.
Published: (2024)
Chimera: Diagnosing Shortcut Learning in Visual-Language Understanding
by: Chi, Ziheng, et al.
Published: (2025)
by: Chi, Ziheng, et al.
Published: (2025)
Extracting Training Dialogue Data from Large Language Model based Task Bots
by: Zhang, Shuo, et al.
Published: (2026)
by: Zhang, Shuo, et al.
Published: (2026)
High-Quality Data Augmentation for Low-Resource NMT: Combining a Translation Memory, a GAN Generator, and Filtering
by: Liu, Hengjie, et al.
Published: (2024)
by: Liu, Hengjie, et al.
Published: (2024)
RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection
by: Huang, Yiming, et al.
Published: (2025)
by: Huang, Yiming, et al.
Published: (2025)
Automatic Speech Recognition for Documenting Endangered Languages: Case Study of Ikema Miyakoan
by: Taguchi, Chihiro, et al.
Published: (2026)
by: Taguchi, Chihiro, et al.
Published: (2026)
An Application of Large Language Models to Coding Negotiation Transcripts
by: Friedman, Ray, et al.
Published: (2024)
by: Friedman, Ray, et al.
Published: (2024)
Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models
by: Tian, Junfeng, et al.
Published: (2024)
by: Tian, Junfeng, et al.
Published: (2024)
Reasoning Transfer for an Extremely Low-Resource and Endangered Language: Bridging Languages Through Sample-Efficient Language Understanding
by: Tran, Khanh-Tung, et al.
Published: (2025)
by: Tran, Khanh-Tung, et al.
Published: (2025)
MemFactory: Unified Inference & Training Framework for Agent Memory
by: Guo, Ziliang, et al.
Published: (2026)
by: Guo, Ziliang, et al.
Published: (2026)
Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning
by: Zan, Changtong, et al.
Published: (2024)
by: Zan, Changtong, et al.
Published: (2024)
Refining Transcripts With TV Subtitles by Prompt-Based Weakly Supervised Training of ASR
by: Zhao, Xinnian, et al.
Published: (2025)
by: Zhao, Xinnian, et al.
Published: (2025)
Integrating Linguistics and AI: Morphological Analysis and Corpus development of Endangered Toto Language of West Bengal
by: Guha, Ambalika, et al.
Published: (2025)
by: Guha, Ambalika, et al.
Published: (2025)
A Large Language Model-Empowered Agent for Reliable and Robust Structural Analysis
by: Liu, Jiachen, et al.
Published: (2025)
by: Liu, Jiachen, et al.
Published: (2025)
Biomedical Entity Linking as Multiple Choice Question Answering
by: Lin, Zhenxi, et al.
Published: (2024)
by: Lin, Zhenxi, et al.
Published: (2024)
Data Management For Training Large Language Models: A Survey
by: Wang, Zige, et al.
Published: (2023)
by: Wang, Zige, et al.
Published: (2023)
Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding
by: Yang, Zi, et al.
Published: (2023)
by: Yang, Zi, et al.
Published: (2023)
SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data
by: Alrashed, Sultan, et al.
Published: (2025)
by: Alrashed, Sultan, et al.
Published: (2025)
Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages
by: Chen, Zui, et al.
Published: (2025)
by: Chen, Zui, et al.
Published: (2025)
Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models
by: Xu, Derong, et al.
Published: (2024)
by: Xu, Derong, et al.
Published: (2024)
Transcending Language Boundaries: Harnessing LLMs for Low-Resource Language Translation
by: Shu, Peng, et al.
Published: (2024)
by: Shu, Peng, et al.
Published: (2024)
NoLoR: An ASR-Based Framework for Expedited Endangered Language Documentation with Neo-Aramaic as a Case Study
by: Nazari, Matthew
Published: (2024)
by: Nazari, Matthew
Published: (2024)
Translate-and-Revise: Boosting Large Language Models for Constrained Translation
by: Huang, Pengcheng, et al.
Published: (2024)
by: Huang, Pengcheng, et al.
Published: (2024)
Regurgitative Training: The Value of Real Data in Training Large Language Models
by: Zhang, Jinghui, et al.
Published: (2024)
by: Zhang, Jinghui, et al.
Published: (2024)
InsurAgent: A Large Language Model-Empowered Agent for Simulating Individual Behavior in Purchasing Flood Insurance
by: Geng, Ziheng, et al.
Published: (2025)
by: Geng, Ziheng, et al.
Published: (2025)
Unlearning Traces the Influential Training Data of Language Models
by: Isonuma, Masaru, et al.
Published: (2024)
by: Isonuma, Masaru, et al.
Published: (2024)
Balanced Data Sampling for Language Model Training with Clustering
by: Shao, Yunfan, et al.
Published: (2024)
by: Shao, Yunfan, et al.
Published: (2024)
MooER: LLM-based Speech Recognition and Translation Models from Moore Threads
by: Xu, Junhao, et al.
Published: (2024)
by: Xu, Junhao, et al.
Published: (2024)
Smooth Operators: LLMs Translating Imperfect Hints into Disfluency-Rich Transcripts
by: Altinok, Duygu
Published: (2025)
by: Altinok, Duygu
Published: (2025)
You Are What You Train: Effects of Data Composition on Training Context-aware Machine Translation Models
by: Mąka, Paweł, et al.
Published: (2025)
by: Mąka, Paweł, et al.
Published: (2025)
Online Training of Large Language Models: Learn while chatting
by: Liang, Juhao, et al.
Published: (2024)
by: Liang, Juhao, et al.
Published: (2024)
Training-Free Test-Time Contrastive Learning for Large Language Models
by: Zheng, Kaiwen, et al.
Published: (2026)
by: Zheng, Kaiwen, et al.
Published: (2026)
TasTe: Teaching Large Language Models to Translate through Self-Reflection
by: Wang, Yutong, et al.
Published: (2024)
by: Wang, Yutong, et al.
Published: (2024)
AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models
by: Zhang, Qianchi, et al.
Published: (2024)
by: Zhang, Qianchi, et al.
Published: (2024)
Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation
by: Zhang, Songming, et al.
Published: (2023)
by: Zhang, Songming, et al.
Published: (2023)
Neural Machine Translation of Clinical Text: An Empirical Investigation into Multilingual Pre-Trained Language Models and Transfer-Learning
by: Han, Lifeng, et al.
Published: (2023)
by: Han, Lifeng, et al.
Published: (2023)
Similar Items
-
Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences
by: Pinhanez, Claudio, et al.
Published: (2024) -
MERIT: Maximum-normalized Element-wise Ratio for Language Model Large-batch Training
by: Luo, Yang, et al.
Published: (2025) -
In-context Language Learning for Endangered Languages in Speech Recognition
by: Li, Zhaolin, et al.
Published: (2025) -
From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2026) -
Translation of Multifaceted Data without Re-Training of Machine Translation Systems
by: Moon, Hyeonseok, et al.
Published: (2024)