Saved in:
| Main Authors: | Wang, Weixuan, Wu, Minghao, Haddow, Barry, Birch, Alexandra |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.12313 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Demystifying Multilingual Chain-of-Thought in Process Reward Modeling
by: Wang, Weixuan, et al.
Published: (2025)
by: Wang, Weixuan, et al.
Published: (2025)
HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models
by: Wang, Weixuan, et al.
Published: (2025)
by: Wang, Weixuan, et al.
Published: (2025)
Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization
by: Wang, Weixuan, et al.
Published: (2025)
by: Wang, Weixuan, et al.
Published: (2025)
Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention
by: Wang, Weixuan, et al.
Published: (2024)
by: Wang, Weixuan, et al.
Published: (2024)
Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs
by: Wang, Weixuan, et al.
Published: (2024)
by: Wang, Weixuan, et al.
Published: (2024)
Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task
by: Ranaldi, Leonardo, et al.
Published: (2025)
by: Ranaldi, Leonardo, et al.
Published: (2025)
MGen: Millions of Naturally Occurring Generics in Context
by: Cilleruelo, Gustavo, et al.
Published: (2025)
by: Cilleruelo, Gustavo, et al.
Published: (2025)
When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale
by: Baziotis, Christos, et al.
Published: (2023)
by: Baziotis, Christos, et al.
Published: (2023)
The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics
by: Bogoychev, Nikolay, et al.
Published: (2023)
by: Bogoychev, Nikolay, et al.
Published: (2023)
Compact Speech Translation Models via Discrete Speech Units Pretraining
by: Lam, Tsz Kin, et al.
Published: (2024)
by: Lam, Tsz Kin, et al.
Published: (2024)
Improving Multilingual Retrieval-Augmented Language Models through Dialectic Reasoning Argumentations
by: Ranaldi, Leonardo, et al.
Published: (2025)
by: Ranaldi, Leonardo, et al.
Published: (2025)
The Prosody of Emojis
by: Zhou, Giulio, et al.
Published: (2025)
by: Zhou, Giulio, et al.
Published: (2025)
Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases
by: Zhou, Giulio, et al.
Published: (2024)
by: Zhou, Giulio, et al.
Published: (2024)
Liaozhai through the Looking-Glass: On Paratextual Explicitation of Culture-Bound Terms in Machine Translation
by: Shen, Sherrie, et al.
Published: (2025)
by: Shen, Sherrie, et al.
Published: (2025)
Generics are puzzling. Can language models find the missing piece?
by: Calderón, Gustavo Cilleruelo, et al.
Published: (2024)
by: Calderón, Gustavo Cilleruelo, et al.
Published: (2024)
Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation
by: Iyer, Vivek, et al.
Published: (2024)
by: Iyer, Vivek, et al.
Published: (2024)
Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering
by: Chen, Yuxin, et al.
Published: (2026)
by: Chen, Yuxin, et al.
Published: (2026)
Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs
by: Bai, Jun, et al.
Published: (2025)
by: Bai, Jun, et al.
Published: (2025)
Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors
by: Wang, Weixuan, et al.
Published: (2024)
by: Wang, Weixuan, et al.
Published: (2024)
Steering MoE LLMs via Expert (De)Activation
by: Fayyaz, Mohsen, et al.
Published: (2025)
by: Fayyaz, Mohsen, et al.
Published: (2025)
Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation
by: Klimaszewski, Mateusz, et al.
Published: (2024)
by: Klimaszewski, Mateusz, et al.
Published: (2024)
From Beginner to Expert: Modeling Medical Knowledge into General LLMs
by: Li, Qiang, et al.
Published: (2023)
by: Li, Qiang, et al.
Published: (2023)
An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing
by: Chai, Ziwei, et al.
Published: (2024)
by: Chai, Ziwei, et al.
Published: (2024)
Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models
by: Stepachev, Pavel, et al.
Published: (2024)
by: Stepachev, Pavel, et al.
Published: (2024)
Dropping Experts, Recombining Neurons: Retraining-Free Pruning for Sparse Mixture-of-Experts LLMs
by: Zhou, Yixiao, et al.
Published: (2025)
by: Zhou, Yixiao, et al.
Published: (2025)
Integrating Expert Knowledge into Logical Programs via LLMs
by: Górski, Franciszek, et al.
Published: (2025)
by: Górski, Franciszek, et al.
Published: (2025)
LF-Steering: Latent Feature Activation Steering for Enhancing Semantic Consistency in Large Language Models
by: Yang, Jingyuan, et al.
Published: (2025)
by: Yang, Jingyuan, et al.
Published: (2025)
Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?
by: Chen, Pinzhen, et al.
Published: (2024)
by: Chen, Pinzhen, et al.
Published: (2024)
Iterative Translation Refinement with Large Language Models
by: Chen, Pinzhen, et al.
Published: (2023)
by: Chen, Pinzhen, et al.
Published: (2023)
Mixture of Experts for Low-Resource LLMs
by: Joseph, Ori Bar, et al.
Published: (2026)
by: Joseph, Ori Bar, et al.
Published: (2026)
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
by: Zhang, Zeliang, et al.
Published: (2024)
by: Zhang, Zeliang, et al.
Published: (2024)
SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
by: Zhuang, Haomin, et al.
Published: (2024)
by: Zhuang, Haomin, et al.
Published: (2024)
DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning
by: Bai, Sikai, et al.
Published: (2025)
by: Bai, Sikai, et al.
Published: (2025)
Continuously Steering LLMs Sensitivity to Contextual Knowledge with Proxy Models
by: Wang, Yilin, et al.
Published: (2025)
by: Wang, Yilin, et al.
Published: (2025)
Test-Time Steering for Lossless Text Compression via Weighted Product of Experts
by: Zhang, Qihang, et al.
Published: (2025)
by: Zhang, Qihang, et al.
Published: (2025)
Knowledge Localization in Mixture-of-Experts LLMs Using Cross-Lingual Inconsistency
by: Bandarkar, Lucas, et al.
Published: (2026)
by: Bandarkar, Lucas, et al.
Published: (2026)
MatheMagic: Generating Dynamic Mathematics Benchmarks Robust to Memorization
by: O'Brien, Dayyán, et al.
Published: (2025)
by: O'Brien, Dayyán, et al.
Published: (2025)
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
by: Sukhbaatar, Sainbayar, et al.
Published: (2024)
by: Sukhbaatar, Sainbayar, et al.
Published: (2024)
dMoE: dLLMs with Learnable Block Experts
by: Feng, Sicheng, et al.
Published: (2026)
by: Feng, Sicheng, et al.
Published: (2026)
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts
by: Su, Zhenpeng, et al.
Published: (2024)
by: Su, Zhenpeng, et al.
Published: (2024)
Similar Items
-
Demystifying Multilingual Chain-of-Thought in Process Reward Modeling
by: Wang, Weixuan, et al.
Published: (2025) -
HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models
by: Wang, Weixuan, et al.
Published: (2025) -
Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization
by: Wang, Weixuan, et al.
Published: (2025) -
Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention
by: Wang, Weixuan, et al.
Published: (2024) -
Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs
by: Wang, Weixuan, et al.
Published: (2024)