Saved in:
| Main Authors: | Li, Xiao, Kreuzwieser, Joel, Peters, Alan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.10095 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
When Agents Look the Same: Quantifying Distillation-Induced Similarity in Tool-Use Behaviors
by: Yang, Chenghao, et al.
Published: (2026)
by: Yang, Chenghao, et al.
Published: (2026)
Stay Hungry, Stay Foolish: On the Extended Reading Articles Generation with LLMs
by: Liou, Yow-Fu, et al.
Published: (2025)
by: Liou, Yow-Fu, et al.
Published: (2025)
Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation
by: Kostić, Bogdan, et al.
Published: (2026)
by: Kostić, Bogdan, et al.
Published: (2026)
Distilling Multilingual Vision-Language Models: When Smaller Models Stay Multilingual
by: Sriratanawilai, Sukrit, et al.
Published: (2025)
by: Sriratanawilai, Sukrit, et al.
Published: (2025)
Is my model perplexed for the right reason? Contrasting LLMs' Benchmark Behavior with Token-Level Perplexity
by: Prins, Zoë, et al.
Published: (2026)
by: Prins, Zoë, et al.
Published: (2026)
Stay Focused: Problem Drift in Multi-Agent Debate
by: Becker, Jonas, et al.
Published: (2025)
by: Becker, Jonas, et al.
Published: (2025)
When the Same Coefficients Reach Different Places: Asymmetric Realizability in Transplanting Tokenizers across Large Language Models
by: Liu, Xiaoze, et al.
Published: (2025)
by: Liu, Xiaoze, et al.
Published: (2025)
Behavior-Equivalent Token: Single-Token Replacement for Long Prompts in LLMs
by: Dong, Jiancheng, et al.
Published: (2025)
by: Dong, Jiancheng, et al.
Published: (2025)
Distributional Semantics, Holism, and the Instability of Meaning
by: Grindrod, Jumbly, et al.
Published: (2024)
by: Grindrod, Jumbly, et al.
Published: (2024)
Say Anything but This: When Tokenizer Betrays Reasoning in LLMs
by: Ayoobi, Navid, et al.
Published: (2026)
by: Ayoobi, Navid, et al.
Published: (2026)
Beyond Tokens: Concept-Level Training Objectives for LLMs
by: Iyer, Laya, et al.
Published: (2026)
by: Iyer, Laya, et al.
Published: (2026)
When to Trust LLMs: Aligning Confidence with Response Quality
by: Tao, Shuchang, et al.
Published: (2024)
by: Tao, Shuchang, et al.
Published: (2024)
Beyond Idealized Patients: Evaluating LLMs under Challenging Patient Behaviors in Medical Consultations
by: Li, Yahan, et al.
Published: (2026)
by: Li, Yahan, et al.
Published: (2026)
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
by: Cui, Xiao, et al.
Published: (2024)
by: Cui, Xiao, et al.
Published: (2024)
Same Patient, Different Words, Different Diagnosis? Evaluating Semantic Stability in Clinical LLMs
by: Alkaeed, Mahdi, et al.
Published: (2026)
by: Alkaeed, Mahdi, et al.
Published: (2026)
When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training
by: Körner, Felicia, et al.
Published: (2026)
by: Körner, Felicia, et al.
Published: (2026)
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores
by: Liu, Yiqi, et al.
Published: (2023)
by: Liu, Yiqi, et al.
Published: (2023)
When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling
by: Yun, Heecheol, et al.
Published: (2025)
by: Yun, Heecheol, et al.
Published: (2025)
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
by: Levy, Mosh, et al.
Published: (2024)
by: Levy, Mosh, et al.
Published: (2024)
TokenShapley: Token Level Context Attribution with Shapley Value
by: Xiao, Yingtai, et al.
Published: (2025)
by: Xiao, Yingtai, et al.
Published: (2025)
Evaluating Alignment of Behavioral Dispositions in LLMs
by: Taubenfeld, Amir, et al.
Published: (2026)
by: Taubenfeld, Amir, et al.
Published: (2026)
Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning
by: Xu, Zhu, et al.
Published: (2024)
by: Xu, Zhu, et al.
Published: (2024)
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
by: Shani, Chen, et al.
Published: (2025)
by: Shani, Chen, et al.
Published: (2025)
CharBench: Evaluating the Role of Tokenization in Character-Level Tasks
by: Uzan, Omri, et al.
Published: (2025)
by: Uzan, Omri, et al.
Published: (2025)
Evaluating Document Simplification: On the Importance of Separately Assessing Simplicity and Meaning Preservation
by: Cripwell, Liam, et al.
Published: (2024)
by: Cripwell, Liam, et al.
Published: (2024)
BPO: Staying Close to the Behavior LLM Creates Better Online LLM Alignment
by: Xu, Wenda, et al.
Published: (2024)
by: Xu, Wenda, et al.
Published: (2024)
How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation
by: Xiao, Yang, et al.
Published: (2023)
by: Xiao, Yang, et al.
Published: (2023)
Distilling Token-Trained Models into Byte-Level Models
by: Bao, Zishuo, et al.
Published: (2026)
by: Bao, Zishuo, et al.
Published: (2026)
On the Same Wavelength? Evaluating Pragmatic Reasoning in Language Models across Broad Concepts
by: Qiu, Linlu, et al.
Published: (2025)
by: Qiu, Linlu, et al.
Published: (2025)
Informed Routing in LLMs: Smarter Token-Level Computation for Faster Inference
by: Han, Chao, et al.
Published: (2025)
by: Han, Chao, et al.
Published: (2025)
When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure
by: Xiao, Boyu, et al.
Published: (2026)
by: Xiao, Boyu, et al.
Published: (2026)
Towards Multidisciplinary Summarization of Hospital Stays: Efficient Sentence-Level Clinical Provenance Categorization
by: Karacan, Baris, et al.
Published: (2026)
by: Karacan, Baris, et al.
Published: (2026)
TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar
by: Li, Yinxi, et al.
Published: (2025)
by: Li, Yinxi, et al.
Published: (2025)
Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Sequence-Level Likelihood
by: Lin, Xingyu, et al.
Published: (2026)
by: Lin, Xingyu, et al.
Published: (2026)
How LLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation of LLMs
by: de Langis, Karin, et al.
Published: (2025)
by: de Langis, Karin, et al.
Published: (2025)
Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models
by: In, Yeonjun, et al.
Published: (2025)
by: In, Yeonjun, et al.
Published: (2025)
Discriminative Policy Optimization for Token-Level Reward Models
by: Chen, Hongzhan, et al.
Published: (2025)
by: Chen, Hongzhan, et al.
Published: (2025)
Rethinking Personalization in Large Language Models at the Token Level
by: Zhang, Chenheng, et al.
Published: (2026)
by: Zhang, Chenheng, et al.
Published: (2026)
Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Markov Likelihood
by: Lin, Xingyu, et al.
Published: (2025)
by: Lin, Xingyu, et al.
Published: (2025)
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
by: Song, Yuhan, et al.
Published: (2025)
by: Song, Yuhan, et al.
Published: (2025)
Similar Items
-
When Agents Look the Same: Quantifying Distillation-Induced Similarity in Tool-Use Behaviors
by: Yang, Chenghao, et al.
Published: (2026) -
Stay Hungry, Stay Foolish: On the Extended Reading Articles Generation with LLMs
by: Liou, Yow-Fu, et al.
Published: (2025) -
Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation
by: Kostić, Bogdan, et al.
Published: (2026) -
Distilling Multilingual Vision-Language Models: When Smaller Models Stay Multilingual
by: Sriratanawilai, Sukrit, et al.
Published: (2025) -
Is my model perplexed for the right reason? Contrasting LLMs' Benchmark Behavior with Token-Level Perplexity
by: Prins, Zoë, et al.
Published: (2026)