Saved in:
| Main Authors: | Edman, Lukas, Schmid, Helmut, Fraser, Alexander |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.15452 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EXECUTE: A Multilingual Benchmark for LLM Token Understanding
by: Edman, Lukas, et al.
Published: (2025)
by: Edman, Lukas, et al.
Published: (2025)
Mask and You Shall Receive: Optimizing Masked Language Modeling For Pretraining BabyLMs
by: Edman, Lukas, et al.
Published: (2025)
by: Edman, Lukas, et al.
Published: (2025)
Are BabyLMs Second Language Learners?
by: Edman, Lukas, et al.
Published: (2024)
by: Edman, Lukas, et al.
Published: (2024)
Beyond Literal Token Overlap: Token Alignability for Multilinguality
by: Hämmerl, Katharina, et al.
Published: (2025)
by: Hämmerl, Katharina, et al.
Published: (2025)
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models
by: Nie, Ercong, et al.
Published: (2025)
by: Nie, Ercong, et al.
Published: (2025)
On the Sensitivity of Instruction-tuned LLMs to Harmful Sentences in Long Inputs
by: Ghorbanpour, Faeze, et al.
Published: (2025)
by: Ghorbanpour, Faeze, et al.
Published: (2025)
CUTE: A Multilingual Dataset for Enhancing Cross-Lingual Knowledge Transfer in Low-Resource Languages
by: Zhuang, Wenhao, et al.
Published: (2025)
by: Zhuang, Wenhao, et al.
Published: (2025)
LLMs Beyond English: Scaling the Multilingual Capability of LLMs with Cross-Lingual Feedback
by: Lai, Wen, et al.
Published: (2024)
by: Lai, Wen, et al.
Published: (2024)
Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation
by: Edman, Lukas, et al.
Published: (2023)
by: Edman, Lukas, et al.
Published: (2023)
Understanding Cross-Lingual Alignment -- A Survey
by: Hämmerl, Katharina, et al.
Published: (2024)
by: Hämmerl, Katharina, et al.
Published: (2024)
Style-Specific Neurons for Steering LLMs in Text Style Transfer
by: Lai, Wen, et al.
Published: (2024)
by: Lai, Wen, et al.
Published: (2024)
PersLitEval: Fine-grained Benchmark and Evaluation of LLMs on Persian Literature Questions
by: Niazi, Ruhallah, et al.
Published: (2026)
by: Niazi, Ruhallah, et al.
Published: (2026)
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks
by: Ma, Bolei, et al.
Published: (2024)
by: Ma, Bolei, et al.
Published: (2024)
Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study
by: Ghorbanpour, Faeze, et al.
Published: (2025)
by: Ghorbanpour, Faeze, et al.
Published: (2025)
Hate Personified: Investigating the role of LLMs in content moderation
by: Masud, Sarah, et al.
Published: (2024)
by: Masud, Sarah, et al.
Published: (2024)
Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning
by: Xu, Zhu, et al.
Published: (2024)
by: Xu, Zhu, et al.
Published: (2024)
How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have
by: Hangya, Viktor, et al.
Published: (2023)
by: Hangya, Viktor, et al.
Published: (2023)
LLM in the Loop: Creating the ParaDeHate Dataset for Hate Speech Detoxification
by: Yuan, Shuzhou, et al.
Published: (2025)
by: Yuan, Shuzhou, et al.
Published: (2025)
Spelling-out is not Straightforward: LLMs' Capability of Tokenization from Token to Characters
by: Hiraoka, Tatsuya, et al.
Published: (2025)
by: Hiraoka, Tatsuya, et al.
Published: (2025)
Behavior-Equivalent Token: Single-Token Replacement for Long Prompts in LLMs
by: Dong, Jiancheng, et al.
Published: (2025)
by: Dong, Jiancheng, et al.
Published: (2025)
Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs
by: Wang, Dingdong, et al.
Published: (2025)
by: Wang, Dingdong, et al.
Published: (2025)
Language Model Re-rankers are Fooled by Lexical Similarities
by: Hagström, Lovisa, et al.
Published: (2025)
by: Hagström, Lovisa, et al.
Published: (2025)
WikiBigEdit: Understanding the Limits of Lifelong Knowledge Editing in LLMs
by: Thede, Lukas, et al.
Published: (2025)
by: Thede, Lukas, et al.
Published: (2025)
Why Are We Lonely? Leveraging LLMs to Measure and Understand Loneliness in Caregivers and Non-caregivers
by: Kim, Michelle Damin, et al.
Published: (2026)
by: Kim, Michelle Damin, et al.
Published: (2026)
CrossNews-UA: A Cross-lingual News Semantic Similarity Benchmark for Ukrainian, Polish, Russian, and English
by: Dementieva, Daryna, et al.
Published: (2025)
by: Dementieva, Daryna, et al.
Published: (2025)
EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian
by: Dementieva, Daryna, et al.
Published: (2025)
by: Dementieva, Daryna, et al.
Published: (2025)
Measuring Scalar Constructs in Social Science with LLMs
by: Licht, Hauke, et al.
Published: (2025)
by: Licht, Hauke, et al.
Published: (2025)
Data-Efficient Hate Speech Detection via Cross-Lingual Nearest Neighbor Retrieval with Limited Labeled Data
by: Ghorbanpour, Faeze, et al.
Published: (2025)
by: Ghorbanpour, Faeze, et al.
Published: (2025)
From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora
by: Shen, Yingli, et al.
Published: (2025)
by: Shen, Yingli, et al.
Published: (2025)
Toward a Theory of Tokenization in LLMs
by: Rajaraman, Nived, et al.
Published: (2024)
by: Rajaraman, Nived, et al.
Published: (2024)
LLMs are Not Just Next Token Predictors
by: Downes, Stephen M., et al.
Published: (2024)
by: Downes, Stephen M., et al.
Published: (2024)
Accelerating Production LLMs with Combined Token/Embedding Speculators
by: Wertheimer, Davis, et al.
Published: (2024)
by: Wertheimer, Davis, et al.
Published: (2024)
Optimizing Korean-Centric LLMs via Token Pruning
by: Kim, Hoyeol, et al.
Published: (2026)
by: Kim, Hoyeol, et al.
Published: (2026)
Beyond Tokens: Concept-Level Training Objectives for LLMs
by: Iyer, Laya, et al.
Published: (2026)
by: Iyer, Laya, et al.
Published: (2026)
Explainable Semantic Textual Similarity via Dissimilar Span Detection
by: Lozano, Diego Miguel, et al.
Published: (2026)
by: Lozano, Diego Miguel, et al.
Published: (2026)
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
by: Song, Yuhan, et al.
Published: (2025)
by: Song, Yuhan, et al.
Published: (2025)
Granuscore: A Reference-Free Measure of Granularity for Text Analysis and Question Answering
by: Ellinger, Lukas, et al.
Published: (2026)
by: Ellinger, Lukas, et al.
Published: (2026)
Measuring Intrinsic Dimension of Token Embeddings
by: Kataiwa, Takuya, et al.
Published: (2025)
by: Kataiwa, Takuya, et al.
Published: (2025)
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
by: Goyal, Agam, et al.
Published: (2025)
by: Goyal, Agam, et al.
Published: (2025)
How Language Directions Align with Token Geometry in Multilingual LLMs
by: Kim, JaeSeong, et al.
Published: (2025)
by: Kim, JaeSeong, et al.
Published: (2025)
Similar Items
-
EXECUTE: A Multilingual Benchmark for LLM Token Understanding
by: Edman, Lukas, et al.
Published: (2025) -
Mask and You Shall Receive: Optimizing Masked Language Modeling For Pretraining BabyLMs
by: Edman, Lukas, et al.
Published: (2025) -
Are BabyLMs Second Language Learners?
by: Edman, Lukas, et al.
Published: (2024) -
Beyond Literal Token Overlap: Token Alignability for Multilinguality
by: Hämmerl, Katharina, et al.
Published: (2025) -
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models
by: Nie, Ercong, et al.
Published: (2025)