Saved in:
| Main Authors: | Thompson, T. Ben, Straznickas, Zygimantas, Sklar, Michael |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.01702 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FLRT: Fluent Student-Teacher Redteaming
by: Thompson, T. Ben, et al.
Published: (2024)
by: Thompson, T. Ben, et al.
Published: (2024)
Fluent but Unfeeling: The Emotional Blind Spots of Language Models
by: Shu, Bangzhao, et al.
Published: (2025)
by: Shu, Bangzhao, et al.
Published: (2025)
Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages
by: Samuel, David, et al.
Published: (2025)
by: Samuel, David, et al.
Published: (2025)
Algorithmic progress in language models
by: Ho, Anson, et al.
Published: (2024)
by: Ho, Anson, et al.
Published: (2024)
LIME-LLM: Probing Models with Fluent Counterfactuals, Not Broken Text
by: Mihaila, George, et al.
Published: (2026)
by: Mihaila, George, et al.
Published: (2026)
Fluent but Foreign: Even Regional LLMs Lack Cultural Alignment
by: Agarwal, Dhruv, et al.
Published: (2025)
by: Agarwal, Dhruv, et al.
Published: (2025)
HyperSteer: Activation Steering at Scale with Hypernetworks
by: Sun, Jiuding, et al.
Published: (2025)
by: Sun, Jiuding, et al.
Published: (2025)
Dissociating language and thought in large language models
by: Mahowald, Kyle, et al.
Published: (2023)
by: Mahowald, Kyle, et al.
Published: (2023)
Auxiliary task demands mask the capabilities of smaller language models
by: Hu, Jennifer, et al.
Published: (2024)
by: Hu, Jennifer, et al.
Published: (2024)
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
by: Sun, Jiuding, et al.
Published: (2025)
by: Sun, Jiuding, et al.
Published: (2025)
On the attribution of confidence to large language models
by: Keeling, Geoff, et al.
Published: (2024)
by: Keeling, Geoff, et al.
Published: (2024)
Large language models and linguistic intentionality
by: Grindrod, Jumbly
Published: (2024)
by: Grindrod, Jumbly
Published: (2024)
Do language models practice what they preach? Examining language ideologies about gendered language reform encoded in LLMs
by: Watson, Julia, et al.
Published: (2024)
by: Watson, Julia, et al.
Published: (2024)
A survey of textual cyber abuse detection using cutting-edge language models and large language models
by: Diaz-Garcia, Jose A., et al.
Published: (2025)
by: Diaz-Garcia, Jose A., et al.
Published: (2025)
Bringing legal knowledge to the public by constructing a legal question bank using large-scale pre-trained language model
by: Yuan, Mingruo, et al.
Published: (2025)
by: Yuan, Mingruo, et al.
Published: (2025)
Humans overrely on overconfident language models, across languages
by: Rathi, Neil, et al.
Published: (2025)
by: Rathi, Neil, et al.
Published: (2025)
Linguistic traces of stochastic empathy in language models
by: Kleinberg, Bennett, et al.
Published: (2024)
by: Kleinberg, Bennett, et al.
Published: (2024)
Infusing clinical knowledge into tokenisers for language models
by: Hasan, Abul, et al.
Published: (2024)
by: Hasan, Abul, et al.
Published: (2024)
CONFLARE: CONFormal LArge language model REtrieval
by: Rouzrokh, Pouria, et al.
Published: (2024)
by: Rouzrokh, Pouria, et al.
Published: (2024)
What do language models model? Transformers, automata, and the format of thought
by: Klein, Colin
Published: (2025)
by: Klein, Colin
Published: (2025)
The language of time: a language model perspective on time-series foundation models
by: Xie, Yi, et al.
Published: (2025)
by: Xie, Yi, et al.
Published: (2025)
Uncovering inequalities in new knowledge learning by large language models across different languages
by: Wang, Chenglong, et al.
Published: (2025)
by: Wang, Chenglong, et al.
Published: (2025)
Multi-round jailbreak attack on large language models
by: Zhou, Yihua, et al.
Published: (2024)
by: Zhou, Yihua, et al.
Published: (2024)
The 20 questions game to distinguish large language models
by: Richardeau, Gurvan, et al.
Published: (2024)
by: Richardeau, Gurvan, et al.
Published: (2024)
Large language model empowered participatory urban planning
by: Zhou, Zhilun, et al.
Published: (2024)
by: Zhou, Zhilun, et al.
Published: (2024)
Evidence of a log scaling law for political persuasion with large language models
by: Hackenburg, Kobi, et al.
Published: (2024)
by: Hackenburg, Kobi, et al.
Published: (2024)
Slm-mux: Orchestrating small language models for reasoning
by: Wang, Chenyu, et al.
Published: (2025)
by: Wang, Chenyu, et al.
Published: (2025)
Quantifying non deterministic drift in large language models
by: Nicholson, Claire
Published: (2026)
by: Nicholson, Claire
Published: (2026)
Can large language models build causal graphs?
by: Long, Stephanie, et al.
Published: (2023)
by: Long, Stephanie, et al.
Published: (2023)
Response: Emergent analogical reasoning in large language models
by: Hodel, Damian, et al.
Published: (2023)
by: Hodel, Damian, et al.
Published: (2023)
Large language models struggle with ethnographic text annotation
by: Goodall, Leonardo S., et al.
Published: (2026)
by: Goodall, Leonardo S., et al.
Published: (2026)
Scaling laws for language encoding models in fMRI
by: Antonello, Richard, et al.
Published: (2023)
by: Antonello, Richard, et al.
Published: (2023)
Just-in-time and distributed task representations in language models
by: Li, Yuxuan, et al.
Published: (2025)
by: Li, Yuxuan, et al.
Published: (2025)
Alignment faking in large language models
by: Greenblatt, Ryan, et al.
Published: (2024)
by: Greenblatt, Ryan, et al.
Published: (2024)
Cognitive models can reveal interpretable value trade-offs in language models
by: Murthy, Sonia K., et al.
Published: (2025)
by: Murthy, Sonia K., et al.
Published: (2025)
Code-enabled language models can outperform reasoning models on diverse tasks
by: Zhang, Cedegao E., et al.
Published: (2025)
by: Zhang, Cedegao E., et al.
Published: (2025)
Retrieval-augmented reasoning with lean language models
by: Chan, Ryan Sze-Yin, et al.
Published: (2025)
by: Chan, Ryan Sze-Yin, et al.
Published: (2025)
Do Chinese models speak Chinese languages?
by: Wen-Yi, Andrea W, et al.
Published: (2025)
by: Wen-Yi, Andrea W, et al.
Published: (2025)
Failure of contextual invariance in large language models
by: Kumar, Sagar, et al.
Published: (2026)
by: Kumar, Sagar, et al.
Published: (2026)
Large language models in medicine: the potentials and pitfalls
by: Omiye, Jesutofunmi A., et al.
Published: (2023)
by: Omiye, Jesutofunmi A., et al.
Published: (2023)
Similar Items
-
FLRT: Fluent Student-Teacher Redteaming
by: Thompson, T. Ben, et al.
Published: (2024) -
Fluent but Unfeeling: The Emotional Blind Spots of Language Models
by: Shu, Bangzhao, et al.
Published: (2025) -
Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages
by: Samuel, David, et al.
Published: (2025) -
Algorithmic progress in language models
by: Ho, Anson, et al.
Published: (2024) -
LIME-LLM: Probing Models with Fluent Counterfactuals, Not Broken Text
by: Mihaila, George, et al.
Published: (2026)