Saved in:
| Main Authors: | Meade, Nicholas, Patel, Arkil, Reddy, Siva |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.16020 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
How to Get Your LLM to Generate Challenging Problems for Evaluation
by: Patel, Arkil, et al.
Published: (2025)
by: Patel, Arkil, et al.
Published: (2025)
Evaluating In-Context Learning of Libraries for Code Generation
by: Patel, Arkil, et al.
Published: (2023)
by: Patel, Arkil, et al.
Published: (2023)
Forecasting Downstream Performance of LLMs With Proxy Metrics
by: Patel, Arkil, et al.
Published: (2026)
by: Patel, Arkil, et al.
Published: (2026)
SafeArena: Evaluating the Safety of Autonomous Web Agents
by: Tur, Ada Defne, et al.
Published: (2025)
by: Tur, Ada Defne, et al.
Published: (2025)
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval
by: BehnamGhader, Parishad, et al.
Published: (2025)
by: BehnamGhader, Parishad, et al.
Published: (2025)
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
by: Lù, Xing Han, et al.
Published: (2025)
by: Lù, Xing Han, et al.
Published: (2025)
Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering
by: Adlakha, Vaibhav, et al.
Published: (2023)
by: Adlakha, Vaibhav, et al.
Published: (2023)
Are self-explanations from Large Language Models faithful?
by: Madsen, Andreas, et al.
Published: (2024)
by: Madsen, Andreas, et al.
Published: (2024)
Scope Ambiguities in Large Language Models
by: Kamath, Gaurav, et al.
Published: (2024)
by: Kamath, Gaurav, et al.
Published: (2024)
DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning
by: Marjanović, Sara Vera, et al.
Published: (2025)
by: Marjanović, Sara Vera, et al.
Published: (2025)
Language Models Largely Exhibit Human-like Constituent Ordering Preferences
by: Tur, Ada Defne, et al.
Published: (2025)
by: Tur, Ada Defne, et al.
Published: (2025)
Faithfulness Measurable Masked Language Models
by: Madsen, Andreas, et al.
Published: (2023)
by: Madsen, Andreas, et al.
Published: (2023)
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
by: Reddy, Aashray, et al.
Published: (2025)
by: Reddy, Aashray, et al.
Published: (2025)
LLM2Vec-Gen: Generative Embeddings from Large Language Models
by: BehnamGhader, Parishad, et al.
Published: (2026)
by: BehnamGhader, Parishad, et al.
Published: (2026)
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
by: BehnamGhader, Parishad, et al.
Published: (2024)
by: BehnamGhader, Parishad, et al.
Published: (2024)
TF-Attack: Transferable and Fast Adversarial Attacks on Large Language Models
by: Li, Zelin, et al.
Published: (2024)
by: Li, Zelin, et al.
Published: (2024)
BELL: Benchmarking the Explainability of Large Language Models
by: Ahmed, Syed Quiser, et al.
Published: (2025)
by: Ahmed, Syed Quiser, et al.
Published: (2025)
Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models
by: Ball, Sarah, et al.
Published: (2025)
by: Ball, Sarah, et al.
Published: (2025)
Not All Data Are Unlearned Equally
by: Krishnan, Aravind, et al.
Published: (2025)
by: Krishnan, Aravind, et al.
Published: (2025)
When does word order matter and when doesn't it?
by: Chen, Xuanda, et al.
Published: (2024)
by: Chen, Xuanda, et al.
Published: (2024)
Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models
by: Liu, Hongfu, et al.
Published: (2024)
by: Liu, Hongfu, et al.
Published: (2024)
Universal Adversarial Triggers
by: Arockiaraj, Benedict Florance, et al.
Published: (2026)
by: Arockiaraj, Benedict Florance, et al.
Published: (2026)
Measuring Non-Adversarial Reproduction of Training Data in Large Language Models
by: Aerni, Michael, et al.
Published: (2024)
by: Aerni, Michael, et al.
Published: (2024)
A Compositional Typed Semantics for Universal Dependencies
by: Bradford, Laurestine, et al.
Published: (2024)
by: Bradford, Laurestine, et al.
Published: (2024)
The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents
by: Lu, Xing Han, et al.
Published: (2023)
by: Lu, Xing Han, et al.
Published: (2023)
If there's a Trigger Warning, then where's the Trigger? Investigating Trigger Warnings at the Passage Level
by: Wiegmann, Matti, et al.
Published: (2024)
by: Wiegmann, Matti, et al.
Published: (2024)
Investigating Political and Demographic Associations in Large Language Models Through Moral Foundations Theory
by: Smith-Vaniz, Nicole, et al.
Published: (2025)
by: Smith-Vaniz, Nicole, et al.
Published: (2025)
Analyzing the Safety of Japanese Large Language Models in Stereotype-Triggering Prompts
by: Nakanishi, Akito, et al.
Published: (2025)
by: Nakanishi, Akito, et al.
Published: (2025)
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
by: Lù, Xing Han, et al.
Published: (2024)
by: Lù, Xing Han, et al.
Published: (2024)
Are Large Language Models Truly Smarter Than Humans?
by: M, Eshwar Reddy, et al.
Published: (2026)
by: M, Eshwar Reddy, et al.
Published: (2026)
AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization
by: Gupta, Mukur, et al.
Published: (2025)
by: Gupta, Mukur, et al.
Published: (2025)
Low-Resource Authorship Style Transfer: Can Non-Famous Authors Be Imitated?
by: Patel, Ajay, et al.
Published: (2022)
by: Patel, Ajay, et al.
Published: (2022)
Robustness of Large Language Models Against Adversarial Attacks
by: Tao, Yiyi, et al.
Published: (2024)
by: Tao, Yiyi, et al.
Published: (2024)
On Adversarial Robustness of Language Models in Transfer Learning
by: Turbal, Bohdan, et al.
Published: (2024)
by: Turbal, Bohdan, et al.
Published: (2024)
Triggers Hijack Language Circuits: A Mechanistic Analysis of Backdoor Behaviors in Large Language Models
by: Lasnier, Théo, et al.
Published: (2026)
by: Lasnier, Théo, et al.
Published: (2026)
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models
by: Rajeev, Meghana, et al.
Published: (2025)
by: Rajeev, Meghana, et al.
Published: (2025)
Investigating Cultural Alignment of Large Language Models
by: AlKhamissi, Badr, et al.
Published: (2024)
by: AlKhamissi, Badr, et al.
Published: (2024)
Interpretability Needs a New Paradigm
by: Madsen, Andreas, et al.
Published: (2024)
by: Madsen, Andreas, et al.
Published: (2024)
Benchmarking Vision Language Models for Cultural Understanding
by: Nayak, Shravan, et al.
Published: (2024)
by: Nayak, Shravan, et al.
Published: (2024)
Investigating Numerical Translation with Large Language Models
by: Tang, Wei, et al.
Published: (2025)
by: Tang, Wei, et al.
Published: (2025)
Similar Items
-
How to Get Your LLM to Generate Challenging Problems for Evaluation
by: Patel, Arkil, et al.
Published: (2025) -
Evaluating In-Context Learning of Libraries for Code Generation
by: Patel, Arkil, et al.
Published: (2023) -
Forecasting Downstream Performance of LLMs With Proxy Metrics
by: Patel, Arkil, et al.
Published: (2026) -
SafeArena: Evaluating the Safety of Autonomous Web Agents
by: Tur, Ada Defne, et al.
Published: (2025) -
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval
by: BehnamGhader, Parishad, et al.
Published: (2025)