:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Meade, Nicholas, Patel, Arkil, Reddy, Siva
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2404.16020
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

How to Get Your LLM to Generate Challenging Problems for Evaluation
by: Patel, Arkil, et al.
Published: (2025)

Evaluating In-Context Learning of Libraries for Code Generation
by: Patel, Arkil, et al.
Published: (2023)

Forecasting Downstream Performance of LLMs With Proxy Metrics
by: Patel, Arkil, et al.
Published: (2026)

SafeArena: Evaluating the Safety of Autonomous Web Agents
by: Tur, Ada Defne, et al.
Published: (2025)

Exploiting Instruction-Following Retrievers for Malicious Information Retrieval
by: BehnamGhader, Parishad, et al.
Published: (2025)

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
by: Lù, Xing Han, et al.
Published: (2025)

Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering
by: Adlakha, Vaibhav, et al.
Published: (2023)

Are self-explanations from Large Language Models faithful?
by: Madsen, Andreas, et al.
Published: (2024)

Scope Ambiguities in Large Language Models
by: Kamath, Gaurav, et al.
Published: (2024)

DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning
by: Marjanović, Sara Vera, et al.
Published: (2025)

Language Models Largely Exhibit Human-like Constituent Ordering Preferences
by: Tur, Ada Defne, et al.
Published: (2025)

Faithfulness Measurable Masked Language Models
by: Madsen, Andreas, et al.
Published: (2023)

AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
by: Reddy, Aashray, et al.
Published: (2025)

LLM2Vec-Gen: Generative Embeddings from Large Language Models
by: BehnamGhader, Parishad, et al.
Published: (2026)

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
by: BehnamGhader, Parishad, et al.
Published: (2024)

TF-Attack: Transferable and Fast Adversarial Attacks on Large Language Models
by: Li, Zelin, et al.
Published: (2024)

BELL: Benchmarking the Explainability of Large Language Models
by: Ahmed, Syed Quiser, et al.
Published: (2025)

Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models
by: Ball, Sarah, et al.
Published: (2025)

Not All Data Are Unlearned Equally
by: Krishnan, Aravind, et al.
Published: (2025)

When does word order matter and when doesn't it?
by: Chen, Xuanda, et al.
Published: (2024)

Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models
by: Liu, Hongfu, et al.
Published: (2024)

Universal Adversarial Triggers
by: Arockiaraj, Benedict Florance, et al.
Published: (2026)

Measuring Non-Adversarial Reproduction of Training Data in Large Language Models
by: Aerni, Michael, et al.
Published: (2024)

A Compositional Typed Semantics for Universal Dependencies
by: Bradford, Laurestine, et al.
Published: (2024)

The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents
by: Lu, Xing Han, et al.
Published: (2023)

If there's a Trigger Warning, then where's the Trigger? Investigating Trigger Warnings at the Passage Level
by: Wiegmann, Matti, et al.
Published: (2024)

Investigating Political and Demographic Associations in Large Language Models Through Moral Foundations Theory
by: Smith-Vaniz, Nicole, et al.
Published: (2025)

Analyzing the Safety of Japanese Large Language Models in Stereotype-Triggering Prompts
by: Nakanishi, Akito, et al.
Published: (2025)

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
by: Lù, Xing Han, et al.
Published: (2024)

Are Large Language Models Truly Smarter Than Humans?
by: M, Eshwar Reddy, et al.
Published: (2026)

AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization
by: Gupta, Mukur, et al.
Published: (2025)

Low-Resource Authorship Style Transfer: Can Non-Famous Authors Be Imitated?
by: Patel, Ajay, et al.
Published: (2022)

Robustness of Large Language Models Against Adversarial Attacks
by: Tao, Yiyi, et al.
Published: (2024)

On Adversarial Robustness of Language Models in Transfer Learning
by: Turbal, Bohdan, et al.
Published: (2024)

Triggers Hijack Language Circuits: A Mechanistic Analysis of Backdoor Behaviors in Large Language Models
by: Lasnier, Théo, et al.
Published: (2026)

Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models
by: Rajeev, Meghana, et al.
Published: (2025)

Investigating Cultural Alignment of Large Language Models
by: AlKhamissi, Badr, et al.
Published: (2024)

Interpretability Needs a New Paradigm
by: Madsen, Andreas, et al.
Published: (2024)

Benchmarking Vision Language Models for Cultural Understanding
by: Nayak, Shravan, et al.
Published: (2024)

Investigating Numerical Translation with Large Language Models
by: Tang, Wei, et al.
Published: (2025)