Saved in:
| Main Authors: | Baldini, Ioana, Yadav, Chhavi, Nagireddy, Manish, Das, Payel, Varshney, Kush R. |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2305.12620 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Value Alignment from Unstructured Text
by: Padhi, Inkit, et al.
Published: (2024)
by: Padhi, Inkit, et al.
Published: (2024)
An Annotated Reading of 'The Singer of Tales' in the LLM Era
by: Varshney, Kush R.
Published: (2025)
by: Varshney, Kush R.
Published: (2025)
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
by: Achintalwar, Swapnaja, et al.
Published: (2024)
by: Achintalwar, Swapnaja, et al.
Published: (2024)
Contextual Moral Value Alignment Through Context-Based Aggregation
by: Dognin, Pierre, et al.
Published: (2024)
by: Dognin, Pierre, et al.
Published: (2024)
Evaluating Deep Unlearning in Large Language Models
by: Wu, Ruihan, et al.
Published: (2024)
by: Wu, Ruihan, et al.
Published: (2024)
When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
by: Nagireddy, Manish, et al.
Published: (2024)
by: Nagireddy, Manish, et al.
Published: (2024)
Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams
by: Kim, Jiyeon, et al.
Published: (2026)
by: Kim, Jiyeon, et al.
Published: (2026)
Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
by: Miehling, Erik, et al.
Published: (2024)
by: Miehling, Erik, et al.
Published: (2024)
Empathy and the Right to Be an Exception: What LLMs Can and Cannot Do
by: Kidder, William, et al.
Published: (2024)
by: Kidder, William, et al.
Published: (2024)
Automated Concept Discovery for LLM-as-a-Judge Preference Analysis
by: Wedgwood, James, et al.
Published: (2026)
by: Wedgwood, James, et al.
Published: (2026)
Can We Infer Confidential Properties of Training Data from LLMs?
by: Huang, Pengrun, et al.
Published: (2025)
by: Huang, Pengrun, et al.
Published: (2025)
Fundamental Safety-Capability Trade-offs in Fine-tuning Large Language Models
by: Chen, Pin-Yu, et al.
Published: (2025)
by: Chen, Pin-Yu, et al.
Published: (2025)
Needle in the Haystack for Memory Based Large Language Models
by: Nelson, Elliot, et al.
Published: (2024)
by: Nelson, Elliot, et al.
Published: (2024)
Scopes of Alignment
by: Varshney, Kush R., et al.
Published: (2025)
by: Varshney, Kush R., et al.
Published: (2025)
Down the Toxicity Rabbit Hole: A Novel Framework to Bias Audit Large Language Models
by: Dutta, Arka, et al.
Published: (2023)
by: Dutta, Arka, et al.
Published: (2023)
Decolonial AI Alignment: Openness, Viśe\d{s}a-Dharma, and Including Excluded Knowledges
by: Varshney, Kush R.
Published: (2023)
by: Varshney, Kush R.
Published: (2023)
Automated Benchmark Auditing for AI Agents and Large Language Models
by: Wang, Junlin, et al.
Published: (2026)
by: Wang, Junlin, et al.
Published: (2026)
Why Don't Prompt-Based Fairness Metrics Correlate?
by: Zayed, Abdelrahman, et al.
Published: (2024)
by: Zayed, Abdelrahman, et al.
Published: (2024)
Segmentation Beyond Defaults: Asymmetrical Byte Pair Encoding for Optimal Machine Translation Performance
by: Yadav, Saumitra, et al.
Published: (2025)
by: Yadav, Saumitra, et al.
Published: (2025)
Get away with less: Need of source side data curation to build parallel corpus for low resource Machine Translation
by: Yadav, Saumitra, et al.
Published: (2026)
by: Yadav, Saumitra, et al.
Published: (2026)
Transformer-based Causal Language Models Perform Clustering
by: Wu, Xinbo, et al.
Published: (2024)
by: Wu, Xinbo, et al.
Published: (2024)
Can Memory-Augmented Language Models Generalize on Reasoning-in-a-Haystack Tasks?
by: Das, Payel, et al.
Published: (2025)
by: Das, Payel, et al.
Published: (2025)
Multi-Level Explanations for Generative Language Models
by: Paes, Lucas Monteiro, et al.
Published: (2024)
by: Paes, Lucas Monteiro, et al.
Published: (2024)
Generation Constraint Scaling Can Mitigate Hallucination
by: Kollias, Georgios, et al.
Published: (2024)
by: Kollias, Georgios, et al.
Published: (2024)
Don't Change My View: Ideological Bias Auditing in Large Language Models
by: Kröger, Paul, et al.
Published: (2025)
by: Kröger, Paul, et al.
Published: (2025)
Signal or Noise? Evaluating Large Language Models in Resume Screening Across Contextual Variations and Human Expert Benchmarks
by: Varshney, Aryan, et al.
Published: (2025)
by: Varshney, Aryan, et al.
Published: (2025)
Systematic Offensive Stereotyping (SOS) Bias in Language Models
by: Elsafoury, Fatma
Published: (2023)
by: Elsafoury, Fatma
Published: (2023)
What's in a Name? Auditing Large Language Models for Race and Gender Bias
by: Salinas, Alejandro, et al.
Published: (2024)
by: Salinas, Alejandro, et al.
Published: (2024)
Social Bias Probing: Fairness Benchmarking for Language Models
by: Manerba, Marta Marchiori, et al.
Published: (2023)
by: Manerba, Marta Marchiori, et al.
Published: (2023)
BioMamba: Domain-Adaptive Biomedical Language Models
by: Yue, Ling, et al.
Published: (2024)
by: Yue, Ling, et al.
Published: (2024)
A Meta-Learning Perspective on Transformers for Causal Language Modeling
by: Wu, Xinbo, et al.
Published: (2023)
by: Wu, Xinbo, et al.
Published: (2023)
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models
by: Dhurandhar, Amit, et al.
Published: (2024)
by: Dhurandhar, Amit, et al.
Published: (2024)
Hey GPT, Can You be More Racist? Analysis from Crowdsourced Attempts to Elicit Biased Content from Generative AI
by: Guo, Hangzhi, et al.
Published: (2024)
by: Guo, Hangzhi, et al.
Published: (2024)
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models
by: Adhikari, Rabin, et al.
Published: (2024)
by: Adhikari, Rabin, et al.
Published: (2024)
Political Alignment in Large Language Models: A Multidimensional Audit of Psychometric Identity and Behavioral Bias
by: Sakhawat, Adib, et al.
Published: (2026)
by: Sakhawat, Adib, et al.
Published: (2026)
Are Bias Evaluation Methods Biased ?
by: Berrayana, Lina, et al.
Published: (2025)
by: Berrayana, Lina, et al.
Published: (2025)
CLIMB: A Benchmark of Clinical Bias in Large Language Models
by: Zhang, Yubo, et al.
Published: (2024)
by: Zhang, Yubo, et al.
Published: (2024)
Zero-Shot Grammar Competency Estimation Using Large Language Model Generated Pseudo Labels
by: Das, Sourya Dipta, et al.
Published: (2025)
by: Das, Sourya Dipta, et al.
Published: (2025)
Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classification
by: Dubey, Kush
Published: (2024)
by: Dubey, Kush
Published: (2024)
Granite Guardian
by: Padhi, Inkit, et al.
Published: (2024)
by: Padhi, Inkit, et al.
Published: (2024)
Similar Items
-
Value Alignment from Unstructured Text
by: Padhi, Inkit, et al.
Published: (2024) -
An Annotated Reading of 'The Singer of Tales' in the LLM Era
by: Varshney, Kush R.
Published: (2025) -
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
by: Achintalwar, Swapnaja, et al.
Published: (2024) -
Contextual Moral Value Alignment Through Context-Based Aggregation
by: Dognin, Pierre, et al.
Published: (2024) -
Evaluating Deep Unlearning in Large Language Models
by: Wu, Ruihan, et al.
Published: (2024)