:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Stańczak, Karolina, Meade, Nicholas, Bhatia, Mehar, Zhou, Hattie, Böttinger, Konstantin, Barnes, Jeremy, Stanley, Jason, Montgomery, Jessica, Zemel, Richard, Papernot, Nicolas, Chapados, Nicolas, Therien, Denis, Lillicrap, Timothy P., Marasović, Ana, Delacroix, Sylvie, Hadfield, Gillian K., Reddy, Siva
Formato:	Preprint
Publicado:	2025
Materias:	Computers and Society Artificial Intelligence Computation and Language
Acceso en línea:	https://arxiv.org/abs/2503.00069
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Value Drifts: Tracing Value Alignment During LLM Post-Training
por: Bhatia, Mehar, et al.
Publicado: (2025)

CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics
por: Nayak, Shravan, et al.
Publicado: (2025)

Habitual Ethics?
por: Delacroix, Sylvie
Publicado: (2022)

A Multilingual Perspective on Probing Gender Bias
por: Stańczak, Karolina
Publicado: (2024)

What Does It Take to Build a Performant Selective Classifier?
por: Rabanser, Stephan, et al.
Publicado: (2025)

Harder or Different? Understanding Generalization of Audio Deepfake Detection
por: Müller, Nicolas M., et al.
Publicado: (2024)

Does Audio Deepfake Detection Generalize?
por: Müller, Nicolas M., et al.
Publicado: (2022)

DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning
por: Marjanović, Sara Vera, et al.
Publicado: (2025)

SafeArena: Evaluating the Safety of Autonomous Web Agents
por: Tur, Ada Defne, et al.
Publicado: (2025)

Legal Infrastructure for Transformative AI Governance
por: Hadfield, Gillian K.
Publicado: (2026)

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
por: BehnamGhader, Parishad, et al.
Publicado: (2024)

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
por: Lù, Xing Han, et al.
Publicado: (2025)

LLM2Vec-Gen: Generative Embeddings from Large Language Models
por: BehnamGhader, Parishad, et al.
Publicado: (2026)

Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model
por: Cebere, Tudor, et al.
Publicado: (2024)

Fairness Feedback Loops: Training on Synthetic Data Amplifies Bias
por: Wyllie, Sierra, et al.
Publicado: (2024)

Binnendifferenzierung im Alphabetisierungskurs im Bereich Deutsch als Zweitsprache
por: Böttinger, Anja
Publicado: (2023)

Investigating Adversarial Trigger Transfer in Large Language Models
por: Meade, Nicholas, et al.
Publicado: (2024)

On Optimizing Multimodal Jailbreaks for Spoken Language Models
por: Krishnan, Aravind, et al.
Publicado: (2026)

Quantifying Gender Biases Towards Politicians on Reddit
por: Marjanovic, Sara, et al.
Publicado: (2021)

Security-by-Design for LLM-Based Code Generation: Leveraging Internal Representations for Concept-Driven Steering Mechanisms
por: Wendlinger, Maximilian, et al.
Publicado: (2026)

Diverse Preference Learning for Capabilities and Alignment
por: Slocum, Stewart, et al.
Publicado: (2025)

A New Approach to Voice Authenticity
por: Müller, Nicolas M., et al.
Publicado: (2024)

Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs
por: Khan, Ariba, et al.
Publicado: (2025)

Distribution-Free Statistical Dispersion Control for Societal Applications
por: Deng, Zhun, et al.
Publicado: (2023)

Regulatory Markets for AI Safety
por: Clark, Jack, et al.
Publicado: (2019)

Regulatory Markets: The Future of AI Governance
por: Hadfield, Gillian K., et al.
Publicado: (2023)

Rational Silence and False Polarization: How Viewpoint Organizations and Recommender Systems Distort the Expression of Public Opinion
por: Sarkar, Atrisha, et al.
Publicado: (2024)

An Economy of AI Agents
por: Hadfield, Gillian K., et al.
Publicado: (2025)

Exploiting Instruction-Following Retrievers for Malicious Information Retrieval
por: BehnamGhader, Parishad, et al.
Publicado: (2025)

In-House Information Management for Government Contractors.
por: Anderson, Hattie T.
Publicado: (1981)

Suitability Filter: A Statistical Framework for Classifier Evaluation in Real-World Deployment Settings
por: Pouget, Angéline, et al.
Publicado: (2025)

LLM Dataset Inference: Did you train on my dataset?
por: Maini, Pratyush, et al.
Publicado: (2024)

Regulation Games for Trustworthy Machine Learning
por: Yaghini, Mohammad, et al.
Publicado: (2024)

CALMA: A Process for Deriving Context-aligned Axes for Language Model Alignment
por: Soni, Prajna, et al.
Publicado: (2025)

Servicio de alimentos y bebidas / D. R. Lillicrap ; Traductora, Guadalupe Meza Staines
por: Lillicrap, D. R
Publicado: (1994)

Servicio de alimentos y bebidas / D.R. Lillicrap ; traductor, Guadalupe García de León del Paso
por: Lillicrap, D.R

La presencia de la ausencia. Hacia una antropología de la vida póstuma de los desparecidos en el Perú
por: Dorothée Delacroix
Publicado: (2020)

L’histoire du temps présent, une histoire (vraiment) comme les autres ?
por: Christian Delacroix
Publicado: (2018)

«Somos peruanos y limpios». Discursos y prácticas en torno al monumento «El Ojo que Llora» de Llinque, Apurímac
por: Dorothée Delacroix
Publicado: (2014)

COLOQUIO INTERNACIONAL «IMPACTOS DE LAS REPARACIONES A LAS VÍCTIMAS EN LAS SOCIEDADES POSCONFLICTO. MEMORIA DE LOS CUERPOS, CONMEMORACIÓN Y PATRIMONIALIZACIÓN»
por: Dorothée Delacroix
Publicado: (2015)