:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Härle, Ruben, Friedrich, Felix, Brack, Manuel, Wäldchen, Stephan, Deiseroth, Björn, Schramowski, Patrick, Kersting, Kristian
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2506.19382
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs
by: Härle, Ruben, et al.
Published: (2024)

T-FREE: Subword Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings
by: Deiseroth, Björn, et al.
Published: (2024)

LIME: Making LLM Data More Efficient with Linguistic Metadata Embeddings
by: Sztwiertnia, Sebastian, et al.
Published: (2025)

Beyond Overcorrection: Evaluating Diversity in T2I Models with DivBench
by: Friedrich, Felix, et al.
Published: (2025)

Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization
by: Deiseroth, Björn, et al.
Published: (2023)

AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation
by: Deiseroth, Björn, et al.
Published: (2023)

Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You
by: Friedrich, Felix, et al.
Published: (2024)

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Inconsistencies
by: Friedrich, Felix, et al.
Published: (2024)

LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models
by: Helff, Lukas, et al.
Published: (2024)

Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis
by: Struppek, Lukas, et al.
Published: (2022)

Does CLIP Know My Face?
by: Hintersdorf, Dominik, et al.
Published: (2022)

CHRONOBERG: Capturing Language Evolution and Temporal Awareness in Foundation Models
by: Hegde, Niharika, et al.
Published: (2025)

ActivationReasoning: Logical Reasoning in Latent Activation Spaces
by: Helff, Lukas, et al.
Published: (2025)

AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency
by: Höth, Max Henning, et al.
Published: (2026)

Core Tokensets for Data-efficient Sequential Training of Transformers
by: Paul, Subarnaduti, et al.
Published: (2024)

LEDITS++: Limitless Image Editing using Text-to-Image Models
by: Brack, Manuel, et al.
Published: (2023)

Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols
by: Deiseroth, Björn, et al.
Published: (2025)

How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions
by: Brack, Manuel, et al.
Published: (2025)

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
by: Tedeschi, Simone, et al.
Published: (2024)

A Typology for Exploring the Mitigation of Shortcut Behavior
by: Friedrich, Felix, et al.
Published: (2022)

Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation
by: Burns, Thomas F, et al.
Published: (2025)

DeiSAM: Segment Anything with Deictic Prompting
by: Shindo, Hikaru, et al.
Published: (2024)

No Safe Dose: How Training Data Drives Unsafe Image Generation
by: Friedrich, Felix, et al.
Published: (2026)

LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking
by: Helff, Lukas, et al.
Published: (2026)

SLR: Automated Synthesis for Scalable Logical Reasoning
by: Helff, Lukas, et al.
Published: (2025)

Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective
by: Yan, Hanqi, et al.
Published: (2024)

Hierarchical Autoregressive Transformers: Combining Byte- and Word-Level Processing for Robust, Adaptable Language Models
by: Neitemeier, Pit, et al.
Published: (2025)

Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
by: Ali, Mehdi, et al.
Published: (2025)

The Cake that is Intelligence and Who Gets to Bake it: An AI Analogy and its Implications for Participation
by: Mundt, Martin, et al.
Published: (2025)

Learning by Self-Explaining
by: Stammer, Wolfgang, et al.
Published: (2023)

ART: Adaptive Relation Tuning for Generalized Relation Prediction
by: Sudhakaran, Gopika, et al.
Published: (2025)

Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs
by: Ye, Charles, et al.
Published: (2026)

EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection
by: Schuhmann, Christoph, et al.
Published: (2025)

Defending Our Privacy With Backdoors
by: Hintersdorf, Dominik, et al.
Published: (2023)

SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems
by: Shindo, Hikaru, et al.
Published: (2026)

Adaptive Rational Activations to Boost Deep Reinforcement Learning
by: Delfosse, Quentin, et al.
Published: (2021)

Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information
by: Struppek, Lukas, et al.
Published: (2025)

STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond
by: Dycke, Nils, et al.
Published: (2024)

Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning
by: Ostermann, Simon, et al.
Published: (2024)

A Monosemantic Attribution Framework for Stable Interpretability in Clinical Neuroscience Transformer-Based Language Models
by: Mamalakis, Michail, et al.
Published: (2026)