Saved in:
| Main Authors: | Härle, Ruben, Friedrich, Felix, Brack, Manuel, Wäldchen, Stephan, Deiseroth, Björn, Schramowski, Patrick, Kersting, Kristian |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.19382 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs
by: Härle, Ruben, et al.
Published: (2024)
by: Härle, Ruben, et al.
Published: (2024)
T-FREE: Subword Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings
by: Deiseroth, Björn, et al.
Published: (2024)
by: Deiseroth, Björn, et al.
Published: (2024)
LIME: Making LLM Data More Efficient with Linguistic Metadata Embeddings
by: Sztwiertnia, Sebastian, et al.
Published: (2025)
by: Sztwiertnia, Sebastian, et al.
Published: (2025)
Beyond Overcorrection: Evaluating Diversity in T2I Models with DivBench
by: Friedrich, Felix, et al.
Published: (2025)
by: Friedrich, Felix, et al.
Published: (2025)
Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization
by: Deiseroth, Björn, et al.
Published: (2023)
by: Deiseroth, Björn, et al.
Published: (2023)
AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation
by: Deiseroth, Björn, et al.
Published: (2023)
by: Deiseroth, Björn, et al.
Published: (2023)
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You
by: Friedrich, Felix, et al.
Published: (2024)
by: Friedrich, Felix, et al.
Published: (2024)
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Inconsistencies
by: Friedrich, Felix, et al.
Published: (2024)
by: Friedrich, Felix, et al.
Published: (2024)
LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models
by: Helff, Lukas, et al.
Published: (2024)
by: Helff, Lukas, et al.
Published: (2024)
Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis
by: Struppek, Lukas, et al.
Published: (2022)
by: Struppek, Lukas, et al.
Published: (2022)
Does CLIP Know My Face?
by: Hintersdorf, Dominik, et al.
Published: (2022)
by: Hintersdorf, Dominik, et al.
Published: (2022)
CHRONOBERG: Capturing Language Evolution and Temporal Awareness in Foundation Models
by: Hegde, Niharika, et al.
Published: (2025)
by: Hegde, Niharika, et al.
Published: (2025)
ActivationReasoning: Logical Reasoning in Latent Activation Spaces
by: Helff, Lukas, et al.
Published: (2025)
by: Helff, Lukas, et al.
Published: (2025)
AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency
by: Höth, Max Henning, et al.
Published: (2026)
by: Höth, Max Henning, et al.
Published: (2026)
Core Tokensets for Data-efficient Sequential Training of Transformers
by: Paul, Subarnaduti, et al.
Published: (2024)
by: Paul, Subarnaduti, et al.
Published: (2024)
LEDITS++: Limitless Image Editing using Text-to-Image Models
by: Brack, Manuel, et al.
Published: (2023)
by: Brack, Manuel, et al.
Published: (2023)
Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols
by: Deiseroth, Björn, et al.
Published: (2025)
by: Deiseroth, Björn, et al.
Published: (2025)
How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions
by: Brack, Manuel, et al.
Published: (2025)
by: Brack, Manuel, et al.
Published: (2025)
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
by: Tedeschi, Simone, et al.
Published: (2024)
by: Tedeschi, Simone, et al.
Published: (2024)
A Typology for Exploring the Mitigation of Shortcut Behavior
by: Friedrich, Felix, et al.
Published: (2022)
by: Friedrich, Felix, et al.
Published: (2022)
Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation
by: Burns, Thomas F, et al.
Published: (2025)
by: Burns, Thomas F, et al.
Published: (2025)
DeiSAM: Segment Anything with Deictic Prompting
by: Shindo, Hikaru, et al.
Published: (2024)
by: Shindo, Hikaru, et al.
Published: (2024)
No Safe Dose: How Training Data Drives Unsafe Image Generation
by: Friedrich, Felix, et al.
Published: (2026)
by: Friedrich, Felix, et al.
Published: (2026)
LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking
by: Helff, Lukas, et al.
Published: (2026)
by: Helff, Lukas, et al.
Published: (2026)
SLR: Automated Synthesis for Scalable Logical Reasoning
by: Helff, Lukas, et al.
Published: (2025)
by: Helff, Lukas, et al.
Published: (2025)
Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective
by: Yan, Hanqi, et al.
Published: (2024)
by: Yan, Hanqi, et al.
Published: (2024)
Hierarchical Autoregressive Transformers: Combining Byte- and Word-Level Processing for Robust, Adaptable Language Models
by: Neitemeier, Pit, et al.
Published: (2025)
by: Neitemeier, Pit, et al.
Published: (2025)
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
by: Ali, Mehdi, et al.
Published: (2025)
by: Ali, Mehdi, et al.
Published: (2025)
The Cake that is Intelligence and Who Gets to Bake it: An AI Analogy and its Implications for Participation
by: Mundt, Martin, et al.
Published: (2025)
by: Mundt, Martin, et al.
Published: (2025)
Learning by Self-Explaining
by: Stammer, Wolfgang, et al.
Published: (2023)
by: Stammer, Wolfgang, et al.
Published: (2023)
ART: Adaptive Relation Tuning for Generalized Relation Prediction
by: Sudhakaran, Gopika, et al.
Published: (2025)
by: Sudhakaran, Gopika, et al.
Published: (2025)
Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs
by: Ye, Charles, et al.
Published: (2026)
by: Ye, Charles, et al.
Published: (2026)
EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection
by: Schuhmann, Christoph, et al.
Published: (2025)
by: Schuhmann, Christoph, et al.
Published: (2025)
Defending Our Privacy With Backdoors
by: Hintersdorf, Dominik, et al.
Published: (2023)
by: Hintersdorf, Dominik, et al.
Published: (2023)
SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems
by: Shindo, Hikaru, et al.
Published: (2026)
by: Shindo, Hikaru, et al.
Published: (2026)
Adaptive Rational Activations to Boost Deep Reinforcement Learning
by: Delfosse, Quentin, et al.
Published: (2021)
by: Delfosse, Quentin, et al.
Published: (2021)
Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information
by: Struppek, Lukas, et al.
Published: (2025)
by: Struppek, Lukas, et al.
Published: (2025)
STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond
by: Dycke, Nils, et al.
Published: (2024)
by: Dycke, Nils, et al.
Published: (2024)
Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning
by: Ostermann, Simon, et al.
Published: (2024)
by: Ostermann, Simon, et al.
Published: (2024)
A Monosemantic Attribution Framework for Stable Interpretability in Clinical Neuroscience Transformer-Based Language Models
by: Mamalakis, Michail, et al.
Published: (2026)
by: Mamalakis, Michail, et al.
Published: (2026)
Similar Items
-
SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs
by: Härle, Ruben, et al.
Published: (2024) -
T-FREE: Subword Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings
by: Deiseroth, Björn, et al.
Published: (2024) -
LIME: Making LLM Data More Efficient with Linguistic Metadata Embeddings
by: Sztwiertnia, Sebastian, et al.
Published: (2025) -
Beyond Overcorrection: Evaluating Diversity in T2I Models with DivBench
by: Friedrich, Felix, et al.
Published: (2025) -
Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization
by: Deiseroth, Björn, et al.
Published: (2023)