Saved in:
| Main Authors: | Dekoninck, Jasper, Petrov, Ivo, Minchev, Kristian, Balunovic, Mislav, Vechev, Martin, Marinov, Miroslav, Drencheva, Maria, Konova, Lyuba, Shumanov, Milen, Tsvetkov, Kaloyan, Drenchev, Nikolay, Todorov, Lazar, Nikolova, Kalina, Georgiev, Nikolay, Kalinkova, Vanesa, Ismoldayev, Margulan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.21621 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
by: Petrov, Ivo, et al.
Published: (2025)
by: Petrov, Ivo, et al.
Published: (2025)
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
by: Balunović, Mislav, et al.
Published: (2025)
by: Balunović, Mislav, et al.
Published: (2025)
MathArena: Evaluating LLMs on Uncontaminated Math Competitions
by: Balunović, Mislav, et al.
Published: (2025)
by: Balunović, Mislav, et al.
Published: (2025)
CuTS: Customizable Tabular Synthetic Data Generation
by: Vero, Mark, et al.
Published: (2023)
by: Vero, Mark, et al.
Published: (2023)
Large Language Models are Advanced Anonymizers
by: Staab, Robin, et al.
Published: (2024)
by: Staab, Robin, et al.
Published: (2024)
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
by: Staab, Robin, et al.
Published: (2023)
by: Staab, Robin, et al.
Published: (2023)
ToolFuzz -- Automated Agent Tool Testing
by: Milev, Ivan, et al.
Published: (2025)
by: Milev, Ivan, et al.
Published: (2025)
Not All Proofs Are Equal: Evaluating LLM Proof Quality Beyond Correctness
by: Petrov, Ivo, et al.
Published: (2026)
by: Petrov, Ivo, et al.
Published: (2026)
BV photometry of the ultracompact binary star GP Com
by: Zamanov, Radoslav, et al.
Published: (2026)
by: Zamanov, Radoslav, et al.
Published: (2026)
Proof of the Complete Presence of a Modulo 4 Bias for the Semiprimes
by: Gyulev, Nikola, et al.
Published: (2024)
by: Gyulev, Nikola, et al.
Published: (2024)
A Unified Approach to Routing and Cascading for LLMs
by: Dekoninck, Jasper, et al.
Published: (2024)
by: Dekoninck, Jasper, et al.
Published: (2024)
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
by: Dekoninck, Jasper, et al.
Published: (2024)
by: Dekoninck, Jasper, et al.
Published: (2024)
Constrained Decoding of Diffusion LLMs with Context-Free Grammars
by: Mündler, Niels, et al.
Published: (2025)
by: Mündler, Niels, et al.
Published: (2025)
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
by: Petrov, Ivo, et al.
Published: (2025)
by: Petrov, Ivo, et al.
Published: (2025)
Learning from Saturated Data: Signals Beyond Correctness for LLM Training
by: Hiss, Hanno, et al.
Published: (2026)
by: Hiss, Hanno, et al.
Published: (2026)
AlphaIntegrator: Transformer Action Search for Symbolic Integration Proofs
by: Ünsal, Mert, et al.
Published: (2024)
by: Ünsal, Mert, et al.
Published: (2024)
GRAIN: Exact Graph Reconstruction from Gradients
by: Drencheva, Maria, et al.
Published: (2025)
by: Drencheva, Maria, et al.
Published: (2025)
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
by: Guldimann, Philipp, et al.
Published: (2024)
by: Guldimann, Philipp, et al.
Published: (2024)
ConStat: Performance-Based Contamination Detection in Large Language Models
by: Dekoninck, Jasper, et al.
Published: (2024)
by: Dekoninck, Jasper, et al.
Published: (2024)
Adaptive Generation of Bias-Eliciting Questions for LLMs
by: Staab, Robin, et al.
Published: (2025)
by: Staab, Robin, et al.
Published: (2025)
Passive uplift of montane biotas: recent advances
by: Michael Heads, et al.
Published: (2025)
by: Michael Heads, et al.
Published: (2025)
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
by: Debenedetti, Edoardo, et al.
Published: (2024)
by: Debenedetti, Edoardo, et al.
Published: (2024)
Specific features of the $π$-electron spectrum of narrow achiral $(2m,m)$ nanoribbons
by: Malysheva, Lyuba
Published: (2025)
by: Malysheva, Lyuba
Published: (2025)
Controlled Text Generation via Language Model Arithmetic
by: Dekoninck, Jasper, et al.
Published: (2023)
by: Dekoninck, Jasper, et al.
Published: (2023)
Working apart together: How outcome control supports operations when working from home
by: Henri C. Dekker, et al.
Published: (2025)
by: Henri C. Dekker, et al.
Published: (2025)
Proof Flow: Preliminary Study on Generative Flow Network Language Model Tuning for Formal Reasoning
by: Ho, Matthew, et al.
Published: (2024)
by: Ho, Matthew, et al.
Published: (2024)
Evading Data Contamination Detection for Language Models is (too) Easy
by: Dekoninck, Jasper, et al.
Published: (2024)
by: Dekoninck, Jasper, et al.
Published: (2024)
Proofs that Modify Proofs
by: Towsner, Henry
Published: (2024)
by: Towsner, Henry
Published: (2024)
Beyond the lone hero: How interpersonal feedback seeking helps entrepreneurs to engage with their social environment
by: Andreana Drencheva, et al.
Published: (2024)
by: Andreana Drencheva, et al.
Published: (2024)
New mineralogical and geochemical data on a potential critical raw materials occurrence at Polski Gradets (Southеastern Bulgaria)
by: Stavrev, Milen, et al.
Published: (2024)
by: Stavrev, Milen, et al.
Published: (2024)
Punctually Standard and Nonstandard Models of Natural Numbers
by: Bazhenov, Nikolay, et al.
Published: (2026)
by: Bazhenov, Nikolay, et al.
Published: (2026)
Delay, Plateau, or Collapse: Evaluating the Impact of Systematic Verification Error on RLVR
by: Egashira, Kazuki, et al.
Published: (2026)
by: Egashira, Kazuki, et al.
Published: (2026)
All Proof of Work But No Proof of Play
by: Tirmazi, Hayder
Published: (2025)
by: Tirmazi, Hayder
Published: (2025)
Proofs that Modify Proofs, 1/2
by: Towsner, Henry
Published: (2025)
by: Towsner, Henry
Published: (2025)
Symmetric Proofs in the Ideal Proof System
by: Dawar, Anuj, et al.
Published: (2025)
by: Dawar, Anuj, et al.
Published: (2025)
Dog Meat in Late Iron Age Bulgaria: Necessity, Delicacy, or Part of a Wider Intercultural Tradition?
by: Stella Nikolova
Published: (2025)
by: Stella Nikolova
Published: (2025)
Evaluation of Phenological and Morphological Traits in Pea Accessions (Pisum sativum L.) and Their Resistance to Pea Weevil (Bruchus pisorum L.)
by: Ivelina Nikolova
Published: (2024)
by: Ivelina Nikolova
Published: (2024)
Unravelling Abstract Cyclic Proofs into Proofs by Induction
by: Grotenhuis, Lide, et al.
Published: (2026)
by: Grotenhuis, Lide, et al.
Published: (2026)
ProofCloud: A Proof Retrieval Engine for Verified Proofs in Higher Order Logic
by: Wang, Shuai
Published: (2024)
by: Wang, Shuai
Published: (2024)
Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs
by: Dekoninck, Jasper, et al.
Published: (2026)
by: Dekoninck, Jasper, et al.
Published: (2026)
Similar Items
-
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
by: Petrov, Ivo, et al.
Published: (2025) -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
by: Balunović, Mislav, et al.
Published: (2025) -
MathArena: Evaluating LLMs on Uncontaminated Math Competitions
by: Balunović, Mislav, et al.
Published: (2025) -
CuTS: Customizable Tabular Synthetic Data Generation
by: Vero, Mark, et al.
Published: (2023) -
Large Language Models are Advanced Anonymizers
by: Staab, Robin, et al.
Published: (2024)