:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dekoninck, Jasper, Petrov, Ivo, Minchev, Kristian, Balunovic, Mislav, Vechev, Martin, Marinov, Miroslav, Drencheva, Maria, Konova, Lyuba, Shumanov, Milen, Tsvetkov, Kaloyan, Drenchev, Nikolay, Todorov, Lazar, Nikolova, Kalina, Georgiev, Nikolay, Kalinkova, Vanesa, Ismoldayev, Margulan
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2506.21621
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
by: Petrov, Ivo, et al.
Published: (2025)

MathConstruct: Challenging LLM Reasoning with Constructive Proofs
by: Balunović, Mislav, et al.
Published: (2025)

MathArena: Evaluating LLMs on Uncontaminated Math Competitions
by: Balunović, Mislav, et al.
Published: (2025)

CuTS: Customizable Tabular Synthetic Data Generation
by: Vero, Mark, et al.
Published: (2023)

Large Language Models are Advanced Anonymizers
by: Staab, Robin, et al.
Published: (2024)

Beyond Memorization: Violating Privacy Via Inference with Large Language Models
by: Staab, Robin, et al.
Published: (2023)

ToolFuzz -- Automated Agent Tool Testing
by: Milev, Ivan, et al.
Published: (2025)

Not All Proofs Are Equal: Evaluating LLM Proof Quality Beyond Correctness
by: Petrov, Ivo, et al.
Published: (2026)

BV photometry of the ultracompact binary star GP Com
by: Zamanov, Radoslav, et al.
Published: (2026)

Proof of the Complete Presence of a Modulo 4 Bias for the Semiprimes
by: Gyulev, Nikola, et al.
Published: (2024)

A Unified Approach to Routing and Cascading for LLMs
by: Dekoninck, Jasper, et al.
Published: (2024)

Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
by: Dekoninck, Jasper, et al.
Published: (2024)

Constrained Decoding of Diffusion LLMs with Context-Free Grammars
by: Mündler, Niels, et al.
Published: (2025)

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
by: Petrov, Ivo, et al.
Published: (2025)

Learning from Saturated Data: Signals Beyond Correctness for LLM Training
by: Hiss, Hanno, et al.
Published: (2026)

AlphaIntegrator: Transformer Action Search for Symbolic Integration Proofs
by: Ünsal, Mert, et al.
Published: (2024)

GRAIN: Exact Graph Reconstruction from Gradients
by: Drencheva, Maria, et al.
Published: (2025)

COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
by: Guldimann, Philipp, et al.
Published: (2024)

ConStat: Performance-Based Contamination Detection in Large Language Models
by: Dekoninck, Jasper, et al.
Published: (2024)

Adaptive Generation of Bias-Eliciting Questions for LLMs
by: Staab, Robin, et al.
Published: (2025)

Passive uplift of montane biotas: recent advances
by: Michael Heads, et al.
Published: (2025)

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
by: Debenedetti, Edoardo, et al.
Published: (2024)

Specific features of the $π$-electron spectrum of narrow achiral $(2m,m)$ nanoribbons
by: Malysheva, Lyuba
Published: (2025)

Controlled Text Generation via Language Model Arithmetic
by: Dekoninck, Jasper, et al.
Published: (2023)

Working apart together: How outcome control supports operations when working from home
by: Henri C. Dekker, et al.
Published: (2025)

Proof Flow: Preliminary Study on Generative Flow Network Language Model Tuning for Formal Reasoning
by: Ho, Matthew, et al.
Published: (2024)

Evading Data Contamination Detection for Language Models is (too) Easy
by: Dekoninck, Jasper, et al.
Published: (2024)

Proofs that Modify Proofs
by: Towsner, Henry
Published: (2024)

Beyond the lone hero: How interpersonal feedback seeking helps entrepreneurs to engage with their social environment
by: Andreana Drencheva, et al.
Published: (2024)

New mineralogical and geochemical data on a potential critical raw materials occurrence at Polski Gradets (Southеastern Bulgaria)
by: Stavrev, Milen, et al.
Published: (2024)

Punctually Standard and Nonstandard Models of Natural Numbers
by: Bazhenov, Nikolay, et al.
Published: (2026)

Delay, Plateau, or Collapse: Evaluating the Impact of Systematic Verification Error on RLVR
by: Egashira, Kazuki, et al.
Published: (2026)

All Proof of Work But No Proof of Play
by: Tirmazi, Hayder
Published: (2025)

Proofs that Modify Proofs, 1/2
by: Towsner, Henry
Published: (2025)

Symmetric Proofs in the Ideal Proof System
by: Dawar, Anuj, et al.
Published: (2025)

Dog Meat in Late Iron Age Bulgaria: Necessity, Delicacy, or Part of a Wider Intercultural Tradition?
by: Stella Nikolova
Published: (2025)

Evaluation of Phenological and Morphological Traits in Pea Accessions (Pisum sativum L.) and Their Resistance to Pea Weevil (Bruchus pisorum L.)
by: Ivelina Nikolova
Published: (2024)

Unravelling Abstract Cyclic Proofs into Proofs by Induction
by: Grotenhuis, Lide, et al.
Published: (2026)

ProofCloud: A Proof Retrieval Engine for Verified Proofs in Higher Order Logic
by: Wang, Shuai
Published: (2024)

Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs
by: Dekoninck, Jasper, et al.
Published: (2026)