:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autore principale:	Chojecki, Przemyslaw
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Artificial Intelligence Machine Learning
Accesso online:	https://arxiv.org/abs/2512.13764
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Self-Improving AI Agents through Self-Play
di: Chojecki, Przemyslaw
Pubblicazione: (2025)

The Geometry of Benchmarks: A New Path Toward AGI
di: Chojecki, Przemyslaw
Pubblicazione: (2025)

Psychometric Tests for AI Agents and Their Moduli Space
di: Chojecki, Przemyslaw
Pubblicazione: (2025)

An Operational Kardashev-Style Scale for Autonomous AI - Towards AGI and Superintelligence
di: Chojecki, Przemyslaw
Pubblicazione: (2025)

Model Science: getting serious about verification, explanation and control of AI systems
di: Biecek, Przemyslaw, et al.
Pubblicazione: (2025)

On the Mathematical Impossibility of Safe Universal Approximators
di: Yao, Jasper
Pubblicazione: (2025)

HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics
di: Fan, Jingxuan, et al.
Pubblicazione: (2024)

An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems
di: Hao, Yuren, et al.
Pubblicazione: (2025)

Beyond Backpropagation: Exploring Innovative Algorithms for Energy-Efficient Deep Neural Network Training
di: Spyra, Przemysław
Pubblicazione: (2025)

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
di: Yu, Zhouliang, et al.
Pubblicazione: (2025)

FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks
di: Thomas, Nishal, et al.
Pubblicazione: (2026)

The Little Book of Generative AI Foundations: An Intuitive Mathematical Primer
di: Chen, Tianhua
Pubblicazione: (2026)

The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI
di: Li, Fulu
Pubblicazione: (2024)

Exploring Local Explanations of Nonlinear Models Using Animated Linear Projections
di: Spyrison, Nicholas, et al.
Pubblicazione: (2022)

Attributions All the Way Down? The Metagame of Interpretability
di: Baniecki, Hubert, et al.
Pubblicazione: (2026)

Parity, Sensitivity, and Transformers
di: Kozachinskiy, Alexander, et al.
Pubblicazione: (2026)

HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class
di: Roggeveen, James V., et al.
Pubblicazione: (2025)

VAR-MATH: Probing True Mathematical Reasoning in LLMS via Symbolic Multi-Instance Benchmarks
di: Yao, Jian, et al.
Pubblicazione: (2025)

SBSC: Step-By-Step Coding for Improving Mathematical Olympiad Performance
di: Singh, Kunal, et al.
Pubblicazione: (2025)

Towards AI Transparency and Accountability: A Global Framework for Exchanging Information on AI Systems
di: Buckley, Warren, et al.
Pubblicazione: (2023)

Position: Explain to Question not to Justify
di: Biecek, Przemyslaw, et al.
Pubblicazione: (2024)

I-RAVEN-X: Benchmarking Generalization and Robustness of Analogical and Mathematical Reasoning in Large Language and Reasoning Models
di: Camposampiero, Giacomo, et al.
Pubblicazione: (2025)

AIDE: AI-Driven Exploration in the Space of Code
di: Jiang, Zhengyao, et al.
Pubblicazione: (2025)

The Agentic Researcher: A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning
di: Zimmer, Max, et al.
Pubblicazione: (2026)

REBEL: Hidden Knowledge Recovery via Evolutionary-Based Evaluation Loop
di: Rybak, Patryk, et al.
Pubblicazione: (2026)

AI Agents as Universal Task Solvers
di: Achille, Alessandro, et al.
Pubblicazione: (2025)

Universal AI maximizes Variational Empowerment
di: Hayashi, Yusuke, et al.
Pubblicazione: (2025)

Monocular 3D Object Position Estimation with VLMs for Human-Robot Interaction
di: Wahl, Ari, et al.
Pubblicazione: (2026)

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning
di: Li, Chengpeng, et al.
Pubblicazione: (2024)

miniCodeProps: a Minimal Benchmark for Proving Code Properties
di: Lohn, Evan, et al.
Pubblicazione: (2024)

MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics
di: Liu, Xinyu, et al.
Pubblicazione: (2026)

BAID: A Benchmark for Bias Assessment of AI Detectors
di: Basu, Priyam, et al.
Pubblicazione: (2025)

KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction
di: Li, Zixuan, et al.
Pubblicazione: (2024)

Solving a Research Problem in Mathematical Statistics with AI Assistance
di: Dobriban, Edgar
Pubblicazione: (2025)

Democratizing AI scientists using ToolUniverse
di: Gao, Shanghua, et al.
Pubblicazione: (2025)

Memory Self-Regeneration: Uncovering Hidden Knowledge in Unlearned Models
di: Polowczyk, Agnieszka, et al.
Pubblicazione: (2025)

FreSh: Frequency Shifting for Accelerated Neural Representation Learning
di: Kania, Adam, et al.
Pubblicazione: (2024)

Exploration of the Rashomon Set Assists Trustworthy Explanations for Medical Data
di: Kobylińska, Katarzyna, et al.
Pubblicazione: (2023)

Make Interval Bound Propagation great again
di: Krukowski, Patryk, et al.
Pubblicazione: (2024)

Bounding Evidence and Estimating Log-Likelihood in VAE
di: Struski, Łukasz, et al.
Pubblicazione: (2022)