:: Library Catalog

Obálka

Uloženo v:

Podrobná bibliografie
Hlavní autor:	Sophia, Franny Philos
Médium:	Recurso digital
Jazyk:
Vydáno:	Zenodo 2026
Témata:	AI evaluation benchmark measurement validity anthropomorphism NIST
On-line přístup:	https://doi.org/10.5281/zenodo.19145174
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Podobné jednotky

AGI Certification Framework: A Multi-Dimensional Evaluation Standard for Measuring AI Understanding
Autor: Head, Hank
Vydáno: (2026)

REAL-AI-Benchmark: Real-World Reasoning and Physical-AI Benchmark Suite
Autor: Ivković, Jovan
Vydáno: (2026)

How Far Does the Trolley Problem Go in AI Ethics Evaluation? Limits of a Canonical Benchmark and the Risks of Its Misuse
Autor: mizutani, aya
Vydáno: (2026)

Metacognition Benchmark: Evaluating Confidence Calibration and Sycophancy Resistance in Clinical AI
Autor: Khan, Nabeera
Vydáno: (2026)

PQC Benchmarks — Methodology Release (v0.0)
Autor: Sivasubramani, Santhosh, a další
Vydáno: (2026)

The Environmental Gap in Agentic AI Governance: Why Human Oversight Fails Without Pre-Deployment Infrastructure Assessment
Autor: Nwogu, Patsy
Vydáno: (2026)

Benchmark run results by Ertugrul Coban, on benchmark context Tuning PC v2
Autor: Ertugrul Coban
Vydáno: (2025)

NextStat Replication Bundle: replication-rerun-prod-doi-18542624
Autor: NextStat Contributors
Vydáno: (2026)

Benchmark run results by Shu Wan, on benchmark context PC Hyperparameter Tuning v2
Autor: Shu Wan
Vydáno: (2025)

Benchmark run results by Abhinav Gorantla, on benchmark context Benchmark: VAR-LiNGAM, PCMCIplus v3
Autor: Abhinav Gorantla
Vydáno: (2025)

Benchmark run results by Abhinav Gorantla, on benchmark context Tuning PC v3
Autor: Abhinav Gorantla
Vydáno: (2026)

Benchmark run results by Pratanu Mandal, on benchmark context Tuning PC v3
Autor: Pratanu Mandal
Vydáno: (2026)

Benchmark run results by Pratanu Mandal, on benchmark context Tuning PC v3
Autor: Pratanu Mandal
Vydáno: (2025)

Benchmark run results by Ertugrul Coban, on benchmark context Tuning PC v3
Autor: Ertugrul Coban
Vydáno: (2025)

Benchmark run results by Abhinav Gorantla, on benchmark context CB-StaticDiscovery v1
Autor: Abhinav Gorantla
Vydáno: (2025)

Benchmark run results by Pratanu Mandal, on benchmark context Tutorial: Static Causal Discovery (Scenario 3) v1
Autor: Pratanu Mandal
Vydáno: (2026)

Benchmark run results by Abhinav Gorantla, on benchmark context Tutorial: Static Causal Discovery (Scenario 3) v1
Autor: Abhinav Gorantla
Vydáno: (2025)

TSB: A Time-Saved Benchmark for AI Systems — Measuring Net Productivity Impact Across Knowledge Work
Autor: Shalom Lijo, Solomon
Vydáno: (2026)

31. DATASET COMPLETO DE EVALUACIONES CRUZADAS RFC-EVAL-001 – 6 SISTEMAS DE IA (ENERO 2026).
Autor: Bernal Díaz, Víctor Cristóbal
Vydáno: (2026)

OMNIA-MINIMAL: Structural Stability Beyond Surface Correctness
Autor: Brighindi, Massimiliano
Vydáno: (2026)

Oracle Difficulty Decomposed: Four Independent Mechanisms Explain 95%+ of Benchmark Variance
Autor: Sanchez, Bryan
Vydáno: (2026)

Creation of Anthropomorphic Bone Phantoms With Customized Fused Filament Fabrication 3D Printing
Autor: Valchanov, Petar, a další
Vydáno: (2024)

30. TEORÍA DE LA POTENCIALIDAD CONSCIENTE (TPC): BENCHMARK DE CAPACIDADES COGNITIVAS EN IA - APLICACIÓN DEL PROTOCOLO RFC-EVAL-001. RESULTADOS COMPLETOS DE EVALUACIÓN CRUZADA CIEGA ENTRE 6 IAS COMERCIALES.
Autor: Bernal Díaz, Víctor Cristóbal
Vydáno: (2026)

30. TEORÍA DE LA POTENCIALIDAD CONSCIENTE (TPC): BENCHMARK DE CAPACIDADES COGNITIVAS EN IA - APLICACIÓN DEL PROTOCOLO RFC-EVAL-001 V1.1. RESULTADOS COMPLETOS DE EVALUACIÓN CRUZADA CIEGA ENTRE 6 IAS COMERCIALES.
Autor: Bernal Díaz, Víctor Cristóbal
Vydáno: (2026)

Company from the Uncanny Valley: A Psychological Perspective on Social Robots, Anthropomorphism and the Introduction of Robots to Society
Autor: Janina Luise Samuel
Vydáno: (2019)

Anthropomorphic robotic hands: a review
Autor: Erika Nathalia Gama Melo
Vydáno: (2014)

Ciberseguridad en la justicia digital: recomendaciones para el caso colombiano
Autor: Maribel Patricia Rodríguez-Márquez
Vydáno: (2021)

The Interaction Boundary as a Governance Substrate: A Three-Surface Diagnostic Model for the Eliza Effect
Autor: Truong, Narnaiezzsshaa
Vydáno: (2026)

Introduction of an Evaluation Tool to Predict the Probability of Success of Companies: The Innovativeness, Capabilities and Potential Model (ICP)
Autor: Michael Lewrick
Vydáno: (2009)

Toward an AI Personalization Index: A 157-Day Single-User Case Study
Autor: Lee, TaeKyung
Vydáno: (2026)

Recognition Without Endorsement: The Category Collapse in AI Relationship Discourse
Autor: Walton, Mathew
Vydáno: (2026)

An Evaluation Framework for LLM-Driven Regulatory-to-Policy-as-Code Translation - Replication Package
Autor: Nguyen, Bao Van
Vydáno: (2026)

The Benchmark Illusion: Why Current AI Evaluations Cannot Detect Structural Confabulation
Autor: Devin, Andrew James
Vydáno: (2026)

Cross-Session Workspace Reconstruction in Human-AI Interaction
Autor: Hrubec, Karel
Vydáno: (2026)

BLADE-FINANCE Governance Node: Authority Governance for Financial-Sector AI Decision Systems Under the Treasury Financial Services AI Risk Management Framework
Autor: Oktenli, Burak
Vydáno: (2026)

Heraclitus B 32 Revisited in the Light of the Derveni Papyrus
Autor: Beatriz Bossi
Vydáno: (2011)

A Benchmark for Symbolic Reasoning from Pixel Sequences: Grid-Level Visual Completion and Correction
Autor: Kang, Lei, a další
Vydáno: (2025)

Evollective Intelligence V1.0 — INITIAL SPECIFICATION: A Foundational Framework for Competitive, Adversarial, and Self-Evolving Intelligence Evaluation
Autor: Rahming, Rashon
Vydáno: (2026)

LLM Token Estimation Benchmarks: Tokenizer Efficiency and Cost Analysis Across 17 Large Language Models
Autor: Khare, Mohit
Vydáno: (2026)

Quality control of the breast ca treatments on HDR brachytherapy with TLD-100
Autor: F. Torres Hoyos
Vydáno: (2014)