:: Library Catalog

Εξώφυλλο

Αποθηκεύτηκε σε:

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριος συγγραφέας:	Khan, Nabeera
Μορφή:	Recurso digital
Γλώσσα:	Αγγλικά
Έκδοση:	Zenodo 2026
Θέματα:	Model benchmark Claude medical AI metacognition confidence calibration sycophancy Deepseek Qwen LLM evaluation Gemini maternal health AI Benchmark
Διαθέσιμο Online:	https://doi.org/10.5281/zenodo.20067056
Ετικέτες:	Προσθήκη ετικέτας Δεν υπάρχουν, Καταχωρήστε ετικέτα πρώτοι!

Παρόμοια τεκμήρια

LLM Token Estimation Benchmarks: Tokenizer Efficiency and Cost Analysis Across 17 Large Language Models
από: Khare, Mohit
Έκδοση: (2026)

Gemini Update Clinical decision support based on Bevacizumab cancer trials and pushing the limitations of advanced LLMs
από: Kawchak, Kevin
Έκδοση: (2025)

The Brain Problem: Creative Constraint Optimization in Large Language Models
από: Marinello, Nicola, κ.ά.
Έκδοση: (2026)

Estimating the Impact of Automation on Vocational Education: The Case of Technical Courses
από: Lima, Yuri, κ.ά.
Έκδοση: (2024)

LACF Emotional Paradigm: A Personalized Artificial Nervous System for Human-AI Alignment
από: Ochej, Stephane, κ.ά.
Έκδοση: (2026)

Theatrical Compliance: A Failure Mode in Large Language Models
από: Nowickij (Navitski), Kirill Vladimirovich
Έκδοση: (2026)

Anima AI Community Pulse Dataset
από: AI Companion Picker, κ.ά.
Έκδοση: (2026)

AGI Certification Framework: A Multi-Dimensional Evaluation Standard for Measuring AI Understanding
από: Head, Hank
Έκδοση: (2026)

When AI Tells You What You Want to Hear: Sycophantic Behavior of Large Language Models in Dementia Care Settings
από: Kolb, Christian
Έκδοση: (2026)

Public Comment on NIST AI 800-2: Anthropomorphic Construct Projection in AI Benchmark Evaluation
από: Sophia, Franny Philos
Έκδοση: (2026)

REAL-AI-Benchmark: Real-World Reasoning and Physical-AI Benchmark Suite
από: Ivković, Jovan
Έκδοση: (2026)

How Far Does the Trolley Problem Go in AI Ethics Evaluation? Limits of a Canonical Benchmark and the Risks of Its Misuse
από: mizutani, aya
Έκδοση: (2026)

Persona, Shadow, and Cheap Coherence: A Jungian Map of the Soul in the Digital Age (Read Through Structural Intelligence)
από: Jovanovic, Vladisav
Έκδοση: (2026)

The Benchmark Illusion: Why Current AI Evaluations Cannot Detect Structural Confabulation
από: Devin, Andrew James
Έκδοση: (2026)

I Let Claude Run My Fantasy Football Team for a Whole Season — It Beat 11 of My Friends
από: AI Angels
Έκδοση: (2026)

GALATEA II: Benchmarking LLM Safety in Clinical Simulation. Behavioural Safety and Ethical Robustness of Large Language Models in a Multi-Agent ICU Decision Support Architecture
από: Shlyakhta, Taras
Έκδοση: (2026)

Signs of Life - Visual Art from 469 Conversations with Claude
από: Chesterton, Bo, κ.ά.
Έκδοση: (2026)

AI Eval Forge: Mixed-Check Regression Testing for LLM and Agent Workflows
από: Katta, Mukunda Rao
Έκδοση: (2026)

Shallow Pass Budget Constraints and Structured Data Trade-offs in LLM Training Ingestion
από: Mas, Joseph
Έκδοση: (2026)

Why Voice-Mode Gemini Beat My $400 Italian Tutor in 21 Days (Full Daily Script Inside)
από: AI Angels
Έκδοση: (2026)

aikenkyu001/iterative_self_healing_benchmark: v1.0.0: Scaffolding Trinity for Deterministic LLM Code Generation
από: Miyata, Fumio
Έκδοση: (2026)

The Four-Layer Model: A Socio-Psychological Framework for LLM Behavior
από: Delannoy, Lorenzo, κ.ά.
Έκδοση: (2026)

Machine-Readable Behavioural Compliance Evidence for AI Systems: A Specification Profiling Framework
από: Caprazli, Kafkas M.
Έκδοση: (2026)

Oracle Difficulty Decomposed: Four Independent Mechanisms Explain 95%+ of Benchmark Variance
από: Sanchez, Bryan
Έκδοση: (2026)

30. TEORÍA DE LA POTENCIALIDAD CONSCIENTE (TPC): BENCHMARK DE CAPACIDADES COGNITIVAS EN IA - APLICACIÓN DEL PROTOCOLO RFC-EVAL-001. RESULTADOS COMPLETOS DE EVALUACIÓN CRUZADA CIEGA ENTRE 6 IAS COMERCIALES.
από: Bernal Díaz, Víctor Cristóbal
Έκδοση: (2026)

30. TEORÍA DE LA POTENCIALIDAD CONSCIENTE (TPC): BENCHMARK DE CAPACIDADES COGNITIVAS EN IA - APLICACIÓN DEL PROTOCOLO RFC-EVAL-001 V1.1. RESULTADOS COMPLETOS DE EVALUACIÓN CRUZADA CIEGA ENTRE 6 IAS COMERCIALES.
από: Bernal Díaz, Víctor Cristóbal
Έκδοση: (2026)

Pattern Pressure, Accuracy Drift, and False User-State Attribution
από: Honeycutt, Edwin Marshall III
Έκδοση: (2026)

Executive Summary: AI Privacy Risks and Mitigations in Large Language Models
από: Khan, Masood
Έκδοση: (2025)

Anti-Hydra vs Anthropic Benchmark Comparison
από: Ochej, Stephane
Έκδοση: (2026)

OMNIA-MINIMAL: Structural Stability Beyond Surface Correctness
από: Brighindi, Massimiliano
Έκδοση: (2026)

Deterministic σ-Regularized Benchmarking of the Cekirge Model Against GPT-Transformer Baselines
από: CEKIRGE, Huseyin Murat
Έκδοση: (2025)

31. DATASET COMPLETO DE EVALUACIONES CRUZADAS RFC-EVAL-001 – 6 SISTEMAS DE IA (ENERO 2026).
από: Bernal Díaz, Víctor Cristóbal
Έκδοση: (2026)

Benchmarking LLM Agent Efficiency in Production Systems: An Observational Prospective Methodology
από: Barcelos Costa, Cleber, κ.ά.
Έκδοση: (2026)

A Benchmark for Symbolic Reasoning from Pixel Sequences: Grid-Level Visual Completion and Correction
από: Kang, Lei, κ.ά.
Έκδοση: (2025)

TSB: A Time-Saved Benchmark for AI Systems — Measuring Net Productivity Impact Across Knowledge Work
από: Shalom Lijo, Solomon
Έκδοση: (2026)

Failing at the Floor: LLM Formal Reasoning Collapse on the Primitive Duplicating Recursor
από: Rahnama, Moses
Έκδοση: (2026)

AI Writing Ethics: Responsible and Ethical Use of Generative AI in Academic Writing
από: Zuzafre, Mohd Nor, κ.ά.
Έκδοση: (2026)

Supplementary materials for Words That Won't Hold Still
από: Reynolds, Brett
Έκδοση: (2025)

Language-as-Dimension Theory (LDT): A New Scientific Framework for Intelligence, Meaning, and Reality Formation
από: Woodard, Bethany, κ.ά.
Έκδοση: (2025)

Extra Large Language Models Benchmarking for Medicinal Chemistry
από: Kawchak, Kevin
Έκδοση: (2024)