محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: COYAUD, Denis
التنسيق: Recurso digital
اللغة:الإنجليزية
منشور في: Zenodo 2026
الموضوعات:
الوصول للمادة أونلاين:https://doi.org/10.5281/zenodo.19358657
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
جدول المحتويات:
  • <p>This paper evaluates whether scaling local model size from 8B to 14B crosses the instruction-following threshold required to produce meaningful biomimetic prompt effects. Using the same enriched protocol as v2 (v2.4.1), three matched model families are tested: Qwen3, Ministral 3, and DeepSeek-R1, each in 8B and 14B variants. 35 configurations, 200 questions, and the same four ground truth dimensions as v2 are applied. Main result: 14B models are systematically inferior to their 8B counterparts on the QUAL composite (8B top QUAL = 0.84 vs 14B top QUAL = 0.77 for poisson_pierre_solo). The instruction-following threshold is not crossed at 14B. 14B models are 3× more concise (vanilla: 86 vs 247 tokens) but this compression does not translate into quality gains — GT Hallucination drops from 0.873 (8B) to 0.667 (14B) on poisson_pierre_solo. The bidirectional signal remains robust in both size conditions (Cohen's d defects: 1.47 for 8B, 1.14 for 14B). For local RAG deployment, the 8B remains the rational choice in the tested range: lower hardware cost, lower latency, superior task performance. 5 key numbers: 24,000 tests (8B) + 122,800 tests (14B) — 3 model families — 2 size conditions — QUAL delta = −0.07 to −0.14 (14B vs 8B) — threshold not crossed at 14B.</p>