Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ognibene, Dimitri, Donabauer, Gregor, Theophilou, Emily, Koyuturk, Cansu, Yavari, Mona, Bursic, Sathya, Telari, Alessia, Testa, Alessia, Boiano, Raffaele, Taibi, Davide, Hernandez-Leo, Davinia, Kruschwitz, Udo, Ruskov, Martin
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computers and Society Computation and Language
Online-Zugang:	https://arxiv.org/abs/2503.02532
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866915182282801152
author	Ognibene, Dimitri Donabauer, Gregor Theophilou, Emily Koyuturk, Cansu Yavari, Mona Bursic, Sathya Telari, Alessia Testa, Alessia Boiano, Raffaele Taibi, Davide Hernandez-Leo, Davinia Kruschwitz, Udo Ruskov, Martin
author_facet	Ognibene, Dimitri Donabauer, Gregor Theophilou, Emily Koyuturk, Cansu Yavari, Mona Bursic, Sathya Telari, Alessia Testa, Alessia Boiano, Raffaele Taibi, Davide Hernandez-Leo, Davinia Kruschwitz, Udo Ruskov, Martin
contents	The use of large language model (LLM)-powered chatbots, such as ChatGPT, has become popular across various domains, supporting a range of tasks and processes. However, due to the intrinsic complexity of LLMs, effective prompting is more challenging than it may seem. This highlights the need for innovative educational and support strategies that are both widely accessible and seamlessly integrated into task workflows. Yet, LLM prompting is highly task- and domain-dependent, limiting the effectiveness of generic approaches. In this study, we explore whether LLM-based methods can facilitate learning assessments by using ad-hoc guidelines and a minimal number of annotated prompt samples. Our framework transforms these guidelines into features that can be identified within learners' prompts. Using these feature descriptions and annotated examples, we create few-shot learning detectors. We then evaluate different configurations of these detectors, testing three state-of-the-art LLMs and ensembles. We run experiments with cross-validation on a sample of original prompts, as well as tests on prompts collected from task-naive learners. Our results show how LLMs perform on feature detection. Notably, GPT- 4 demonstrates strong performance on most features, while closely related models, such as GPT-3 and GPT-3.5 Turbo (Instruct), show inconsistent behaviors in feature classification. These differences highlight the need for further research into how design choices impact feature selection and prompt detection. Our findings contribute to the fields of generative AI literacy and computer-supported learning assessment, offering valuable insights for both researchers and practitioners.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_02532
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Use Me Wisely: AI-Driven Assessment for LLM Prompting Skills Development Ognibene, Dimitri Donabauer, Gregor Theophilou, Emily Koyuturk, Cansu Yavari, Mona Bursic, Sathya Telari, Alessia Testa, Alessia Boiano, Raffaele Taibi, Davide Hernandez-Leo, Davinia Kruschwitz, Udo Ruskov, Martin Computers and Society Computation and Language The use of large language model (LLM)-powered chatbots, such as ChatGPT, has become popular across various domains, supporting a range of tasks and processes. However, due to the intrinsic complexity of LLMs, effective prompting is more challenging than it may seem. This highlights the need for innovative educational and support strategies that are both widely accessible and seamlessly integrated into task workflows. Yet, LLM prompting is highly task- and domain-dependent, limiting the effectiveness of generic approaches. In this study, we explore whether LLM-based methods can facilitate learning assessments by using ad-hoc guidelines and a minimal number of annotated prompt samples. Our framework transforms these guidelines into features that can be identified within learners' prompts. Using these feature descriptions and annotated examples, we create few-shot learning detectors. We then evaluate different configurations of these detectors, testing three state-of-the-art LLMs and ensembles. We run experiments with cross-validation on a sample of original prompts, as well as tests on prompts collected from task-naive learners. Our results show how LLMs perform on feature detection. Notably, GPT- 4 demonstrates strong performance on most features, while closely related models, such as GPT-3 and GPT-3.5 Turbo (Instruct), show inconsistent behaviors in feature classification. These differences highlight the need for further research into how design choices impact feature selection and prompt detection. Our findings contribute to the fields of generative AI literacy and computer-supported learning assessment, offering valuable insights for both researchers and practitioners.
title	Use Me Wisely: AI-Driven Assessment for LLM Prompting Skills Development
topic	Computers and Society Computation and Language
url	https://arxiv.org/abs/2503.02532

Ähnliche Einträge