Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Karaca, Kemal Sami, Eravcı, Bahaeddin
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.21907
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Understanding the qualitative intent of citations is essential for a comprehensive assessment of academic research, a task that poses unique challenges for agglutinative languages like Turkish. This paper introduces a systematic methodology and a foundational dataset to address this problem. We first present a new, publicly available dataset of Turkish citation intents, created with a purpose-built annotation tool. We then evaluate the performance of standard In-Context Learning (ICL) with Large Language Models (LLMs), demonstrating that its effectiveness is limited by inconsistent results caused by manually designed prompts. To address this core limitation, we introduce a programmable classification pipeline built on the DSPy framework, which automates prompt optimization systematically. For final classification, we employ a stacked generalization ensemble to aggregate outputs from multiple optimized models, ensuring stable and reliable predictions. This ensemble, with an XGBoost meta-model, achieves a state-of-the-art accuracy of 91.3\%. Ultimately, this study provides the Turkish NLP community and the broader academic circles with a foundational dataset and a robust classification framework paving the way for future qualitative citation studies.

Similar Items