Enregistré dans:
Détails bibliographiques
Auteurs principaux: Anđelić, Marija, Šipek, Dominik, Majer, Laura, Šnajder, Jan
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2507.14314
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
Table des matières:
  • Online news outlets operate predominantly on an advertising-based revenue model, compelling journalists to create headlines that are often scandalous, intriguing, and provocative -- commonly referred to as clickbait. Automatic detection of clickbait headlines is essential for preserving information quality and reader trust in digital media and requires both contextual understanding and world knowledge. For this task, particularly in less-resourced languages, it remains unclear whether fine-tuned methods or in-context learning (ICL) yield better results. In this paper, we compile CLIC, a novel dataset for clickbait detection of Croatian news headlines spanning a 20-year period and encompassing mainstream and fringe outlets. We fine-tune the BERTić model on this task and compare its performance to LLM-based ICL methods with prompts both in Croatian and English. Finally, we analyze the linguistic properties of clickbait. We find that nearly half of the analyzed headlines contain clickbait, and that finetuned models deliver better results than general LLMs.