Saved in:
Bibliographic Details
Main Authors: Valeros, Veronica, Širokova, Anna, Catania, Carlos, Garcia, Sebastian
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.01940
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Understanding cybercrime communications is paramount for cybersecurity defence. This often involves translating communications into English for processing, interpreting, and generating timely intelligence. The problem is that translation is hard. Human translation is slow, expensive, and scarce. Machine translation is inaccurate and biased. We propose using fine-tuned Large Language Models (LLM) to generate translations that can accurately capture the nuances of cybercrime language. We apply our technique to public chats from the NoName057(16) Russian-speaking hacktivist group. Our results show that our fine-tuned LLM model is better, faster, more accurate, and able to capture nuances of the language. Our method shows it is possible to achieve high-fidelity translations and significantly reduce costs by a factor ranging from 430 to 23,000 compared to a human translator.