Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Srivastava, Prerak, Corallo, Giulio, Rybalko, Sergey
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2506.01147
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915317871017984
author	Srivastava, Prerak Corallo, Giulio Rybalko, Sergey
author_facet	Srivastava, Prerak Corallo, Giulio Rybalko, Sergey
contents	System-generated logs are typically converted into categorical log templates through parsing. These templates are crucial for generating actionable insights in various downstream tasks. However, existing parsers often fail to capture fine-grained template details, leading to suboptimal accuracy and reduced utility in downstream tasks requiring precise pattern identification. We propose a character-level log parser utilizing a novel neural architecture that aggregates character embeddings. Our approach estimates a sequence of binary-coded decimals to achieve highly granular log templates extraction. Our low-resource character-level parser, tested on revised Loghub-2k and a manually annotated industrial dataset, matches LLM-based parsers in accuracy while outperforming semantic parsers in efficiency.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_01147
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	A Word is Worth 4-bit: Efficient Log Parsing with Binary Coded Decimal Recognition Srivastava, Prerak Corallo, Giulio Rybalko, Sergey Computation and Language Machine Learning System-generated logs are typically converted into categorical log templates through parsing. These templates are crucial for generating actionable insights in various downstream tasks. However, existing parsers often fail to capture fine-grained template details, leading to suboptimal accuracy and reduced utility in downstream tasks requiring precise pattern identification. We propose a character-level log parser utilizing a novel neural architecture that aggregates character embeddings. Our approach estimates a sequence of binary-coded decimals to achieve highly granular log templates extraction. Our low-resource character-level parser, tested on revised Loghub-2k and a manually annotated industrial dataset, matches LLM-based parsers in accuracy while outperforming semantic parsers in efficiency.
title	A Word is Worth 4-bit: Efficient Log Parsing with Binary Coded Decimal Recognition
topic	Computation and Language Machine Learning
url	https://arxiv.org/abs/2506.01147

Similar Items