Saved in:
Bibliographic Details
Main Authors: Petersen, Jonas, Mazzoleni, Camilla, Lombardi, Gian-Alessandro, Martelli, Federico, Maggioni, Riccardo
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.02834
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911668568588288
author Petersen, Jonas
Mazzoleni, Camilla
Lombardi, Gian-Alessandro
Martelli, Federico
Maggioni, Riccardo
author_facet Petersen, Jonas
Mazzoleni, Camilla
Lombardi, Gian-Alessandro
Martelli, Federico
Maggioni, Riccardo
contents What structural inductive bias helps transformers reason over knowledge graphs? Through controlled ablations of a minimal transformer modification with four independently removable components (sparse adjacency masking, edge-type biases, query scaling, value gating), we isolate which structural signals drive multi-hop reasoning. Our finding is sharp: sparse adjacency masking alone accounts for the dominant share of improvement over unmasked transformers (+72.5pp on 3-hop MetaQA, +45.5pp on WebQSP, +53.9pp on CWQ), while learned relation parameters add only modest refinement and can actively hurt without structural guidance. A zero-shot experiment provides architecturally independent corroboration: masking-based attention degrades 4.0x less than relation-specific weights when edge types are held out. The useful inductive bias for multi-hop KGQA is predominantly topological, not relational.
format Preprint
id arxiv_https___arxiv_org_abs_2602_02834
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle What Structural Inductive Bias Helps Transformers Reason Over Knowledge Graphs? A Study with Tabula RASA
Petersen, Jonas
Mazzoleni, Camilla
Lombardi, Gian-Alessandro
Martelli, Federico
Maggioni, Riccardo
Machine Learning
Artificial Intelligence
What structural inductive bias helps transformers reason over knowledge graphs? Through controlled ablations of a minimal transformer modification with four independently removable components (sparse adjacency masking, edge-type biases, query scaling, value gating), we isolate which structural signals drive multi-hop reasoning. Our finding is sharp: sparse adjacency masking alone accounts for the dominant share of improvement over unmasked transformers (+72.5pp on 3-hop MetaQA, +45.5pp on WebQSP, +53.9pp on CWQ), while learned relation parameters add only modest refinement and can actively hurt without structural guidance. A zero-shot experiment provides architecturally independent corroboration: masking-based attention degrades 4.0x less than relation-specific weights when edge types are held out. The useful inductive bias for multi-hop KGQA is predominantly topological, not relational.
title What Structural Inductive Bias Helps Transformers Reason Over Knowledge Graphs? A Study with Tabula RASA
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2602.02834