Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.02834 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911668568588288 |
|---|---|
| author | Petersen, Jonas Mazzoleni, Camilla Lombardi, Gian-Alessandro Martelli, Federico Maggioni, Riccardo |
| author_facet | Petersen, Jonas Mazzoleni, Camilla Lombardi, Gian-Alessandro Martelli, Federico Maggioni, Riccardo |
| contents | What structural inductive bias helps transformers reason over knowledge graphs? Through controlled ablations of a minimal transformer modification with four independently removable components (sparse adjacency masking, edge-type biases, query scaling, value gating), we isolate which structural signals drive multi-hop reasoning. Our finding is sharp: sparse adjacency masking alone accounts for the dominant share of improvement over unmasked transformers (+72.5pp on 3-hop MetaQA, +45.5pp on WebQSP, +53.9pp on CWQ), while learned relation parameters add only modest refinement and can actively hurt without structural guidance. A zero-shot experiment provides architecturally independent corroboration: masking-based attention degrades 4.0x less than relation-specific weights when edge types are held out. The useful inductive bias for multi-hop KGQA is predominantly topological, not relational. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2602_02834 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | What Structural Inductive Bias Helps Transformers Reason Over Knowledge Graphs? A Study with Tabula RASA Petersen, Jonas Mazzoleni, Camilla Lombardi, Gian-Alessandro Martelli, Federico Maggioni, Riccardo Machine Learning Artificial Intelligence What structural inductive bias helps transformers reason over knowledge graphs? Through controlled ablations of a minimal transformer modification with four independently removable components (sparse adjacency masking, edge-type biases, query scaling, value gating), we isolate which structural signals drive multi-hop reasoning. Our finding is sharp: sparse adjacency masking alone accounts for the dominant share of improvement over unmasked transformers (+72.5pp on 3-hop MetaQA, +45.5pp on WebQSP, +53.9pp on CWQ), while learned relation parameters add only modest refinement and can actively hurt without structural guidance. A zero-shot experiment provides architecturally independent corroboration: masking-based attention degrades 4.0x less than relation-specific weights when edge types are held out. The useful inductive bias for multi-hop KGQA is predominantly topological, not relational. |
| title | What Structural Inductive Bias Helps Transformers Reason Over Knowledge Graphs? A Study with Tabula RASA |
| topic | Machine Learning Artificial Intelligence |
| url | https://arxiv.org/abs/2602.02834 |