Saved in:
Bibliographic Details
Main Authors: Roy, Amartya, N, Devharish, Ganguly, Shreya, Ghosh, Kripabandhu
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.23992
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914062277804032
author Roy, Amartya
N, Devharish
Ganguly, Shreya
Ghosh, Kripabandhu
author_facet Roy, Amartya
N, Devharish
Ganguly, Shreya
Ghosh, Kripabandhu
contents Modern causal discovery methods face critical limitations in scalability, computational efficiency, and adaptability to mixed data types, as evidenced by benchmarks on node scalability (30, $\le 50$, $\ge 70$ nodes), computational energy demands, and continuous/non-continuous data handling. While traditional algorithms like PC, GES, and ICA-LiNGAM struggle with these challenges, exhibiting prohibitive energy costs for higher-order nodes and poor scalability beyond 70 nodes, we propose \textbf{GUIDE}, a framework that integrates Large Language Model (LLM)-generated adjacency matrices with observational data through a dual-encoder architecture. GUIDE uniquely optimizes computational efficiency, reducing runtime on average by $\approx 42%$ compared to RL-BIC and KCRL methods, while achieving an average $\approx 117%$ improvement in accuracy over both NOTEARS and GraN-DAG individually. During training, GUIDE's reinforcement learning agent dynamically balances reward maximization (accuracy) and penalty avoidance (DAG constraints), enabling robust performance across mixed data types and scalability to $\ge 70$ nodes -- a setting where baseline methods fail.
format Preprint
id arxiv_https___arxiv_org_abs_2509_23992
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Guide: Generalized-Prior and Data Encoders for DAG Estimation
Roy, Amartya
N, Devharish
Ganguly, Shreya
Ghosh, Kripabandhu
Machine Learning
Artificial Intelligence
Modern causal discovery methods face critical limitations in scalability, computational efficiency, and adaptability to mixed data types, as evidenced by benchmarks on node scalability (30, $\le 50$, $\ge 70$ nodes), computational energy demands, and continuous/non-continuous data handling. While traditional algorithms like PC, GES, and ICA-LiNGAM struggle with these challenges, exhibiting prohibitive energy costs for higher-order nodes and poor scalability beyond 70 nodes, we propose \textbf{GUIDE}, a framework that integrates Large Language Model (LLM)-generated adjacency matrices with observational data through a dual-encoder architecture. GUIDE uniquely optimizes computational efficiency, reducing runtime on average by $\approx 42%$ compared to RL-BIC and KCRL methods, while achieving an average $\approx 117%$ improvement in accuracy over both NOTEARS and GraN-DAG individually. During training, GUIDE's reinforcement learning agent dynamically balances reward maximization (accuracy) and penalty avoidance (DAG constraints), enabling robust performance across mixed data types and scalability to $\ge 70$ nodes -- a setting where baseline methods fail.
title Guide: Generalized-Prior and Data Encoders for DAG Estimation
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2509.23992