Causal discovery based on heterogeneous non-Euclidean data

WANG Xiaokang, LI Shuaige, WANG Yihui, WAN Yan

Systems Engineering - Theory & Practice ›› 2024, Vol. 44 ›› Issue (6) : 1987-2002.

PDF(1475 KB)
PDF(1475 KB)
Systems Engineering - Theory & Practice ›› 2024, Vol. 44 ›› Issue (6) : 1987-2002. DOI: 10.12011/SETP2023-1202

Causal discovery based on heterogeneous non-Euclidean data

  • WANG Xiaokang1, LI Shuaige1, WANG Yihui2, WAN Yan1
Author information +
History +

Abstract

Causal relationships play an irreplaceable role in revealing the mechanisms of phenomena and guiding intervention actions. However, due to limitations in existing frameworks regarding model representations and learning algorithms, only a few studies have explored causal discovery on non-Euclidean data. In this paper, we address the issue by proposing a causal mapping process based on coordinate representations for heterogeneous non-Euclidean data. We propose a data generation mechanism between the parent nodes and the child nodes and create a causal mechanism based on multi-dimensional tensor regression. Furthermore, within the aforementioned theoretical framework, we propose a two-stage causal discovery approach based on regularized generalized canonical correlation analysis. Using the discrete representation in the shared projection direction, causal relationships between heterogeneous non-Euclidean variables can be discovered more accurately. Finally, empirical research is conducted on real-world industrial sensor data, which demonstrates the effectiveness of the proposed method for discovering causal relationships in heterogeneous non-Euclidean data.

Key words

causal discovery / functional data / compositional data / canonical correlation analysis / industrial fault diagnosis

Cite this article

Download Citations
WANG Xiaokang , LI Shuaige , WANG Yihui , WAN Yan. Causal discovery based on heterogeneous non-Euclidean data. Systems Engineering - Theory & Practice, 2024, 44(6): 1987-2002 https://doi.org/10.12011/SETP2023-1202

References

[1] 李家宁, 熊睿彬, 兰艳艳, 等.因果机器学习的前沿进展综述[J].计算机研究与发展, 2023, 60(1): 59-84.Li J N, Xiong R B, Lan Y Y, et al. A review of frontier advances in causal machine learning[J]. Computer Research and Development, 2023, 60(1): 59-84.
[2] 蔡瑞初, 陈薇, 张坤, 等.基于非时序观察数据的因果关系发现综述[J].计算机学报, 2017, 40(6): 1470-1490.Cai R C, Chen W, Zhang K, et al. Causal relationship discovery based on non-temporal observational data: A SURvey[J]. Journal of Computer Research and Development, 2017, 40(6): 1470-1490.
[3] Wang H, Lu S, Zhao J. Aggregating multiple types of complex data in stock market prediction: A model-independent framework[J]. Knowledge-based Systems, 2019, 164: 193-204.
[4] Yang J, Xie K, An N. Causal discovery on non-Euclidean data[C]// Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Washington DC, USA. 2022: 2202-2211.
[5] Bruno F, Cocchi D, Greco F. Clustering compositional data trajectories: The case of particulate matter in the lower troposphere[J]. Environmetrics, 2015, 22(8): 975-984.
[6] 陶志富, 谭文发, 陈华友.一种融合模糊时间序列分析的成分数据时间序列预测方法[J].系统工程理论与实践, 2023, 43(5): 1534-1544.Tao Z F, Tan W F, Chen H Y. A component data time series prediction method integrating fuzzy time series analysis[J]. Systems Engineering—Theory & Practice, 2023, 43(5): 1534-1544.
[7] 涂云东, 汪思韦.函数型核加权估计法及其在经济学中的应用[J].系统工程理论与实践, 2019, 39(4): 839-853.Tu Y D, Wang S W. Functional kernel weighted estimation method and its application in economics[J]. Systems Engineering—Theory & Practice, 2019, 39(4): 839-853.
[8] Bareinboim E, Correa J D, Ibeling D, et al. On Pearl's hierarchy and the foundations of causal inference[C]// Probabilistic and Causal Inference: The Works of Judea Pearl, 2020: 507-556.
[9] Zablocki é, Ben-Younes H, Pérez P, et al. Explainability of deep vision-based autonomous driving systems: Review and challenges[J]. International Journal of Computer Vision, 2022, 130(10): 2425-2452.
[10] Wang Y, Liang D, Charlin L, et al. Causal inference for recommender systems[C]// Proceedings of the 14th ACM Conference on Recommender Systems, 2020: 426-431.
[11] Zeng Y, Shimizu S, Matsui H, et al. Causal discovery for linear mixed data[C]// Conference on Causal Learning and Reasoning, Eureka, CA, USA, 2022: 994-1009.
[12] Raghu V K, Poon A, Benos P V, et al. Evaluation of causal structure learning methods on mixed data types[C]// Proceedings of 2018 ACM SIGKDD Workshop on Causal Discovery. PMLR, London, United Kingdom, 2018, 92: 48-65.
[13] Marx A, Vreeken J. Causal inference on multivariate and mixed-type data[M]// Machine Learning and Knowledge Discovery in Databases. Cham: Springer International Publishing, 2019: 655-671.
[14] Handhayani T, Cussens J. Kernel-based approach to handle mixed data for inferring causal graphs[J]. arXiv preprint arXiv: 1910.03055. 2019.
[15] Liu X, Xu Z, Guo P. Causal inference for mixed-type data in additive noise models[C]// Proceedings Neural Information Processing: 27th International Conference, ICONIP 2020, Bangkok, Thailand, 2020.
[16] Wei W, Feng L. Nonlinear causal structure learning for mixed data[C]// Proceedings of the 2021 IEEE International Conference on Data Mining (ICDM), 2021: 709-718.
[17] 贺志芳, 董天琪.中美股市投资者风险偏好的联动性研究——基于风险-收益关系视角[J].系统工程理论与实践, 2023, 43(9): 2556-2569.He Z F, Dong T Q. A study on the linkage of risk preferences of investors in Chinese and American stock markets: Based on the risk return relationship perspective[J]. Systems Engineering—Theory & Practice, 2023, 43(9): 2556-2569
[18] 刘汉中.基于HAC估计视角的格兰杰伪因果关系检验[J].系统工程理论与实践, 2013, 33(8): 2007-2014.Liu H Z. Granger pseudocausality test based on HAC estimation perspective[J]. Systems Engineering—Theory & Practice, 2013, 33(8): 2007-2014.
[19] Arnold K F, Berrie L, Tennant P W, et al. A causal inference perspective on the analysis of compositional data[J]. International Journal of Epidemiology, 2020, 49(4): 1307-1313.
[20] Kumakura D, Yamaguchi R, Nakaoka S. Extended applicability of causal inference to compositional data by reciprocal logarithmic ratio transformation[J]. BioRxiv, 2021-01.
[21] Ailer E, Müller C L, Kilbertus N. A causal view on compositional data[J]. arXiv preprint arXiv: 2106.11234. 2021.
[22] 李文钊.因果推理中的潜在结果模型:起源、 逻辑与意蕴[J].公共行政评论, 2018, 11(1): 124-149.Li W Z. The potential result model in causal reasoning: Origin, logic, and meaning[J]. Public Administration Review, 2018, 11(1): 124-149.
[23] Tenenhaus M, Tenenhaus A, Groenen P J. Regularized generalized canonical correlation analysis: A framework for sequential multiblock component methods[J]. Psychometrika, 2017, 82: 737-777.
[24] Wang X, Wang H, Wang S, et al. Convex clustering method for compositional data via sparse group lasso[J]. Neurocomputing, 2021, 425: 23-36.
[25] Colombo D, Maathuis MH. Order-independent constraint-based causal structure learning[J]. Journal of Machine Learning Research, 2014, 15: 1-40.
[26] Qiao X, Guo S, James G M. Functional graphical models[J]. Journal of the American Statistical Association, 2019, 114(525): 211-222.
[27] Ha M J, Sun W, Xie J. PenPC: A two-step approach to estimate the skeletons of high-dimensional directed acyclic graphs[J]. Biometrics, 2016, 72(1): 146-155.
[28] 李登峰, 林萍萍.基于D-S证据融合和直觉模糊贝叶斯网络双向推理的景区游客拥挤踩踏故障诊断分析[J].系统工程理论与实践, 2022, 42(7): 1979-1992.Li D F, Lin P P. Diagnosis and analysis of tourist crowding and stampede faults in scenic areas based on D-S evidence fusion and intuitionistic fuzzy Bayesian network bidirectional inference[J]. Systems Engineering—Theory & Practice, 2022, 42(7): 1979-1992.
[29] 孙炜, 刘恒, 陶建峰, 等.基于 IndRNN-1DLCNN 的负载口独立控制阀控缸系统故障诊断[J]. 浙江大学学报: 工学版, 2023, 57(10): 2028-2041. Sun W, Liu H, Tao J F, et al. IndRNN-1DLCNN based fault diagnosis of independent metering valve-controlled hydraulic cylinder system[J]. Journal of Zhejiang University (Engineering Science), 2023, 57(10): 2028-2041.
[30] 张新生, 王哲.基于EMICA-KRR的长输管道压力监测与泄漏定位方法[J].系统工程理论与实践, 2019, 39(7): 1885-1895.Zhang X S, Wang Z. Long distance pipeline pressure monitoring and leakage location method based on EMICA-KRR[J]. Systems Engineering—Theory & Practice, 2019, 39(7): 1885-1895.
[31] Helwig N, Pignanelli E, Schütze A. Condition monitoring of a complex hydraulic system using multivariate statistics[C]// Proceedings of the 2015 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), 2015: 210-215.

Funding

Youth Project of Humanities and Social Sciences Foundation of Ministry of Education of China (23YJCZH223); National Natural Science Foundation of China (72374031); Fundamental Research Funds for the Central Universities (2023RC11); Research Innovation Fund for College Students of Beijing University of Posts and Telecommunications (202308001)
PDF(1475 KB)

319

Accesses

0

Citation

Detail

Sections
Recommended

/