Research on k-Fermat algorithm for optimal partitioning clustering of high-dimensional data

LI Tong, ZHAI Yongnan, HUA Yingfan

Systems Engineering - Theory & Practice ›› 2024, Vol. 44 ›› Issue (2) : 752-772.

PDF(2560 KB)
PDF(2560 KB)
Systems Engineering - Theory & Practice ›› 2024, Vol. 44 ›› Issue (2) : 752-772. DOI: 10.12011/SETP2023-0173

Research on k-Fermat algorithm for optimal partitioning clustering of high-dimensional data

  • LI Tong1,2, ZHAI Yongnan1,2, HUA Yingfan2
Author information +
History +

Abstract

In this paper, the optimal rally point of two-dimensional data (generalized Fermat point) in the number theory is extended to a high-dimensional space, and a partitioning clustering algorithm called the k-Fermat clustering algorithm is proposed, with a high-dimensional Fermat point as the optimal clustering center point. Firstly, the optimality and uniqueness of Fermat points as cluster centers are analyzed theoretically, secondly, an algorithmic system for solving Fermat point of high-dimensional data and guiding the clustering process by the plant growth simulated algorithm (PGSA) is established, and finally, the complexity of the algorithm is proved rigorously. In order to verify the performance of the algorithm, the k-Fermat algorithm is compared with dozens of mainstream clustering algorithms published in recent years in international important journals and top conferences using the classical data sets published internationally, and the accuracy and stability of the algorithm in this paper are verified. This paper improves the theoretical deficiency that there is no optimal clustering center in current partitioning clustering algorithms, and explores a new approach for unsupervised learning.

Key words

partitioning clustering / optimal clustering center / generalized Fermat points / plant growth simulated algorithm (PGSA) / k-Fermat algorithm

Cite this article

Download Citations
LI Tong , ZHAI Yongnan , HUA Yingfan. Research on k-Fermat algorithm for optimal partitioning clustering of high-dimensional data. Systems Engineering - Theory & Practice, 2024, 44(2): 752-772 https://doi.org/10.12011/SETP2023-0173

References

[1] Jain A K, Dubes R C. Algorithms for clustering data[M]. Englewood Cliffs, NJ: Prentice-Hall, Inc, 1988.
[2] MacQueen J. Some methods for classification and analysis of multivariate observations[C]// Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967, 1(14): 281-297.
[3] Kaufman L, Rousseeuw P J. Finding groups in data: An introduction to cluster analysis[M]. Hoboken N J, USA: John Wiley & Sons, Inc. 1990.
[4] Juan A, Vidal E. Fast median search in metric spaces[C]// Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). Springer, Berlin, Heidelberg, 1998: 905-912.
[5] Durocher S, Kirkpatrick D. The Steiner centre of a set of points: Stability, eccentricity, and applications to mobile facility location[J]. International Journal of Computational Geometry & Applications, 2006, 16(4): 345-371.
[6] Ward, Joe H. Hierarchical grouping to optimize an objective function[J]. Publications of the American Statistical Association, 1963, 58(301): 236-244.
[7] Piccialli V, Sudoso A M, Wiegele A. SOS-SDP: An exact solver for minimum sum-of-squares clustering[J]. INFORMS Journal on Computing, 2022. doi: 10.1287/IJOC.2022.1166.
[8] Lee C Y, Antonsson E K. Dynamic partitional clustering using evolution strategies[C]// 2000 26th Annual Conference of the IEEE Industrial Electronics Society. IECON 2000. 2000 IEEE International Conference on Industrial Electronics, Control and Instrumentation. 21st Century Technologies. IEEE, 2000, 4: 2716-2721.
[9] Wang X, Jiang H, Yang B. A k-nearest neighbor medoid-based outlier detection algorithm[C]// 2021 International Conference on Communications, Information System and Computer Engineering (CISCE). IEEE, 2021: 601-605.
[10] Rangel E M, Hendrix W, Agrawal A, et al. AGORAS: A fast algorithm for estimating medoids in large datasets[J]. Procedia Computer Science, 2016, 80: 1159-1169.
[11] 张朝,郭秀娟,张坤鹏. K-means算法聚类中心选取[J].吉林大学学报(信息科学版), 2019, 37(4): 437-441. Zhang C, Guo X J, Zhang K P. K-means algorithm for clustering center selection[J]. Journal of Jilin University (Information Science Edition), 2019, 37(4): 437-441.
[12] Lee T, Kim S J, Chung E Y, et al. K-maximin clustering: A maximin correlation approach to partition-based clustering[J]. IEICE Electronics Express, 2009, 6(17): 1205-1211.
[13] Elfarra B K, El Khateeb T J, Ashour W M. BH-centroids: A new efficient clustering algorithm[J]. International Journal of Artificial Intelligence and Application, 2013, 1(1): 15-24.
[14] Zhang Y, Wang H J, Zhu Z. Quantile-regression-based clustering for panel data[J]. Journal of Econometrics, 2019, 213(1): 54-67.
[15] Wang B, Li Y, Härdle W K. K-expectiles clustering[J]. Journal of Multivariate Analysis, 2022, 189: 104869.
[16] Kuwil F H, Atila Ü, Abu-Issa R, et al. A novel data clustering algorithm based on gravity center methodology[J]. Expert Systems with Applications, 2020, 156. doi: 10.1016/j.eswa.2020.113435.
[17] Gholizadeh N, Saadatfar H, Hanafi N. K-DBSCAN: An improved DBSCAN algorithm for big data[J]. The Journal of Supercomputing, 2021, 77(6): 6214-6235.
[18] Abdalameer A K, Alswaitti M, Alsudani A A, et al. A new validity clustering index-based on finding new centroid positions using the mean of clustered data to determine the optimum number of clusters[J]. Expert Systems with Applications, 2022, 191. doi: 10.1016/J.ESWA.2021.116329.
[19] Chen S, Xie W. On the cluster-aware supervised learning: Frameworks, convergent algorithms, and applications[J]. INFORMS Journal on Computing. 2020. doi: 10.1287/ijoc.2020.1053.
[20] Strehl A, Ghosh J. Relationship-based clustering and visualization for high-dimensional data mining[J]. INFORMS Journal on Computing, 2003, 15(2): 208-230.
[21] Cheng Y, Church G M. Biclustering of expression data[C]// ISMB, 2000, 8: 93-103.
[22] Fatehi K, Rezvani M, Fateh M. ASCRClu: An adaptive subspace combination and reduction algorithm for clustering of high-dimensional data[J]. Pattern Analysis and Applications, 2020, 23(4): 1651-1663.
[23] Wang X D, Chen R C, Yan F, et al. Fast adaptive K-means subspace clustering for high-dimensional data[J]. IEEE Access, 2019, 7: 42639-42651.
[24] Deng T, Ye D, Ma R, et al. Low-rank local tangent space embedding for subspace clustering[J]. Information Sciences, 2020, 508: 1-21.
[25] Wen G, Li X, Zhu Y, et al. One-step spectral rotation clustering for imbalanced high-dimensional data[J]. Information Processing & Management, 2021, 58(1): 102388.
[26] Li T, Wang X, Zhong H. Cohesive clustering algorithm based on high-dimensional generalized Fermat points[J]. Information Sciences, 2022, 613: 904-931.
[27] Uteshev A Y, Yashina M V. Stationary points for the family of Fermat–Torricelli–Coulomb-like potential functions[C]// International Workshop on Computer Algebra in Scientific Computing. Springer, Cham, 2013: 412-426.
[28] Rodin V A, Sinegubov S V. Algorithms for the numerical determination of coordinates of analogues of Fermat-Steiner point[C]// Journal of Physics: Conference Series. IOP Publishing, 2019, 1202(1): 012027.
[29] Kuhn H W. A note on Fermat's problem[J]. Mathematical Programming, 1973, 4(1): 98-107.
[30] Nam N M. The Fermat-Torricelli problem in the light of convex analysis[J]. arXiv preprint arXiv: 1302.5244, 2013.
[31] Rokach L, Maimon O. Clustering methods[M]// Data Mining and Knowledge Discovery Handbook. Springer, Boston, Ma, 2005: 321-352.
[32] Park Y W, Jiang Y, Klabjan D, et al. Algorithms for generalized clusterwise linear regression[J]. INFORMS Journal on Computing, 2017, 29(2): 301-317.
[33] Hartigan J A. Clustering algorithms[M]. Hoboken, NJ: John Wiley & Sons, Inc, 1975.
[34] Milligan G W. An examination of the effect of six types of error perturbation on fifteen clustering algorithms[J]. Psychometrika, 1980, 45(3): 325-342.
[35] Hampel F R, Ronchetti E M, Rousseeuw P J, et al. Robust statistics: The Approach Based on Influence Functions[M]. Hoboken, NJ: John Wiley & Sons, 2011.
[36] Estivill-Castro V, Yang J. Fast and robust general purpose clustering algorithms[C]// Pacific Rim International Conference on Artificial Intelligence. Springer, Berlin, Heidelberg, 2000: 208-218.
[37] Bradley P S, Fayyad U M, Mangasarian O L. Mathematical programming for data mining: Formulations and challenges[J]. INFORMS Journal on Computing, 1999, 11(3): 217-238.
[38] 李彤,王春峰,王文波,等.求解整数规划的一种仿生类全局优化算法——模拟植物生长算法[J]. 系统工程理论与实践, 2005, 25(1): 76-85. Li T, Wang C F, Wang W B, et al. A biomimetic-like global optimization algorithm for solving integer programming — Plant growth simulated algorithm[J]. Systems Engineering — Theory & Practice, 2005, 25(1): 76-85.
[39] 李彤,王众托.模拟植物生长算法的原理及应用[J].系统工程理论与实践, 2020, 40(5): 1266-1280. Li T, Wang Z T. The principle and application of the algorithm for simulating plant growth[J]. Systems Engineering — Theory & Practice, 2020, 40(5): 1266-1280.
[40] Brabazo A, O'Neill M, McGarraghy S. Natural computing algorithms[M]. Springer-Verlag, Berlin Heidelberg, 2015.
[41] Guney K, Durmus A, Basbug S. A plant growth simulation algorithm for pattern nulling of linear antenna arrays by amplitude control[J]. Progress in Electromagnetics Research B, 2009, 17: 69-84.

Funding

The Key Special Program of National Natural Science Foundation of China (72342013); Key Projects of Liaoning Provincial Social Science Fund (L15AGL019)
PDF(2560 KB)

551

Accesses

0

Citation

Detail

Sections
Recommended

/