Research of large scale manifold learning based on MapReduce

XUE Yong-jian, NI Zhi-wei

Systems Engineering - Theory & Practice ›› 2014, Vol. 34 ›› Issue (s1) : 151-157.

PDF(930 KB)
PDF(930 KB)
Systems Engineering - Theory & Practice ›› 2014, Vol. 34 ›› Issue (s1) : 151-157. DOI: 10.12011/1000-6788(2014)s1-151

Research of large scale manifold learning based on MapReduce

  • XUE Yong-jian1,2, NI Zhi-wei1,2
Author information +
History +

Abstract

With the rapid development of the information technology, it is challenging for the traditional machine learning and data mining algorithms to deal with large scale explosive growth data. Manifold learning is a dimensionality reduction algorithm which can overcome some shortages of traditional linear dimensionality reduction methods. However, it is not useful for large scale data because of high complexity. In order to deal with the dimensionality reduction of large scale data, a distributed manifold learning algorithm is proposed based on MapReduce. Local sensitive hash functions are used to map the similarity points to the same bucket, then the geodesic distance between points in the same bucket can be computed by Euclidean norm according to the local homeomorphisms of Euclidean spaces of manifold and the geodesic distance among points between buckets can be computed by the modified geodesic distance formula which takes use of central points and edge points. Experiments on large scale of manmade dataset and real dataset show that this distributed manifold learning algorithm can approximate the geodesic distance between points effectively and it is useful for large scale dimensionality reduction.

Key words

MapReduce / manifold leaning / large scale dimensionality reduction / local sensitive hashing

Cite this article

Download Citations
XUE Yong-jian , NI Zhi-wei. Research of large scale manifold learning based on MapReduce. Systems Engineering - Theory & Practice, 2014, 34(s1): 151-157 https://doi.org/10.12011/1000-6788(2014)s1-151

References

[1] Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 2000, 290(5500): 2323-2326.
[2] Tenenbaum J B, De Silva V, Langford J C. A global geometric framework for nonlinear dimensionality reduction[J]. Science, 2000, 290(5500): 2319-2323.
[3] Belkin M, Niyogi P. Laplacian eigenmaps for dimensionality reduction and data representation[J]. Neural Computation, 2003, 15(6): 1373-1396.
[4] Donoho D L, Grimes C. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data[J]. Proceedings of the National Academy of Sciences, 2003, 100(10): 5591-5596.
[5] Zhang Z, Zha H. Principal manifolds and nonlinear dimension reduction via local tangent space alignment[J]. SIAM Journal on Scientific Computing, 2005, 26(1): 313-338.
[6] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters[C]//Brewer E, Chen P. Proceedings of the OSDI. California: USENIX Association, 2004: 137-150.
[7] Balasubramanian M, Schwartz E L. The Isomap algorithm and topological stability[J]. Science, 2002, 295(5552): 7-7.
[8] Samko O, Marshall A D, Rosin P L. Selection of the optimal parameter value for the Isomap algorithm[J]. Pattern Recognition Letters, 2006, 27(9): 968-979.
[9] Wang J, Zhang Z, Zha H. Adaptive manifold learning[C]//Advances in Neural Information Processing Systems, 2004: 1473-1480.
[10] Li Y. Building k-edge-connected neighborhood graph for distance-based data projection[J]. Pattern Recognition Letters, 2005, 26(13): 2015-2021.
[11] Yang L. Building k edge-disjoint spanning trees of minimum total length for isometric data embedding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(10): 1680-1683.
[12] Wu Y, Chan K L. An extended Isomap algorithm for learning multi-class manifold[C]//Proceedings of 2004 IEEE International Conference on Machine Learning and Cybernetics, 2004, 6: 3429-3433.
[13] Souvenir R, Pless R. Manifold clustering[C]//Proceedings of Tenth IEEE International Conference on Computer Vision, 2005, 1: 648-653.
[14] Wankou Y, Changyin S, Lei Z. A multi-manifold discriminant analysis method for image feature extraction[J]. Pattern Recognition, 2011, 44(8): 1649-1657.
[15] Gong D, Zhao X, Medioni G. Robust multiple manifolds structure learning[C]//Proceedings of the 29th International Conference on Machine Learning, 2012: 321-328.
[16] Law M H C, Zhang N, Jain A K. Nonlinear manifold learning for data stream[C]//Proceedings of the Fourth SIAM International Conference on Data Mining, 2004: 33-44.
[17] Law M H C, Jain A K. Incremental nonlinear dimensionality reduction by manifold learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(3): 377-391.
[18] Kouropteva O, Okun O, Pietikainen M. Incremental locally linear embedding[J]. Pattern Recognition, 2005, 38(10): 1764-1767.
[19] Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing[C]//Proceedings of the International Conference on Very Large Data Bases, 1999: 518-529.
[20] Charikar M S. Similarity estimation techniques from rounding algorithms[C]//Proceedings of the Thirty-fourth Annual ACM Symposium on Theory of Computing. ACM, 2002: 380-388.
[21] Datar M, Immorlica N, Indyk P, et al. Locality-sensitive hashing scheme based on p-stable distributions[C]//Proceedings of the Twentieth Annual ACM Symposium on Computational Geometry. ACM, 2004: 253-262.
[22] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
PDF(930 KB)

294

Accesses

0

Citation

Detail

Sections
Recommended

/