Article Preview
Top1. Introduction
In recent years, due to the rapid development of the Internet, many industries have accumulated a huge amount of data, and are showing explosive growth. At present, the speed of data analysis and utilization of database technology is far behind the speed of production and storage. In addition, the results after the database analysis cannot be directly used, still need manual analysis. In this context, data mining technology emerges as the times require. Data mining technology can not only effectively deal with a large number of historical and current data, and to dig out the valuable information from the massive historical data, so as to the actual production and operation, to provide guidance for the development (Wang & Ding, 2009; Agrawal & Srikant, 1995).
Data mining technology can not only process tremendous historical and existing data, and also discover valuable information from such huge historical data, providing guidance to the practical production, operation and development (Gong, Liu & Jia, 2011; Dong & Wen, 2009). Sequence pattern mining, as a significant research topic in the data mining field, is a knowledge discovery process in which frequent sub-sequence is found from sequential database to use as pattern (Hu, 2009; Wang & Fan, 2009; Li, Wang & Chen, 2013) Sequence pattern mining is of practicability and easy comprehension so that it gains wide concern and deep investigation. Through researches in recent years, some typical sequence pattern mining algorithms have been generated. They improved to a certain degree the efficiency of data exploration, however with the age of big data approaching, the scale of data set has been becoming bigger and bigger in more and more complicated structure (Lv & Zhang, 2006; Lin, 2013; Zhang, Hu & Chen, 2007). When traditional sequence pattern mining algorithm is doing data mining, the existence of enormous irrelevant and redundant data increases space-time costs and sometimes causes memory overflow, weakening greatly the performance of traditional sequence pattern mining technique. Besides, sequence patterns excavated by the traditional algorithm are of low quality, unable to meet customer’s actual requirements, no useful information mined. When it enters into the era of big data, the commonest issue facing lots of enterprises is “vast information but poor information”. How to dig out the most valuable information from massive big data collection has become a research focus at present and an issue to be solved urgently (Jiang, Wang & Huang, 2015; Fu &Yang, 2015). The paper proposed SPM of Map-Reduce algorithm based on Map-Reduce and used it in real big dataset. Valuable information is discovered from big dataset and used to guide practical business activities.