中国科学院数学与系统科学研究院期刊网

Collections

大数据分析与预测技术
Sort by Default Latest Most read  
Please wait a minute...
  • Select all
    |
  • SHI Huiting, CHAI Jian, LU Quanying, WANG Shouyang
    Systems Engineering - Theory & Practice. 2021, 41(12): 3366-3377. https://doi.org/10.12011/SETP2020-2828
    As the cleanest fossil energy, natural gas is used more and more widely. However, the price fluctuation will affect the investment and demand of natural gas industry, lead to production cost management difficulties, and affect the formulation of energy policy and economic growth. Therefore, it is very important to fully understand the price determination mechanism and future fluctuation trend of natural gas. In this paper, dynamic Bayesian network model (DBN) is used to study the volatility mechanism of Henry Hub natural gas spot price and to predict the price volatility. As a result, we establish a dynamic causal network diagram of the formation mechanism of natural gas spot price, which comprehensively shows the direct and indirect factors driving the price formation. The forecast results show the range and probability of spot price volatility of natural gas in the next 24 months. For example, in January, June, July and October, the spot price of natural gas will maintain a growth rate of[-10%, 0%] with a probability of 0.2072. Our research provides a comprehensive analysis framework for exploring the driving factors of natural gas, and also provides more comprehensive prediction information for investors and policy makers.
  • TANG Zhenpeng, WU Junchuan, ZHANG Tingting, DU Xiaoxu, CHEN Kaijie
    Systems Engineering - Theory & Practice. 2021, 41(11): 2837-2849. https://doi.org/10.12011/SETP2020-0672

    Based on the idea of secondary decomposition and ensemble learning, we build the VMD-EEMD-DE-ELM-DE-ELM model, select soybeans, wheat and rice futures listed on the CBOT exchange as representatives of international grain futures, and predict its future price trend. In view of the existing research that directly ignore the residual items after VMD decomposition, we introduce the idea of secondary decomposition to perform the EEMD decomposition and ensemble prediction of its residual items for the first time. This method can capture the rich information contained in the residual items, thereby helping to improve the model's prediction effect on the original sequence. At the same time, because of the shortcomings of the existing model which use equal weights to reconstruct the prediction results of components, we draw on the idea of ensemble learning and introduces the DE-ELM meta-learner to optimize the reconstruction weights to obtain the best overall prediction results of the model. The empirical results show that the model proposed by us has a significant predictive advantage over the existing models.

  • ZHANG Dabin, LI Qian, CHEN Shanying, LING Liwen
    Systems Engineering - Theory & Practice. 2021, 41(11): 3020-3030. https://doi.org/10.12011/SETP2020-0996

    To improve the accuracy of interval forecasting, a VECM-CoinSVR hybrid model considering the cointegration between the upper and lower bounds for interval-valued forecasting is proposed. Vector error correction model (VECM) is firstly employed to fit the original time series so as to obtain the prediction result and residual error series of VECM. Secondly, the cointegration vector between the upper and lower bounds of the residual error series is obtained by using cointegration test, then the cointegration vector and the historical data of residual error series are treated as the input of the support vector regression considered cointegration (Coin-SVR) to obtain the prediction result of the residual error series. Finally, the final prediction of VECM-CoinSVR is obtained by combining the prediction result of VECM and the prediction result of the residual error series. To verify the effectiveness of the proposed model, the interval forecast hybrid model is used for empirical research on the price forecasting of beef, mutton and live chicken in the national market. Compared with the three single models (VECM, SVR, Coin-SVR) and based on the criteria MAPE, MSEI, and UI, VECM-CoinSVR has significantly higher prediction accuracy. By comparing with the point forecasting result of the interval center time series, the point that interval forecasting can yield a better result than point forecasting is further demonstrated.

  • ZHOU Hao, ZHANG Yifei, WANG Zhen, WANG Jue, WANG Shouyang
    Systems Engineering - Theory & Practice. 2021, 41(10): 2660-2668. https://doi.org/10.12011/SETP2019-1965
    Crude oil price forecasting received much attention due to its importance and the non-linearity and complexity of crude oil price series. Using the useful information provided by sub-model to generate comprehensive prediction, forecast combination aims to improve the forecasting accuracy. It is significant how to efficiently generate many diverse sub-models and weighting vector. In this paper, we first introduced various feature selection techniques, including filter, wrapper and embedded methods to determine the key factors affecting crude oil prices. Then, individual models are constructed by incorporating feature selection methods with multiple linear regression, artificial neural network and support vector regression model. Finally, a dynamic particle swarm optimization algorithm is proposed. The algorithm can search for the optimal weighting vector and capture the dynamic changes of weighting series. Experimental results show that the proposed dynamic forecast combination model can reduce the computational complexity and improve the forecasting performance.
  • LIU Yi, QU Jianwen, DONG Xugao, ZHANG Lei
    Systems Engineering - Theory & Practice. 2021, 41(9): 2256-2270. https://doi.org/10.12011/SETP2020-1181
    Due to the excellent performance of range in improving volatility forecast and the wide-spread use of information based on the sign of return in the capital market, this paper constructs the signed range by combining range and the sign of return and introduces it into four mainstream HAR models. The empirical results based on the 5-minute high-frequency trading data of the Shanghai Composite Index indicate that signed range has a significant "asymmetric" impact on future volatility in the short term, with negative (positive) signed range leading to significantly higher (lower) future volatility. The out-of-sample prediction results show that the introduction of singed range can significantly improve the model's predictive ability, and the results are robust. Last but not least, HAR-RSV-SR model and HAR-Q-SR model are the best models in short and medium and long horizons than others models discussed in this paper. The conclusion of this article has important reference value for the application of volatility in asset pricing and risk management.
  • CAI Guanghui, XU Jun, YING Xuehai
    Systems Engineering - Theory & Practice. 2021, 41(8): 2030-2044. https://doi.org/10.12011/SETP2020-2738
    Considering that the correlation between financial assets has time-varying and long memory, although the MIDAS Copula model incorporating mixed data sampling can characterize time-varying and long memory, its parameter evolution process is relatively simple. Therefore, the generalized autoregressive score (GAS) model is introduced into the MIDAS Copula model as the parameter evolution process, to construct the GAS MIDAS Copula model. The empirical analysis found that the model has improved the ability of the MIDAS Copula model to fit samples; Further select choose three sets of CSI 300 industry indexes with different degrees of correlation, and analyze the model's ability to capture long memory of time-varying correlation coefficients between industries with different degrees of correlation and the risk prediction accuracy of its portfolio. The results showed that: 1) The GAS MIDAS Copula model has the best ability to describe the long memory of the correlation coefficients between highly and moderately related industries; 2) The VaR and ES backtesting results of simple portfolio of three sets of data show that the GAS MIDAS Copula model has the highest prediction accuracy. Finally, various risk prediction results based on different confidence levels, different weight ratios, different rolling window lengths, and different assets confirm the robustness of the GAS MIDAS Copula model.
  • LIU Yezheng, WU Feng, SUN Jianshan, YANG Lu
    Systems Engineering - Theory & Practice. 2021, 41(3): 537-553. https://doi.org/10.12011/SETP2020-1301
    The group recommender system has become an important tool of social platforms to provide personalized and satisfied products or services for groups. However, existing methods of group recommendation mainly focus on improving the personalized recommendation methods, not only ignoring the interaction of users and groups, but also neglecting the dynamics of user preferences and group preferences. These interaction process and dynamic evolution are essential to group recommendation. Therefore, this paper proposes a dynamic group recommendation method based on the co-evolution of user preferences and group preferences to model the dynamic interaction between users and groups. Specifically, we model the user preferences as a weighted aggregation of user historical preferences and group influence, and model the group preferences as a weighted combination of group historical preferences and new members' preferences. Finally, we aim to predict users' joining behaviors and group consumption behaviors. We also carry out extensive experiments using real data to evaluate the effectiveness of our model. The experimental results show that the proposed model not only achieve better performances on predicting both joining and consumption behaviors, but also is robustness.
  • QIAN Yu, CAO Enye, DENG Wenjun, YUAN Hua
    Systems Engineering - Theory & Practice. 2021, 41(3): 554-564. https://doi.org/10.12011/SETP2019-1136
    The user reviews published on APP market contain useful information for the APP R&D team. In order to study the influence and mode of user reviews on APP software update design, we propose a sentence vector similarity calculation model based on word vector representation, which can be used to measure the similarity of sentences from update log text and user comment text. Then, we propose a "log-comment" matching algorithm to divide the different semantic matching result into different data sets. By collecting a large amount of APP software update logs and user reviews from an open APP market, our method found that the APP development team adopted less than 20% of the user reviews, and the content adopted was mainly focused on the APP software function. Many of the user reviews pointed to the marketing activities, however, these reviews can rarely be considered and corrected in the new version of an APP. It was partly due to the limited role of R&D team in company's daily operation.
  • CHEN Jian, XIAO Yongbo, ZHU Bin
    Systems Engineering - Theory & Practice. 2021, 41(3): 596-612. https://doi.org/10.12011/SETP2019-1219
    In the environment of globalization and specialization, enterprises are facing increasing risks in all aspects of their supply chain management. By enhancing the data "visibility", the cross-boundary big data and its analytical techniques have provided a new means of risk evaluation and risk management. This paper focuses on the procurement function of enterprise operations management and investigates several issues in procurement risk evaluation from the big-data perspective. Based on a survey of the procurement process of a typical purchasing service company, we propose a "5+X" framework to classify the risks involved in procurement. Specifically, we consider environment risk, competition risk, moral risk, financial risk, fulfillment risk, and internal-control risk. For each category of risk, we propose the potential data source and its handling techniques. Based on an illustrative case study, we demonstrate the implementation steps of procurement risk assessment based on big data analytics.
  • TANG Xia, KUANG Haibo, GUO Yuanyuan, DIAO Shujie, ZHANG Pengfei
    Systems Engineering - Theory & Practice. 2021, 41(1): 176-187. https://doi.org/10.12011/SETP2019-0226
    Following the idea of decomposition-reconstruction-subsequence forecasting-ensemble, a combined forecasting model based on variational mode decomposition (VMD) was proposed. The model was constructed by selecting suitable decomposition model, optimizing reconstruction method, choosing appropriate subsequence forecasting method and ensemble method. And it was used to forecast the China containerized freight index (CCFI) and analyze the volatility characteristics and economic connotations of CCFI. Firstly, The time series CCFI was decomposed into multiple modal components by using VMD. Secondly, The modal components were reconstructed into high frequency, medium frequency, low frequency and trend subsequences, which means short-term market imbalance factors, seasonal factors, major events and market supply and demand respectively. Here, the fuzzy C-clustering algorithm was used to reconstruct the modal components, and its parameter C was optimized by component time-scale analysis. The economic meaning of each subsequence was explored by analyzing its volatility characteristics. Thirdly, a method based on data feature analysis was proposed to select the proper forecasting models, and it was used for reconstruct subsequences forecast. Finally, forecast results of reconstructed subsequences were added to obtain final output, and the ensemble forecast output was compared with other models' forecast results. The empirical results showed that the combined forecast model proposed in this paper is superior to the single model, such as BPNN, SVM, ARIMA, and EMD combination model, as well as other multi-scale combined forecast models based on VMD. And the analysis results reflected the external fluctuation characteristics and intrinsic economic meaning of CCFI.
  • LIU Weiyi, JIANG Hanyu, ZHANG Tianwei, CHEN Wei
    Systems Engineering - Theory & Practice. 2020, 40(12): 3095-3111. https://doi.org/10.12011/SETP2019-2691
    This paper systematically studies the modeling and forecasting of volatility under three types of high-frequency extreme value data, i.e. high-frequency closing price data, high-frequency high-low price data, and high-frequency OHLC data, based on which the theoretical properties of corresponding estimators under continuous price assumptions and under price jump assumptions are discussed and refined, and these estimation methods are uniformly extended to the corresponding dynamic forecasting models. Through the empirical analysis based on the high-frequency data of the Shanghai Stock Index and other major indexes, it reveals that sufficiently utilizing high-frequency extreme data information can significantly improve the model fitting ability and dynamic forecasting ability of volatility.
  • YUAN Ying, ZHANG Tonghui, ZHUANG Xintian
    Systems Engineering - Theory & Practice. 2020, 40(9): 2269-2281. https://doi.org/10.12011/1000-6788-2019-1379-13
    In this paper, we propose a modified multifractal volatility measure and construct multifractal volatility models based on HAR-type models including jumps and leverage effect. We apply Diebold-Mariano test and model confidence set test to compare the empirical performance of these models. The empirical results show that, 1) Based on the same paradigm, weighted adjusted realized volatility is better than realized volatility and our new multifractal volatility outperforms the other methods. 2) Based on the same volatility method, these models perform better when including jumps and leverage effect. 3) By the comparison among models, LHAR-MVWA-CJ model and LHAR-MVWA model outperform the other models.
  • WU Junjie, LIU Guannan, WANG Jingyuan, ZUO Yuan, BU Hui, LIN Hao
    Systems Engineering - Theory & Practice. 2020, 40(8): 2116-2149. https://doi.org/10.12011/1000-6788-2020-0027-34
    With the unprecedented development of big data and artificial intelligence, data intelligence has emerged as a focal point in both academia and industry. It features in a set of predictive data analytics methods gathered in a big-data driven and applications oriented manner, including data mining, machine learning, deep learning, etc. It aims to extract valuable patterns from big data generated inside and outside targeted application scenarios so as to enhance real-life management and decision-making levels. This paper thus focuses on introducing the recent advances in data intelligence, which is formulated as a cyclic system including three naturally integrated and mutually functional dimensions: Data, algorithms, and scenarios. We discuss the hot topics, growing trends, as well as research challenges in data intelligence, with our own comments and opinions aiming to provide guidance for entering the area of data intelligence and arouse peer discussions on this exciting field.
  • YANG Yang, LIU Sheng, LI Yiwei, JIA Jianmin
    Systems Engineering - Theory & Practice. 2020, 40(8): 2150-2158. https://doi.org/10.12011/1000-6788-2020-1187-09
    As the capability of tracking consumer footprint is enhanced, marketing science is experiencing a revolution of big data. In order to understand the changes in consumer behavior and marketing strategy under big data era, this paper collects relevant literature on big data marketing in the past decade, sorts out the related concepts, types and analytical methodsp, and extracts top 50 popular subjects of big data marketing such as search, mobile, word-of-mouth, digitization, APP and social media. Based on these findings, we review the research progress of big data marketing through four stages including Internet, social network, mobile Internet, big data and artificial intelligence. In the end, the future research direction of big data marketing is discussed from the three aspects regarding customer journey, quantitative evaluation of marketing activities, and development of marketing analytics technology.
  • ZHU Pingfang, DONG Chaohua, LIU Yali, LIAO Hui
    Systems Engineering - Theory & Practice. 2020, 40(6): 1495-1508. https://doi.org/10.12011/1000-6788-2020-0465-14
    Forecasting the exchange rate is very difficult as its fluctuations have statistical characteristics, such as time variability, randomness and ambiguity. The prediction effects of various methods and models in the existing literature are affected by many factors, and their predicting power are not as good as the random walk models. This is the so-called "The Meese and Rogoff puzzle" in exchange rate forecasting. We use a non-parametric method to study exchange rate fluctuations and forecasting model, and find that it is more flexible than any parametric or semi-parametric method. In order to overcome the "curse of dimensionality", we propose an additive non-parametric model to study the exchange rate forecasting. Compared with existing models, we find that our model has better out-of-sample prediction capabilities during the same observation period, which strongly proves that the "The Meese and Rogoff puzzle" is not impossible to crack. In addition, we apply the additive non-parametric exchange rate forecasting model to the RMB to USD exchange rate prediction, and the results still reveals the model's good fit and prediction ability. This study provides new research ideas and methods for exchange rate forecasting.
  • ZHANG Jian, SUN Yuying, ZHANG Xinyu, WANG Shouyang
    Systems Engineering - Theory & Practice. 2020, 40(6): 1509-1519. https://doi.org/10.12011/1000-6788-2020-0443-11

    Structural changes often occur in air passengers due to some external factors such as airport expansion, policy orientation and economic development; model uncertainty is a common long-standing issue in forecasting. To address these issues, a novel time-varying Jackknife model averaging method (TVJMA) (Sun et al, 2019, 2020) is employed to predict air passengers of the Top 5 airports in China. Based on nonparametric estimation, the optimal time-varying weights for various candidate models with time-varying parameters in candidate models are obtained by minimizing the local Jackknife criterion at every time point t. TVJMA method allows the weights and parameters to change over time. Empirical results show that the TVJMA method used in this paper is significantly superior to other benchmark models, including Hansen and Racine's (2012) Jackknife model averaging method (JMA), autoregression model (AR), autoregression integrated moving average model (ARIMA), seasonal autoregression integrated moving average model (SARIMA), and time-varying parameter model (TVP). Furthermore, the predictive effect of TVJMA is robust to different test sets and prediction steps. Overall, TVJMA method effectively reduces the predictive risk caused by structural changes and model uncertainty, and thus produces accurate and stable forecasts of air passengers.

  • LI Bing, LIN Anqi, GUO Dongmei
    Systems Engineering - Theory & Practice. 2020, 40(6): 1578-1595. https://doi.org/10.12011/1000-6788-2020-0439-18
    This paper uses Baker et al. (2016) to construct China's economic policy uncertainty index (EPU) and China's economic policy uncertainty index based on Chinese newspaper text keyword search (CEPU). Using monthly import data of about 5000 kinds of products in China from January 2010 to April 2016, we estimate the impact of EPU and CEPU on these products' import and find that economic policy uncertainty has different effects on imports of different products, both negative and positive. Further, we use BEC classification, differentiated products and homogeneous products, the number and concentration of countries in global trade, and the elasticity of import demand to characterize the heterogeneity of products and explain the differences in the impact of economic uncertainty on different products. We find that when the uncertainties of China's economic policy increase, capital goods and transport equipment are more negatively affected by imports than intermediate inputs and differentiated products are homogeneous products; the smaller the elasticity of demand substitution, the more countries imported globally, the lower the concentration of global importing countries, and the higher the concentration of global exporting countries, the negative reaction of product imports the bigger. In addition, we compare the performance of EPU and CEPU indices. CEPU index is more reasonable than the EPU index.
  • GONG Xu, CAO Jie, WEN Fenghua, YANG Xiaoguang
    Systems Engineering - Theory & Practice. 2020, 40(5): 1113-1133. https://doi.org/10.12011/1000-6788-2019-0561-21
    Recently, the HAR-type models based on high-frequency transaction data have shown a good forecasting performance for the volatility of financial markets. On the basis of 4 existing HAR-type models, through adding the leverage and structural breaks, we develop 4 new HAR-type models with leverage and structural breaks. Then, we use high-frequency transaction data for five minutes of the Shanghai Composite Index and Shenzhen Component Index as the study sample, which respectively analyzes on all HAR-type models. The results indicate that the realized volatility, continuous volatility, upside volatility, downside volatility, leverage and structural breaks have obvious in-sample prediction power for the volatility in Chinese stock market, while the jump volatility and signed jump variation show weak in-sample predictive ability. In addition, we also find, compared with HAR-type models without leverage and structural breaks, the new HAR-type models with leverage and structural breaks have higher in-sample fitting capacity and out-of-sample predictive power for the volatility. In most cases, the LHAR-CJ-SB model exhibits the best in-sample and out-of-sample performances. Our results suggest that adding the leverage and structural breaks can improve the prediction performance of HAR-type models, so we cannot ignore these two factors when we build new HAR-type models.