Time-Series Link Prediction Using Support Vector Machines
Jan Miles Co* and Proceso Fernandez
Department of Information Systems and Computer Science
Ateneo de Manila University
The prominence of social networks motivates developments in network analysis, such as link prediction, which deals with predicting the existence or emergence of links on a given network. The Vector Auto Regression (VAR) technique has been shown to be one of the best for time-series based link prediction. One VAR technique implementation uses an unweighted adjacency matrix and five additional matrices based on the similarity metrics of Common Neighbor, Adamic-Adar, Jaccard’s Coefficient, Preferential Attachment and Research Allocation Index. In our previous work, we proposed the use of the Support Vector Machines (SVM) for such prediction task, and, using the same set of matrices, we gained better results. A dataset from DBLP was used to test the performance of the VAR and SVM link prediction models for two lags. In this study, we extended the VAR and SVM models by using three, four, and five lags, and these showed that both VAR and SVM improved with more data from the lags. The VAR and SVM models achieved their highest ROC-AUC values of 84.96% and 86.32% respectively using five lags compared to lower AUC values of 84.26% and 84.98% using two lags. Moreover, we identified that improving the predictive abilities of both models is constrained by the difficulty in the prediction of new links, which we define as links that do not exist in any of the corresponding lags. Hence, we created separate VAR and SVM models for the prediction of new links. The highest ROC-AUC was still achieved by using SVM with five lags, although at a lower value of 73.85%. The significant drop in the performance of VAR and SVM predictors for the prediction of new links indicate the need for more research in this problem space. Moreover, results showed that SVM can be used as an alternative method for time-series based link prediction.
As of August 2015, there are 3.175 billion active Internet users, with 2.206 billion active social media users. Over the year 2014, social media users have risen by 176 million in just a single year (Regan 2015). The rapid increase in social media users implies that either existing networks are growing or new networks are being created. The development in social networks serves as the main motivation for our study in network analysis, specifically link prediction.
Link prediction is an area in network analysis that deals with determining the existence or emergence of links given a network. Link prediction can be classified into two types: static link prediction and dynamic link prediction. In static link prediction, the detection of hidden links is based on a known partial snapshot, and the objective is to predict currently hidden but existing links . . . . . read more