MENU

Time-Series Link Prediction Using Support Vector Machines

Jan Miles Co* and Proceso Fernandez

Department of Information Systems and Computer Science
Ateneo de Manila University

*Corresponding author: This email address is being protected from spambots. You need JavaScript enabled to view it.

 

ABSTRACT
The prominence of social networks motivates developments in network analysis, such as link prediction, which deals with predicting the existence or emergence of links on a given network. The Vector Auto Regression (VAR) technique has been shown to be one of the best for time-series based link prediction. One VAR technique implementation uses an unweighted adjacency matrix and five additional matrices based on the similarity metrics of Common Neighbor, Adamic-Adar, Jaccard’s Coefficient, Preferential Attachment and Research Allocation Index. In our previous work, we proposed the use of the Support Vector Machines (SVM) for such prediction task, and, using the same set of matrices, we gained better results. A dataset from DBLP was used to test the performance of the VAR and SVM link prediction models for two lags. In this study, we extended the VAR and SVM models by using three, four, and five lags, and these showed that both VAR and SVM improved with more data from the lags. The VAR and SVM models achieved their highest ROC-AUC values of 84.96% and 86.32% respectively using five lags compared to lower AUC values of 84.26% and 84.98% using two lags. Moreover, we identified that improving the predictive abilities of both models is constrained by the difficulty in the prediction of new links, which we define as links that do not exist in any of the corresponding lags. Hence, we created separate VAR and SVM models for the prediction of new links. The highest ROC-AUC was still achieved by using SVM with five lags, although at a lower value of 73.85%. The significant drop in the performance of VAR and SVM predictors for the prediction of new links indicate the need for more research in this problem space. Moreover, results showed that SVM can be used as an alternative method for time-series based link prediction.

INTRODUCTION
As of August 2015, there are 3.175 billion active Internet users, with 2.206 billion active social media users. Over the year 2014, social media users have risen by 176 million in just a single year (Regan 2015). The rapid increase in social media users implies that either existing networks are growing or new networks are being created. The development in social networks serves as the main motivation for our study in network analysis, specifically link prediction.

Link prediction is an area in network analysis that deals with determining the existence or emergence of links given a network. Link prediction can be classified into two types: static link prediction and dynamic link prediction. In static link prediction, the detection of hidden links is based on a known partial snapshot, and the objective is to predict currently hidden but existing links . . . . . read more

REFERENCES
AKBANI R, KWEK S, JAPKOWICZ N. 2004. Applying support vector machine to imbalanced datasets. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D, ed. Machine learning: ECML 2004. 15th European Conference on Machine Learning; 2004 September 20-24; Pisa, Italy. p. 39-50
BARROS RC, BASGALUPP MP, CARVALHO ACPLF, FREITAS AA. 2012. A survey of evolutionary algorithms for decision-tree induction. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(3): 291-312.
CO JM, FERNANDEZ P. 2016. Improving the vector auto regression technique for time-series link prediction by using support vector machine. MATEC Web of Conferences 56(01008): 1-5.
CO JM. 2016. Filtered DBLP for time-series based link prediction. Retrieved from https://www.researchgate.net/publication/300928276_Filtered_DBLP_for_Time-Series_Based_Link_Prediction.
DEMPSEY K, DURAISAMY K, ALI H, BHOWMICK A. 2011. A parallel graph sampling algorithm for analyzing gene correlation networks. Procedia Computer Science 4(2011): 136-145.
DHOTE Y, MISHRA N, SHARMA S. 2013. Survey and analysis of temporal link prediction in online social networks. In: 2013 International Conference on Advances in Computing, Communications and Informatics; 2013 August 22-25; Mysore, India. p. 1178-1183.
DUNLAVY D, KOLDA T, ACAR E. 2011. Temporal link prediction using matrix and tensor factorizations. ACM Transactions on Knowledge Discovery from Data 5(2): 10:1-7.
 GJOKA M, KURANT M, BUTTS CT, MARKOPOULOU A. 2010. Walking in Facebook: A case study of unbiased sampling of OSNs. In: IEEE International Conference on Computer Communications; 2010 March 15-19; San Diego, CA, United States. p. 1-9.
GUPTA N, SINGH A. 2014. A novel strategy for link prediction in social Networks. In: 2014 CoNEXT on Student Workshop; 2014 December 2; Sydney, Australia. p. 12-14.
HASAN MA, CHAOJI V, SALEM S, ZAKI M. 2006. Link prediction using supervised learning. In: Link Analysis, Counterterrorism and Security 2016; 2006 April 22; Bethesda, Maryland.
HUANG Z, LIN DKJ. 2009. The time-series link prediction problem with applications in communication surveillance. INFORMS Journal on Computing 21(2): 286-303.
KURANT M, MARKOPOULOU A, THIRAN P. 2010. On the bias of BFS. In: 22nd International Teletraffic Congress; 2010 September 7-9; Netherlands. p. 1-8.
LEE JB, ADORNA H. 2012. Link prediction in a modified heterogeneous bibliographic network. In: 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining; 2012 August 26-29; Instanbul. p. 442-449.
LOTTE F, LECUYER A, ARNALDI B. 2007. FuRIA: A novel feature extraction algorithm for brain-computer interfaces using inverse models and fuzzy regions of interest. In: 2007 3rd international IEEE/EMBS Conference on Neural Engineering; 2007 May 2-5; Kohala Coast, Hawaii. p. 175-178.
MENGSHOEL OJ, DESAI R, CHEN A, TRAN B. 2013. Will we connect again? Machine learning for link prediction in mobile social networks. In: 11th Workshop on Mining and Learning with Graphs; 2013 August 11; Illinois, USA.
MOHAISEN A, LUO P, LI Y, KIM Y, ZHANG Z. 2012. Measuring bias in the mixing time of social graphs due to graph sampling. In: IEEE Military Communications Conference; 2012 October 29 – November 1; Orlando, Florida, United States. p. 1-6.
NAGPAL R, GARG R. 2014. Algorithms for reducing the size of network. International Journal of Innovative Research in Computer and Communication Engineering 2(11): 6512-6518.
NGUYEN-THI AT, NGUYEN PQ, NGO TD, NGUYEN-HOANG TA. 2015. Transfer AdaBoost SVM for link prediction in newly signed social networks using explicit and PNR features. Procedia Computer Science 60: 332-341.
OZACAN A, OGUDUCU SG. 2015. Multivariate temporal link prediction in evolving social networks. In: 2015 International Conference on Information Systems; 2015 June 28- July 1; Las Vegas, Nevada, United States. p.185-190.
RAHMAN MM, DAVIS DN. 2013. Cluster based under-sampling for imbalanced cardiovascular data. In: World Congress on Engineering; 2013 July 3-5; London, UK. p. 1480-1485.
REGAN K. 10 Amazing social media growth stats from 2015. Retrieved from http://www.socialmediatoday.com/social-networks/kadie-regan/2015-08-10/10-amazing-social-media-growth-stats-2015.
SOARES PR, PRUDENCIO RB. 2012. Time series based link prediction. In: The 2012 International Joint Conference on Neural Networks; 2012 June 10-15; Brisbane, Australia. p. 1-7.
TANG J, CHANG S, AGGARWAL C, LIU H. 2015. Negative link prediction in social media. In: WSDM ’15 Proceedings of the Eight ACM International Conference on Web Search and Data Mining; 2015 January 31 - February 6; Shanghai, China. p. 87-96.
WANG T, LIAO G. 2014. A review of link prediction in social networks. In: Proceedings of 2014 International Conference on Management of E-Commerce and E-Government; 2014 October 31 - November 2; Shanghai, China. p.147-150.
YEN SJ, LEE YS. 2009. Cluster-based under-sampling approaches for imbalanced data distributions. Journal Expert Systems with Applications: An International Journal 36(3): 5718-5727.
ZHOU R, HOLDER LB. 2010. Frequent subgraph mining on a single large graph using sampling techniques. In: Proceedings of the Eight Workshop on Mining and Learning with Graphs; 2010 July 25-28; Washington, DC, United States. p. 171-178.