Strong Spanned Patterns Generation Using Subsequence Cover Problem Reduction and the Term-Product Operation



Jasmine A. Malinao, Richelle Ann B. Juayong, 

Nestine Hope S. Hernandez, and Henry N. Adorna

Department of Computer Science (Algorithms and Complexity Laboratory)
University of the Philippines, Diliman, Quezon City
Rm. 317, UP Alumni Engineers Centennial Hall, Velasquez St.,
University of the Philippines, Diliman, Quezon City

corresponding author: This email address is being protected from spambots. You need JavaScript enabled to view it.



The Strong Spanned Patterns-Trie Construction (SSP-TC) algorithm is introduced to efficiently generate a set of strong spanned patterns of a given conflict-free binary k-tagged data set obtained by the use of the Approximate Crisp Theory Set Formation (ACTSF) methodology that we proposed in our previous work. In our previous work, we have shown that such a set with this characteristic can be obtained using the SSP-trie data structure in O(mn2). In this paper, we present and prove the correctness of the SSP-TC algorithm that generates this set through parallel computations in O(mn) implemented in this trie structure. We were also able to reduce the problem of generating a set of strong spanned patterns into a problem known as the Subsequence Cover Problem (SubCP). We obtain a solution to this reduct through the use of the SSP-TC algorithm and the SSP-Trie data structure whose input is from the components of the Term-Product Matrix introduced in this paper. To illustrate the classification performance of the generated set of patterns using the proposed concepts and methods, we use two data sets publicly-made-available in University of California Irvine (UCI) Machine Learning Repository and show that we achieve better rates of classification on the test sets of the two data sets compared with the results in literature.



One of the key concepts driving the need for pattern extraction is the quest to predict and understand the occurrence of a phenomenon for a given space of data. Patterns, unlike other data models, provide users more intuitive representation of what needs to be understood of a very complex, oftentimes, huge number of data points. There had been many algorithms known in the area of Knowledge Discovery in Databases developed for this purpose. . . . . . . . . . . . . . . . .





ALEXE S, HAMMER PL. 2006. Accelerated algorithm for pattern detection in logical analysis of data. Discrete Applied Mathematics, Special Issue: Discrete mathematics and Data Mining II: p. 1050-63. ISSN:0166-218X.

BARANDELA R, GASCA E. 2000. Decontamination of Training Samples for Supervised Pattern Recognition Methods. In: Ferri FJ, Inesta Querada JM, Amin A, Paudil P. (eds.). Lecture Notes in Computer Science, Vol. 1876. London, UK: Springer-Verlag. p. 621-630.

BOROS E, HAMMER P, IBARAKI T, KOGAN A, MAYORAZ E, MUCHNIK I. 1996. An Implementation of Logical Analysis of Data. IEEE Transactions on Knowledge and Data Engineering, Volume 12, No. 2. NJ, USA: IEEE Educational Activities Department, Piscataway. p. 292-306.

BREIMAN L. 1996. Bagging Predictors. Machine Learning 24: 123-140.

DUCH W, ADAMCZAK R, GRABZEWSKI K. 1998. Extraction of logical rules from backpropagation networks. Neural Processing Letters 7: 1-9.

DUCH W, GRABCZEWSKI K. 1999. Searching for Optimal MLP (Multilayer Perceptron). In: Fourth Conference on Neural Networks and their Applications: May 1999; Zakopane, Poland: Source: http://www.

FISHER R, MARSHALL M. 1998. UCI Repository of Machine Learning Databases: Machine readable data repository, Department of Computer Science, University of California, Irvine. 

FREITAS A. 1999. On rule interestingness measures. Knowledge Based Systems 12(5-6): 309-315.

HAMMER et al. 2004. Pareto-optimal patterns in logical analysis of data. Elseview Science Publishers B.V., Amsterdam, The Netherlands: The Netherlands. J Discrete Appl Mathematics. 144(1-2): 79-102.

JIANG Y, ZHOU ZH. 2004. Editing Training Data for kNN classifiers with Neural Network Ensemble. In: Lecture Notes in Computer Science presented at the International Symposium on Neural Networks; 19-21 August 2004; 3173 Dalian, China: Springer-Verlag. 356-361.

KONTKANEN P, LAHTINEN J, MYLLYMAKI P, TIRRI H. 2000. Unsupervised Bayesian visualization of high-dimensional data. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 20-23 Agust 2000; Boston, Ma., USA: ACM. p. 325-329.

KORNIENKO L, ALBRECHT D, DOWE D. 2005. A Preliminary MML Linear Classifier using Principal Components for Multiple Classes. Lecture Notes in Computer Science. Volume 3809/2005: 922-926. DOI: 10.1007/11589990 111.

LI J. 2007. Rough Set Based Rule Evaluations and Their Applications. [PhD Thesis]. Canada: University of Waterloo. 200p. Retrieved from: http://uwspace. pdf.

MALINAO J, ADORNA H. 2009. An Algorithm to Efficiently Generate an Approximation of a Theory Set. In: Proceedings of 9th Philippine Computing Science Congress; 2-3 March 2009; Philippines: Computing Society of the Philippines. p. 185-208.

MALINAO J, HERNANDEZ NH, ADORNA H. 2009. The SSP-Trie Algorithm for Efficient Strong Spanned Pattern Set Generation of ACTSF. In: Proceedings of the 3rd ERDT Conference, 11 September 2009. University of the Philippines Diliman, Philippines: College of Engineering. p. 162- 68.

MALY K. 1976. Compressed Tries. Communications of the ACM 19(7): 409-415.

SEO D, SONG C, LEE W. 2007. A Classifier Capable of Handling New Attributes. In: Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining; 1-5 April 2007; Honolulu, Hawaii, USA: Institute of Electrical and Electronics Engineers. p. 323-327.