Reoptimization of the Consensus Pattern Problem under Pattern Length Modification
Jhoirene B. Clemente1*, Proceso L. Fernandez Jr.2, Richelle Ann B. Juayong1,
Jasmine A. Malinao1, Ivy D. Ordanel1, and Henry N. Adorna1
1Department of Computer Science, University of the Philippines Diliman,
Quezon City, NCR 1101 Philippines
2Department of Information Systems and Computer Science, Ateneo de Manila University,
Quezon City, NCR 1108 Philippines
*Corresponding author: This email address is being protected from spambots. You need JavaScript enabled to view it.
ABSTRACT
In Bioinformatics, finding conserved regions in genomic sequences remains to be a challenge not just because of the increasing size of genomic data collected but because of the hardness of the combinatorial model of the problem. One problem formulation is called the Consensus Pattern Problem (CPP). Given a set of t n-length strings S = {S1,..., St} defined over some constant size alphabet Σ and an integer l, where l ≤ n, the objective of CPP is to find an l-length string v and a set of l-length substrings si of each Si in S such that the total sum of d(si, v) is minimized for all 1 ≤ i ≤ t. Here d(x, y) denotes the Hamming distance between the two strings x and y. It is known that CPP is NP-hard i.e., unless P = NP, there is no polynomial-time algorithm that produces an optimal solution for CPP. In this study, we investigate a combinatorial setting called reoptimization in finding an approximate solution for this problem. We seek to identify whether a specific additional information can help in solving CPP. Specifically, we deal with the following reoptimization scenario. Suppose we have an optimal l-length consensus substring of a given set of sequences S. How can this information be beneficial in obtaining an (l + k)-length and (l – k)-length consensus for S? In this paper, we show that the reoptimization variant of the problem is still computationally hard even with k = 1. In response, we present four algorithms that make use of the given optimal solution – we prove that the first three algorithms produce solutions with quality that is bounded from above by an additive error that grows as the parameter k increases, while the fourth algorithm achieves a guaranteed approximation ratio. It has been shown that there is no efficient polynomial-time approximation scheme for CPP (Boucher 2015). In this paper, we show that we can save 𝒕(𝒏−(𝒍+𝒌)+𝟏)(𝒕(𝒍+𝟐𝒌)𝒓) steps in computation from the original running time of the known polynomial-time approximation scheme for CPP.
INTRODUCTION
Transcription factor binding sites in genomic sequences are conserved segments in the DNA that are known to regulate the expression of one or more genes. Identifying these conserved segments is modelled as a substring selection problem in Computer Science. One formulation of the problem is called the Consensus Pattern Problem, abbreviated as CPP. Formally – given a set of t n-length strings S = {S1,..., St} defined over some constant size alphabet Σ and an integer l, where l ≤ n – the objective of CPP is to find an l-length string v and a set of l-length substrings si of each Si in S such that the total sum of d(si, v) is minimized for all 1 ≤ i ≤ t. Here d(x, y) denotes the Hamming distance between the two strings x and y. Algorithms for CPP have been applied to a variety of pattern identification – ranging from biological sequences to text mining. The general formulation of the problem can be applied to other discrete structures such as graphs and time-series datasets. . . . . read more
REFERENCES
BILÒ D, ZYCH A. 2012. New Advances in Reoptimizing the Minimum Steiner Tree Problem. In Proc. of the Mathematical Foundations of Computer Science. LNCS. 7464: 184–197.
BILÒ D, BÖCKENHAUER HJ, KOMM D, KRALOVIC R, MÖMKE T, SEIBERT S, ZYCH A. 2011. Reoptimization of the Shortest Common Superstring Problem. Algorithmica (New York) 61(2): 227–251. ISSN 01784617. doi: 10.1007/s00453-010-9419-8.
BILÒ D. 2018. New algorithms for Steiner tree reoptimization. ICALP 19: 1 – 19: 14.
BÖCKENHAUER HJ, HROMKOVIČ J, MÖMKE T, WIDMAYER P. 2008. On the hardness of reoptimization. In: SOFSEM 2008. Lecture Notes in Computer Science, Vol. 4910. Geffert V, Karhumäki J, Bertoni A, Preneel B, Návrat P, Bieliková M eds. Springer, Berlin, Heidelberg. p. 50–65.
BÖCKENHAUER HJ, FREIRMUTH K, HROMKOVIČ J, MÖMKE T, SPROCK A, STEFFEN B. 2012. Steiner Tree Reoptimization in Graphs with Sharpened Triangle Inequality. Journal of Discrete Algorithms 11(1): 73–86. ISSN 15708667.
BORIA N, PASCHOS VT. 2010. Fast Reoptimization for the Minimum Spanning Tree Problem. Journal of Discrete Algorithms 8(3): 296–310. https://doi.org/10.1016/j.jda.2009.07.002
BORIA N, MONNOT J, PASCHOS VT. 2012a. Reoptimization of Maximum Weight Induced Hereditary Subgraph Problems. Theoretical Computer Science p. 1–12.
BORIA N, MONNOT J, PASCHOS VT, BILÒ D. 2012b. Reoptimization of the Maximum Weighted Pk-Free Subgraph Problem under Vertex Insertion. In: International Workshop on Algorithms and Computation (WALCOM 2012). Rahman MS, Nakano S eds. Springer-Verlag Berlin Heidelberg. p. 76–87. ISSN 0302-9743. doi: 10.1007/978-3-540-93980-1.
BOUCHER C, LO C, LOKSHANTOV D. 2015. Consensus Patterns (Probably) Has No EPTAS. In: Algorithms – ESA 2015: Lecture Notes in Computer Science, Vol. 9294. Bansal N, Finocchi I eds. Springer-Verlag Berlin Heidelberg.
CATTANEO G, POMPEO F, PETRILLO UF, ITALIANO G. 2010. Maintaining Dynamic Minimum Spanning Trees: An Experimental Study. Discrete Applied Mathematics 158(5): 404–425.
CLEMENTE JB, ABOROT JA, ADORNA HN. 2014. Reoptimization of Motif Finding Problem. In: Proc. of the International Multi-Conference of Engineers and Computer Scientists I: 106–111. ISBN 9789881925251.
CLEMENTE JB, ABOROT JA, ADORNA HN. 2016. On Self-reducibility and Reoptimization of the Closest Substring Problem. Philippine Computing Journal 10(2): 1–7.
GAREY MR, JOHNSON DS. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. New York: W.H. Freeman & Co. ISBN 0716710447.
GOYAL K, MÖMKE T. 2015. Robust Reoptimization of Steiner Trees. In: Proceedings of the 5th IARCS Annual Conf. Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015). Harsha P, Ramalingam G eds. p. 10–24.
HROMKOVIČ J. 2009. Algorithmic Adventures: From Knowledge to Magic, 1st edition. Springer Publishing Company Incorporated. ISBN 3540859853, 9783540859857.
LI M, MA B, WANG L. 1999. Finding Similar Regions in Many Strings. Proceedings of the thirty-first annual ACM symposium on theory of computing 65(1): 473–482. ISSN 00220000. doi: 10.1145/301250.301376.
LI M, MA B, WANG L. 2002. On the Closest String and Substring Problems. Journal of the ACM 49(2):157–171. ISSN 00045411. doi: 10.1145/506147.506150.
NARDELLI E, PROIETTI G, WIDMAYER P. 2003. Swapping a Failing Edge of a Single Source Shortest Paths Tree Is Good and Fast. Algorithmica p. 56–74.
RIBEIRO C, TOSO R. 2007. Experimental Analysis of Algorithms for Updating Minimum Spanning Trees on Graphs Subject to Changes on Edge Weights. International Workshop on Experimental and Efficient Algorithms (WEA 2007, LCNS 4525). Demetrescu C ed. Springer-Verlag Berlin Heidelberg. p. 339–405.
SCHÄFFTER MW. 1997. Scheduling with Forbidden Sets. Discrete Applied Mathematics 72(1): 155–166. ISSN 0166-218X. doi: http://dx.doi.org/10.1016/S0166-218X(96)00042-X.
SECOMANDI N, MARGOT F. 2009. Reoptimization Approaches for the Vehicle-Routing Problem with Stochastic Demands. Operations Research 57(1): 214–230. ISSN 0030-364X.
SHACHNAI H, TAMIR G, TAMIR T. 2012. A Theory and Algorithms for Combinatorial Reoptimization. Lecture Notes in Computer Science 7256(1574): 618–630.
THORUP M. 2000. Dynamic Graph Algorithms with Applications. In: Proc. of the 7th Scandinavian Workshop on Algorithm Theory. Springer-Verlag Berlin Heidelberg. p. 1–9.
ZYCH A. 2012. Reoptimization of NP-hard Problems [Ph.D. thesis]. ETH Zürich.