On designing DNA codes and their applications (Record no. 30240)

000 -LEADER
fixed length control field nam a22 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 210204b xxu||||| |||| 00| 0 eng d
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 621.391
Item number LIM
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name Limbachiya, Dixita
245 ## - TITLE STATEMENT
Title On designing DNA codes and their applications
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc Gandhinagar
Name of publisher, distributor, etc Dhirubhai Ambani Institute of Information and Communication Technology
Date of publication, distribution, etc 2019
300 ## - PHYSICAL DESCRIPTION
Extent xvii, 117 p.
500 ## - GENERAL NOTE
General note Gupta, Manish K., Thesis supervisor
Student ID No. 201221014
Thesis (Ph.D.) -Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, 2019
520 ## - SUMMARY, ETC.
Summary, etc Bio-computing uses the complexes of biomolecules such as DNA (Deoxyribonucleic acid), RNA (Ribonucleic acid) and proteins to perform the computational processes for encoding and processing the data. In 1994, L. Adleman introduced the field of DNA computing by solving an instance of the Hamiltonian path problem using the bunch of DNA sequences and biotechnology lab methods. An idea of DNA hybridization was used to perform this experiment. DNA hybridization is a backbone for any computation using the DNA sequences. However, it is also cause of errors. To use the DNA for computing, a specific set of the DNA sequences (DNA codes) which satisfies particular properties (DNA codes constraints) that avoid cross-hybridization are designed to perform a particular task. Contributions of this dissertation can be broadly divided into two parts as 1) Designing the DNA codes by using algebraic coding theory. 2) Codes for DNA data storage systems to encode the data in the DNA. The main research objective in designing the DNA codes over the quaternary alphabets {A, C, G, T}, is to find the largest possible set of M codewords each of length n such that they are at least at the distance d and satisfies the desired constraints which are feasible with respect to practical implementation. In the literature, various computational and theoretical approaches have been used to design a set of DNA codes which are sufficiently dissimilar. Furthermore, DNA codes are constructed using coding theoretic approaches using fields and rings. In this dissertation, one such approach is used to generate the DNA codes from the ring R = Z4 + wZ4, where w2 = 2 + 2w. Some of the algebraic properties of the ring R are explored. In order to define an isometry from the elements of the ring R to DNA, a new distance called Gau distance is defined. The Gau distance motivates the distance preserving map called Gau map f. Linear and closure properties of the Gau map are obtained. General conditions on the generator matrix over the ring R to satisfy reverse and reverse complement constraints on the DNA code are derived. Using this map, several new classes of the DNA codes which satisfies the Hamming distance, reverse and reverse complement constraints are given. The families of the DNA codes via the Simplex type codes, first order and rth order Reed-Muller type codes and Octa type codes are developed. Some of the general results on the generator matrix to satisfy the reverse and reverse complement constraints are given. Some of the constructed DNA codes are optimal with respect to the bounds on M, the size of the code. These DNA codes can be used for a myriad of applications, one of which is data storage. DNA is stable, robust and reliable. Theoretically, it is estimated that one gram of DNA can store 455 EB (1 Exabyte = 1018 bytes). These properties make the DNA a potential candidate for data storage. However, there are various practical constraints for the DNA data storage system. In this work, we construct DNA codes with some of the DNA constraints to design efficient codes to store data in DNA. One of the practical constraints in designing DNA codes for storage is the repeated bases (runlengths) of the same DNA nucleotides. Hence, it is essential that each DNA codeword should avoid long runlengths. In this thesis, codes are proposed for data storage that will dis-allow runlengths of any base to develop DNA data storage error-free codes. A fixed GC-weight u (the occurrence of G and C nucleotides in a DNA codeword) is another requirement for DNA codewords used in DNA storage. DNA codewords with large GC-weight lead to insertion and deletion (indel) errors in DNA reading and amplification process thus, it is crucial to consider a fixed GCweight for DNA code. In this work, we propose methods that generate families of codes for the DNA data storage systems that satisfy no-runlength and fixed GC-weight constraints for the DNA codewords used for data storage. The first is the constrained codes which use the quaternary code and the second is DNA Golay subcodes that use the ternary encoding. The constrained quaternary coding is presented to generate DNA codes for the data storage. We give a construction algorithm for finding families of DNA codes with the no-runlength and fixed GC-weight constraints. The number of DNA codewords of fixed GC-weight with the no-runlength constraint is enumerated. We note that the prior work only gave bounds on the number of such codewords while in this work we count the number of these DNA codewords exactly. We observe that the bound mentioned in the previous work does not take into account the distance of the code which is essential for data reliability. Thus, we consider distance to obtain a lower bound on the number of codewords along with the fixed GC-weight and no-runlength constraints. In the second method, we demonstrate the Golay subcode method to encode the data in a variable chunk architecture of the DNA using ternary encoding. N.Goldman et al. introduced the first proof of concept of the DNA data storage in 2013 by encoding the data without using error correction in the DNA which motivated us to implement this method. While implementing this method, a bottleneck of this approach was identified which limited the amount of data that can be encoded due to fix length chunk architecture used for data encoding. In this work, we propose a modified scheme using a non-linear family of ternary codes based on the Golay subcode that includes flexible length chunk architecture for data encoding in DNA. By using the Golay ternary subcode, two substitution errors can be corrected. In a nutshell, the significant contributions of this thesis are designing DNA codes with specific constraints. First, DNA codes from the ring using algebraic coding by defining a new type of distance (Gau distance) and map (Gau map) are proposed. These DNA codes satisfy reverse, reverse complement and complement with the minimum Hamming distance constraints. Several families of these DNA codes and their properties are studied. Second, DNA codes using constrained coding and Golay subcode method are developed that satisfy norunlength and GC-weight constraints for a DNA data storage system.
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Computer network architectures
Topical term or geographic name as entry element Logic design
Topical term or geographic name as entry element Computer vision
Topical term or geographic name as entry element Artificial intelligence
Topical term or geographic name as entry element Coding theory and cryptology
Topical term or geographic name as entry element Computer programming
Topical term or geographic name as entry element Software development
Topical term or geographic name as entry element Image processing and Computer vision
Topical term or geographic name as entry element Computer graphics
Topical term or geographic name as entry element Molecular computers
700 ## - ADDED ENTRY--PERSONAL NAME
Personal name Gupta, Manish K.
856 ## - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier http://drsr.daiict.ac.in/handle/123456789/886
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Koha item type Thesis and Dissertations
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Permanent Location Current Location Date acquired Full call number Barcode Date last seen Koha item type
          DAIICT DAIICT 2019-03-27 621.391 LIM T00759 2021-02-04 Thesis and Dissertations

Powered by Koha