The link to my team's video presentation is here
The modern DNA revolution was brought about in 2001 by the completion of the human reference genome, which consisted of piecing together small fragments of DNA sequence into 24 full chromosomes and cost about $1 billion. Since 2001, computer scientists have made considerable headway on new algorithms for assembling genomes, and we will need to continue to improve our methods as we strive to complete reference genomes for all extant and extinct (i.e., Neanderthals) organisms.
In this project, my team designed and programmed a computing algorithm (using Python3) to sequence the target genome and produce an assembly via contigs and scaffolds based on paired-read data of DNA sequence. We also illustrated the algorithm and next steps for further improvement in a video presentation.
Genome sequencing is the process of decoding the sequence of nucleotides from some sample’s genome. Genome sequencing produces sequences or reads. Due to technical limitations, these reads are short. The most widely used sequencing platform produces reads that are 200 nucleotides long. Newer technologies are producing longer reads, but they are currently expensive and error-prone.
Genome assembly is the process of taking a large number of sequences and putting them back together to create a representation of the original genome.