【正文】
he human DNA supposedly accounts for less than 10% (perhaps less than 5%) of the entire human genome. ? The HGP strategy, . sequencing everything, could be considered as lavishness with resources since the main part of the information lacks relevance. ? The scope of the strategy by Venter and coworkers is to focus the investigations to messenger ribonucleic acid (mRNA) instead of DNA. The point of using mRNA is that it does not include any noncoding DNA. The mRNA molecule can be isolated and used as a template to synthesize a plementary DNA (cDNA) strand, which can then be used to locate the corresponding genes on a chromosome map. ? By using Venter180。s method an inplete copy of the gene, called expressed sequence tag (EST), is acquired. ESTs are useful for localizing and orienting the mapping and sequence data reported from many different laboratories and serve as landmarks on the developing physical map of the human genome. Whole Genome Shotgun Sequencing ? Venter and coworkers success in 1995 of sequencing the H. influenzae bacterium introduced a method called whole genome shotgun sequencing. The shotgun method involves randomly sequencing tiny cloned sections of the genome, with no foreknowledge of where on a chromosome the section originally came from. ? The partial sequences obtained are then reassembled to a plete sequence by use of puters. The advantage with this method is that it eliminates the need for timeconsuming mapping. ? By peting (and cooperating) the governmentally financed human genome project (HGP) and the private biotechnology pany Celera has pleted a reference DNA sequence of the human genome. Both parties made their information simultaneously available in February 2023, by publishing it in on the Inter and in the scientific journals Nature and Science. Whole Genome Shotgun Sequencing ? After the sequence is shotgunned the 10 million fragments of the genomic jigsaw puzzle need to be repiled into the readable base pairs in the proper order. ? This method will be pleted using a 10x redundancy to eliminate errors and reduce the possibility of having misses any targeted regions. ? The Celera Assembler is one of the core petencies and makes this Herculean task possible. ? The first pass through the data the shotgunned fragments are pared against each other and equivalent sequences greater than 40 base pairs long identified. ? These 40 base pairs matches are statistically impossible to occur by chance. These matches are then determined to be true or repeat induced. True matches are overlapping sections and are the desired fragments。 repeatinduced fragments occur in multiple locations of the genome and do not belong together. Whole Genome Shotgun Sequencing ? The assembler then searches for overlapping fragments that have a mon sequence and are not contested elsewhere in the dataset. ? The uncontested data is assembled into unitigs containing approximately 30 fragments. ? These assembled unitigs are 99 % accurate and repeats are filtered out using the Discriminator algorithm. ? Unitigs passing this filter are identified and renamed Uuntigs that are ready for ordering. ? The scaffolding stage starts and the order found by looking at the mate pairs and anizing these into contigs. By constantly looking at these contigs and looking at the orientation the scaffold bee plete except for some sequencing gaps. ? This strategy is repeated until the gaps are filled using the Discriminator algorithm and a method using sequence “rocks” and “pebbles”. Whole Genome Shotgun Sequencing ? As HGP has been making public the incremental sequence the shotgun approach utilized this data to help eliminate errors and speed the scaffolding process. Sequence Gaps Brown. Genomes 2 Advances ? The following advances in robotics and automation reduced the labor by 80% while bining the microbiological advances: – Development of PerkinElmer (ABI PRISM 3700) gene sequence. – 1000 sample per day – 15 minutes instead of 8 hours for first automated sequencers – A parallel system of 300 sequencers ($300,000 each) – Use of superputers to assemble fragments ? Development of process support instrumentation to process 100 K template preps and 200 K sequence reactions per day. ? 24 hour per day unattended operation of sequencers Map of Chromosome 16 Advances ? In addition to the above advances the field of putational biology (bioinformatics) became increasingly important as the software and processors required to assemble a puzzle of this size still needed to be developed. ? The solution came in advances in processor speeds that have doubled every 18 months and the development of better overlap detection algorithms. It is expected that using 100 worked workstations the entire genome could be assembled in 3060 putational days. Costs of Human Genomic Sequencing ? Clone by clone ? $ per finished base ? $130 million per year for 7 years ? Total $900 million spent by end of 2023 ? Shotgun ? $ per raw base ? $130 million for 3 years would provide ? 10 coverage/redundancy plus an additional $90 million for informatics Human Genome Projects Progress Algorithms ? Human Genome Project: GigAssembler ? Celera: Celera Assembler ? Euler Algorithm Gigassemler Assembly Process Overview: ? The assembly proceeds according to the following major steps: ? Decontaminating and repeat masking the sequence. ? Alignment of mRNA, EST, BAC end, and paired plasmid reads against genomic fragments. On a cluster of one hundred 800 MhZ Pentium III CPUs running Linux this takes about three days. ? Creating an input directory structure with using Washington University map and other data. This step takes about an hour on