The U.S. Department of Energy Joint Genome Institute (DOE JGI) is among the world leaders in sequencing the genomes of microbes, focusing on their potential applications in the fields of bioenergy and environment. As a national user facility, the DOE JGI is also focused on developing tools that more cost-effectively enable the assembly and analysis of the sequence that it, as well as other genome centers, generates.
Despite tremendous advances in cost reduction and throughput of DNA sequencing, significant challenges remain in the process of efficiently reconstructing genomes. Existing technologies are good at cranking out short fragments (reads) of DNA letters that are computationally stitched back together (assembled) into longer pieces, so that the order of those letters can be determined and the function of the target sequence discerned. However, genome assembly, the equivalent of trying to put together a multi-million piece jigsaw puzzle without knowing what the picture on the cover of the box is, remains challenging due to the very large number of very small pieces, which must be assembled using current approaches.
As reported May 5 online in the journal Nature Methods, a collaboration between the DOE JGI, Pacific Biosciences (PacBio) and the University of Washington has resulted in an improved workflow for genome assembly that the team describes as “a fully automated process from DNA sample preparation to the determination of the finished genome.”
The technique, known as HGAP (Hierarchical Genome Assembly Process), uses PacBio’s single molecule, real-time DNA sequencing platform, which generates reads that can be up to tens of thousands of nucleotides long, even longer than those provided by the workhorse technology of the Human Genome Project era, the Sanger sequencing technology, which produced reads of about 700 nucleotides. The Sanger process involved creating multiple DNA libraries, conducting multiple runs, and combining the data, so that gaps in the code were covered and accuracies of a DNA base assignment were very high. Post-Sanger methods still typically require multiple libraries and often a mix of technologies to produce optimal results. Instead, with HGAP, “only a single, long-insert shotgun DNA library is prepared and subjected to automated continuous long-read SMRT sequencing, and the assembly is performed without the need for circular consensus sequencing,” the team reported. Via New cost-effective genome assembly process developed.