InvestorsHub Logo

Paulieme

01/29/14 6:08 PM

#304 RE: Paulieme #301

Tuesday, January 28, 2014(PacBio Blog) At Plant & Animal Genome Workshop, Users Showcase Projects Enabled by SMRT Sequencing
Earlier this month, we hosted a workshop at the International Plant & Animal Genome (PAG) conference in San Diego entitled “A SMRT® Sequencing Approach to Reference Genomes, Annotation, and Haplotyping.” PacBio users presented data on various projects that have benefited from long-read sequence data, including several that had previously been attempted with short-read technologies without success. We were delighted to see reports on newer features of SMRT Sequencing, including full-length isoforms, automated haplotyping, and more. Here’s a recap, as well as links to video recordings of the presentations:

Chongyuan Luo, a scientist from Joe Ecker’s lab at the Salk Institute for Biological Studies, offered a presentation on genomic and epigenetic variations across model organism Arabidopsis thaliana. He used SMRT Sequencing to resolve three strains of the plant, sequencing each to more than 50x coverage. Compared to short-read sequence data, PacBio® data correctly identified more than 200,000 SNPs previously missed in each strain; most were enriched in the peri-centromere region. Because of that, Luo recommends using only PacBio data for a genome assembly. His team also achieved their goal of detecting structural variants that have been underrepresented by genome assemblies from short-read data.
Watch recording: Resolving the Complexity of Genomic and Epigenomic Variations in Arabidopsis

Shane Brubaker, bioinformatics director at Solazyme, Inc., talked about the need for a high-quality reference genome for a strain of algae that his company uses to produce renewable oil. The company first tried short-read sequence data, but couldn’t get through the GC-rich genome. Using PacBio sequencing, the team not only fully sequenced the genome — assembling it into just a few contigs per chromosome that even included centromere sequence — but also built a tool to perform automated haplotyping and later conducted allele-specific expression analysis. The final assembly accurately represented the diploid genome, Brubaker said, noting that CCS reads alone exceeded Sanger quality at far lower cost. “You can now get a reference assembly that is essentially finished quality without doing all those gap-closing steps,” he said. Watch recording: Assembly, Haplotyping, and Annotation of a High GC Algal Genome

Allen Van Deynze, director of research at the University of California, Davis, Seed Biotechnology Center, spoke about a spinach genome sequencing project. The plant is important in its own right, but sequencing became more urgent in an effort to find genes that confer resistance against a downy mildew that is destructive to the crop. Van Deynze reported a draft genome sequence using SMRT Sequencing (Quiver polishing was still underway at the time of the workshop) that already showed a marked improvement in N50 contig length compared to a previous short-read assembly of the genome. Watch recording: A De Novo Draft Assembly of Spinach Using Pacific Biosciences Technology

From USDA’s Agricultural Research Service, molecular biologist Sean Gordon discussed the need for long-read sequencing to map an organism’s transcriptome. His team analyzed the wood-decaying fungus Plicaturopsis crispa first with short reads and found that they were missing exons and other important information. “There is no path from short reads to accurate isoforms,” he said. They switched to SMRT Sequencing so they could observe, rather than infer, full-length transcripts. Gordon showed one particular gene to illustrate the success of the approach: with short-read sequencing, this gene was predicted to have six isoforms; with PacBio, the team observed and confirmed 118 isoforms instead. He also noted that generating a transcriptome from PacBio data does not require a reference genome. His team did have a reference for P. crispa, however, which they used to double-check the PacBio results and found them to be highly accurate. Gordon said that the long reads also enabled unexpected findings, such as abundant read-through transcription, in which multiple ORFs occurred in a transcript. (The recording is not available at this time.)

Finally, our own Edwin Hauw spoke about the PacBio technology roadmap (link: http://blog.pacificbiosciences.com/2014/01/looking-ahead-2014-pacbio-technology.html) for the coming year. Sample prep improvements are expected to reduce input DNA requirements (down to 10-100 ng), improve preps for longer insert sizes, and streamline kits. A new C4 chemistry is expected to extend average read lengths to 10-15 Kb this year, with the long-term goal of generating about 1.6 Gb per SMRT Cell. PacBio is also planning to focus on data analysis improvements, including an easy-to-use GUI for isoform sequencing and tools for viral minor-variant detection and long-amplicon haplotype analysis. In addition, Hauw told users that PacBio is working to provide better assemblers for diploid de novo genomes or low-coverage genomes, as well as a faster version of Quiver and regional methylation detection, including 5mC without bisulfite conversion, with an expected release date later in the year. Watch recording: SMRT Sequencing Road Map

http://blog.pacificbiosciences.com/2014/01/at-plant-animal-genome-workshop-users.html

Paulieme

01/31/14 6:04 PM

#311 RE: Paulieme #301

Whole story from incomplete link 0n 1/28/14 PacBio Demos First De Novo Animal Genome as it Plans
Longer Reads, Increased Throughput
January 28, 2014
By Monica Heger
Researchers have sequenced and de novo assembled theDrosophila melanogaster genome on Pacific
Biosciences' RS II — the first time an animal genome has been sequenced and assembled solely with
PacBio technology — and have produced a genome with fewer gaps and longer contigs than the current
reference.
Sergey Koren, a bioinformaticist at the National Biodefense Analysis and Countermeasures Center and
University of Maryland, developed software for error correction of PacBio reads dubbed PBcR, and
presented on the Drosophila assembly at the International Plant and Animal Genome meeting in San
Diego earlier this month.
Additionally, the company is planning this year to increase its throughput four-fold to achieve 1 gigabase
of data per SMRT cell and average read lengths greater than 10-15 kilobases, as well as improvements to
sample prep and new methods for assembly of diploid genomes.
The Drosophila genome, estimated to be around 140 megabases, but potentially as large as 220
megabases, was sequenced in six days using 42 SMRT cells to 90-fold coverage and produced average
read lengths of 10 kilobases. Using the Celera assembler, the researchers constructed a haploid assembly
in 128 contigs with an N50 length of 15 megabases and a maximum contig length of 24.6 megabases.
Total turnaround time from sample to final assembly was six weeks.
PacBio scientists collaborated on the project with researchers involved in the
Berkeley Drosophila Genome Project, and researchers from the University of Maryland and the
University of Manchester.
According to Sue Celniker, co-director of the Berkeley Drosophila Genome Project, the PacBio-only
assembly is a huge improvement over the reference genome, which is currently in its fifth iteration.
Researchers involved in the Berkeley Drosophila Genome Project have spent over 10 years working on
the reference genome using a combination of Sanger sequencing, BAC clones, and other manual and
labor-intensive approaches. Yet, using just one next-gen sequencing technology, and over just six weeks,
the PacBio technology was able to piece together regions that have proved particularly troublesome, like
heterochromatin and the Y chromosome, she said.
"There's been some persistent repeats that we couldn't get through, that [PacBio] did," she told In
Sequence. "Having those very long reads allows you to get through large arrays of repeats."
Researchers are still evaluating and comparing the PacBio assembly to the reference, so Celniker said she could not precisely say how many of the remaining gaps the PacBio assembly was able to close.--- However, it is already clear that in some cases the long reads were able to generate a more contiguous
sequence than the reference. For instance, chromosome 2R was reduced to two pieces in the PacBio
haploid assembly from 27 pieces in the reference. Chromosome 2L was reduced to between 4 and 6
pieces from 6 pieces, and chromosomes 3L and 3R were reduced to 1 and 3 pieces in the PacBio
assembly from 22 and 15 pieces, respectively.
Additionally, in the most recent release of the Drosophila reference genome, only around 1 percent of
chromosome Y is represented. While the BDGP researchers have since assembled around 7.5 percent of
the Y chromosome, the team anticipates that more than half of the Y chromosome will be assembled with
the PacBio data.
Part of the reason for less Y representation in the reference genome is that the fly DNA was taken from
embryos, so there is no way to know whether male or female DNA was being used, Casey Bergman, a
senior lecturer in computational and evolutionary biology at the University of Manchester, told IS. But in
the PacBio collaboration, only male flies were used, he said.
Bergman's lab became involved with the project last summer after it released a dataset generating wholegenome shotgun sequences using PacBio technology of the Drosophila reference strain as well as
Illumina sequences that it used to error-correct the PacBio reads. The company contacted Bergman to
collaborate on generating data and doing de novo assembly using its newer sequencing chemistry.
Bergman said that this Drosophila genome validates PacBio's technology for use inde novo assembly, and
shows the value of long reads. Genomes that have been assembled using short-read sequencing
technology, like the panda genome, are put together in contigs that are tens of kilobases, he said. But,
the Drosophila has an N50 of 12 megabases. "That is chromosome-sized segments. It is what was
declared finished for many genomes 10 years ago, and is of much higher contiguity and sequence
quality," he said.
Short-read sequencing technology is valuable for applications like identifying genes or fragments of
genes, and enables many genomes to be sequenced cost-effectively — but it doesn't give you the longrange
architecture, Bergman said.
The PacBio-only assembly also has some advantages over the hybrid PacBio/Illumina assembly,
Bergman said.
One problem with error correction, he said, is that Illumina technology does not sequence well through
repetitive regions, so the Illumina-corrected reads in those repetitive regions are not as good. "You don't
really get the gain in the regions of the genome where you need them for the long-range assemblies," he
said.
Adam Phillippy with the National Biodefense Analysis and Countermeasures Center, who worked on the
assembly, agreed. In theory, a hybrid assembly approach is beneficial because it combines two orthogonal
technologies and can take advantage of the strengths of both, he said. And indeed, in many genomic regions, a hybrid assembly works well. But, since short reads do not align well to certain regions, like
repeats, it is difficult to use short reads for error correction in those regions.
"Short reads are notoriously hard to map against a repetitive genome," Phillippy said. "It's much easier to
align long reads to long reads, so you assemble the repeats much more effectively."
Phillippy and Koren last year published a study in Genome Biology, estimating a cost of about $1,000
for de novo sequencing and assembly of microbes with PacBio technology. Additionally, the researchers
compared self-correction to hybrid correction and found that self-correction was often better in terms of
accuracy and contiguity.
Phillippy said that he expects these conclusions for microbial genomes to carry over to larger genomes,
especially as throughput and read lengths continue to increase, and the Drosophila genome is the first
evidence of that.
Further improvements
Looking ahead, Jonas Korlach, PacBio's CSO, said that the company is planning further improvements to
its read lengths and throughput this year.
The company plans to increase throughput to 1 gigabase per SMRT cell and average read lengths to
greater than 10-15 kilobases. An increase in read length will be achieved by several factors, Korlach said.
The company continues to study different polymerases and is working out ways to optimize the signal
from the nucleotide.
For instance, in its latest sequencing chemistry, P5-C3, the company incorporated a protective scaffolding
strategy, which reduces photo damage to the polymerase and enables longer reads. Korlach said that the
company continues to improve upon this strategy. Additionally, the company has found that "nicks or
damage to the DNA template can stall the polymerase and thereby reduce read length," so researchers are
looking at ways to do more "efficient DNA damage repair during sample preparation."
Korlach added that the company is also looking at ways to improve loading efficiency, which would also
increase throughput. Each SMRT cell contains 150,000 zero-mode waveguides, each of which has the
potential to be occupied by a polymerase and template complex. However, the current method of loading
is limited by Poisson statistics, meaning that only about one-third of the ZMWs will be occupied with one
polymerase-template complex with the remainder occupied by either none or more than one complex,
Korlach said. "However, we believe through improvements in loading, we can at least double the amount
that we are currently loading per SMRT cell."
These improvements, which will be delivered over the year, will come in the form of software upgrades
and a new sequencing kit, Korlach said. None will require a hardware upgrade or installation.
Monica Heger tracks trends in next-generation sequencing for research and clinical applications for
GenomeWeb's In Sequence and Clinical Sequencing News. E-mail Monica Heger or follow her GenomeWeb
Twitter accounts at @InSequence and@ClinSeqNews. -- http://files.pacb.com/pdf/MediaCoverage_Demos_FirstDeNovoAnimalGenome.pdf