LT.Swing trade!
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Finishing PacBio assembly, and a presentation by our local PacBio representative
Posted on November 4, 2014 by Marc Robinson-Rechavi
In this first session of the course “Sequence a genome” since we sent DNA to the sequencing facility, we have had the visit of Gerrit Kuhn, who supports PacBio in our area, and who gave a nice presentation of how PacBio works, the workflow, and of the specific requirements during sample preparation and how it influences data quality. While the sleek animations are expected from a corporate presentation, kudos to Gerrit for being open and giving insight into the advantages and specificities of the PacBio workflow, and answering all questions straightforwardly. (Updated paragraph.)
( A new adventure: PacBio sequencing and RNA-seq in the classroom)
Gerrit Kuhn of PacBio Switzerland talking some more to our students.
After our first pass of PacBio, we have 9 contigs, with the longest at 5’815’706 bp. We have asked the students to use Mauve to look at our assembly, and compare it to a version of the Pseudomonas veronii genome in NCBI, which has 63 contigs.
Screenshot of Mauve comparison of our contigs (top) with the NCBI genome (bottom). Red lines separate contigs, colored blocks are recognized as similar between the two genomes.
We have two very large contigs, unitig-3 and -5, with strong similarity to the NCBI genome, and some other contigs with almost none. Notice the spaghetti of relations between contig blocks, due to the fragmented assembly in the NCBI genome. We also compared our contigs between themselves. We are thus able to eliminate 3 contigs which are small and entirely redundant with larger contigs within our assembly. We can also find that the two largest contigs have enough overlap to join them, which provides us a main chromosome of 6.8 Mb. We are not able to circularize it, though. Two other groups of contigs can be joined into additional molecules, plasmids or secondary chromosomes, to be determined at annotation (in a few weeks). A group of 3 contigs groups into an apparent mega-plasmid, which we can circularize thanks to similarity at the ends of the contigs; this also has high similarity to NCBI contigs. The other potential plasmid, formed of two contigs which cannot circularize, has no similarity to the NCBI sequence, and ends with potential transposons (sequences found elsewhere in the genome).
Thus overall, the PacBio experiment seems to have worked: the DNA extracted by our students and sequenced in our facility has produced a usable assembly. In the next weeks we will annotate this chromosome and these two plasmids.
http://www3.unil.ch/wpmu/sequenceagenome/2014/11/04/finishing-pacbio-assembly-and-a-presentation-by-our-local-pacbio-representative/
US National Library of Medicine National Institutes of Health Create File
BMC Genomics. 2014 Oct 20;15:914. The genomic landscape of the verrucomicrobial methanotroph Methylacidiphilum fumariolicum SolV.
Anvar SY1, Frank J, Pol A, Schmitz A, Kraaijeveld K, den Dunnen JT, Op den Camp HJ.
Author information 1Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands. s.y.anvar@lumc.nl.
Abstract
BACKGROUND: Aerobic methanotrophs can grow in hostile volcanic environments and use methane as their sole source of energy. The discovery of three verrucomicrobial Methylacidiphilum strains has revealed diverse metabolic pathways used by these methanotrophs, including mechanisms through which methane is oxidized. The basis of a complete understanding of these processes and of how these bacteria evolved and are able to thrive in such extreme environments partially resides in the complete characterization of their genome and its architecture.
RESULTS: In this study, we present the complete genome sequence of Methylacidiphilum fumariolicum SolV, obtained using Pacific Biosciences single-molecule real-time (SMRT) sequencing technology. The genome assembles to a single 2.5 Mbp chromosome with an average GC content of 41.5%. The genome contains 2,741 annotated genes and 314 functional subsystems including all key metabolic pathways that are associated with Methylacidiphilum strains, including the CBB pathway for CO2 fixation. However, it does not encode the serine cycle and ribulose monophosphate pathways for carbon fixation. Phylogenetic analysis of the particulate methane mono-oxygenase operon separates the Methylacidiphilum strains from other verrucomicrobial methanotrophs. RNA-Seq analysis of cell cultures growing in three different conditions revealed the deregulation of two out of three pmoCAB operons. In addition, genes involved in nitrogen fixation were upregulated in cell cultures growing in nitrogen fixing conditions, indicating the presence of active nitrogenase. Characterization of the global methylation state of M. fumariolicum SolV revealed methylation of adenines and cytosines mainly in the coding regions of the genome. Methylation of adenines was predominantly associated with 5'-m6ACN4GT-3' and 5'-CCm6AN5CTC-3' methyltransferase recognition motifs whereas methylated cytosines were not associated with any specific motif.
CONCLUSIONS: Our findings provide novel insights into the global methylation state of verrucomicrobial methanotroph M. fumariolicum SolV. However, partial conservation of methyltransferases between M. fumariolicum SolV and M. infernorum V4 indicates potential differences in the global methylation state of Methylacidiphilum strains. Unravelling the M. fumariolicum SolV genome and its epigenetic regulation allow for robust characterization of biological processes that are involved in oxidizing methane. In turn, they offer a better understanding of the evolution, the underlying physiological and ecological properties of SolV and other Methylacidiphilum strains.
---http://www.ncbi.nlm.nih.gov/pubmed/25331649
Check this out! Reversible Positioning of Single Molecules inside Zero-Mode Waveguides
(Pacific Biosciences, Stephen W. Turner , Jonas Korlach) ----------Abstract
We have developed a hybrid nanopore/zero-mode waveguide device for single-molecule fluorescence and DNA sequencing applications. The device is a freestanding solid-state membrane with sub-5 nm nanopores that reversibly delivers individual biomolecules to the base of 70 nm diameter waveguides for interrogation. Rapid and reversible molecular loading is achieved by controlling the voltage across the device. Using this device we demonstrate protein and DNA loading with efficiency that is orders of magnitude higher than diffusion-based molecular loading.
[Available on 2015/9/11]
http://www.ncbi.nlm.nih.gov/m/pubmed/25209321/ http://pubs.acs.org/doi/abs/10.1021/nl503134x
Nanopore sequencing using charge blockade labels
(US Patent #8652779 B2) Publication number US8652779 B2
Publication type Grant
Application number US 13/893,891
Publication date Feb 18, 2014
Filing date May 14, 2013
Priority date Apr 9, 2010
Also published as US20130240359
Publication number 13893891, 893891, US 8652779 B2, US 8652779B2, US-B2-8652779, US8652779 B2, US8652779B2
Inventors Stephen Turner, Jeffrey Wegener
Original Assignee Pacific Biosciences Of California, Inc.
Export Citation BiBTeX, EndNote, RefMan
Patent Citations (31), Non-Patent Citations (54), Referenced by (1),
Abstract
The invention relates to devices and methods for nanopore sequencing. The invention includes compositions and methods of nucleic acid sequencing using a single polymerase enzyme complex comprising a polymerase enzyme and a template nucleic acid attached proximal to a nanopore, and nucleotide analogs in solution comprising charge blockade label that are attached to the polyphosphate portion of the nucleotide analog such that the charge blockade labels are cleaved when the nucleotide analog is incorporated into a growing nucleic acid and the charge blockade label is detected by the nanopore to determine the presence and identity of the incorporated nucleotide and thereby determine the sequence of a template nucleic acid.
Images(24)
(US20100331194 * Apr 9, 2010 Dec 30, 2010 Pacific Biosciences Of California, Inc. Nanopore sequencing devices and methods)! http://www.google.com/patents/US8652779
Wednesday, October 29, 2014--- ‘Revolutionizing HLA Typing': Uppsala’s Ulf Gyllensten on How Long Reads Give Access to New Areas of the Human Genome
In a recent interview with Theral Timpson — part of Mendelspod’s series on long-read sequencing — Ulf Gyllensten, a scientist at Uppsala University, spoke about using PacBio® technology for HLA typing, human genome studies, transcriptomics, and more.
Based in the medical genetics and genomics department, Gyllensten focuses on two areas: using systems biology to study biological variation in human physiology and studying the epidemiology of human papilloma virus and its genetic link to cervical cancer. He also works with the National Genomics Infrastructure, a national core facility in Sweden for genotyping and DNA sequencing, where he has access to all commercially available sequencing platforms.
In the podcast, Gyllensten spoke about advances in screening for HPV, his predictions for the widespread use of genome sequencing in the clinic, and applications using Single Molecule, Real-Time (SMRT®) Sequencing for human genome studies.
Unambiguous HLA typing
“PacBio is really revolutionizing HLA typing,” Gyllensten said, noting that long-read sequencing addresses the ongoing challenge of linking polymorphisms in distant parts of the HLA genes and distinguishing alleles. “I have been in that field for quite a while. … Finally, we have a technology that will resolve all the ambiguities in HLA typing, which will have a huge impact.”
Gyllensten said the major advantage of SMRT Sequencing for the HLA region is its ability to completely sequence all HLA genes (both class 1 and class 2), getting all the introns and exons for each in a single long read. He believes PacBio sequencing, with its rapid turnaround time, will ultimately become “the key technology” for matching donors and recipients in organ transplantation.
Asked by Timpson whether it’s really possible to achieve 100 percent accuracy for these complicated regions using SMRT Sequencing, Gyllensten replied that it was. “The fact that you can sequence a single allele — that is, a single chromosome by itself and then the other chromosome in the individual — and separate them down to the single base is really the most accurate way you can ever do HLA typing,” he said.
Applications in human genomics
Gyllensten told Timpson that his team expected the primary use of PacBio sequencing to be for smaller genomes, such as getting complete de novo assemblies for pathogens. While they do routinely handle those projects, he was surprised to find robust demand for using the sequencer to analyze larger genomes — including human — as well. “Before having the PacBio instrument installed and running we hadn’t thought about some of these things,” he said. “But it’s opening a lot of opportunities.”
He noted that clinical research, in particular, is a good fit for SMRT sequencing. “Focusing in on particular regions actually suits clinical genetics and clinical immunology because they don’t want the whole genome. They have their favorite genes, favorite targets,” Gyllensten said. “Those can then be accessed through the PacBio [system], and the information that is coming out is really information that could not come out of any other sequencing technology at this point.”
Researching treatment resistance and cancer biology in individuals with leukemia is one example of where the PacBio platform can make a difference. SMRT Sequencing can more accurately cover the fusion gene that is responsible for the nature of the leukemia and its development, Gyllensten said. In addition, he believes PacBio’s technology offers the potential for early detection of new mutations linked to treatment resistance. Reliable early detection could one day make a difference in clinicians’ ability to change a patient’s therapy at the earliest sign of resistance, he noted.
“It all has to do with the long read because you need to sequence maybe 2 or 3 kb around the particular breakpoint in this patient to figure out whether they have a resistance mutation or not,” Gyllensten said, “and there is no other technology that can do that.”
A view into genomic dark matter
Gyllensten told Timpson that as people begin to figure out how much important information is being missed in genome sequences, they will move to a platform that offers more complete views of biology.
Transcriptomics is one place where SMRT Sequencing makes a real difference. “Very few studies have been done on complete transcriptome data,” Gyllensten said. “I think when people start to see that, they will eventually move into … long-read [sequencing].”
A comprehensive view of the human genome will also motivate people to move away from short-read sequencing. At some point, he said, scientists will look at all the short-read data that has been amassed for human genome studies and “realize that a lot of the questions will still not have been answered. They will ask, is the answer hidden in that 15-20 percent of the genome we still haven’t covered with the present technology?” Gyllensten said. “Then there will be a rush to understand the remaining [portion] of the genome.”
Validating other technologies
According to Gyllensten, whose core facility still runs a number of Sanger sequencers, PacBio sequencing has been gaining ground as the preferred technology for validating results found by short-read platforms. “We are seeing more and more requests to do it not with the Sanger, but with the PacBio [sequencer],” he said. “You need to validate [with] different technology and PacBio is really well suited for that.” http://blog.pacificbiosciences.com/2014/10/revolutionizing-hla-typing-uppsalas-ulf.html
Going to Great Read Lengths
10/27/2013 Janelle Weaver, Ph.D.
In a revival of the era of finished genomes, scientists are using the long reads offered by third-generation sequencing technologies to close gaps in genome assemblies. Can next-generation sequencing catch up? Janelle Weaver reports.
Next-generation sequencing has made it possible for scientists to sequence genomes faster and at a much lower cost than with Sanger sequencing, paving the way for the $1000 genome. But this approach sacrifices read length for speed, reducing average reads to about 100 base pairs instead of 800–900 base pairs using Sanger sequencing (1). Short read lengths make genome assembly more difficult because additional coverage (i.e., more overlapping sequence reads) is required to produce a comparable assembly (2).
But deeper coverage does not compensate for certain problems. For de novo assembly, repetitive sequences longer than the read length produce gaps, resulting in more fragmented assemblies in recent years than in the past. As a result, it’s more difficult to detect variation in repetitive regions, which may be important for understanding certain diseases.
“The frustrating thing about short-read data is that there’s not a lot of information content in a 100 base pair read,” says Kim Worley, a geneticist at the Human Genome Sequencing Center at Baylor College of Medicine. She pointed out that the current draft genome of the rhesus macaque—an important biomedical animal model—contains sequence gaps in up to 20% of its gene models.
“We have finished the human genome and the mouse genome,” she says. “But even those finished genomes have regions that are not completely contiguous and correct, and users of those data are always dissatisfied with those regions.”
To address this issue, Worley and her colleagues turned to the Pacific Biosciences (PacBio) RS platform, a third-generation sequencing technology that can perform single-molecule sequencing reactions in real time. The system produces average read lengths that span several thousand bases and maximum read lengths of up to 30,000 bases in some cases.
Those long sequence reads simplify genome assembly because they can span repeat regions, and, because no amplification of source DNA is required, there's also a reduction in certain sequencing artifacts and genome coverage biases. Because the PacBio RS platform produces long reads without GC-bias or systematic errors, it is uniquely suited for upgrading genome assemblies.
As reported previously in PLoS ONE (3), Worley and her colleagues developed an automated software tool called PBJelly, which aligns long PacBio reads to draft assemblies to close or improve gaps while preserving annotations. Applying this approach to four genomes—a simulated Drosophila melanogaster genome, the version 2 draft for Drosophila pseudoobscura, an assembly of the Assemblathon 2.0 budgerigar dataset, and a preliminary assembly of the sooty mangabey genome— the researchers addressed 63%–99% of gaps and were able to close 32%–69% and improve 12%–63%.
“We’re experiencing a renaissance and a revival of the era of finished genomes,” says Jonas Korlach, chief scientific officer at PacBio. “That was really the norm back in the days of Sanger sequencing, but when next-generation technologies came around, it was really almost abandoned because it was not possible or it was so cumbersome to close those genomes with Sanger sequencing.”
Playing Catch Up
In principle, PBJelly can be applied to long sequence reads produced by any platform. This feature may be important in the future when next-generation sequencing companies catch up with PacBio’s read lengths.
One move in this direction is the acquisition of the San Francisco-based startup Moleculo by Illumina. Technology developed by Moleculo allows large DNA fragments to be sequenced on standard next-generation sequencing Illumina systems for subsequent assembly into synthetic long reads. The short sequence reads originating from each molecule are assembled separately, and the end result is a full sequence of all the fragments. Essentially, short read data is reconstructed into long reads.
At the International Plant and Animal Genome Conference, a team of scientists reported that Moleculo technology could produce long, accurate DNA sequencing reads spanning 1.5–15 kilobases using the Illumina HiSeq2000 platform.
Another example of long-read technology is the 454 GS FLX+ system, which can deliver reads of up to 1000 base pairs. Right now, a research consortium is using this sequencing technology to analyze and assemble the RP11 human reference genome as part of an effort to close gaps and uncover novel genes in the genome sequence.
“One of the things that 454 has been known for is the highest-quality, longest-read sequencing on the market today,” says Todd Arnold, vice president for research and development at 454 Life Sciences, a Roche company. And the read length and throughput are only going to get better, he says. “What we strive for is to preserve our quality score as we increase the read length, because it’s very important to our customers.”
But according to Korlach, other existing technologies will never be able to catch up with PacBio. “There are fundamental technological differences and limitations that prevent other commercially available technologies from providing contiguous single reads of the lengths that we can provide,” he says.
Even so, one downside of the PacBio long-read technology is its high error rate. Although highly accurate sequencing results can be achieved through building consensus sequences, the PacBio RS instrument generates single-pass reads that average only 87–89% nucleotide accuracy.
“We’re working on improving that, but the accuracy will probably be lower than other existing technologies for a significant amount of time because our technology is fundamentally based on single-molecule, real-time detection,” says Edwin Hauw, the company's senior director of product management.
Putting Long Reads to the Test
At the University of Tokyo, computational biologist Michiaki Hamada isn’t too concerned about those error rates. “In my opinion, these high error rates do not raise serious issues, because most of the errors can be corrected by using short reads with low error rates, such as those produced by Illumina sequencers,” he says.
In a study, Hamada and his team developed a read simulator, called PBSIM, which captures the key features of PacBio reads. “Our long-term research goal is to develop a de novo assembler for long reads produced by, for example, PacBio sequencers,” says Hamada. “But there was no available simulator that targeted the specific generation of PacBio libraries.”
As reported last year in Bioinformatics (4), Hamada and his team used PBSIM to analyze 13 PacBio datasets. After conducting hybrid error correction and assembly tests for PacBio reads, they found that extensive assembly results can be obtained with a continuous long-read coverage depth of at least 15, in combination with a circular consensus sequencing coverage depth of at least 30. “PBSIM can be used not only in evaluating assemblers for PacBio sequencers, but also in experimental design for sequencing,” says Hamada.
In the end, because these gaps in reference genomes could contain genes involved with disease, capitalizing on long-read technology can make a big impact in the clinical realm. For example, in their study, Arnold and colleagues identified a region that might be involved in cancer development. “There was evidence for that gene that came out of earlier RNA sequence data, but this didn’t appear in the reference genome, so anyone who was doing resequencing studies wouldn’t see it,” says Arnold. “The more complete the reference library is, the better your ability to use this data in a positive fashion.”
http://www.biotechniques.com/news/Going-to-Great-Read-Lengths/biotechniques-341722.html#.VE-RehZoCmM
October 27, 2014 17:02 ET |New DNA Sequencing Facility Established in China Based on PacBio's SMRT Sequencing Platform
MENLO PARK, Calif. and TIANJIN, China, Oct. 27, 2014 (GLOBE NEWSWIRE) -- Pacific Biosciences of California, Inc., (Nasdaq:PACB) provider of the PacBio® RS II DNA Sequencing System, and Suqian Lakeside Pangu Gene Company (SLPC) announced that SLPC has established a new genomics facility with 1,500 square meters of laboratory space in the Tianjin Dong Li Lake Technology Park, dedicated to using PacBio Single Molecule, Real-Time (SMRT®) Sequencing as the major sequencing platform for its translational medicine research projects.
"Our goal is to become an innovative high-tech enterprise," stated founder and CEO of SLPC, Dr. Zhang Yaozhou. "I believe that PacBio's platform provides the most complete and accurate sequencing data, and we are committed to applying this technology toward the improvement of human health."
SLPC purchased four PacBio RS II Sequencing Systems earlier this year and recently installed them at their newly dedicated facility. The company ultimately intends to focus on applications such as molecular-based approaches for early diagnosis of cancers, disease recurrence control and drug efficacy evaluation.
Ram Laxman, Ph.D., President & General Manager, Asia Pacific for Pacific Biosciences commented: "SLPC's investment in the PacBio RS II systems for human and other large genome sequencing follows a growing trend of researchers adopting the SMRT Sequencing technology to generate highly accurate complete genomes."
http://globenewswire.com/news-release/2014/10/27/676936/10104682/en/New-DNA-Sequencing-Facility-Established-in-China-Based-on-PacBio-s-SMRT-Sequencing-Platform.html?f=22&fvtc=7
Comparative genome analysis of Wolbachia strain
wAu /// Results and discussion
Genomic DNA purity assessment
Approximate calculations based on quantitative PCR (qPCR) C(t) values for wAu and host
genes were performed to estimate the degree of contamination with host gDNA in wAu
gDNA samples extracted from cultured cells and whole adult flies. The estimated purity of
wAu gDNA was ~60% for the extract from cultured cells, and >90% for the extract from
whole adult flies. The latter is comparable to the figure of up to 97% reported previously [27]
using the same extraction method. There is no previous data on Wolbachia gDNA extraction
from cultured cells. One explanation for the lower purity could is that Wolbachia densities
may be lower within cultured cells than in vivo.
Genome sequencing and assembly
wAu genome sequencing was initially performed using the Illumina platform on gDNA
extracted from whole adult files. However, the resulting assembly was fragmented in the
regions of most interest, with scaffold positions uncertain. A second round of sequencing was
therefore performed using the PacBio RS II system to obtain longer reads in an attempt to
improve the assembly, using gDNA extracted from cultured cells rather whole adult files.
The Illumina data was used to correct errors in the PacBio reads, which assembled into a
single contig.
The achievement of a single contig assembly shows that PacBio represents an extremely
useful new sequencing platform for rapid generation of finished bacterial genome assemblies.
Furthermore, the generation of this single contig from a very small amount of DNA
(approximately 2 ng), containing a substantial amount of host DNA contamination (~40%),
suggests that PacBio is well suited to use in cases where it is hard to obtain a large amount of
gDNA, including obligate endosymbionts, like Wolbachia, that cannot be cultured outside of
host cells. The sequence generated was largely consistent with data produced using the
Illumina platform, with only one single nucleotide polymorphism (SNP) between the two
datasets. There were 88 indels relative to Illumina data; these were mostly single nucleotide,Conclusions
In this study, a methodology for conveniently extracting Wolbachia gDNA for genome
sequencing using an infected cell line has been successfully employed, and the PacBio RS II
sequencing platform has proved a very useful tool for achieving a complete bacterial
assembly, particularly when combined with Illumina sequencing. Using this approach, a
single contig assembly has been generated for the genome of the wAu strain, which does not
induce CI. Comparison of this genome to that of wMel, which does induce CI, revealed
significant structural differences in the prophage regions and loss or potential inactivation of
a number of genes.(click on link for lots more,28 pages) http://www.biomedcentral.com/content/pdf/1471-2164-15-928.pdf
9:27 AM - 24 Oct 2014 "the US Centers for Disease Control and Prevention ordered two PacBio RS II systems, bringing its total to three." http://www.genomeweb.com/sequencing/pacbio-continues-push-human-sequencing-sees-new-market-hiseq-x-ten-customers …
Posted October 23, 2014-- Long-read, whole genome shotgun sequence data for five model organisms
Abstract
Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characterisitcs of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4-C2 and P5-C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.
http://biorxiv.org/content/early/2014/10/23/008037
Fosmid-Based ResequencingThe resequencing of some genomic regions–including the MHC and KIR regions–has been time consuming and expensive due to their size and repetitive structure. Our method combines a fosmid-based approach with next generation sequencing to provide you with complete, high fidelity, and phased sequence.
Fosmid libraries are constructed from fragmented genomic DNA and are diluted into 96-well plates to reduce the complexity of each well. The use of fosmids provides the advantage of a higher scale order on the highly repetitive structures of complex genomic regions.
Next generation sequencing, using the Illumina MiSeq Sequencing System allows us to identify target fosmids within each well in a single step. Target fosmids are isolated from wells via recombineering and sequenced together in a single sequencing run using either the MiSeq or PacBio’s Single Molecule Real-Time (SMRT®) long-read sequencing technology. Fosmid sequences are assembled using overlaps, producing phased, high quality data on the megabase scale.
(click on link for charts) http://www.sciscogenetics.com/technology/fosmid-based-resequencing/
("If the long reads are high quality and cheap, you wouldn't need the short reads. . . [long reads] would take over the market.")!!! Short Read Sequencing Not Up to the Task of Characterizing Transcriptome Says Mike Snyder of Stanford
published by Ayanna Monteverdi on Wed, 09/10/2014 ---- Guest:
Mike Snyder, Director, Center for Genomics & Personalized Medicine, Stanford Bio and Contact Info
Listen (5:44) Current method for figuring out transcriptomes is crazy
Listen (4:18) Long reads necessary to find paternal or maternal alleles
Listen (4:31) Practical applications of the transcriptome
Listen (4:41) Has the race to the $1,000 Genome been at the expense of quality?
Listen (6:33) If price drops for long reads is there a future for short reads?
Today we launch the much anticipated series on The Rise of Long Read Sequencing with Mike Snyder, Chair of Genetics at Stanford. Mike has been working four years on what has become known as the “Snyderome” (or "Narcissome" as his colleagues call affectionately call it), looking at hundreds of thousands of his own molecular biomarkers regularly over time. Lately Mike has been focused particularly on his transcriptome, or RNA molecules.
The transcriptome is studied by looking at individual isoforms. On average, every gene has five or six isoforms or transcripts. Recently Mike has co-authored a couple papers showing that it is difficult to identify full-length transcript isoforms using the current short read sequencing technology.
“The way we figure out transcriptomes now is kind of crazy if you think about it," he says. "We take RNA. We blow it up into little fragments, and then we try to assemble them back together to understand what the transcription looked like in the first place. That’s a horrible way to do this.”
Mike explains how PacBio's long read technology is opening up new possibilities for characterizing the transcriptome and identifies some of the practical applications that might come from his research.
So what does this mean about the future of NGS? If PacBio or one of the emerging nanopore sequencing companies can offer long reads at high throughput, is there any reason why a researcher would use short read technology?
"If the long reads are high quality and cheap, you wouldn't need the short reads. . . [long reads] would take over the market." Mike says.
Reflecting on the rapid changes we've seen in the NGS space from year to year, he says, "next year we'll probably have a whole different conversation."
..
The Progress of Clinical Genomics in Sweden with Ulf Gyllensten
published by Ayanna Monteverdi on Thu, 10/16/2014 - 11:30
Guest:
Ulf Gyllensten, Professor, Department of Immunology, Genetics, and Pathology, Uppsala University, Sweden Bio and Contact Info
Listen (4:24) What are your goals at the National Genomics Infrastructure?
Listen (4:42) PacBio revolutionizing HLA typing
Listen (4:01) Getting the word about long reads out to clinicians
Listen (3:17) What would you like to see from sequencing companies in the future?
Listen (8:03) An update on clinical genomics in Sweden
Listen (5:02) The Road Show
For our final show in the series on long read sequencing, we move to Sweden and talk to Ulf Gyllensten, Co-Director of the National Genomics Infrastructure.
Ulf and his team use all the major sequencing platforms, and one of their jobs at the NGI is to compare the platforms. In today’s interview, he tells of the goals at the NGI and how new long read technology from PacBio is opening up new applications.
Some of these applications are clinical, and Ulf gives an update on clinical genomics in Sweden where regulation and privacy concerns are much more straight forward than they are here in the U.S.
Podcast brought to you by: Pacific Biosciences - providers of long read sequencing solutions based on their Single Molecule Real Time technology.
http://mendelspod.com/podcast/progress-clinical-genomics-sweden-ulf-gyllensten
"this report undermines PacBio"?
NOT!!!
It is actually good news for PacBio (and any other noisy long read technology like nanopore). Availability of better algorithms means researchers will need less computing power in assembling the reads. So, that makes the sequencing technology more usable, whereas the processing time used to be a big concern in the past.
One caution, however, is that the new paper used both PacBio and Illumina reads, whereas the 405,000 CPU hour reporting was based on PacBio only. Therefore, 6 hours vs 405K hours is not an apple-to-apple comparison. A better comparison would be based on what Gene Myers reported in his DALIGN paper.
http://www.homolog.us/blogs/blog/2014/07/28/in-dalign-paper-gene-myers-delivers-a-major-blow-to-his-biggest-competitor/
Drug Discovery TutorialsMore » Oct 15, 2014
The New World of Isoform Sequencing
Long-Read Sequencing Can Offer the Most Comprehensive View Yet of Gene Activity
Jonas Korlach, Ph.D.
Not too long ago, the life sciences community was still debating whether sequencers would ever overtake microarrays as the preferred means of measuring gene expression. Today, not only have sequencers become the standard workhorse for gene expression studies, but newer sequencing technology has delivered the ability to generate novel expression data even in the most well-characterized cells or organisms. Truly, it is a remarkable time for comprehensive studies of which genes are being transcribed, with the goal of providing functional insight into various biological processes.
The key advantage sequencing holds over microarrays is its ability to deeply survey an entire transcriptome, while microarrays are limited to interrogating known genes using probes designed from a reference genome assembly. As next-generation sequencing became more affordable, scientists were eager to switch to this approach, which became known as RNA sequencing or simply RNA-seq.
Recently, scientists have begun applying long-read sequencing to further advance the field of gene expression, finding that this method can directly sequence full-length transcripts and provide additional insights into gene isoforms. In doing so, this technique has generated a more comprehensive view of full-length, protein-coding gene transcription than other sequencing technologies for the clearest view yet of a transcriptome.
From Short Reads to Long ReadsClick Image To Enlarge +
Bcl-x is a classic example of how two gene isoforms can have opposite biological effects, depending on whether a particular exon is retained or spliced out.
As a method, RNA-seq has flourished since it was first introduced some seven years ago. Scientists performing RNA-seq studies convert their RNA of interest into cDNA and then sequence it on massively parallel, next-gen sequencers. This generates millions of reads—far more data than it was possible to collect in a single microarray study.
The challenge with most next-gen sequencers, however, is that the reads they produce are quite short (generally maxing out at just a few hundred bases). For gene expression studies, these snippets of information become problematic during assembly, when algorithms have a hard time correctly mapping these reads. This can make isoforms difficult to see, often conflating alternately spliced isoforms into a smaller number of transcripts than they really represent. In some studies, researchers have found that gene isoforms are significantly underrepresented in short-read assemblies.
In a recent review of RNA-seq in Nature Methods, Stanford genetics professor Michael Snyder was quoted as saying, “The way we do RNA-seq now is … you take the transcriptome, you blow it up into pieces and then you try to figure out how they all go back together again. … If you think about it, it’s kind of a crazy way to do things.”
A separate publication in Nature Methods from members of the RNA-seq Genome Annotation Assessment Project (RGASP) presented an evaluation of two dozen protocols in various organisms for inferring transcript information and gene expression level from RNA-seq data. (Such inference is necessary, the authors note, because technical limitations of RNA-seq result in “partial sequence reads of fragmented gene products,” which mandates a shotgun approach to these sequences.)
Click Image To Enlarge +
The transcriptome is extraordinarily diverse and complex. A myriad of variation can form from individual genes, such as alternative splicing of exons (the portions of the gene that encode for protein), alternative first and last exons, cassette exons where only one or another exon, but not both, are present in a particular mRNA, and many others
One challenge identified through this large benchmarking effort was that transcript identification is quite limited when reads miss exons. “For a significant fraction of transcripts not all exons are identified, ranging from 30% in C. elegans to greater than 60% in H. sapiens,” they write.
Another significant challenge found in the study was reconstruction of isoforms. “These results underscore the difficulty of transcript assembly, which relies on two outcomes: all exons comprising a given transcript must be identified, then connected to form the correct isoform structure,” according to the paper. The researchers determined that automated methods for transcript assembly failed to identify every exon for most transcripts, and that even when all exons were identified, the methods were often unable to pull them together into complete isoforms. “Assembly of complete isoform structures poses a major challenge even when all constituent elements are identified,” the authors report.
The RGASP publication concluded that “unannotated transcript isoforms assembled from RNA-seq data should be interpreted with care, and those critical to an experimental study subjected to independent validation.” Looking to the future, they write, “Ultimately, the evolution of RNA-seq will move toward single-pass determination of intact transcripts. Third-generation instruments will realize that potential and inspire new computing approaches to meet the next wave of innovation in transcriptome analysis.”
One such third-generation technology is Single Molecule, Real-Time (SMRT®) Sequencing from Pacific Biosciences, which generates long reads that have been used by scientists to capture full transcripts from the 5´ end all the way to the 3´ end. In DNA sequencing, this technology generates reads averaging more than 8,000 bases.
Full TranscriptsClick Image To Enlarge +
In addition to having a high-quality genome assembly, it is important to understand all of the gene products, both in terms of their frequency and with regard to the diversity of the different forms that can be created from a particular gene.
The ability to perform RNA sequencing with long-read technology is still relatively new, but several pioneering publications describe studies that have utilized the approach. They often report finding critical elements that were previously missed by short-read sequencing. One such study is presented here.
A paper in Proceedings of the National Academy of Sciences from lead author Kin Fai Au and senior author Wing Wong at Stanford University, with several collaborators, used transcriptome sequencing to analyze human embryonic stem cells. The addition of SMRT Sequencing to an existing RNA-seq data set from short-read sequencing helped the scientists characterize more than 13,000 full-length transcript isoforms. More than a third of the isoforms seen in this well-characterized cell line were novel, the scientists reported, noting that long, noncoding RNAs were more likely to be missed by short-read sequence data. The SMRT Sequencing data was useful not just for more comprehensively identifying isoforms, as they found 273 new genes in the data as well. The authors concluded that “gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.”
Going ForwardWith extraordinarily long sequence reads, scientists will be able to learn more about transcriptomes than has ever been possible. Already, studies are turning up new discoveries that simply could never have been detected with short-read sequence data. The ability to accurately identify all transcripts, correctly sort alternately spliced regions, and find important elements such as long, noncoding RNAs will be essential to accelerating our understanding of how gene expression functions.
Jonas Korlach, Ph.D. (jkorlach@pacificbiosciences.com), is CSO at Pacific Biosciences.
inShare.KEYWORDS: Gene Expression ,Pacific Biosciences ,RNA-Seq ,Sequencing Email ThisShare This Print ThisReprintsEmail The EditorSave to FavoritesThis article has been saved to your favorites!
Add a commentYou must be signed in to perform this action.
Click here to Login or Register for free.
You will be taken back to your selected item after Login/Registration.
Related contentSix Epigenetic Faces of Streptococcus
Intracellular RNA-Seq
RNA-Seq Dissects the Transcriptome
http://www.genengnews.com/gen-articles/the-new-world-of-isoform-sequencing/5335/
Wednesday, October 15, 2014----- New Chemistry Boosts Average Read Length to 10 Kb – 15 Kb for PacBio® RS II
We are pleased to announce the launch of our new reagent kit, P6-C4, which represents the next generation of our polymerase as well as our chemistry. This kit replaces the P5-C3 chemistry and is recommended for all SMRT® Sequencing applications, including de novo assembly, targeted sequencing, isoform sequencing, minor variant detection, scaffolding, long-repeat spanning, SNP phasing, and structural variant analysis.
P6-C4 continues the steady read length improvement our users have seen since the instrument first launched. With this new chemistry, average read lengths increase to 10 Kb - 15 Kb, with half of all data in reads 14 Kb or longer. The throughput is expected to be between 500 million to 1 billion bases per SMRT Cell, depending on the sample being sequenced. By providing more throughput per instrument run, the chemistry enables users to sequence larger genomes and observe previously undetected structural variants, highly repetitive regions, and distant genetic elements.
This new release also includes more robust analysis software, SMRT Analysis 2.3, providing improvements for Long Amplicon Analysis and the Iso-Seq™ method. Together with performance enhancements, these advances boost accuracy, speed up analysis, and provide more options for analyzing amplicons of mixed sizes such as full-length HLA Class I and II genes.
Here are the new part numbers:
DNA Sequencing Reagent 4.0 — P/N 100-356-200
DNA Sequencing Bundle 4.0 (10 Pack) — P/N 100-356-400
DNA/Polymerase Binding Kit P6 — P/N 100-356-300
DNA Internal Control Complex (P6) — P/N 100-356-500
We are also happy to release an additional model organism data set to the public, Caenorhabditis elegans. C. elegans sequence was first published in 1998 and has been updated and improved over the years. Our data was generated using 11 SMRT Cells with the new P6-C4 chemistry. The average read length of the raw data set is >14 Kb, with half of the bases in reads > 21 Kb and the maximum read length of 64,500 bases.
Some basic assembly stats:
Genome size: 103.02 Mb
Raw data: 4.57 Gb
Assembly Coverage: 39.45x
Polished Contigs: 245
Max Contig Length: 3.17 Mb
N50 Contig Length: 1.61 Mb
Sum of Contig Lengths: 104.2 Mb
Mapped Subread Length Distribution:
Mapped Subread Concordance:
(See link for mapped)
http://blog.pacificbiosciences.com/2014/10/new-chemistry-boosts-average-read.html
(From Cornell University Library) DBG2OLC: Efficient Assembly of Large Genomes Using the Compressed Overlap Graph
Authors: Chengxi Ye, Chris Hill, Sergey Koren, Jue Ruan, Zhanshan (Sam)Ma, James A. Yorke, Aleksey Zimin
(Submitted on 10 Oct 2014)
Abstract: The genome assembly computational problem is preventing the transition from the prevalent second generation to third generation sequencing technology. The problem emerged because the erroneous long reads made the assembly pipeline prohibitive expensive in terms of computational time and memory space consumption.
In this paper, we propose and demonstrate a novel algorithm that allows efficient assembly of long erroneous reads of mammalian size genomes on a desktop PC. Our algorithm converts the de novo genome assembly problem from the de Bruijn graph to the overlap layout consensus framework. We only need to focus on the overlaps composed of reads that are non-contained within any contigs built with de Bruijn graph algorithm, rather than on all the overlaps in the genome data sets. For each read spanning through several contigs, we compress the regions that lie inside each de Bruijn graph contigs, which greatly lowers the length of the reads and therefore the complexity of the assembly problem. The new algorithm transforms previously prohibitive tasks such as pair-wise alignment into jobs that can be completed within small amount of time. A compressed overlap graph that preserves all necessary information is constructed with the compressed reads to enable the final-stage assembly.
We implement the new algorithm in a proof-of-concept software package DBG2OLC. Experiments with the sequencing data from the third generation technologies show that our method is able to assemble large genomes much more efficiently than existing methods. On a large PacBio human genome dataset we calculated the pair-wise alignment of 54x erroneous long reads of human genome in 6 hours on a desktop computer compared to the 405,000 CPU hours using a clusters, previously reported by Pacific Biosciences. The final assembly results were in comparably high quality.
Subjects: Genomics (q-bio.GN)
Cite as: arXiv:1410.2801 [q-bio.GN]
(or arXiv:1410.2801v1 [q-bio.GN] for this version)
Submission history
From: Chengxi Ye [view email]
[v1] Fri, 10 Oct 2014 14:58:15 GMT (492kb)
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
Link back to: arXiv, form interface, contact.
http://arxiv.org/abs/1410.2801 (another link)"We implement the new algorithm in a proof-of-concept software package DBG2OLC. Experiments with the sequencing data from the third generation technologies show that our method is able to assemble large genomes much more efficiently than existing methods. On a large PacBio human genome dataset we calculated the pair-wise alignment of 54x erroneous long reads of human genome in 6 hours on a desktop computer compared to the 405,000 CPU hours using a clusters, previously reported by Pacific Biosciences. The final assembly results were in comparably high quality."
http://www.homolog.us/blogs/blog/2014/10/13/very-efficient-hybrid-assembler-for-pacbio-data/
(Just searching for PACBIO sequencing Ebola).cdnwww.genomeweb.com/sequencing/after-release-20-new-genomes-100k- pathogen-project-now-kicking-pacbio-sequencing
Jul 30, 2013 ... He said PacBio sequencing has been "fantastic" for the team so far, yielding "nice
.... reports on the sequencing of 99 Ebola virus genomes from infected patients in
https://www.google.com/search?hl=en&source=hp&q=Has+PACBIO++Sequence+analysis+of+the+Ebola+virus+genome&gbv=2&oq=Has+PACBIO++Sequence+analysis+of+the+Ebola+virus+genome&gs_l=heirloom-hp.12...7551.147233.0.156235.24.23.0.1.0.0.219.3947.0j19j4.23.0....0...1ac.1.34.heirloom-hp..15.9.1719.rYbt5irYtvw /// ("The team sequenced 99 Ebola virus genomes from 78 people in Sierra Leone, who were diagnosed with Ebola in late May and mid-June. Sadly, five of over 50 co-authors of the paper lost their lives to Ebola virus before the paper was published.") http://nextgenseek.com/2014/08/ebola-genome-sequence-link-roundup/
Friday, October 10, 2014-- ASHG 2014: A New Look at the Human Genome with Long-Read Sequencing
Scientists around the world are getting ready for the annual meeting of the American Society for Human Genetics taking place October 18-22 at the San Diego Convention Center. We’re looking forward to a number of excellent presentations and posters, and are delighted to see that many of them will focus on applying Single Molecule, Real-Time (SMRT®) Sequencing to human studies.
If you’ll be among those attending ASHG, be sure to attend our workshop, A New Look at the Human Genome – Novel Insights with Long-Read PacBio Sequencing, taking place 12:30 – 2:00 p.m. on Tuesday, October 21. Register in advance to reserve your seat or to receive the recording following the event. Our CSO, Jonas Korlach, will host the workshop, which includes:
* Increased Complexity of the Human Genome Revealed by Single-Molecule Sequencing
Evan Eichler, University of Washington
* Defining a Personal, Allele-Specific, and Single-Molecule Long-Read Transcriptome
Hagen Tilgner, Stanford University
* Long-Read Multiplexed Amplicon Sequencing: Applications for Epigenetics and Pharmacogenetics
Stuart Scott, Icahn School of Medicine at Mount Sinai
In addition, here are some of the program presentations and posters we’re excited to see at the event:
Sunday, October 19
11:30 a.m.
Discovery and impact of balanced inversion polymorphisms
Jan Korbel, European Molecular Biology Laboratory
4:00 – 5:00 p.m.
1626S: Third generation sequencing and analysis of complete mitochondrial genomes
Gabriel Hoffman, Icahn School of Medicine at Mount Sinai
1665S: Completing CpG methylation statuses in human and vertebrate genomes by integrating SMRT sequencing kinetic data
Shinichi Morishita, University of Tokyo
5:00 – 6:00 p.m.
1656S: Genome in a Bottle: So you’ve sequenced a genome, how well did you do?
Justin Zook, NIST
Monday, October 20
2:00 – 3:00 p.m.
1627M: Full-length, single molecule whole transcriptome sequencing reveals alternative 5’- start sites, splicoforms, and poly(A) addition signal sequences
David Munroe, NCI
1687M: Unique Haplotype structure determination in human genome using Single Molecule, Real-Time (SMRT®) Sequencing of targeted full-length fosmids
Kevin Eng, Pacific Biosciences
3375M: Highly sensitive, non-invasive detection of colorectal cancer mutations using single molecule, third generation sequencing
Giancarlo Russo, UZH/ETH Zurich
Tuesday, October 21
8:00 a.m.
Completion of The 1000 Genomes Project: Results, Lessons Learned and Open Questions
Goncalo R. Abecasis, University of Michigan School of Public Health
11:45 a.m.
High-throughput Determination of Long INterspersed Element-1 Integration Preferences in the Human Genome
Diane A. Flasch, University of Michigan Medical School
2:00 – 3:00 p.m.
1583T: Multiplexing strategies for HLA genotyping using DNA barcoding methods for SMRT® sequencing
Swati Ranade, Pacific Biosciences
3:00 – 4:00 p.m.
472T: Multiplexed and Quantitative DNA Methylation Analysis Using Long-Read Single-Molecule Real-Time (SMRT) Bisulfite Sequencing
Yao Yang, Icahn School of Medicine at Mt. Sinai
552T: Complex alternative splicing patterns in human hematopoietic cell subpopulations revealed by third-generation long reads
Anne Deslattes Mays, Georgetown University
1586T: Assessing novel centromeric repeat sequence variation within individuals by long read sequencing
Karen Miga, UC Santa Cruz
1622T: Resolving the 'Dark Matter' in Human Genomes through Long-Read Sequencing
Jonas Korlach, Pacific Biosciences
5:45 p.m.
Increased complexity of the human genome revealed by single-molecule sequencing
Mark Chaisson, University of Washington
Stop by booth #931 to contribute your ‘variation’ and in return, we’ll contribute to a charity on your behalf! If you can’t make it to the event, you can follow the science at #ASHG14 or register to receive the recording of our workshop. http://blog.pacificbiosciences.com/2014/10/ashg-2014-new-look-at-human-genome-with.html
10/08/2014 | 02:50pm US/Eastern--Pacific Biosciences of California : Patent Issued for Modular Nucleotide Compositions and Uses Therefor
By a News Reporter-Staff News Editor at Biotech Business Week -- According to news reporting originating from Alexandria, Virginia, by NewsRx journalists, a patent by the inventors Korlach, Jonas (Newark, CA); Wegener, Jeffrey (Cupertino, CA), filed on July 19, 2012, was published online on September 30, 2014 (see also Biotechnology Companies).
The assignee for this patent, patent number 8846881, is Pacific Biosciences of California, Inc. (Menlo Park, CA).
Reporters obtained the following quote from the background information supplied by the inventors: "A wide variety of nucleotide compositions and nucleotide analog compositions have been provided for use in a variety of different applications. In some cases, these compositions function as analytical reagents for the analysis of biological processes, e.g., in nucleic acid sequencing reactions. In other cases, these compositions function as pharmaceutically active substances for the treatment of disease. In still other aspects, these compositions form building blocks for other commercial applications. In a number of situations, a basic nucleotide, e.g., a nucleoside triphosphate, is coupled to an additional functional group in order to provide an additional or a different function to that compound. For example, in one of the more ubiquitous embodiments, detectable label groups, such as fluorescent dyes, radiolabels, semiconductor nanocrystals, or the like, are coupled to the nucleotide to render the nucleotide more easily detectable, e.g., through a fluorescent microscope. These labels may be coupled to persistent components of the nucleotide, i.e., the nucleobase, that remains even following polymerization with other nucleotides, or they may be coupled through the transient portions, e.g., a gamma phosphate group that may be removed upon polymerization. In other cases, functional groups may be coupled to nucleotides or nucleotide analogs in order to provide therapeutic activity, e.g., in interrupting viral replication, or the like.
"Despite the widespread use of functionally tagged nucleotides, it would be desirable to provide for a modular nucleotide composition that allows simple and flexible functionalization of nucleotides for use in a variety of different applications. The present invention meets these and other needs."
In addition to obtaining background information on this patent, NewsRx editors also obtained the inventors' summary information for this patent: "The invention generally provides modular nucleotide compositions and methods of making and using such compositions that employ a cassette approach to addition of functional groups to nucleotide analogs.
"In certain aspects, the present invention provides a composition having a nucleoside polyphosphate coupled to a functional group through a phosphate group by a non-covalent linkage. In certain embodiments, the functional group is coupled to the nucleoside polyphosphate through a phosphate group other than the alpha phosphate group, e.g., the beta, gamma, or other terminal phosphate group. In preferred embodiments, the functional group comprises a detectable label, such as a fluorescent label. In certain embodiments, the functional group is a 'payload' delivered by the composition, e.g., a pharmaceutical compound or diagnostic agent. In certain embodiments, the functional group comprises a particle, e.g., a magnetic particle, a fluorescent semiconductor particle, a metal particle, and/or a polymeric particle.
"The non-covalent linkage preferably comprises one or more of an affinity linkage, biotin, avidin (or biotin-binding subunit thereof), streptavidin (or biotin-binding subunit thereof), neutravidin (or biotin-binding subunit thereof), an antibody or fraction thereof, a polynucleotide, a nucleic acid binding protein, or a combination thereof. In certain embodiments, the non-covalent linkage is a polyvalent non-covalent linkage. For example, a polyvalent non-covalent linkage may couple multiple functional groups to a single nucleoside polyphosphate, or may couple multiple nucleoside polyphosphates to a single functional group, of may couple multiple nucleoside polyphosphates to multiple functional groups. The multiple nucleotide polyphosphates and/or multiple functional groups can be the same or different from one another. For example, multiple functional groups can comprise spectrally distinguishable fluorescent labels or moieties with different charges.
"In certain embodiments, the invention provides compositions having multiple non-covalent linkages. For example, in some compositions of the invention multiple non-covalent linkages couple a single nucleoside polyphosphate to multiple functional groups, and in other compositions of the invention multiple non-covalent linkages couple multiple single nucleoside polyphosphates to a single functional group.
"In other aspects, the invention provides compositions having the structure BSPLF, where B comprises a nucleobases moiety, S comprises a sugar, acyclic, or carbocyclic moiety, P comprises a polyphosphate group, L comprises a non-covalent linkage component, and F comprises a desired functional group. In certain preferred embodiments, L comprises an affinity binding pair.
"In further aspects, the invention provides methods for preparing nucleotide compositions that include providing a nucleoside polyphosphate having a first non-covalent linking group coupled to a phosphate group; providing a functional group having a second non-covalent linking group coupled thereto, the second non-covalent linking group being capable of non-covalently binding to the first non-covalent linking group; and linking the nucleoside polyphosphate to the functional moiety through the first and second non-covalent linking groups. Preferably, the phosphate group through which the non-covalent linking group is coupled to the nucleoside polyphosphate is hot the alpha phosphate group of the nucleoside polyphosphate. In preferred embodiments, the first and second non-covalent linking groups form an affinity binding pair, e.g., an epitope pair, GST/glutathione pair, RNA/aptamer pair, or an associative protein or polypeptide pair. For example, in some embodiments the first non-covalent linking group is complementary to the second non-covalent linking group. In other embodiments, one of the non-covalent linking groups is an antibody and the other is an antigen. In yet further embodiments, one of the non-covalent linking groups is a nucleic acid and the other is a nucleic acid binding protein.
"In yet further aspects; the invention provides systems for providing functionalized nucleotide compositions comprising: a first source of nucleoside polyphosphates having at least a first linkage component attached to a phosphate group thereon; a second source of functional groups having a second linkage component coupled thereto, wherein the first and second linkage components comprise an affinity binding pair. The system can further include a reagent mixing system for transferring nucleoside polyphosphate from the first source and functional; groups from the second source to a mixing chamber to combine the nucleoside polyphosphate and the functional groups under conditions whereby the first and second linking components form a non-covalent linkage, thereby providing functionalized nucleotide compositions. The system can further include a dispensing the functionalized nucleotide compositions into a reaction mixture. In preferred embodiments, the first source comprises at least two different nucleoside polyphosphates having the first linkage component attached to a phosphate group thereon; and/or the second source comprises at least two different functional groups having the second linkage component coupled thereto. The system is capable of providing multiple different functionalized nucleotide compositions, each of which comprises the first and second linkage component and a different combination of nucleoside polyphosphate and functional group. For example, each may comprise the same avidin/biotin pair, but a different combination of nucleoside polyphosphate and detectable label."
For more information, see this patent: Korlach, Jonas; Wegener, Jeffrey. Modular Nucleotide Compositions and Uses Therefor. U.S. Patent Number 8846881, filed July 19, 2012, and published online on September 30, 2014. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=8846881.PN.&OS=PN/8846881RS=PN/8846881
Keywords for this news article include: Anions, Chemistry, Electronics, Electrolytes, Semiconductor, Polyphosphates, Phosphoric Acids, Inorganic Chemicals, Phosphorus Compounds, Biotechnology Companies, Pacific Biosciences of California Inc..
Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC
(c) 2014 NewsRx LLC
http://www.4-traders.com/PACIFIC-BIOSCIENCES-OF-CA-6797675/news/Pacific-Biosciences-of-California--Patent-Issued-for-Modular-Nucleotide-Compositions-and-Uses-There-19159888/
Monday, October 6, 2014---'The Quality of PacBio Data Is Beyond Compare': Eric Schadt on Applications of SMRT Sequencing to Human Genetics
As part of its continuing series on long-read sequencing, last week Mendelspod aired an engaging interview with Eric Schadt, Professor & Chair of Genetics and Genomic Sciences, and Director of the Icahn Institute for Genomics and Multiscale Biology at Mount Sinai.
Having now spent three years in his role at the groundbreaking institute, he reports that they are making great progress in the quest to build better data-driven health profiles around individuals that may better guide healthcare choices.
On short-read versus long-read sequencing
Short-read sequencing technologies still maintain the advantage in terms of throughput, says Schadt, but there are a variety of important genomic features that cannot be characterized without long-read sequencing, such as long tandem repeats, bigger structural variations, and focal variants important in cancer.
“I definitely think [short-read] technologies were tuned for certain problems and had certain advantages that enabled this big advance, but they are absolutely not hitting the entire problem like we need it hit,” he told Mendelspod.
Cancer is a main area of study for which Schadt believes long-read sequencing is needed, in order to understand the complicated genomic features driving the tumor cells. And outside of human applications he called out plant genomics. “Plant genomes are so complicated and so flooded with repeat sequences, their only hope is to have long-read data,” he said.
In general, Schadt believes that the scientific community is recognizing the need for long-read data to provide complete characterizations for genomes. “The aim in all of these is to unambiguously resolve all the structural features of a genome, to de novo assemble those genomes and get away from reference-based assemblies, you are just not going to be able to do that with short-read technologies.”
The quality of PacBio sequencing is 'beyond compare'
Schadt noted that early misconceptions about the type of error profiles seen in single-molecule data erroneously led people to believe the data was of lower quality. He explained how the errors are random and can easily be washed out with a modest amount of coverage, whereas other next-generation sequencing technologies have systemic errors that cannot be removed.
At Mount Sinai they used PacBio® technology to sequence the human genome and saw “very dramatic improvements in the quality of the de novo assemblies, revealing features that have never been seen before.” He said he believes this type of sequencing will become the standard. “The quality of that PacBio data is just beyond compare.”
Schadt also noted that because it is the only single molecule sequencer, there are certain applications that cannot currently be done with any other product. Examples he discussed include assembling bacterial genomes de novo, calling cancer variants in heterogeneous samples, dealing with viral mixtures, mitochondrial DNA sequencing, and looking at methylation as part of the sequencing. “It’s just amazing what the instrument is capable of doing,” he said.
Next up in the series
Mendelspod will interview Gene Myers from the Max Planck Institute. Stay tuned for programming information. http://blog.pacificbiosciences.com/2014/10/the-quality-of-pacbio-data-is-beyond.html
Thursday, October 2, 2014--‘We’re Going to Find the Keys’: Dan Geraghty Discusses an Approach to Understanding Causal Genetic Variation
Dan Geraghty, a researcher at Fred Hutchinson Cancer Research Center and CEO of Scisco Genetics, has spent much of his career focused on the genetics of immune response. Recently he talked to Mendelspod host Theral Timpson as part of a continuing series of podcasts on the rise of long-read sequencing.
Geraghty explained that while there have been decades’ worth of studies associating the genetics of the major histocompatibility complex (MHC), and the highly polymorphic HLA class 1 and 2 genes, we still haven’t found the key mutations for a variety of different autoimmune diseases such as type 1 diabetes, rheumatoid arthritis, multiple sclerosis, and others.
Enormous amounts of linkage disequilibrium in these regions are one factor, as is getting information in phase, so larger stretches of sequence are needed. Recently Geraghty has begun using Single Molecule, Real-Time (SMRT®) Technology with hopes of drilling down to the causal genetics.
The challenge with short reads
Geraghty explained that sequencing fosmids with short-read technology is cumbersome when it comes to stitching together the reads. Data analysis and finishing “became a roadblock that the Illumina short-read technology wouldn’t let us get beyond,” he said, noting that the finishing process takes 30 minutes to an hour per fosmid, prohibitive for any modest-scale effort. Geraghty marveled that he has received 40 kb reads from PacBio – meaning a whole fosmid can be sequenced in one piece.
PacBio is ready to handle the challenge
Geraghty said that with recent technology improvements, PacBio data is “really high quality” and “as good or better than Illumina and Sanger,” noting that his group has compared all three technologies with the same sequences. “It opens up a whole new possibility,” he said, because previously “you simply weren’t getting all of the data. People were using statistics to impute missing data and so on, and it simply doesn’t work.”
Should PacBio be used for all major sequencing projects?
Geraghty thinks so, noting that a resource such as the 1000 Genomes Project would be upgraded significantly with PacBio data for complex regions such as MHC and KIR. He said that if you look at these regions in the 1000 Genomes data you will find “a mass of confusion” because those regions are highly repetitive and contain a large amount of copy number and allelic variation, making it difficult or impossible to assemble the data correctly with short reads.
“Any large human genome sequencing projects just using short-read technology are not going to acquire usable data for these complex regions, it’s as simple as that,” he said. For complex regions, “you’ll need long-read data,” he said, “The long-read data will give you really what everybody has been after all along without realizing it. It will give you the phase and the detail on the polymorphism in these highly polymorphic regions.”
The future is bright
Geraghty expressed his excitement about the future using long-read sequencing this way: “We’re hot on the trail. We basically see the entire picture; we are not looking under a lamp post for the keys. It’s daylight and we can see the whole neighborhood. So we’re going to find the keys.” http://blog.pacificbiosciences.com/2014/10/were-going-to-find-keys-dan-geraghty.html
Breakthrough study discovers six changing faces of ‘global killer’ bacteria
Issued by University of Leicester Press Office on 30 September 2014
"Every ten seconds a human being dies from pneumococcus infection making it the leading cause of serious illness across the globe." The University of Adelaide and scientists from Pacific Biosciences, and has for the first time shown a genetic switch that allows this bacterium to randomly change its characteristics into six alternative states.
http://www2.le.ac.uk/offices/press/press-releases/2014/september/breakthrough-study-discovers-six-changing-faces-of-2018global-killer2019-bacteria---- "Now that scientists have determined the methylation profiles with the PacBio® platform, it should be possible for other scientists to accurately assign the pathogen to its specific phase. “Future studies must recognize the potential for switching between these heretofore undetectable, differentiated pneumococcal subpopulations in vitro and in vivo,” the authors note. “We believe these findings represent a new paradigm in gene regulation in bacteria and therefore are of great significance to the infectious disease field.”
http://blog.pacificbiosciences.com/2014/09/new-papers-detail-complexity-of.html?m=1
DNA sequencing has been a critical tool for the field of microbiology
since the technology was first invented. Here, we look at how
advances in sequencing provide microbiologists the most
comprehensive and cost-effective view of their research subjects. (A must READ)! http://www.pacificbiosciences.com/pdf/microbial_primer.pdf?utm_content=buffera6595&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
New: PacBio office hours at the DNA Technologies Core
Posted on September 22, 2014 by Keith Bradnam
We are pleased to announce monthly PacBio office hours at the Core Labs. Pacific Biosciences Field Applications Scientist Nicole Rapicavoli will be answering your questions and assisting with the design of experiments every third Tuesday of the month (starting Tuesday, August 19th) from 1 to 4 pm. Please stop by at the DNA Technologies Core (room 1410, Genome Center – GBSF) to meet with Nicole.
Share this
‹ RNA-seq library preparation workshop being organized by the DNA Technologies and Expression Analysis CoresPosted in Highlights, News http://genomecenter.ucdavis.edu/2014/09/22/new-pacbio-office-hours-at-the-dna-technologies-core/?utm_content=bufferb4c87&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
Monday, September 22, 2014-- Maryland Scientists Produce High-Quality, Cost-Effective Genome Assembly of Loa loa Roundworm Using SMRT Sequencing
A paper just released in BMC Genomics details what authors call “the most complete filarial
nematode assembly published thus far at a fraction of the cost of previous efforts.” The project was performed using the PacBio® RS II DNA Sequencing System by scientists at the University of Maryland School of Medicine’s Institute for Genome Sciences and the Laboratory of Parasitic Diseases at the National Institute of Allergy and Infectious Diseases.
In this genome sequencing effort, scientists generated a de novo assembly of Loa loa, a roundworm that infects humans. L. loa, transmitted to humans by deer flies, causes loiasis. The parasite lives under the skin and can grow to several centimeters without being detected.
Like other filarial nematodes, the roundworm has proven challenging to grow in labs for in-depth study. Lead author Luke Tallon and his collaborators note that genome sequencing is even more important in such cases since it is a rare opportunity to elucidate the biology of these parasites. The genome they generated with Single Molecule, Real-Time (SMRT®) Sequencing may allow for the development of advanced molecular diagnostics to improve outcomes for patients with nematode infections.
Previous attempts to sequence L. loa were challenged by highly repetitive DNA (it was estimated that 9% of the genome was in repeats) and its AT-rich nature. Tallon et al. tackled a clinical specimen of the organism collected from a patient in the Central African Republic to produce a better assembly. A comparison of short-read sequence data, short- and long-read hybrid data, and long-read-only data found that PacBio data used on its own outperformed other assemblies that included short-read sequence. The final assembly was produced with HGAP2 and polished with Quiver. It includes 96.4 Mbp in 2,250 contigs and covers about 9% more of the genome than a previous draft assembly — in 85% fewer contigs and starting with 80% less DNA, the authors note.
“Recent improvements in long-read, single-molecule sequencing have enabled more economical sequencing and improved genome assembly for previously difficult to sequence clinical samples,” Tallon said in a press release issued by the University of Maryland School of Medicine. “To our knowledge, this study represents the largest and most complete genome of an uncultured clinical specimen successfully sequenced and assembled using this technology.”
Review the full paper, “Single molecule sequencing and genome assembly of a clinical specimen of Loa loa, the causative agent of loiasis,” at BMC Genomics.
http://blog.pacificbiosciences.com/2014/09/maryland-scientists-produce-high.html?utm_content=buffer68f2b&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
(I like this,older article) 6/11/2014--Roche doesn’t need to choose between the Genia instrument and PacBio’s, he says.
“It’s very possible the two platforms will complement each other,” Zabrowski says. PacBio is developing a new sequencer for Roche’s exclusive use in human genetic analysis, he says. “What we’ll see going forward is what are the best applications to be used with these technologies.”
PacBio remains free to sell its sequencing system already on the market, the PacBio RS II, which has developed a following among researchers seeking highly accurate sequences, or specializing in microbial genomes. PacBio’s equipment identifies the letters in a DNA sequence with light sensors, like traditional sequencers.
But both the Genia and PacBio systems can read the sequence of a single piece of DNA—a method that can increase the accuracy of results compared with most sequencers now on the market"---------- "Illumina, the market leader, said early this year that it had reached that $1,000 milestone. But Illumina’s low cost per genome can only be realized by organizations equipped with a $10 million bank of 10 Illumina sequencers working full tilt to scan thousands of genomes—thus benefitting from economies of scale." http://www.xconomy.com/san-francisco/2014/06/11/genia-sold-roche-to-incubate-another-novel-dna-sequencing-system/
Posted on September 18, 2014 by Marc Robinson-Rechavi
A new adventure: PacBio sequencing and RNA-seq in the classroom
This class has been running since 2010-11, on the following principle:
¦autumn semester: isolate bacterial DNA, sequence using Illumina, assemble;
¦spring semester: close assembly gaps, annotate genome.
As anyone following genomics knows, the times they are a’changing again and again, so this is less and less state-of-the-art. So we have decided to try a new course plan this year, taking advantage of the progress in bacterial genome sequencing with long PacBio reads.
Our new principle is, hopefully:
¦autumn semester: isolate bacterial DNA, sequence using PacBio, assembly trivial, annotate genome;
¦spring semester: RNA-seq under 2 growth conditions, experiments, Illumina sequencing and bioinformatics analysis.
“Hopefully” because PacBio on bacteria is not yet routine, depending on the genus and the growth conditions. We are thus trying two different bacteria, a Pseudomonas which has a cool story for the RNA-seq part, and a Caulobacter which has been shown to work with PacBio. Preliminary studies on the Pseudomonas are somewhat discouraging for the PacBio sequencing, but we will still try, with adaptations of the protocol. We will also keep the possibility to reverting to Illumina sequencing plus assembly, but we would like to avoid that (if Caulobacter is plan B, this is plan C).
And of course, we have never done RNA-seq with master students, so this year will be a new adventure, comparable to our first course in 2010. Stressful and exciting.
This entry was posted in course plan, sequencing. Bookmark the permalink.
? Our first student genome paper is out: Miyazaki et al Environ MicrobiolOne Response to A new adventure: PacBio sequencing and RNA-seq in the classroom
Winship Herr says:
September 18, 2014 at 14:31
This is a super cool course. My first Master First-step project was to sequence 100 nucleotides of the lac operon
http://www3.unil.ch/wpmu/sequenceagenome/2014/09/18/a-new-adventure-pacbio-sequencing-and-rna-seq-in-the-classroom/#comments
Senior Software Engineer
Pacific Biosciences Menlo Park, CA.--- Job Description
Do you want to use your C++ software engineering talent to solve problems of real scientific and medical importance? Are you bored with the idea of simply contributing to yet another social networking site or fixing bugs in some large company’s search engine? Do you have a desire to be part of a company creating cutting edge technology that is uncovering the mysteries of life itself? Do you want to work at a company that uses robots and lasers and nanotechnology to see a single DNA molecule sequence in real-time? Pacific Biosciences is seeking a talented C++ software engineer to build automated test systems, infrastructure code and optimized algorithm implementations, including well designed APIs for a complex system that produces SMRT (Single Molecule, Real Time) sequencing data which addresses a myriad of diverse scientific application areas. The candidate will develop robust, reliable and performant software infrastructure components, documentation and tests that enable our team to rapidly create a diversity of software solutions.
Responsibilities:
• Work in an exciting multi-disciplinary organization of software and hardware engineers, bioinformaticians, chemists, and molecular biologists developing state-of-the-art, single-molecule, genomic analysis systems.
• Design and write automated functional and unit tests and support developers to troubleshoot system issues or assist in system-level integration.
• Create a variety of infrastructure components that will be the foundational software layers used to build advanced analysis software.
• Quickly identify solutions to complex data processing and automation problems for use in production instrument software and internal tools; provide methods or prototypes for concept evaluation.
• Develop, integrate and test analysis pipeline components for deployment in production software; develop optimized implementations to maximize performance and throughput on available hardware.
• Write design and functional specifications as well as test plans for peer review; maintain software development practices adhering to company standards for coding and unit/functional test coverage.
Skills & Requirements
• M.S. in computer science, electrical engineering or in physical science disciplines; candidates with a similar B.S. degree and a high level of relevant experience will also be considered.
• An expert in C++ with at least 5 years of implementation experience on Linux.
• At least 5 years of experience developing reusable components, automated tests, utilizing formal software development processes, employing best practices.
• Experience maintaining software projects under source control with p4, git, svn, or similar.
• Extensive experience with multithreading programming and optimization techniques including SSE/SIMD vector processing instructions is required.
• Experience with the Intel Xeon Phi computing platform is a plus.
• Understanding of FDA compliance regulations a plus.
• Preference is shown to candidates with strong analytical and development skills who demonstrate the capability to bring solutions beyond the prototype stage for deployment in performance-critical production software.
http://careers.stackoverflow.com/jobs/68100/senior-software-engineer-pacific-biosciences
September 11, 2014
Reversible Positioning of Single Molecules inside Zero-Mode Waveguides
Chemistry/Chemical Biology, Northeastern University, 110 Forsyth Street, Boston, Massachusetts 02115, United States
§ Pacific Biosciences, 1380 Willow Road, Menlo Park, California 94025, United States
Nano Lett., Article ASAP
DOI: 10.1021/nl503134x
Publication Date (Web): September 11, 2014
Copyright © 2014 American Chemical Society
*E-mail: wanunu@neu.edu. Fax: (617) 373 2943.
ACS AuthorChoice + 12, Open Access on 09/11/2015
We have developed a hybrid nanopore/zero-mode waveguide device for single-molecule fluorescence and DNA sequencing applications. The device is a freestanding solid-state membrane with sub-5 nm nanopores that reversibly delivers individual biomolecules to the base of 70 nm diameter waveguides for interrogation. Rapid and reversible molecular loading is achieved by controlling the voltage across the device. Using this device we demonstrate protein and DNA loading with efficiency that is orders of magnitude higher than diffusion-based molecular loading.
Keywords: SMRT-sequencing; DNA sequencing; single molecule; nanophotonics; zeptoliter--- http://pubs.acs.org/doi/abs/10.1021/nl503134x?journalCode=nalefd
Single molecule sequencing and genome assembly of a clinical specimen of Loa loa, the causative agent of loiasis
Luke J Tallon, Xinyue Liu, Sasisekhar Bennuru, Marcus C Chibucos, Alvaro Godinez, Sandra Ott, Xuechu Zhao, Lisa Sadzewicz, Claire M Fraser, Thomas B Nutman and Julie C Dunning Hotopp
Author Affiliations
For all author emails, please log on.
BMC Genomics 2014, 15:788 doi:10.1186/1471-2164-15-788
Published: 12 September 2014
Abstract (provisional)
Background
More than 20% of the world's population is at risk for infection by filarial nematodes and >180 million people worldwide are already infected. Along with infection comes significant morbidity that has a socioeconomic impact. The eight filarial nematodes that infect humans are Wuchereria bancrofti, Brugia malayi, Brugia timori, Onchocerca volvulus, Loa loa, Mansonella perstans, Mansonella streptocerca, and Mansonella ozzardi, of which three have published draft genome sequences. Since all have humans as the definitive host, standard avenues of research that rely on culturing and genetics have often not been possible. Therefore, genome sequencing provides an important window into understanding the biology of these parasites. The need for large amounts of high quality genomic DNA from homozygous, inbred lines; the availability of only short sequence reads from next-generation sequencing platforms at a reasonable expense; and the lack of random large insert libraries has limited our ability to generate high quality genome sequences for these parasites. However, the Pacific Biosciences single molecule, real-time sequencing platform holds great promise in reducing input amounts and generating sufficiently long sequences that bypass the need for large insert paired libraries.
Results
Here, we report on efforts to generate a more complete genome assembly for L. loa using genetically heterogeneous DNA isolated from a single clinical sample and sequenced on the Pacific Biosciences platform. To obtain the best assembly, numerous assemblers and sequencing datasets were analyzed, combined, and compared. Quiver-informed trimming of an assembly of only Pacific Biosciences reads by HGAP2 was selected as the final assembly of 96.4 Mbp in 2,250 contigs. This results in ~9% more of the genome in ~85% fewer contigs from ~80% less starting material at a fraction of the cost of previous Roche 454-based sequencing efforts.
Conclusions
The result is the most complete filarial nematode assembly produced thus far and demonstrates the utility of single molecule sequencing on the Pacific Biosciences platform for genetically heterogeneous metazoan genomes.
The complete article is available as a provisional PDF. The fully formatted PDF and HTML versions are in production.
http://www.biomedcentral.com/1471-2164/15/788/abstract
Single molecule sequencing and genome assembly of a clinical specimen of Loa loa, the causative agent of loiasis
Luke J Tallon, Xinyue Liu, Sasisekhar Bennuru, Marcus C Chibucos, Alvaro Godinez, Sandra Ott, Xuechu Zhao, Lisa Sadzewicz, Claire M Fraser, Thomas B Nutman and Julie C Dunning Hotopp
Author Affiliations
For all author emails, please log on.
BMC Genomics 2014, 15:788 doi:10.1186/1471-2164-15-788
Published: 12 September 2014
Abstract (provisional)
Background
More than 20% of the world's population is at risk for infection by filarial nematodes and >180 million people worldwide are already infected. Along with infection comes significant morbidity that has a socioeconomic impact. The eight filarial nematodes that infect humans are Wuchereria bancrofti, Brugia malayi, Brugia timori, Onchocerca volvulus, Loa loa, Mansonella perstans, Mansonella streptocerca, and Mansonella ozzardi, of which three have published draft genome sequences. Since all have humans as the definitive host, standard avenues of research that rely on culturing and genetics have often not been possible. Therefore, genome sequencing provides an important window into understanding the biology of these parasites. The need for large amounts of high quality genomic DNA from homozygous, inbred lines; the availability of only short sequence reads from next-generation sequencing platforms at a reasonable expense; and the lack of random large insert libraries has limited our ability to generate high quality genome sequences for these parasites. However, the Pacific Biosciences single molecule, real-time sequencing platform holds great promise in reducing input amounts and generating sufficiently long sequences that bypass the need for large insert paired libraries.
Results
Here, we report on efforts to generate a more complete genome assembly for L. loa using genetically heterogeneous DNA isolated from a single clinical sample and sequenced on the Pacific Biosciences platform. To obtain the best assembly, numerous assemblers and sequencing datasets were analyzed, combined, and compared. Quiver-informed trimming of an assembly of only Pacific Biosciences reads by HGAP2 was selected as the final assembly of 96.4 Mbp in 2,250 contigs. This results in ~9% more of the genome in ~85% fewer contigs from ~80% less starting material at a fraction of the cost of previous Roche 454-based sequencing efforts.
Conclusions
The result is the most complete filarial nematode assembly produced thus far and demonstrates the utility of single molecule sequencing on the Pacific Biosciences platform for genetically heterogeneous metazoan genomes.
The complete article is available as a provisional PDF. The fully formatted PDF and HTML versions are in production.
http://www.biomedcentral.com/1471-2164/15/788/abstract
Single Molecule Sequencing and Genome Assembly of a Clinical Specimen of Loa loa
Posted on September 12, 2014 by nsengamalay
Scientists Apply Successful Single Molecule Sequencing and de novo Genome Assembly to a Parasitic Worm that Infects Human Eyes and Skin
Investigators at the Institute for Genome Sciences (IGS) at the University of Maryland School of Medicine and the Laboratory of Parasitic Diseases at the National Institute of Allergy and Infectious Diseases (NIAID) at the National Institutes of Health (NIH) used the long-read, single-molecule Pacific Biosciences platform for the successful genome sequencing and de novo assembly of Loa loa round worms from a clinical sample. Their research, which generated the most complete genome sequence of a filarial nematode produced to date, provides a more comprehensive reference genome for this parasite in the hopes of developing better molecular diagnostics to decrease morbidity from filarial nematodes. Their findings appear in today’s issue of BMC Genomics.
Click here to access the abstract and complete article.
This entry was posted in Illumina, PacBio, Press Release, RS II, Sequencing. Bookmark the permalink. http://www.igs.umaryland.edu/labs/grc/2014/09/12/single-molecule-sequencing-and-genome-assembly-of-a-clinical-specimen-of-loa-loa/?utm_content=bufferbf733&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
Thursday, September 11, 2014-- The Rise of Long Reads: Mendelspod Podcast Series
Mendelspod host Theral Timpson kicked off a new podcast series this week on long-read sequencing that will include interviews with luminaries in the genomics field. Check out this introductory article from Timpson for an explanation of why scientists are demanding longer reads to meet their research goals.
The first interview is with Mike Snyder at Stanford, who has published recent papers in Nature Biotechnology and PNAS using Single Molecule, Real-Time (SMRT®) Sequencing for transcriptome analysis and demonstrated that long reads enable full coverage of RNA molecules. He discusses that work and his views on long-read sequencing and transcriptomics on the show. Here are some highlights:
On the state of transcriptomics
Without using long-read sequencing, the way transcriptomes are figured out is “crazy,” Snyder explains. “We take RNA, we blow it up into little fragments, and then we try and assemble them back together to see what the transcriptome looked like in the first place. And that’s a horrible way to do this because what we’re really trying to do is understand all of the different isoforms of a transcript….So when you blow them up and try to reassemble them back together you can’t always figure out which parts of the puzzle belong together.” (This reminds us of a clever cartoon in Nature Methods last year, subscription required.)
The power of long-read sequencing
By nature, long-read sequencing can avoid ‘blowing up’ the transcripts, because as Snyder has demonstrated in his studies, it is possible to generate full-length transcripts using SMRT Sequencing.
“The power of long-read sequencing is really to be able to capture all of the information in its intact form without trying to solve a jigsaw puzzle that you may have put together wrong.” Snyder explains that misassembling transcripts can make it impossible to understand what is going on. For example, different isoforms of the same tumor gene have very different functions and may be either healthy or oncogenic.
On the $1000 genome
“Has the race to the $1000 genome been at the expense of quality?” Timpson asked Snyder. “Yes,” he replied. “I think people’s eyes are opening to that.” Based on what is currently considered the “$1,000 Genome” (which Snyder points out cannot actually be commercially purchased today for $1,000), he says, “the quality is still not there, there’s still significant gaps.”
“People don’t realize this but there are still several hundred gaps in the human genome that have never been closed,” Snyder explains. And, ironically, in the process of attempting to fix these gaps in the reference genome, researchers end up uncovering more errors, “so the number of gaps in the human genome has stayed fairly constant over the last 10 years or so,” he adds.
Snyder believes that PacBio’s SMRT Sequencing provides one solution to this problem, by spanning gaps, resolving structural variation and providing the ‘gold standard’ in quality in sequencing today.
Stay tuned
This series will also include interview with Eric Schadt, Gene Myers, and Dan Geraghty. We’ll be tuning in to hear what these great scientists have to say, and we hope you do, as well.
http://blog.pacificbiosciences.com/2014/09/the-rise-of-long-reads-mendelspod.html?utm_content=buffer1bda3&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
A fault-tolerant method for HLA typing with PacBio data
Chia-Jung Chang, Pei-Lung Chen, Wei-Shiung Yang and Kun-Mao Chao
Published: 3 September 2014
Abstract (provisional)
Background
Human leukocyte antigen (HLA) genes are critical genes involved in important biomedical aspects, including organ transplantation, autoimmune diseases and infectious diseases. The gene family contains the most polymorphic genes in humans and the difference between two alleles is only a single base pair substitution in many cases. The next generation sequencing (NGS) technologies could be used for high throughput HLA typing but in silico methods are still needed to correctly assign the alleles of a sample. Computer scientists have developed such methods for various NGS platforms, such as Illumina, Roche 454 and Ion Torrent, based on the characteristics of the reads they generate. However, the method for PacBio reads was less addressed, probably owing to its high error rates. The PacBio system has the longest read length among available NGS platforms, and therefore is the only platform capable of having exon 2 and exon 3 of HLA genes on the same read to unequivocally solve the ambiguity problem caused by the "phasing" issue.
Results
We proposed a new method BayesTyping1 to assign HLA alleles for PacBio circular consensus sequencing reads using Bayes' theorem. The method was applied to simulated data of the three loci HLA-A, HLA-B and HLA-DRB1. The experimental results showed its capability to tolerate the disturbance of sequencing errors and external noise reads.
Conclusions
The BayesTyping1 method could overcome the problems of HLA typing using PacBio reads, which mostly arise from sequencing errors of PacBio reads and the divergence of HLA genes, to some extent.
The complete article is available as a provisional PDF. The fully formatted PDF and HTML versions are in production.
http://www.biomedcentral.com/1471-2105/15/296/abstract?utm_content=bufferdf099&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
Has the Race to the $1,000 Genome Proceeded at the Expense of Quality? New Series on The Rise of Long Read Sequencing
published by Theral Timpson on Wed, 09/03/2014 - 21:21
According to a 2010 article in Bio-IT World, the term $1,000 Genome has been around since 2001. The University of Wisconsin’s David Schwartz claims to have coined the term at an NHGRI retreat during a breakout session. Whatever its origin, the $1,000 Genome soon became the target for the rapid development of next-gen sequencing (NGS).
With Illumina, the dominant player in the NGS market, claiming this year that they’ve reached that target with their HiSeq X Ten system, it’s fair to stop and ask just what has been achieved. What do you get for that $1,000? And furthermore, where does NGS go from here?
Beginning next week, we're launching a new series, The Rise of Long Read Sequencing.
I first heard “long read” sequencing differentiated from “short read” in an interview with Mike Hunkapiller, CEO of Pacific Biosciences last year. I had asked him the obvious question about how he expects to compete with Illumina, and he responded saying that “short read technologies” had serious draw backs.
“Wait a minute,” I remember thinking at the time, “did Mike just dismiss Illumina’s technology out right? And what are these long reads he’s talking about.”
There’s no doubt that Illumina is a major success story. In the current edition of Forbes, Matthew Herper crowns Illumina with a glowing article, naming the rapid decrease in the price of sequencing after their CEO, “Flatley’s Law.” This is no small praise for Illumina’s Jay Flatley, who has led the company from startup who used to offer oligos for $0.15/base to be the dominant player in the sequencing space, and now strongly poised as an upcoming contender in the clinical diagnostics industry.
But this is the story you’ll hear everywhere.
What is less known is that of the turnabout of Pacific Biosciences and the rise of long read sequencing. PacBio had a much touted beginning, raising north of $600 million. But they disappointed the industry by not delivering on some early hype that they could compete with Illumina on throughput by sequencing a human genome in fifteen minutes. In fact, PacBio not only didn’t improve on Illumina’s high throughput, their technology had the unattractive high error rate of 15%. And to top that, their machine was more expensive.
However, for over a year now, we’ve been following an emerging trend among researchers toward the use of PacBio’s long reads to do not only de novo sequencing, but to probe areas of the human genome that have defied short read technologies. From better characterization of RNA isoforms to raising the quality of the human reference genome, more and more papers are published touting the new possibilities of PacBio's long reads.
There’s also now some data coming from Oxford Nanopore’s new minION that is exciting the first round of users. This is long read data. In addition, I recently toured Genia Technologies’ facility in Mountain View and was shown their new sequencer now in alpha testing. Genia’s CEO, Stefan Roever, says their new chip will read over a million long reads per run.
Once you have long reads and high throughput, is there any use for short read technology? I asked Stefan. “Not really,” he confirmed.
To chronicle the rise of long reads, we went to PacBio and asked them if they’d introduce us to some of their users and sponsor a series on the topic. They did.
Take the story of Gene Myers, for instance. Gene helped develop the BLAST algorithm for sequence alignment back in the 90’s, working on the Human Genome Project at Celera. Then he got out of sequencing to pursue “more interesting science.” He thought that the future of sequencing was pretty straight forward and not that provocative for a scientist.
“Everything basically went short because that’s where you could get the reduction in cost,” says Myers in our upcoming interview. “Today everyone does it routinely but I don’t think they should be. . . . They’re using 100 bp reads, and the assemblies are crappy,” Gene says.
Gene is now back into sequencing, working at the Max Planck Institute in Germany. And he’s very excited about long reads. He says that for the first time ever it is theoretically possible to get to 100% accuracy with PacBio’s technology.
Wait a minute. What about PacBio’s terrible accuracy rate?
It turns out that that even though the error rate of the PacBio SMRT system was quite high, the errors were random. So if you stacked the sequences deep enough, you could greatly improve the accuracy.
We ask Gene how is it that the industry has bought in for so long to the short read technology?
“I think it’s because they weren’t offered anything else. It’s what you got,” says Myers.
We start off the series with Mike Snyder from Stanford who explains how PacBio’s long read technology has opened up his research into the transcriptome. Often there are various RNA isoforms that are hard to analyze with Illumina’s short read technology, Mike says. He’s recently published a couple papers showing that with PacBio’s long reads he is able to completely cover the full-length RNA molecules, thereby characterizing areas that previously have not been annotated.
After that we’ll be talking with the former CSO of PacBio, Eric Schadt, now at the Icahn Institute at Mt. Sinai in New York. In his current job he’s working to bring sequencing to the clinic and says that the PacBio long reads are very important for getting a better picture of the genome. From Eric's interview:
“In order to drive the throughput super high, we’ve been ignoring a lot of the structural features in the genome that are as important as some of the single nucleotide hits, whether its long tandem repeats that vary, or bigger structural variations, or focal variants that are important in cancer--those things are difficult to characterize unambiguously with the current short read technology. [Short reads] were attuned to certain problems and had certain advantages that enabled this big advance, but they are absolutely not hitting the entire problem like we need hit.”
In addition to improving our understanding of the transcriptome and structural variation of the genome, the long read technology is helping us nail down that troublesome area of the genome known as the HLA region. This is a region that holds much promise for biomedical research because not only has it defied easy characterization, it just happens to be connected to many of the common diseases we have.
Dan Geraghty has been sequencing the HLA region for many years. Some of his work was used in the original Human Genome Project. Dan says that long read sequencing is a game changer.
“Long reads is the NGS story of the year,” he told me in our pre-interview chat.
For now this long read story is pretty much owned by PacBio. But all of these researchers say they are platform agnostic and are happy to see new technologies on the horizon that are promising long reads. There’s Oxford Nanopore and Genia and others, including Nabsys who we’ve profiled here as well. Illumina offers their Moleculo technology which assembles long reads from shorter reads, but not many have seen the datasets or other details about this technology.
So what does this mean for the future of NGS? Do long reads open up vast new territories in genomics that have yet to be discovered or are they just a nice bonus? We’ll be pursuing these questions with other guests as well, including upcoming chats with Shawn Baker, CSO of the sequencing marketplace, Allseq, and with George Church of Harvard.
http://mendelspod.com/blog/has-race-1000-genome-proceeded-expense-quality-new-series-rise-long-read-sequencing
Mon, 08/25/2014 - 23:32hrs--- Workshop Summary and Slides from August 2014. -- Thank you to all of you participated in our GIAB workshop August 14-15. It was great to have discussions with many of you. Below is a summary of talks and discussions from the workshop. Most of the slides are available on slideshare at http://www.slideshare.net/GenomeInABottle, and some additional slides may be posted in the next couple weeks. We plan to hold the next GIAB Consortium workshop Jan 29-30, 2015 at Stanford University, and we hope to see many of you there then!
NIST gave an update on the Consortium’s progress since the last workshop as well as its plans for releasing RMs and data over the next year (http://www.slideshare.net/GenomeInABottle/aug2014-giab-intro-slides; http://www.slideshare.net/GenomeInABottle/aug2014-giab-status-update-and.... A rough timeline is available in the slides at http://www.slideshare.net/GenomeInABottle/aug2014-nist-rm-development-plans. The most immediate plans are to release the pilot NIST RM8398 (based on NA12878) by the end of 2014, and to release preliminary SNP and indel calls and ~100x PacBio sequencing for the PGP Ashkenazim trio in Q1 2015.
Don Baldwin gave an update on the proposed ABRF interlaboratory NGS sequencing study. This study will use the Ashkenazim trio, as well as several mixed samples to test somatic mutation detection (http://www.slideshare.net/GenomeInABottle/aug2014-abrf-interlaboratory-s...).
Several groups gave examples of how they are already using the NIST high-confidence SNP, indel, and homozygous reference calls for benchmarking sequencing and bioinformatics methods (http://www.slideshare.net/GenomeInABottle/aug2014-use-cases-combined).
RM Selection & Design Working Group (http://www.slideshare.net/GenomeInABottle/aug2014-working-group-report-r...)
Horizon Diagnostics presented about their engineered cell lines with ~40 cancer-relevant mutations, including one translocation that is present in DNA and RNA. They also embed mixed cells in FFPE.
Acrometrix presented about their control material containing synthetic DNA spiked into DNA from the cell line of the Askenazim son, which will be NIST RM 8391. The Acrometrix control contains 504 SNVs, 2 MNVs, and 49 indels of lengths ranging from 1 to 41 nucleotides (29 deletions, 19 insertions, and one complex indel). Data from a multi-site study demonstrated differences in performance between sites using the same control material.
Translocations were discussed as a challenging problem for performance assessment, but we decided to postpone until we better understand the utility of synthetic spike-ins with SNVs and indels.
To use existing RMs for cancer applications, they may need to be characterized for lower frequency mutations, which would require very high read depths at least in some locations. This may need to be done on the actual RM batch of DNA since new somatic mutations can accumulate in the cell lines.
For additional families to be made into RMs, it was stated that diverse ancestries are desired, especially an admixed Hispanic family and African ancestry. This may help avoid over-fitting pipelines and will be useful for mixing experiments.
This group also discussed a potential interlaboratory study to assess whether synthetic controls are good surrogates for DNA isolated from a clinical sample. Several members plan to develop a strawman proposal to distribute to the consortium for comments.
Characterization/Bioinformatics Working Groups (http://www.slideshare.net/GenomeInABottle/aug2014-working-group-report-c...)
In the last workshop, it was recommended that GIAB data be submitted to the SRA first and then NCBI will add them to the GIAB ftp site. However, since most GIAB members have not submitted to the SRA, barriers to submitting data to SRA were discussed. Generally, experienced submitters expressed that they find the process straightforward, but the SRA could probably have better instructions for first-time submitters. Any specific suggestions for improving submission instructions are welcome.
In April 2014, NIST released v.0.2 of its calls that integrated the phased pedigree calls from Real Time Genomics and Illumina Platinum Genomes. NIST has generally received positive feedback about these calls, but some suggestions were made at the workshop that may refine them in certain cases.
NIST is reimplementing a new version of its multi-dataset integration methods that will be on the cloud and more easily run by other interested groups. While the concepts behind this new version are similar to the Nature Biotechnology paper from Feb 2014, it is being redesigned to be more flexible to better take advantage of knowledge of strengths and biases of each sequencing method. More details about the proposed methods are available at http://www.slideshare.net/GenomeInABottle/aug2014-nist-integration-plans.
The PGP trios will be characterized by a number of long-read technologies to help characterize more difficult variants and parts of the genome. Specifically, NIST and Mt Sinai will be performing 100x PacBio sequencing of the Ashkenazim trio (~60x of son and ~20x of each parent). Mt Sinai has developed methods to perform SV calling and de novo assembly with PacBio reads alone and in combination with BioNano Genomics, which will be submitted for publication soon. BioNano Genomics has already mapped the Asian son and plans to measure the Ashkenazim trio as well. Illumina is preparing moleculo assembled long read libraries for the Ashkenazim and Asian sons, which will be sequenced at NIST and will result in ~10x coverage of the genome by long reads. Complete Genomics is also preparing LFR libraries for the PGP trios. Short read sequencing of the PGP trios by Complete Genomics, Illumina, SOLiD, and Ion is already finished or will be completed soon and will be uploaded to the GIAB ftp site. Everyone is welcome to help analyze these data, particularly the new long-read technologies, to help detect variants in more difficult parts of the genome and SVs.
Several groups are working on developing SV calls for GIAB genomes. Personalis has used the NA12878 pedigree to find large deletions that are breakpoint resolved and in multiple members of the pedigree, and NIST has developed methods to look for evidence of candidate SVs in bam files (http://www.slideshare.net/GenomeInABottle/aug2014-nist-structural-varian...). In addition, Spiral Genetics has developed calls for NA12878 using its unique anchored assembly approach (http://www.slideshare.net/GenomeInABottle/aug2014-spiral-genetics-anchor...). Harvard School of Public Health and other have collaborated to develop SV integration and comparison methods as well, including using a set of high-confidence SVs developed by LUMPY (http://bcbio.wordpress.com/2014/08/12/validated-whole-genome-structural-...).
Transitioning to GRCh38 was also discussed, since having high-confidence calls for this new reference assembly will help labs transition to using it. Taking full advantage of GRCh38 requires tools that can use the alternate haplotypes, and Deanna Church will update the group on development of these tools after a meeting this fall. In the meantime, it was suggested that we remap all of the existing data to GRCh38, including the alternate haplotypes, but then exclude as uncertain any regions in the reference that have corresponding sequence in an alternate haplotype that does not exactly match the reference.
Performance Metrics Working Group (http://www.slideshare.net/GenomeInABottle/aug2014-working-group-report-p...)
The Performance Metrics Specification draft posted on the blog earlier this year was discussed, and edits were suggested at https://docs.google.com/document/d/1g8Q-6aunFmyyeS_L3xmRMqUT8PmWdIiCrzdE.... It was suggested that use cases, inputs, and outputs need to be clarified further. It was emphasized that a web-based interface for variant comparison will be very helpful for many labs. It would be useful to incorporate the comparison tool into the GeT-RM browser and the GCAT website and both websites have expressed interest in doing this when a comparison tool is developed.
The Global Alliance for Genomic Health (GA4GH) has very recently formed a Benchmarking working group to develop standardized methods and metrics for benchmarking variant calls. Julie Newcomb from UC Berkeley discussed the SMaSH benchmarking tool developed by their group, which is likely going to be adopted and further developed by GA4GH. The Performance Metrics group decided to continue working with GA4GH and SMaSH and ensure that the performance assessment performed by SMaSH will be useful to clinical labs.
http://www.genomeinabottle.org/blog-entry/workshop-summary-and-slides-august-2014 /// http://www.bio-itworld.com/2014/8/25/updates-genome-bottle-consortium.html
Published: 21 August 2014
Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes
Abstract (provisional)
Background
The availability of diverse second- and third-generation sequencing technologies enables the rapid determination of the sequences of bacterial genomes. However, identifying the sequencing technology most suitable for producing a finished genome with multiple chromosomes remains a challenge. We evaluated the abilities of the following three second-generation sequencers: Roche 454 GS Junior (GS Jr), Life Technologies Ion PGM (Ion PGM), and Illumina MiSeq (MiSeq) and a third-generation sequencer, the Pacific Biosciences RS sequencer (PacBio), by sequencing and assembling the genome of Vibrio parahaemolyticus, which consists of a 5-Mb genome comprising two circular chromosomes.
Results
We sequenced the genome of V. parahaemolyticus with GS Jr, Ion PGM, MiSeq, and PacBio and performed de novo assembly with several genome assemblers. Although GS Jr generated the longest mean read length of 418 bp among the second-generation sequencers, the maximum contig length of the best assembly from GS Jr was 165 kbp, and the number of contigs was 309. Single runs of Ion PGM and MiSeq produced data of considerably greater sequencing coverage, 279x and 1927x, respectively. The optimized result for Ion PGM contained 61 contigs assembled from reads of 77x coverage, and the longest contig was 895 kbp in size. Those for MiSeq were 34 contigs, 58 x coverage, and 733 kbp, respectively. These results suggest that higher coverage depth is unnecessary for a better assembly result. We observed that multiple rRNA coding regions were fragmented in the assemblies from the second-generation sequencers, whereas PacBio generated two exceptionally long contigs of 3,288,561 and 1,875,537 bps, each of which was from a single chromosome, with 73x coverage and mean read length 3,119 bp, allowing us to determine the absolute positions of all rRNA operons.
Conclusions
PacBio outperformed the other sequencers in terms of the length of contigs and reconstructed the greatest portion of the genome, achieving a genome assembly of "finished grade" because of its long reads. It showed the potential to assemble more complex genomes with multiple chromosomes containing more repetitive sequences.
The complete article is available as a provisional PDF. The fully formatted PDF and HTML versions are in production.
close
Sign up to receive new article alerts from BMC GenomicsSign up Advertisement
http://www.biomedcentral.com/1471-2164/15/699/abstract