LT.Swing trade!
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
PacBio CEO Says Roche Deal Validates Technology, Will Lead to Improvements for Research Customers
September 30, 2013
http://www.genomeweb.com/sequencing/pacbio-ceo-says-roche-deal-validates-technology-will-lead-improvements-research?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+genomeweb%2Finsequence+%28In+Sequence%29
("Pacific Biosciences’ SMRT technology is currently is the only technique with a read length of 5,000 bases on average at an accuracy above 98%") http://www.eurobiotechnews.eu/news/news/2013-04/roche-and-pacbio-in-us75m-deal.html
GenomeWeb-September 24, 2013.NEB Employs PacBio to Study Bacterial Methylomes; Working on Optimizing Reagents for 5-mC Detection
By Monica Heger
(Monica Heger tracks trends in next-generation sequencing for research and clinical applications for GenomeWeb's In Sequence and Clinical Sequencing News). Sorry Full-text access for premium subscribers only )
http://www.genomeweb.com/sequencing/neb-employs-pacbio-study-bacterial-methylomes-working-optimizing-reagents-5-mc-d
September 21st, 2013 End of Short-Read Era? – (Part I)
I will go out on a limb and make a bold call. The world of genomics is on the verge of seeing another set of major transformations, and many algorithms, tools, pipelines and methodologies developed for short reads over the last 3-4 years will be useless. In my opinion, the era of short-read sequencing is reaching a peak, or to be kind to its users, short read technologies are shining like the full moon. Related to peaking of the short read era, we will see two other changes – (i) end of “genome sequencing and genome paper” era and (ii) end of big data bioinformatics. For further explanation of the last sentence, please read the detailed explanation in the later part of the commentary.
I spent the last two days in California attending an user group meeting of Pacific Biosciences and am absolutely fascinated by the talks and discussions. You may say that the views expressed here are colored by many PacBio only talks and that is true to some extent. However, I went into the meeting with a healthy dose of skepticism, even though I heard many positive opinions on PacBio technology for months. I know Dr. Jason Chin, the author of HGAP paper and a leading bioinformatician at PacBio, for many years. He is a very smart ex-physicist and was my collaborator in two important papers. Over the last eighteen months, he had been telling me that I should abandon trying to assemble genomes from short reads. Despite knowing him for long time, I assumed that he was biased and was trying to oversell his company’s technology. Now I am embarrassed, because his description of the technology appears to be quite accurate, if not a modest description of the range of possibilities.
Cost per Base is a Superficial Measure. Cost per Information Content is More ‘Informative’
Few weeks back, Sergey Koren posted a paper titled – Reducing assembly complexity of microbial genomes with single-molecule sequencing in the arxiv. The paper discussed how PacBio helped in finishing a number of bacterial genomes.
BioMickWatson responded with – Bacterial genomes – 2nd and 3rd generation costs and came to the conclusion -
My conservative estimate is that PacBio is about 10 times more expensive per sample for bacterial genomes than Illumina, and in reality it is probably higher.
His comment about expensive versus cheap boiled down to how many nucleotides one can get for a fixed amount of money, which I believe is a superficial way of seeing things. Think about it this way. We know that PacBio reads are 85% accurate. Therefore, only 85 out of 100 nucleotides of PacBio reads are informative. In the zeroth order, that should make the PacBio reads 15% less useful than a simple nucleotide cost comparison would suggest.
But why stop at the zeroth order? In the first order, we know that PacBio reads are about two orders of magnitude longer than the Illumina reads. Positional data provided by long reads is another form of information. In this comparison PacBio wins, especially after RS II and BluePippin size selection. So, we get two orders of magnitude of information gain with PacBio for one order of magnitude of extra cost as estimated by BioMickWatson.
Then again, others would argue that Illumina mate pair reads provide positional information too, and that point is not lost on BGI or the authors of the SPAdes assembler. BGI, for example, recently published the tiger genome paper, where they pushed the mate pair insert size to 20 kb.
Libraries for the Amur tiger genome were constructed at BGI, Shenzhen, and the insert sizes of the libraries were 170?bp, 500?bp, 800?bp, 2?kb, 5?kb, 10?kb and 20?kb.
We also need to factor in the bioinformatician’s time needed to scaffold longer mate pairs. The assembly process is not fully automated and can add errors here and there (Assemblathon 2 anyone?). Sorting out those errors increases the cost of assembly, which a pure sequencing cost does not include.
Taking all those points into account, it is not at all easy to estimate the cost per information content, but that does not mean we need to resort to superficial measures like cost per nucleotide.
Genome Assembly
Is there any reasonable way to compare cost per information content for two technologies? I would argue that complete genome assembly is one such problem. Complete genome is a type of information that biologists consider as important. Both Illumina and PacBio technologies are attempting to provide that information through two different routes.
You can see that even though short reads are clean themselves, the read size itself is a form of noise or loss of information. That loss of information manifests itself into the assembly quality of the genome.
How about relative costs? Alex Copeland of JGI gave a fantastic talk, where he discussed how the assembly costs and qualities of microbial genomes changed over the years.
2002-2006 – $50K/genome, 49 contigs (Sanger)
2006-2008 – $35K/genome, 22 contigs (Sanger)
2008-2011 – $10K/genome, 44 contigs (Sanger)
2011-2013 – $1.5K-$3K/genome, 69 contigs (switch to Illumina)
2011-2013 – $5K/genome, 6 contigs (PacBio)
Based on his numbers, the switch to PacBio increased his costs by about a factor of 2, but led to big improvement in the quality of the assembly. In the microbial world, PacBio has become a winner.
Then I heard the same about the fungal genomes, where JGI managed to assemble complete chromosomes with PacBio. Chongyuan Luo of Salk Institute presented on high quality assembly of various Arabidopsis strains (Col-0, Ler-0, cvi) from PacBio only. HGAP was quite successful there as well (access data here) . Given that Arabidopsis genome is 120 Mb long, I do not see why assembling smaller vertebrate genomes from PacBio only would pose any difficulty. HGAP paper was published only this year and nobody attempted the method on larger genomes only due to lack of time.
There will be no Assemblathon 3
As I explained earlier, short read sequences are clean themselves, but have severe loss of information through read sizes. That loss of information manifests itself into the genome assembly. Assemblathon 1 and 2 were the responses of genomics community to estimate that ‘noise’ in genome assembly.
Given that large PacBio sequences already removed the assembly noise from genomes as large as Arabidopsis, I do not see what the purpose of Assemblathon 3 would be. Will Assemblathon 3 be restricted to short reads only to quantify the relative success rates of short read-related algorithms in removing noise? Will it include very difficult assembly problems that only PacBio reads can handle? I suspect Assemblathon 3 will face operational difficulties, because it will not only have to evaluate assembly algorithms but will also have to compare different sequencing technologies.
However, when it comes to PacBio only, their Arabidopsis work already made extensive comparisons between the existing Arabidopsis assembly and the new assembly to show that PacBio produced high quality assembly. Not only that, the comparisons led to understanding of why the assemblies were incorrect in some regions and further refinement of the assembly algorithm. After those corrections are accounted for, there is little need to worry about assembly error. Hence no Assemblathon 3.
End of Genome Sequencing Era
PacBio makes assembly so easy that there will be no glory in genome assembly. The genome sequencing era started in the mid-90s with the publication of genome papers of various model organisms, but reached massive media frenzy in 2000-2001, when human genome papers were published. Biologists are generally rewarded for publishing in Science and Nature and genome papers had been the surest way to get there.
In the first generation, genome papers used to be 100 pages long and journals devoted entire issues for the supporting papers. Then the journals started to shrink space devoted to genome papers and pushed much of the material to supplement. Supporting papers were asked to go elsewhere.
In the late stage, genome papers turned into a very predictive meme -
(i) X is a cool organism,
(ii) We assembled high-quality genome of X, which will help in figuring out why X is cool.
(iii) Title: “Genome of X reveals why X is cool”, where ‘reveal’ refers to few genes identified through genome comparison and some straightforward bioinformatics.
Essentially the genome papers got into Nature or Science for picking a cool organism and completing (ii). With long reads from PacBio making step (ii) easy, I do not see why genome papers should get any importance for merely completing the engineering task of assembling high-quality genomes. We also noticed that a paper comparing the genomes of white Bengal tiger, African lion, white African lion and snow leopard did not make into a glam journal. So, possibly BGI saturated the field even prior to arrival of PacBio.
Assembly of Difficult Regions – Major Histocompatibility Complex
What is biologically relevant information? So far we talked about assembly of entire genomes and argued that higher quality genome assembly is more informative than cleaner short reads. However, many scientists do not benefit from the entire genome unless the genomic regions they are working on are assembled properly. Given that NIH spends a large part of its budget to cure human diseases, one can say that immune genes in mammals is an informative piece of the puzzle contributing to the research of many biologists.
Stanford immunologist Lisbeth Guethlein presented on -”Genomic Architecture of the KIR and MHC-B and -C Regions in Orangutan” and explained how PacBio helped them immensely in reconstructing those relevant genes.
If you are unfamiliar with MHC, here is a short introduction from wiki -
The major histocompatibility complex (MHC) is a set of cell surface molecules encoded by a large gene family in all vertebrates. MHC molecules mediate interactions of leukocytes, also called white blood cells (WBCs), which are immune cells, with other leukocytes or body cells. MHC determines compatibility of donors for organ transplant as well as one’s susceptibility to an autoimmune disease via crossreacting immunization. In humans, MHC is also called human leukocyte antigen (HLA).
….
The MHC gene family is divided into three subgroups: class I, class II and class III. Diversity of antigen presentation, mediated by MHC classes I and II, is attained in multiple ways: (1) the MHC’s genetic encoding is polygenic, (2) MHC genes are highly polymorphic and have many variants, (3) several MHC genes are expressed from both inherited alleles.
The Stanford group tried to assemble the MHC region of Orangutan for many years, first using Sanger and then from 454 sequences. Then came PacBio and they got their results within days without spending any time on bioinformatics. That was quite fascinating, because I am stuck with very similar assembly difficulties for smell and odor receptors in another organism. Receptor genes sit in clusters and are very similar to each other, making their assembly from short reads extremely difficult. Before going to the meeting, I have been working on an algorithm to sort out tandem duplicated genes from de Bruijn graphs. Now I wonder whether that is at all necessary and whether I should spend my time elsewhere.
Assembly of Diploid and Polymorphic Genomes
Now that the problem of getting high-quality genomes from next-generation sequencing is solved, researchers can move on to other difficult problems such as phasing. Separating diploid genomes from short reads is very difficult, especially when the chromosomes differ by a large extent (>5%). BGI had hell of a problem with the oyster genome and they came up with an innovative method that required extensive resequencing of shorter segments of the genome. Long PacBio reads would make the process quite easy. Chongyuan Luo from Salk Institute mentioned that they were working on mixing reads from two separate strains of Arabidopsis to see whether the strains can be computationally separated out.
Transcriptome Assembly
PacBio developed a terrific transcriptome assembler – ‘CRAZY’. It assembles the transcripts and isoforms easily and to a high degree of accuracy. Bye bye Trinity !!
If you are wondering about it, ‘CRAZY’ is not an acronym. One has to be crazy to think about assembling transcripts from PacBio reads, given that the reads themselves are longer than typical genes. You sequence and you get your genes. No assembly needed.
Gene Expression Measurement
Transcriptome experiments have two components – (i) finding the genes, (ii) determining relative gene expressions. Illumina is a clear winner on the counting aspect.
We worked on transcriptomes for over a decade and find the introduction of PacBio to lead to another major shift as shown below.
i) First generation: Genes were determined through gene prediction programs and some EST. Expressions were measured through spotted arrays.
ii) Second generation: Genes were determined through EST, comparative analysis and later tiling arrays. Expressions were measured using oligonucleotide arrays.
iii) Third generation: Short read technologies helped in both determining the genes and estimating relative expressions.
iv) Fourth generation (future): Genes will be determined through PacBio experiment, and relative expression will be measured through short read sequencing.
Finding Methylation Patterns and other Genome Modification
Matthew Blow from JGI gave an excellent presentation on bacterial functional genomics. PacBio is the only sequencing technology capable finding genome modification directly from the same signals for identifying nucleotides. The talk was quite informative on the biological aspects and I will see whether I can find the slides to post here.
Part II of this commentary will cover metagenomics, pricing aspects, what the changes mean to biologists and core facilities, and whether Pacbio can survive as a business to make the above changes happen.
http://www.homolog.us/blogs/blog/2013/09/21/end-illumina-era/
(More twitter @ #PacBioUGM) Icahn Institute ?@IcahnInstitute 5m ago---Up now Bobby Sebra @IcahnMountSinai re: increased subread lengths with PacBio RS II and Blue Pippin #PacBioUGM #TurboChargeYourPacBioFollowed by Pacific Biosciences
4:17 PM - 18 Sep 13 ----- Pacific Biosciences ?@PacBio 10m ago -Bobby Sebra @IcahnInstitute: Optimizing Blue Pippin Size Selection for Increased Subread Lengths on the PacBio RS II #PacBioUGM @SageSciExpand 4:16 PM - 18 Sep 13 ---- 10m ao-- Bobby Sebra @IcahnInstitute uses PacBio for "Human Sequence "Infectious Disease" 4:19 PM - 18 Sep 13 --- Bobby Sebra @IcahnMountSinai Best read = 34,500 bases! #PacBioUGM #TurboChargedPacBio----- 1m ago --Bobby Sebra: Rapid & cost effective Infectious Disease Surveillance - complete microbial genomes with only 1-4 SMRT Cells #PacBioUGMFollowed by Pacific Biosciences
4:32 PM - 18 Sep 13 --- Bobby Sebra: whole human genome on PacBio indicates that the reference likely under represents TR Spans #PacBioUGM
Bioscribe
@bioscribe
Public Relations and Marcom Consultancy for Life Science Technology Companies
Read Opinion by @SageScience CEO @TheScientistLLC http://shar.es/iBPTe Bioscribe ?@bioscribe 1h ago GlandStone Labs using PacBio to sequence chicken genes to understand why human children are born with congenital heart problems. twitter @ #PacBioUGM - Pacific Biosciences ?@PacBio 1h ago AH: 1.5 million sequencing reads of full-length transcripts mapped uniquely (~72%) to chicken transcriptome (galGal4 assembly) #PacBioUGM
2:53 PM - 18 Sep 13 · Details
PacBio Sequencing Providers
Welcome! Find a PacBio sequencing provider for your project needs - download a complete list of our sequencing providers.
Select Your Region: Institution Location Regions Served + Cold Spring Harbor Laboratory (CSHL) Woodbury, NY North America Name: Cold Spring Harbor Laboratory (CSHL)
Website: http://cshl.edu/pacbio
Email: eantonio@cshl.edu
Description: We provide PacBio sequencing services to all HHMI investigators located at nearby non-profit research institutions.
We offer full service sequencing on the Pacific Biosciences RS instrument. We will QC the DNA, make the PacBio sequencing library and sequence the library on the appropriate number of SMRT cells.
Furthermore, we will help you design your sequencing experiment (insert size, type of run, number of SMRT cells, etc.) to help ensure that you will receive the data you need for your research.
Phone: none
Address:
CSHL Genome Center
500 Sunnyside Blvd.
Woodbury, NY 11797
+ DNA Link, Inc. South Korea Global Name: DNA Link, Inc.
Website: http://www.dnalink.com
Email: office@dnalink.com
Description: Established in March 2000, DNA Link, Inc. is a leading contract research service provider of genomics studies. Its services consist of next generation sequencing(NGS), microarray, genotyping and personalized genome services.
Phone: +82 2-3153-1500
Address:
12Fl, Asan Institute for Life Sciences 1, Olympic-Ro 43-GIL rd.88
Songpa-gu, Seoul (138-736),
S.Korea
+ Duke University Durham, NC Global Name: IGSP Genome Sequencing & Analysis Core Resource
Website: http://www.genome.duke.edu/cores/sequencing/
Email: sequencing@duke.edu
Description: The mission of the Genome Sequencing & Analysis Core Resource is to promote advances in biology by providing researchers complete genomic solutions with the latest and most complementary technologies readily available. We provide services for traditional capillary sequencing, high-throughput next-generation sequencing, and various data analyses, both across the Duke University campus and across the world. We offer services on a several types of sequencers (Illumina, Ion Torrent, PacBio, ABI). We have experience in a wide range of applications, such as de novo genome sequencing, targeted re-sequencing, metagenomics, transcriptome profiling, and epigenetic marker sequencing. We help researchers with their project plan, providing them with the most cost effective and appropriate solutions with our available platforms to reach their goal. Our Core Resource provides a start-to-finish partnership with investigators that includes consultation, sample preparation, library construction, DNA sequencing, post-run quality control, and in-depth bioinformatics analyses.
Phone: +1 919-684-3359
Address:
Department of Biology
Box 90338
Duke University
Durham, NC 27708
+ Expression Analysis Durham, NC Global Name: Pacific Biosciences Service Laboratory
Website: http://www.expressionanalysis.com/platforms/category/pacific_biosciences/
Email: info@expressionanalysis.com
Description: Expression Analysis (EA) provides whole genome to focused set gene expression profiling and genotyping assays along with DNA and RNA sequencings services, sequence enrichment technologies and bioinformatics support. EA offers solutions for challenging specimens such as whole blood and FFPE tissues, as well as nucleic acid isolation and data analysis services. Our quality system follows CLSI guidelines and our CLIA-certified laboratory supports GLP compliance.
Phone: +1 866-293-6094
Address:
4324 S. Alston Avenue
Durham, NC 27713
+ GATC Biotech™ Germany Global Name: GATC Biotech
Website: www.gatc-biotech.com
Email: customerservice@gatc-biotech.com
Description: GATC has all the leading Next Gen sequencing technologies available in its lab, providing multiplatform sequencing strategies. The laboratories are currently equipped with the following Next Gen systems: 2x Roche GS FLX+ System 4x Illumina HiSeq 2000, 1x Pacific Biosciences PacBio RS.
As the first in Europe the Next Generation Sequencing Laboratories of GATC Biotech's Genome and Diagnostics Centre were accredited by the German Accreditation Body (DAkkS) according to the international norm ISO 17025. We fully meet essential standards for the authority of analysis and calibration laboratories. These are DIN EN ISO/IEC 17025 as well as DIN EN ISO 9001.
Phone: www.gatc-biotech.com
Address:
Jakob-Stadler-Platz 7
78467 Konstanz
Germany
+ Genomics Resource Center, Institute for Genome Sciences, University of Maryland Baltimore, MD Global Name: Genomics Resource Center
Website: http://www.igs.umaryland.edu/resources/grc/index.php
Email: grc-info@som.umaryland.edu
Description: The Genomics Resource Center (GRC) is a high-throughput laboratory and data analysis group supporting the scientific programs of IGS and its collaborators, both across the University of Maryland Baltimore campus and across the globe. The GRC team has been an early-adopter and pioneer in genomic technology development for the past 20 years. Genomic technologies and applications now permeate both basic and clinical research. Using multiple sequencing and analysis platforms, the GRC generates high-quality genomic data in a cost-effective manner.
Phone: none
Address:
University of Maryland
Genomics Resource Center
801 W. Baltimore Street
Suite 638
Baltimore, MD 21201
+ John Hopkins University School of Medicine Baltimore, MD Global Name: JHMI Deep Sequencing and Microarray Core
Website: http://www.microarray.jhmi.edu/
Email: w3w@hit.jhmi.edu
Description: We started PacBio RS sequencing since 2011 and recently upgraded to RS II. We offer PacBio RS II sequencing for all applications.
Phone: +1 443-287-9056
Address:
733 North Broadway
MRB 360
Baltimore, MD 21205
+ KeyGene - KeyGene N.V. Netherlands Global Name: Keygene NV
Website: http://www.keygene.com
Email: info@keygene.com
Description: KeyGene provides its partners and customers in the plant breeding industry contract research, partnerships and molecular genetic services.
KeyGene delivers novel enabling technologies and applications to support companies and institutes that carry out fundamental or applied genetic/genomic research or that seek to enhance the quality of their products through the improvement of their genetic material.
Phone: +31 (0)317-466866
Address:
Keygene N.V.
Agro Business Park 90
6708 PW Wageningen
The Netherlands
P.O. Box 216
6700 AE Wageningen
The Netherlands
+ KeyGene - Plant Research International Netherlands Global Name: Plant Research International
Website: http://www.wageningenur.nl/en/Expertise-Services/Facilities/CATAgroFood-3/Our-facilities.htm
Email: info.cat-agrofood@wur.nl
Description: Wageningen University, CAT-AgroFood offers state-of-the-art Next Generation Sequencing technology and expertise for a wide range of research applications.
Phone: none
Address:
Wageningen University
Plant Research International,
PO Box 6700AA Wageningen,
The Netherlands
+ ServiceXS Netherlands Global Name: ServiceXS
Website: http://www.servicexs.com
Email: info@servicexs.com
Description: ServiceXS is a ISO/IEC 17025 accredited service provider in the area of genomics.
Our goal is to provide our customers with reliable, high-quality data that paves the way to new insights and scientific breakthroughs.
Our core values are Quality, Innovation & Knowledge.
Phone: +31 (0) 71-568-1050
Address:
Plesmanlaan 1d
2333 BZ
Leiden
The Netherlands
+ McGill University Canada North America Name: McGill University and Genome Quebec Innovation Center
Website: http://gqinnovationcenter.com/index.aspx
Email: infoservices@genomequebec.com
Description: The McGill University and Genome Quebec Innovation Centre is a world class research facility for genomics and proteomics. Founded in 2002, the Centre has developed a world-renowned expertise in complex genetic disorders such as cardiac disease, asthma and Type 2 diabetes, and has become a resource and a networking site for various research initiatives in human health, forestry, infectious diseases, agriculture and environment.
Ambitious projects in recent years are a testimony of the ability of Genome Quebec to provide data of exceptional quality in the pursuit of various genomics studies. The Innovation Centre provides complete DNA and RNA analysis services, from a few samples to several tens of thousands per week. Large-scale genomics and proteomics services at the Innovation Centre are articulated around sequencing (including massively parallel sequencing), genotyping, functional genomics and proteomics supported by a solid infrastructure, tools (Nanuq) and unique expertise in bioinformatics.
All services work in parallel to provide comprehensive, reliable services to the Quebec, Canadian and international scientific community. Located on the campus of McGill University in the heart of Montreal, the Innovation Centre acts as a vast resource of knowledge and technology to the academic and industrial sectors.
Phone: none
Address:
740 Dr. Penfield ave
Montreal, Qc
H2P2K2
Canada
+ National Center for Genome Resources (NCGR) Santa Fe, NM North America Name:National Center for Genome Resources - Sequencing Center
Website: http://www.ncgr.org
Email: seq@ncgr.org
Description: The National Center for Genome Resources (NCGR) in Santa Fe, New Mexico, is a non-profit research institute dedicated to improving human health and nutrition through the application and development of the most advanced genomic and innovative bioinformatics resources available. The NCGR Sequencing Center provides sequencing services using Pacific Biosciences and Illumina instrumentation, comprehensive experimental design assistance, as well as novel informatics solutions to achieve the optimal results for practically any genomic experiment. At NCGR we are committed to remaining at the vanguard of our field by providing the best sequencing and analysis results for all our collaborators and clients.
Phone: +1 505-995-4444
Address:
National Center for Genome Resources
2935 Rodeo Park Drive East,
Santa Fe, NM 87505
+ University of California San Diego La Jolla, CA Global Name: Biomedical Genomics Facility (BIOGEM)
Website: http://biogem.ucsd.edu
Email: biogem@ucsd.edu
Description: The Biomedical Genomics laboratory (BIOGEM) is a genomics facility located in the Department of Medicine at UCSD. (BIOGEM) was established in 2000 to provide spotted cDNA microarray technology to the UCSD research community. Since then the scope has been expanded to include a variety of related services and resources, including several commercial microarray platforms and second (next-gen) and third generation sequencing technology. Sequencers in the facility include a Pacific Biosciences RS, Illumina HiSeqs and an Ion PGM Sequencer.
Phone: +1 858-822-3792
Address:
BIOGEM (BioMedical Genomics Microarray Facility)
9500 Gilman Drive, UCSD, Dept. 0724
Leichtag Rm 172
La Jolla, CA 92093-0724
+ University of California Irvine Irvine, CA Global Name:UCI Genomics High Throughput Facility (GHTF)
Website: http://ghtf.biochem.uci.edu/
Email: ucightf@gmail.com
Description:The mission of the UCI Genomics High-Throughput Facility (GHTF) is to provide genome-wide analysis for clients interested in gene expression, regulation of gene expression, and genome sequence and variation.
Phone: +1 949-824-5327
Address:
University of California, Irvine
Sprague Hall Room 340
Irvine, CA 92697
+ University of Delaware (UD) Newark, DE Global Name: University of Delaware Sequencing & Genotyping Center
Website: http://www.udel.edu/dnasequence
Email: brucek@UDel.Edu
Description: The DNA Sequencing & Genotyping Center (SGC) supports genomic research through our established expertise with state-of-the-art genomics technologies. The Center is located in the Delaware Biotechnology Institute (DBI).
Phone: +1 302-831-0823
Address:
Delaware Biotechnology Institute
15 Innovation Way
Newark, DE 19711
+ University of Florida (UF) Gainesville, FL Global Name: ICBR-NextGen DNA Sequencing
Website: http://www.biotech.ufl.edu
Email: moraga@biotech.ufl.edu
Description: The ICBR-NextGen DNA Sequencing Core facility is part of a comprehensive set of resources available within our Genomics Division. Our dedicated scientists are highly experienced and knowledgeable. Sequencing projects requests are accepted from both academia and commercial entities. In depth data analysis is provided upon request through the ICBR-Bioinformatics group.
Phone: +1 352-273-8050
Address:
2033 Mowry Road
CGRC, Room 178
Gainesville, FL 32610-3622
+ University of Helsinki Finland Global Name: DNA Sequencing and Genomics Laboratory
Website: http://www.biocenter.helsinki.fi/bi/dnagen/index.htm
Email: bio-alf@Helsinki.fi
Description: Core facility for DNA sequencing with Sanger, 454, Illumina, SOLiD and PacBio.
Phone: none
Address:
DNA Sequencing and Genomics Lab
Institute of Biotechnology
University of Helsinki
P.O. Box 56 (Viikinkaari 4)
00790 Helsinki
Finland
FIN-00014 Helsinki
+ University of Lausanne (UNIL) Switzerland Europe Name: Genomic Technologies Facility
Website: http://www.unil.ch/gtf
Email: keith.harshman@unil.ch
Description: The Lausanne Genomic Technologies Facility provides access to state-of-the-art technologies used to detect and measure quantitative and qualitative variations in nucleic acids. The principal technology platforms supported by the facility are:
Illumina, Pacific Biosciences RS and Ion Torrent high throughput DNA sequencing instrumentsAffymetrix GeneChip oligonucleotide arrays for the analysis of mRNA and DNAAgilent oligonucleotide arrays for the analysis of small non-coding RNAApplied Biosystems 7900HT Sequence Detection System for quantitative real-time PCR analysesLiquid handling robots for the preparation of high throughput sequencing libraries and qPCR reaction plates
Phone: +41 (0)21-692-3906
Address:
Lausanne Genomics Technologies Facility
Center for Integrative Genomics
University of Lausanne
Genopode Building
1015 Lausanne
Switzerland
+ University of Massachusetts Shrewsbury, MA Global Name: Deep Sequencing Core Labs at UMass
Website: http://www.umassmed.edu/nemo
Email: DeepSequencingCoreLabs@umassmed.edu
Description: Library Preparation and Sequencing for genomic DNA, Exome, ChIPseq, RNAseq, siRNA profiling, transcriptome. Illumina HiSeq and GAIIx as well as PacBio RS available.
Phone: +1 508-856-3265
Address:
UMass Medical School
222 Maple Avenue
Rose Gordon 143
Shrewsbury, MA 01545
+ University of Michigan Ann Arbor, MI Global Name: University of Michigan DNA Sequencing Core
Website: http://seqcore.brcf.med.umich.edu/
Email: seqcore@umich.edu
Description: Full Service DNA sequencing core. Includes PacBio RS instrument, and most other major systems. Services available to external clients, see website.
Phone: +1 734-647-5623
Address:
2800 Plymouth Rd.
NCRC Bldg 14, Rm 148
Ann Arbor, MI 48109-2800
+ University of Oslo Norway Global Name: Norwegian Sequencing Centre
Website: http://www.sequencing.uio.no/
Email: post@sequencing.uio.no
Description: The Norwegian Sequencing Centre is a national technology core facility offering sequencing services on the GS FLX from Roche/454, and HiSeq 2000/2500 & MiSeq instruments from Illumina and the PacBio RS from Pacific Biosciences. In addition, the NSC is in the process of installing the PGM and Proton from IonTorrent/Life Technologies.
Phone: +47 2285-4400
Address:
NSC/CEES, Dept. of Biology
University of Oslo
P.O. Box 1066 Blindern
NO-0316 Oslo
Norway
+ University of South Carolina Columbia, SC Global Name: Selah Clinical Genomics Center Innovista
Website: http://www.engencore.sc.edu
Email: john.busch@selahgenomics.com
Description: DNA sequencing. PacBio RS, MiSeq, Ion Torrent PGM, Roche 454, and Ion Proton. CLIA
Phone: +1 803-777-4338
Address:
541 Main Street
Room 126 Horizon 1
Columbia, SC 29201
+ University of Washington Seattle, WA North America Name:UW PacBio Sequencing Services
Website: https://pacbio.gs.washington.edu/
Email: uwpacbio@u.washington.edu
Description: Using the PacBio RS instrument we provide library preparations and sequencing for large-insert gDNA (3-10 Kb), BAC-based DNA, and purified PCR products. Upcoming services include MagBead runs for 10-20 Kb libraries or other new protocols as they become available. Services available to external clients, see website.
Address:
University of Washington
UW Dept of Genome Sciences
3720 15th Avenue NE, S413A
Seattle, WA 98195
+ Washington State University (WSU) Pullman, WA Global Name: Laboratory for Biotechnology and Bioanalysis - DNA Sequencing Core
Website: none
Email: dnaguy@mail.wsu.edu
Description: The Laboratory for Biotechnology and Bioanalysis is the Genomics Core Lab for Washington State University. The Core lab currently provides services for three different Next Generation sequencing platforms including the Pacific Biosciences RS sequencer. Services include library preparation, data collection as well as most primary level analysis.
Phone: +1 509-335-1174
Address:
Biotechnology Life Sciences - Rm 227
Washington State University
Pullman, WA
99164-7520
+ Weill Cornell Medical College New York, NY Global Name:MasonLab
Website: http://smrt.med.cornell.edu/
Email: chm2042@med.cornell.edu
Description:The SMRT Sequencing Lab offers sequencing services using the Pacific Biosciences RS sequencer. Library Preparation, Sequencing and Bioinformatics services are offered at competitive rates for researchers here at Weill Cornell as well as the general public.
Address:
515 East 71th Street, Room S-222
New York, NY 10065
+ Yale University West Haven, CT Global Name: Yale Center for Genome Analysis
Website: http://medicine.yale.edu/keck/ycga/index.aspx
Email: microarrays@yale.edu
Description: The Yale Center for Genome Analysis is a full service facility dedicated to providing RNA expression profiling, DNA genotyping, and high-throughput sequencing using state of the art technologies. The resource is open to both Yale and other non-profit organizations.
Phone: +1 203-737-3662
Address:
300 Heffernan Drive, B36
West Haven, CT 06516
If you are a researcher and are having trouble contacting one of the providers listed above, please click here to contact us for assistance.
If you are an existing service provider listed above and would like to update your listing, please click here to submit your updated information.
If you are a new sequencing provider interested in joining our program please click here to submit an enrollment form.
... http://www.pacificbiosciences.com/support/sequencing_provider/
More Press from Homologus Frontier in Bioinformatics;
Personalized Genomics from Pacbio, if You Want to be Reborn as Bacteria
(September 13th, 2013)-- Click on ‘microbe’, when you die and are asked about what you want to be in next life. For only $1000 or less, you will get your genome sequenced !!
Reducing Assembly Complexity of Microbial Genomes with Single-molecule Sequencing
Background
The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem.
Results
To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads.
Conclusions
Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.
http://www.homolog.us/blogs/blog/2013/09/13/personalized-genomics-pacbio-want-reborn-bacteria/
Adam Phillippy , Now official in @GenomeBiology #OA: Most microbial genomes can be finished for <$1,000 with a single @PacBio library http://genomebiology.com/2013/14/9/R101/abstract (8:46 AM - 13 Sep 13) --- Reducing assembly complexity of microbial genomes with single-molecule sequencing.(Genome Biology Published: 13 September 2013)---
Abstract (provisional)
Background
The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem.
Results
To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads.
Conclusions
Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.
The complete article is available as a provisional PDF. The fully formatted PDF and HTML versions are in production.
(David Baltrus ? I just spent about 2 hours scaffolding pacbio reads vs. contigs by hand through blast…my OCD is soooooooo satisfied right now!
4:44 PM - 12 Sep 13) (David Baltrus Most importantly...that 2 hours of scaffolding gave me an answer (chromosome, not plasmid). @pacbio 4:52 PM - 12 Sep 13 ) (David Baltrus ? @PacBio it's like a puzzle that actually accomplishes something 4:54 PM - 12 Sep 13 )
(David Baltrus
Assistant Prof at the U of Arizona interested in genomes,microbes,adaptation).Sept.12 2013- Went for pacbio over longer illumina with last money cause regions are crazy repetitive (non ribosomal peptides)
18m Jonas - 90% of viral DNA diversity from humans was masked by traditional sequencing, but identified by @PacBio #ICGAmericas
1:59 PM - 12 Sep 13 · Details
(10m ago) Jonas Lorlach (PacBio) showing the power of long accurate reads. And has a new automated PacBio pipeline #ICGAmericas
Embed Tweet Report Tweet 1:54 PM - 12 Sep 13 (At The International Conference on Genomics in the Americas (ICG), organized by BGI and UC Davis, is taking place NOW in Sacramento, CA.
(Tweet 12:18 PM - 10 Sep 13) By Pam R Yoder, MD, PhD ?@DrPam4Health @mike_schatz on future of sequencing: For starters, @PacBio nearly doubles read length every 6 months. #genomicsRetweeted by Pacific Biosciences
Tweet 12:18 PM - 10 Sep 13
Next Week’s ICG Meeting to Include Several SMRT Sequencing Talks
(Thursday, September 5, 2013) The International Conference on Genomics in the Americas (ICG), organized by BGI and UC Davis, is taking place on Sept 12-13 in Sacramento, CA.
One of the keynote presentations in the opening session comes from Nobel Laureate Richard Roberts, Chief Scientific Officer of New England Biolabs. Entitled “Bacterial Methylomes,” the talk will cover some recent work Roberts has done using SMRT® Sequencing to characterize the full, genome-wide methylation marks in various strains of microbes. This kind of work “has yielded a plethora of new and interesting results,” Roberts told BioMed Central recently.
Later that day, Ken Dewar from McGill University will talk about his experience with SMRT Sequencing in a talk entitled “Tomorrow’s Genome: Complete Bacterial Genomes in <24 h for Outbreak Response.” Dewar, who heads up a highly regarded core facility, has been making tremendous inroads on generating high-quality, rapid, affordable genome sequences of microbes. The PacBio sequencing platform is a cornerstone of that pipeline.
For anyone who wants to learn more about our latest technology innovations and their applications in microbiology, our CSO Jonas Korlach will be speaking as well. His presentation, “Finished Microbial Genomes and Epigenomes on a Large Scale,” will showcase recent work extending long reads, automating the assembly and finishing process, and sequencing to provide closed assemblies for several different microbes.
If you’re attending in person, we also recommend a poster from our own Susana Wang, et al., entitled “Greater than 10 kb Read Lengths Routine when Sequencing with the PacBio RS II.”
We hope to see you next week in Sacramento!
http://blog.pacificbiosciences.com/2013/09/next-weeks-icg-meeting-to-include.html?utm_source=buffer&utm_campaign=Buffer&utm_content=buffer75edc&utm_medium=twitter
Introducing the PacBio RS II, which still delivers the longest reads and 99.999% consensus accuracy, plus double the throughput of the original system.
(older news,but good idea where PACBIO is heading)! PacBio DevNet
The SMRT developers' network. Open source software, data, documentation and tips for PacBio sequencing.
News and Features
Introducing the PacBio RS II, which still delivers the longest reads and 99.999% consensus accuracy, plus double the throughput of the original system.
SMRT Analysis 2.0.0 is a pre-requisite for analyzing data from the PacBio RS II and SMRT Analysis 2.0.1 is a pre-requisite for analyzing data from the P4 enzyme. Please upgrade to take advantage of the following features:
Microbial assembly using fully integrated HGAP in SMRT Portal
Analyze bacterial methylomes with the P4/C2 polymerase/chemistry combination
Streamlined reference sequence upload within SMRT Portal
Modification identification of 6-methyladenosine (6-mA) and 4-methylcytosine (4mC)
Watch the SMRT Analysis 2.0 webinar, P4 polymerase webinar, and HGAP genome assembly webinar (Best on Chrome) for an overview.
Data Release Announcement:
Dataset for Arabidopsis genome sequenced using only PacBio® sequence data and assembled with SMRT® Portal is available for download. For more information, please see the PacBio Blog entry.
http://pacbiodevnet.com/
New PacBio RSII throughput record at Duke:(9:36 AM - 30 Aug 13)
1 smrtcell,>64k reads,>658Mb,>10kb av. read length (reads>50bp & QV>0.75) https://twitter.com/PacBio/status/373484332268412928/photo/1 Multiple groups started sequencing - BGI, life technologies, UK HPA, Gottingen Genomics Lab & pacbio #QRW13
Oklahoma Scientists Use SMRT Sequencing to Rescue Fungal Genome Assembly
(Tuesday, August 27, 2013) Scientists from Oklahoma State University and the University of Oklahoma teamed up with a sequencing service provider to study the genome of an anaerobic fungus found in the rumen of cows that may have implications for effective plant biomass degradation. What made this particular species so tricky to sequence were its extreme GC content — just 17 percent — and unusually high number of repeats.
The study was reported in “The Genome of the Anaerobic Fungus Orpinomyces sp. Strain C1A Reveals the Unique Evolutionary History of a Remarkable Plant Biomass Degrader,” a paper published in this month’s edition of the ASM journal Applied and Environmental Microbiology. The C1A isolate had a large genome: just over 100 Mb, with more than 16,000 genes.
Senior author Mostafa Elshaheda and his team sequenced the fungus, Orpinomyces sp. strain C1A, using both Illumina® and PacBio® technologies. They report that the organism’s extremely low GC content of just 17 percent is the lowest seen of any free-living microbe sequenced to date. Other unusual traits of the genome were its “relatively large proportion of noncoding intergenic regions,” comprising some 73 percent of the sequence, and high number of simple sequence repeats, which saw “massive proliferation” in the noncoding regions. These repeats, mostly homopolyer As or Ts, made up nearly 5 percent of the genome; the authors point out that this is at least an order of magnitude higher than repeat numbers reported in other fungal genomes.
These remarkable insights were attained by a two-part attempt to sequence the organism’s genome. As described in the paper, the team initially used paired-end sequencing on the Illumina platform to generate an assembly with 290-fold coverage that was “highly fragmented … with an extremely large number of contigs in the final assembly (82,325 contigs), a large proportion of the final assembly (32.4%) harbored in extremely short contigs (300 to 900 bp), and a low N50 (1,666 bp).”
So Elshaheda et al. turned to SMRT® Sequencing, generating about 10-fold coverage of the C1A isolate. PacBioToCA was used to produce a hybrid assembly of the fungus that had an average quality score of 59.7. “The final assembly was a marked improvement compared to the Illumina-only assembly, as evident from the improved N50/N90 values” and other characteristics, the authors write. They note that the long PacBio reads “allowed for the identification of a large number of introns previously undetected in the Illumina assembly.”
Armed with this sequence data, Elshaheda and his team performed a number of follow-up and functional studies on C1A. They found the organism to be a “remarkable biomass degrader,” and in tests of several different plant materials — including switchgrass, corn stover, alfalfa, and more — C1A proved quite versatile, “able to metabolize all different types of examined plant biomass.” This trait makes the organism a particularly promising candidate for use in plant bioprocessing required for the production of many biofuels, they add.
http://blog.pacificbiosciences.com/2013/08/oklahoma-scientists-use-smrt-sequencing.html?utm_source=buffer&utm_campaign=Buffer&utm_content=buffera9e51&utm_medium=twitter
New DNA Polymerase P4 Delivers Higher-Quality Assemblies Using Fewer SMRT Cells (Wednesday, August 21, 2013) Pacific Biosciences is pleased to announce the introduction of DNA/Polymerase Binding Kit P4. This P4 enzyme has average read lengths of >4,300 bp when paired with the C2 sequencing chemistry and >5,000 bp when paired with the XL chemistry. The enzyme’s accuracy is similar to C2, reaching QV50 between 30X and 40X coverage. The resulting P4 attributes will provide you with higher-quality assemblies using fewer SMRT® Cells, and with improved variant calling.
The P4 binding kit is compatible with PacBio® RS and PacBio RS II Systems with the latest version of Instrument Control Software (v1.3.3.1 and v2.0.1 respectively), and with SMRT Analysis v2.0.1 and higher. For more information, please contact your local Field Application Scientist.
In addition to the P4 binding kit, four new controls are being introduced. Two spike-in (or internal) sequencing controls and two whole SMRT Cell (or external) controls are available pre-bound with the P4 polymerase. The controls support 24 reactions per kit and help to distinguish between template preparation and sequencing issues, which simplifies troubleshooting. The spike-in controls are also useful for day-to-day run monitoring and quality control. The new parts are:
Reagent / Part Number
DNA/Polymerase Binding Kit P4 / 100-236-500
DNA Control Complex P4 (250 bp ? <3 kb) / 100-245-100
DNA Control Complex P4 (3 kb ? 10 kb) / 100-245-200
Plasmidbell Complex P4 (11 kb) / 100-245-300
Lambda Lib Complex P4 (2 kb) / 100-245-000 http://blog.pacificbiosciences.com/2013/08/new-dna-polymerase-p4-delivers-higher.html
binks; I`m in big time!!
(published in Genome Biology) Monday, August 19, 2013 Scientists Assess Error Modes in Sequencing Platforms and Find SMRT Sequencing ‘Least Biased’
A paper from scientists at the Broad Institute reports a rigorous study of bias across all major sequencing platforms. In “Characterizing and measuring bias in sequence data,” published in Genome Biology, lead author Michael Ross and his colleagues report that SMRT® Sequencing on the PacBio® sequencer is the “least biased” in coverage of all the technologies studied.
The authors assessed sequences for coverage bias, or uniformity of read distribution, and error bias, or incorrect call at a given position. For coverage bias, they report that PacBio performed best in extreme GC content (both GC-rich and GC-poor) and suggest this may be related to the lack of an amplification step in the sequencing process. Regarding error bias, the scientists describe shifting error rates based on genome sequence; GC-rich or homopolymer regions, for example, tended to change the rate of errors for each platform. “In general, the sequence context dependence of error rates varied considerably from technology to technology,” they write.
Ross et al. note that each platform’s bias rate changes with technology development, but note that at the time of their work, “single-molecule data from Pacific Biosciences” had “the clear edge.”
(With LSC,Illumin correction no longer needed)! (From Bergman Lab) PacBio Whole Genome Shotgun Sequences for the D. melanogaster reference strain
Posted 31 Jul 2013 As part of a collaboration with Danny Miller and Scott Hawley from the Stowers Institute, we have generated whole genome shotgun sequences using PacBio RS technology for the Drosophila melanogaster y; cn, bw, sp strain (Bloomington 2057), the same strain that was also used to assemble the D. melanogaster reference genome sequence. We’ve been meaning to release these data to the community since we got the data in April, but have been waylaid by teaching commitments and a spate of recent server problems. Prompted by Danny’s visit to the Bergman Lab over the last two weeks and the generous release by Dmitri Petrov’s lab of a data set of Illumina long reads using the Moleculo technology for the same strain of D. melanogaster, we’ve finally gotten around to bundling these D. melanogaster PacBio sequences for general release. We’re hoping that the availability of both PacBio and Moleculo long-read data for the same strain that has one of the highest quality reference genomes for any species will allow the genomics community to investigate directly the pros/cons of each of these new exciting technologies.
These PacBio sequence data were generated by the University of Michigan DNA Sequencing Core facility on a PacBio RS using DNA from 10 adult males. Flies were starved for 4 hours to reduce microbial DNA contaminants before freezing at -80oC and prepped using the Qiagen DNeasy Blood & Tissue Kit (catalog number 69504). Six SMRT cells were used for sequencing 5 Kb fragment libraries on two Long Read mode runs: 1 cell on the first run (Run53) and 5 cells on the second run (Run55). Excluding the circular consensus (CCS) reads, and combining data from the S1 and S2 files for each of the six cells, we obtained 1,357,183,439 bp of raw DNA sequence from this experiment or roughly 7.5x coverage of the 180 Mb male D. melanogaster genome.
A ~63 Gb tar.gz of the entire PacBio long read dataset including .fasta, .fastq, and .h5 files can be found here. To help newcomers to PacBio data (like us) get over the hurdle of installing the PacBio long-read aligner blasr, we are also distributing CentOS 6/Scientific Linux 64-bit and source RPMs for blasr, made by Peter Briggs in the University of Manchester Bioinformatics Core Facility.
We have also generated 100bp paired-end Illumina data from the same stock, to aid with error-correction of the PacBio long reads. These data were generated by the Stowers Institute Molecular Biology Core facility. As above, genomic DNA was prepared from 10 starved, adult males using the Qiagen DNeasy Blood & Tissue Kit. 1ug of DNA from each was fragmented using a Covaris S220 sonicator (Covaris Inc.) to 250 base pair (bp) fragments by adjusting the treatment time to 85 seconds. Following manufacturer’s directions, short fragment libraries were made using the KAPA Library Preparation Kits (KAPA Biosystems, Cat. No. KK8201) and Bioo Scientific NEXTflex™ DNA Barcodes (Bioo Scientific, Cat. No. 514104). The resulting libraries were purified using Agencourt AMPure XP system (Beckman Coulter, Cat. No. A63880), then quantified using a Bioanalyzer (Agilent Technologies) and a Qubit Fluorometer (Life Technologies). The library was pooled with several other strains, re-quantified and run as high output mode on six 100 bp paired-end lanes on an Illumina HiSeq 2000 instrument, using HiSeq Control Software 1.5.15.1 and Real-Time Analysis (RTA) version 1.13.48.0. Secondary Analysis version CASAVA-1.8.2 was run to demultiplex reads and generate FASTQ files.
A ~17 Gb tar.gz of the Illumina 100 bp paired-end dataset in .fastq can be found here. Additional Illumina data for the same reference strain have been generated previously by Chuck Langley’s lab, either from mixed adult males and females (SRX040484) or haploid embryos (SRX040485, SRX040486, SRX040491) that could be used to supplement the PacBio error correction.
As with previous unpublished data we have released from the Bergman Lab, we have chosen to release these genomic data under a Creative Commons CC-BY license, which requires only that you credit the originators of the work as specified below. However, we hope that users of these data respect the established model of genomic data release under the Ft. Lauderdale agreement that is traditionally honored for major sequencing centers. Until the paper describing these genomic data is published, please cite:
•Miller, D.E., C.B. Smith, R.S. Hawley and C.M. Bergman (2013) PacBio Whole Genome Shotgun Sequences for the D. melanogaster Reference Strain. http://bergmanlab.smith.man.ac.uk/?p=1971
http://bergmanlab.smith.man.ac.uk/?p=1971
LSC is a long read error correction tool.
It offers fast correction with high sensitivity
and good accuracy.
Latest News: LSC 0.3 -- Support BWA, Bowtie2, RazerS3, much faster and more accurate ... read more
.Getting Started
These simple steps will help you integrate LSC into your transcriptomics analysis pipeline.
•Read the LSC_requirements for running LSC.
•Download and set-up the LSC package
•Follow the tutorial to see how LSC works on some example data.
•Check the manual if anything is unclear.
•You're ready, Happy LSCing!
Latest publication
Kin Fai Au, Jason Underwood, Lawrence Lee and Wing Hung Wong
Improving PacBio Long Read Accuracy by Short Read Alignment [preprint]
PLoS ONE 2012. 7(10): e46679. doi:10.1371/journal.pone.0046679
Latest News
08-07-2013: Big changes in LSC 0.3
In LSC 0.3, we have a few updates. They are very IMPORTANT updates, new features and small fixes
Very IMPORTANT updates:
•Support for Bowtie2 and RazerS3 as initial aligners. Now, BWA, Bowtie2, RazerS3 and Novoalign work in LSC. Please see the comparison details of aligners in the "Short read - Long read aligner#manual".
•Added SR length coverage percentage on LR (SR-covered length/full length of corrected LR) to corrected_LR output file. Here is an example, where the last number 0.82 is the SR length coverage percentage on LR:
>m111006_202713_42141_c100202382555500000315044810141104_s1_p0/18941/365_1361|0.82
•Added support for three modes for step-wise runs:
mode 0: end-to-end
mode 1: generating LR_SR.map file
mode 2: correction step
•Generating FASTQ output format based on correction probability given short read coverage. Please refer to LSC paper and manual page for more details. You can select well-corrected reads for downstream analyses by using the quality in FASTQ output or SR length coverage percentage above. Please the the filtering in the "Output#manual".
New features
•Used the python path in the cfg file instead of default user/bin path
•Added option (-clean_up) to remove intermediate files or not (Note: important/useful ones will still be there in temp folder)
•Support for input fastq format for LR (long reads) and/or SR (short reads)
•Updated default BWA and novoalign commands options
•Printing out original LR names in the output file
•Support for printing out version number using -v/-version option
Small bug fixed
•Fixed in removing XZ pattern printed out at the end of some uncorrected_LR sequences
•Fixed samParser bug (which was ignoring some valid alignments in BWA output)
http://www.stanford.edu/~kinfai/LSC/LSC.html
SMRTPortal trying HGAP assembly in the cloud. https://twitter.com/BioMickWatson/status/365109262021709824/photo/1
Three Takeaways from the 3rd Next-Generation Sequencing Conference
Posted on July 29, 2013 by marketing@trilinkbiotech.com
¦Exciting potential of direct sequencing of modified DNA
¦Small holes with big promise but bigger challenges
¦Paleogenomics: sequencing ancient DNA—how old can you go?
Sometimes small scientific meetings have big impacts on one’s impressions, which was certainly my experience at the 3rd Next-Generation Sequencing (NGS) conference in San Francisco on June 19-21, 2013. Of the many interesting presentations (click here for all speakers and abstracts), three completely different topics struck me the most: Pacific Biosystems’ uniquely powerful single-molecule real-time (SMRT) sequencing of modified DNA, Sequencing-pioneer Prof. David Deamer’s update on Nanopore’s advances and challenges, and the new field of Paleogenomics involving sequencing old DNA. With apologies to all of the other speakers, and admitting personally biased selection, here are my comments about these three topics.
Pacific Biosystems: direct sequencing of modified DNA
—Dr. Jonas Korlach co-invented SMRT technology with Stephen Turner, Ph.D., PacBio Founder and Chief Technology Officer, when the two were graduate students at Cornell University. Dr. Korlach joined PacBio as the company’s eighth employee in 2004. Dr. Korlach was appointed Chief Scientific Officer at PacBio in July, 2012.
Pacific Biosystems (PacBio) deserves a lot of credit for being able to overcome numerous technical challenges facing commercialization of its SMRT sequencing system, which offers some uniquely powerful capabilities. (I’ll save a bit of time and space by refraining from describing how this complex system works, but I encourage you to take advantage of various videos and other technical information available at PacBio’s website.) In addition to providing amazingly long read lengths (up to 20kb) to facilitate genome assembly, SMRT sequencing gives data related to kinetics of nucleotide incorporation. Algorithms for differentiating rate of incorporation of A, G, C or T opposite a cognate nucleotide position in the template strand for various sequence contexts within the “footprint” of a DNA polymerase can also differentiate modified template positions. In other words, the average rate of incorporation of G opposite C is different than that opposite 5-methylcytosine (5-mC). This difference in kinetics allows direct determination of epigenetic methylation patterns in DNA, which was the focus of an excellent presentation by PacBio CSO Jonas Korlach. Direct epigenetic sequencing of 5-mC is completely novel and offers a significant advantage by obviating the need to carrying out so-called ‘bisulfite conversion chemistry’ prior to sequencing. Commercial kits are available for bisulfite conversion but require extra time, can be very tricky, and utilize more sample than may be available—especially for limited amounts of clinical biopsies.
(open link,long story)! http://zon.trilinkbiotech.com/2013/07/29/three-takeaways-from-the-3rd-next-generation-sequencing-conference/
We made fun of PacBio stock price on the way down. Therefore, we have the responsible to point out, when it goes up.
Homologus Frontier in Bioinformatics
August 5th, 2013
Large Jump in PacBio Stock Price
Why is it going up?
Mainstream opinion – PacBio’s new release (RS II) is working very well in giving researchers long reads they need. For details, check the paper and discussion linked in this thread.
Contrarian view (proposed by only one analyst) -
Homolog.us blog finally figured out on late Friday, what went wrong with it’s data !!
Trials and Tribulations with PacBio Data
A Snapshot of How PacBio Reads Look Like
For those who are laughing, it is called ‘butterfly effect’.
In chaos theory, the butterfly effect is the sensitive dependence on initial conditions in which a small change at one place in a deterministic nonlinear system can result in large differences in a later state. The name of the effect, coined by Edward Lorenz, is derived from the theoretical example of a hurricane’s formation being contingent on whether or not a distant butterfly had flapped its wings several weeks earlier.
Although the butterfly effect may appear to be an esoteric and unlikely behavior, it is exhibited by very simple systems. For example, a ball placed at the crest of a hill may roll into any surrounding valley depending on, among other things, slight differences in initial position.
……
The butterfly effect is most familiar in terms of weather; it can easily be demonstrated in standard weather prediction models, for example.[5]
The potential for sensitive dependence on initial conditions (the butterfly effect) has been studied in a number of cases in semiclassical and quantum physics including atoms in strong fields and the anisotropic Kepler problem.[6][7] Some authors have argued that extreme (exponential) dependence on initial conditions is not expected in pure quantum treatments;[8][9] however, the sensitive dependence on initial conditions demonstrated in classical motion is included in the semiclassical treatments developed by Martin Gutzwiller[10] and Delos and co-workers.[11]
(for charts and full story,click on link)http://www.homolog.us/blogs/blog/2013/08/05/large-jump-in-pacbio-stock-price/
Trials and Tribulations with PacBio Data
http://www.homolog.us/blogs/blog/2013/08/02/trials-and-tribulations-with-pacbio-data/ A Snapshot of How PacBio Reads Look Like
http://www.homolog.us/blogs/blog/2013/08/02/a-snapshot-of-how-pacbio-reads-look-like/
PacBio Whole Genome Shotgun Sequences for the D. melanogaster reference strain from Bergman Lab 02 Aug 2013 — 03:08
Posted 31 Jul 2013 As part of a collaboration with Danny Miller and Scott Hawley from the Stowers Institute, we have generated whole genome shotgun sequences using PacBio RS technology for the Drosophila melanogaster y; cn, bw, sp strain (Bloomington 2057), the same strain that was also used to assemble the D. melanogaster reference genome sequence. We’ve been meaning to release these data to the community since we got the data in April, but have been waylaid by teaching commitments and a spate of recent server problems. Prompted by Danny’s visit to the Bergman Lab over the last two weeks and the generous release by Dmitri Petrov’s lab of a data set of Illumina long reads using the Moleculo technology for the same strain of D. melanogaster, we’ve finally gotten around to bundling these D. melanogaster PacBio sequences for general release. We’re hoping that the availability of both PacBio and Moleculo long-read data for the same strain that has one of the highest quality reference genomes for any species will allow the genomics community to investigate directly the pros/cons of each of these new exciting technologies.
These PacBio sequence data were generated by the University of Michigan DNA Sequencing Core facility on a PacBio RS using DNA from 10 adult males. Flies were starved for 4 hours to reduce microbial DNA contaminants before freezing at -80oC and prepped using the Qiagen DNeasy Blood & Tissue Kit (catalog number 69504). Six SMRT cells were used for sequencing 5 Kb fragment libraries on two Long Read mode runs: 1 cell on the first run (Run53) and 5 cells on the second run (Run55). Excluding the circular consensus (CCS) reads, and combining data from the S1 and S2 files for each of the six cells, we obtained 1,357,183,439 bp of raw DNA sequence from this experiment or roughly 7.5x coverage of the 180 Mb male D. melanogaster genome.
A ~63 Gb tar.gz of the entire PacBio long read dataset including .fasta, .fastq, and .h5 files can be found here. To help newcomers to PacBio data (like us) get over the hurdle of installing the PacBio long-read aligner blasr, we are also distributing CentOS 6/Scientific Linux 64-bit and source RPMs for blasr, made by Peter Briggs in the University of Manchester Bioinformatics Core Facility.
We have also generated 100bp paired-end Illumina data from the same stock, to aid with error-correction of the PacBio long reads. These data were generated by the Stowers Institute Molecular Biology Core facility. As above, genomic DNA was prepared from 10 starved, adult males using the Qiagen DNeasy Blood & Tissue Kit. 1ug of DNA from each was fragmented using a Covaris S220 sonicator (Covaris Inc.) to 250 base pair (bp) fragments by adjusting the treatment time to 85 seconds. Following manufacturer’s directions, short fragment libraries were made using the KAPA Library Preparation Kits (KAPA Biosystems, Cat. No. KK8201) and Bioo Scientific NEXTflex™ DNA Barcodes (Bioo Scientific, Cat. No. 514104). The resulting libraries were purified using Agencourt AMPure XP system (Beckman Coulter, Cat. No. A63880), then quantified using a Bioanalyzer (Agilent Technologies) and a Qubit Fluorometer (Life Technologies). The library was pooled with several other strains, re-quantified and run as high output mode on six 100 bp paired-end lanes on an Illumina HiSeq 2000 instrument, using HiSeq Control Software 1.5.15.1 and Real-Time Analysis (RTA) version 1.13.48.0. Secondary Analysis version CASAVA-1.8.2 was run to demultiplex reads and generate FASTQ files.
A ~17 Gb tar.gz of the Illumina 100 bp paired-end dataset in .fastq can be found here. Additional Illumina data for the same reference strain have been generated previously by Chuck Langley’s lab, either from mixed adult males and females (SRX040484) or haploid embryos (SRX040485, SRX040486, SRX040491) that could be used to supplement the PacBio error correction.
As with previous unpublished data we have released from the Bergman Lab, we have chosen to release these genomic data under a Creative Commons CC-BY license, which requires only that you credit the originators of the work as specified below. However, we hope that users of these data respect the established model of genomic data release under the Ft. Lauderdale agreement that is traditionally honored for major sequencing centers. Until the paper describing these genomic data is published, please cite:
•Miller, D.E., C.B. Smith, R.S. Hawley and C.M. Bergman (2013) PacBio Whole Genome Shotgun Sequences for the D. melanogaster Reference Strain. http://bergmanlab.smith.man.ac.uk/?p=1971
last revised 30 Jul 2013 Reducing assembly complexity of microbial genomes with single-molecule sequencing
Authors: Sergey Koren, Gregory P Harhay, Timothy PL Smith, James L Bono, Dayna M Harhay, D. Scott Mcvey, Diana Radune, Nicholas H Bergman, Adam M Phillippy
Background: The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem.
Results: To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads.
Conclusions: Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.
Comments: Parallel submission with Genome Biology. 26 pages, 6 figures, 3 tables. Supplementary materials available from: this http URL
Subjects: Genomics (q-bio.GN)
Cite as: arXiv:1304.3752 [q-bio.GN]
(or arXiv:1304.3752v3 [q-bio.GN] for this version)
Submission history
From: Sergey Koren [view email]
[v1] Sat, 13 Apr 2013 00:25:43 GMT (1126kb)
[v2] Thu, 2 May 2013 17:10:52 GMT (1140kb)
[v3] Tue, 30 Jul 2013 20:50:29 GMT (1180kb)
Which authors of this paper are endorsers?
Link back to: arXiv, form interface, contact.
http://arxiv.org/abs/1304.3752
Interestingr read !! opiniomics bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"
Bacterial genomes – 2nd and 3rd generation costs
Leave a reply
There are some really cool developments coming out from PacBio, not least of which is the tantalising ability to be able to sequence and assemble single-contig bacterial genomes (link, link, link).
Adam Phillipy and others have published a really cool paper on this at arXiv, and I have to say I am really, really incredibly impressed by PacBio and all the advances bioinformaticians are making in this area. It’s really cool.
However, in the hype, it’s possible to lose sight of the advantages of the Illumina system, and there are certainly some uncertainties around cost – in the arXiv paper, we see the phrase:
“While the cost of multiplexed Illumina can be as low $300 per genome, the resulting assemblies are typically in hundreds of contigs”
Whilst I don’t have issue with the latter part of that sentence, the first part is perhaps worth questioning!
Some statistics
The rapid-run mode of the HiSeq 2500 is perfectly capable of producing 150 million 150bp paired-end reads. This equates to 45Gb of sequence data.
If we are sequencing 5Mb genomes, at 40X, we need 200Mb of sequence. 96 of those will therefore need ~20Gb of sequence, so as you see, a single lane of HiSeq 2500 easily copes.
ARK-Genomics runs a non-profit full cost recovery business model, which means we charge for reagents, staff time and equipment. So for that lane of sequencing, we would probably charge in the region of £2500.
We need to factor in the cost of libraries. In reality, we could make this cheaper via automation, but for the sake of ease, let’s say the library prep is £100 per sample. That’s £9600 on library prep.
That’s a total of £12100, or £126 per genome.
In reality, I think we could get library prep down to £50 per sample, This would bring the cost down to £76 per genome.
At present exchange rates, $300 is about £200, so you can see our costs are significantly cheaper than the costs in Adam’s paper.
PacBio costs
I have less of an idea about Pac Bio costs – Adam’s paper suggests between $900 and $1200, but admits a different recipe is as high as $2200.
We have commissioned some PacBio work and the cost was about £1100 for a single sample.
Perhaps others can comment on this?
Comparison
My conservative estimate is that PacBio is about 10 times more expensive per sample for bacterial genomes than Illumina, and in reality it is probably higher. Even taking my conservative estimate, the figure of “10 times” is significantly higher than the comparison implied in Adam’s paper. My worry is that Adam’s paper compares an expensive Illumina quote with a cheap PacBio quote.
Pros and Cons
Pros of PacBio are that you get a finished genome.
Pros of Illumina are
1.Cost per sample is far cheaper
2.Population level statistics – I’m not sure of the fold coverage one achieves with PacBio, but 40x Illumina coverage certainly lets you begin to see low-level variants in the population of cells being sequenced
3.Scale – if you want to sequence 96 genomes, the only real option is Illumina – more people have consumables budgets of around £10k than have budgets around £100k
Horses for courses
I love what PacBio are doing, and I love what Adam and others are doing on the Informatics side. At the end of the day, we must choose the right technology for the right question. PacBio is great if you want complete genomes; Illumina is still the only viable alternative if you want to sequence hundreds of bacterial genomes at once.
http://biomickwatson.wordpress.com/2013/08/01/bacterial-genomes-2nd-and-3rd-generation/
#SMRTseq Twitter chat
Genome Biology recently published a Correspondence article that argued for a bigger uptake of PacBio's SMRT sequencing platform. We debated the issues arising in a Twitter chat, showcased here.
Share on FacebookShare on TwitterShare on Google+Share on LinkedinShare by email#SMRTseq Tweet chat
Genome Biology recently published a Correspondence article by Rich Roberts, Mauricio Carneiro and Mike Schatz that argued for a bigger uptake of PacBio's SMRT sequencing platform. Roberts followed this up with a Q&A in Biome, the online magazine recently launched by Genome Biology's publisher, BioMed Central. We decided to follow the Correspondence with a Twitter chat to debate the matters arising, the highlights of which we include below. For more of a preamble to the Twitter chat, please see our previous two blog posts here and here.
Mike Schatz (@mike_schatz; Cold Spring Harbor Laboratory) - article co-author
Mauricio Carneiro (@mauricinho; Broad Institute) - article co-author
Mario Caccamo (@mcaccamo; TGAC) - Beyond The Genome organizing committee
Jason Merkin (@jjmerkin; MIT, Burge lab) - Redditor interested in SMRT
Eric Johnson (@SNPsaurus; SNPsaurus) - Redditor interested in SMRT
Share on FacebookShare on TwitterShare on Google+Share on LinkedinShare by emailIntroductionsShare on FacebookShare on TwitterShare on Google+Share on LinkedinShare by emailGenome Biology
@GenomeBiologyWelcome to the #SMRTseq Tweet chat! Please introduce yourself & what your interest in PacBio is. If applicable, how did you start using it?
a day agoReplyRetweetFavoriteShare on FacebookShare on TwitterShare on Google+Share on LinkedinShare by emailGenome Biology
@GenomeBiologyWe are of course the world's favorite #OA genomics journal, Genome Biology. Our interest in PacBio is as editors of genomics papers #SMRTseq
#SMRTseq
Share by emailMauricio Carneiro
@mauricinhoStarted using Pacbio in 2010 when the Broad got it's first machine, writing algorithms for human research and using the GATK #SMRTSeq
a day agoReplyRetweetFavoriteShare on FacebookShare on TwitterShare on Google+Share on LinkedinShare by emailjason merkin
@jjmerkinSomewhat different direction: we are analyzing full-length isoform structures. #SMRTseq
(A must see at this link)http://storify.com/GenomeBiology/smrtseq-twitter-chat
Reducing Assembly Complexity of Microbial Genomes with Single-molecule Sequencing July 31st, 2013 | Category: Pacbio
Many Pacbio experts and one novice group (which is us) joined a twitter chat at #SMRTseq this morning. If you are interested, please click on the hashtag #SMRTseq to find what was discussed. Many thanks to @GenomeBiology for arranging it. It is very exciting to find the editors of Genome Biology to think out-of-the-box and use the latest social media tools to bring so many PacBio enthusiasts together. Another good example of using social media was shown by Carl Zimmer of National Geographic, who arranged a Google Hangout meeting to discuss the Coelacanth genome paper in April.
In the PacBio chat, Adam Phillippy updated on his pacbio-related paper available from arxiv.
Background: The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem.
Results: To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These assemblies are also comparable in accuracy to hybrid assemblies including second-generation data.
Conclusions: Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to below $2,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of complete genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.
http://www.homolog.us/blogs/blog/2013/07/31/reducing-assembly-complexity-of-microbial-genomes-with-single-molecule-sequencing/
Very interesting discussion on SMRT sequencing technology,started about 3 1/2 hours ago! https://twitter.com/search/realtime?q=%23SMRTseq&src=typd
After Release of 20 New Genomes, 100K Pathogen Project Now Kicking PacBio Sequencing into Higher Gear
July 30, 2013
By Molika Ashford
http://www.genomeweb.com/sequencing/after-release-20-new-genomes-100k-pathogen-project-now-kicking-pacbio-sequencing?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+genomeweb%2Finsequence+%28In+Sequence%29
Perspectives on PacBio: Rich Roberts Q&A leads up to tomorrow’s #SMRTseq Tweet chat
Naomi Attar on July 30, 2013 at 3:16 pm - 0 Comments
Tomorrow (July 31), Genome Biology will host the #SMRTseq Tweet chat on PacBio's SMRT sequencing platform, as discussed in detail in this earlier blog post.
To give budding Tweet chattees food for thought before letting rip on their keyboards, Biome (BioMed Central’s new online magazine) has posted a Q & A with Rich Roberts, lead author on the Genome Biology Correspondence article upon which the Tweet chat is based.
In the Q&A, Roberts explains why he decided to evangelize on behalf of the SMRT technology (he has no personal interest in PacBio) and how he sees the sequencing wars playing out in the future. He also explains why the work for which he received a Nobel prize, the discovery of introns, plays second fiddle to the true highlight of his career.
As a reminder, the Tweet chat will take place take place July 31 at 11 am EST/4 pm BST for a one hour duration. All you need to do to take part is to include the hashtag #SMRTseq in your tweets – you will be able to read the thoughts of other contributors on the #SMRTseq Twitter page.
The 'official' panel is composed of Roberts' co-authors Mike Schatz and Mauricio Carneiro, as well as Mario Caccamo, Jason Merkin and Eric Johnson. Please see our previous blog post list for a list of the questions that will be discussed during the chat.
http://blogs.biomedcentral.com/bmcblog/2013/07/30/perspectives-on-pacbio-rich-roberts-qa-leads-up-to-tomorrows-smrtseq-tweet-chat/?utm_source=twitterfeed&utm_medium=twitter
Rich Roberts discusses single-molecule sequencing technology
Posted by Biome on 29th July 2013
By enabling the parallel sequencing of DNA, the introduction of next-generation sequencing technologies has been instrumental in driving down the costs involved in genomic studies. The technologies were genuinely game-changing, but the switch to high throughput assays introduced several limitations: biased error patterns and short read lengths.
In theory, single-molecule sequencing, in which DNA molecules are sequenced without amplification steps, was to offer a route to achieve high-throughput sequencing without these limitations; however, the first wave of single-molecule sequencing instruments floundered: Helicos Biosciences went bankrupt, ‘Project Starlight’ from Life Technologies was put on hiatus, and Pacific Biosciences’ single-molecule real-time (SMRT) platform quickly earned itself a reputation for unreliable, error-prone performance.
As the previously rapid climb in cost efficiency brought about by next-generation sequencing plateaus, the failure of single-molecule sequencing to deliver might leave some genomics aficionados despondent about the prospects for their field. But a recent Correspondence article in Genome Biology saw Nobel laureate Richard Roberts, together with Cold Spring Harbor’s Mike Schatz and Mauricio Carneiro of the Broad Institute, argue that the latest iteration of Pacific Biosciences’ SMRT platform is a powerful tool, whose value should be reassessed by a skeptical community.
In this Q&A, Roberts tells us why he thinks there’s a need for re-evaluation, and what sparked his interest in genomics in the first place.
How does SMRT sequencing differ from other existing next-generation sequencing technologies, and what benefits does it bring?
SMRT sequencing is a single molecule technique that can generate long reads (10-15Kb), is highly accurate and can distinguish methylated bases from the normal A,C,G,T.
This latter property is unique as no other method can do that for N6-methyladenine or N4-methylcytosine without additional chemistry being involved. This methylation information is both useful and intriguing. It can be used to determine methyltransferase recognition sequences and hence often the companion restriction enzyme specificity and contains important functional information that the methyltransferase is active. In addition it offers the possibility of looking at the epigenetic potential of bacteria. The significance of the long reads is also very important because it means that for small genomes the complete sequence can be obtained without the need for expensive and time-consuming gap closing methods that other Next-Gen technologies require. Instead of trying to do a 100,000 piece jigsaw puzzle the problem of sequence assembly is reduced to a 1,000 piece jigsaw puzzle – a considerable improvement.
What instigated you to write a commentary specifically on Pacific Biosciences’ SMRT sequencing technology?
There has been a misconception in the scientific research community that the method is very inaccurate. In fact it is the most highly accurate of all of the Next-Gen sequencing technologies available. This is because the errors, while high on a single read, are completely random and disappear statistically as more reads are made. A recent paper has shown that human polymorphisms can be found with greater accuracy using this technology.
Given that Pacific Biosciences’ SMRT sequencing has been subject to negative rumors, how did you come to realise that this technology is actually a valuable and accurate tool?
My original principle interest was in the methylation patterns as it seemed to offer the possibility of determining the recognition specificities of restriction modification systems in an extremely facile way. This turned out to be true and has yielded a plethora of new and interesting results. Along the way it became clear that this technology had much greater promise than the early scurrilous rumours suggested. A major reason that the community has not appreciated this is that very few of them have tried it. The original rumors put them off from buying the machines.
Of all the different benefits of SMRT sequencing, which do you think will be the most persuasive in getting people to adopt it?
I suspect that the accuracy of the sequence and the ability to easily close small genomes will be an important selling point. At present GenBank is littered with shotgun sequences that for the most part are close to worthless because they tell you very little about the organism from which they came. This is because you never know what is missing – it could be the gene you are most interested in!
In contrast a complete genome sequence is invaluable as it tells you the full genetic potential of the organism. All we need to do now is to improve our bioinformatics so that we can properly interpret that DNA sequence. Unfortunately, we are not spending enough money doing the functional analysis of the sequences we are obtaining and our biological research agenda is suffering because of it. Just at the moment we should be greatly increasing our efforts to gain functional insights into the millions of genes we are discovering by sequencing and for which we either have no idea of what they do, or many of our predictions are simply wrong. But the only way we will know if they are wrong is by critically testing selected subsets of them. I don’t see anything like enough funding to do this. It is very short-sighted of NIH and the biological community not to demand more functional annotation of the genomes we are sequencing.
Should nanopore sequencing become a commercially viable reality, do you think SMRT sequencing will become redundant or can these two technologies co-exist?
It depends what you mean by nanopore sequencing. I haven’t heard of anything that I believe in so far. Where is the data showing that it works? Despite the claim by Oxford Nanopore that they can read methylated bases, they never answered my emails offering to test those claims critically.
Current next-generation sequencing software is designed for short reads. With the longer sequence reads of SMRT sequencing, are we going to have to revisit old software solutions that were developed for long reads generated by Sanger sequencing?
It is always a good idea to revisit software. In the case of 10 Kb reads the earlier software should be up to the task as it has become easier. However, with methylated base data also available some other improved approaches should be possible. I helped write the original assembly programs back in the 1970s, but have not given much thought to the problem since then as we were just interested in what would now be considered short sequences (Adenovirus-2 was just 36 Kb long). The sequences needing assembly today are megabases or gigabases long and more challenging. I am having too much fun exploring bacterial epigenetics!
You have forged an extensive career in biochemistry and molecular biology. What led to your interest in genomics?
As an organic chemist in the late 1960s I became fascinated by the chemical problems posed by molecular biology. It was clear that DNA sequencing was going to become of crucial importance. After doing a post-doc spent sequencing some tRNAs I moved to Cold Spring Harbor Laboratory with the idea of developing new methods to sequence DNA. I thought the newly-discovered restriction enzymes would be key in generating small DNA molecules (not available naturally) with which to develop methods. Instead I got seduced by the restriction enzymes and their companion methyltransferases and these have now been the main focus of my research for 40 years. They are fascinating and have led me into areas I would never have suspected. They are a paradigm of biology and exhibit most of the traits that make biology such a fascinating subject. I can’t imagine leaving them behind just yet. For one thing they led me into bioinformatics, which is now also a great love of my life.
During the course of your career you have made several notable contributions that have significantly furthered biological research. Which contribution are you most proud of or consider the most important?
Obviously the discovery of split genes and RNA splicing was an amazing outcome of research into Adenovirus transcription. But I feel that the role I played in discovering so many of the early restriction enzymes and pushing their commercialization has had a profound impact on biological research and enabled the whole biotechnology industry to take off. Because we were very generous in giving away samples of the first restriction enzymes to anyone who wanted them I made a lot of friends who have remained so throughout my scientific life. That has been extremely rewarding!
Readers interested in the application of SMRT sequencing to human genomics might be interested in this presentation by Mount Sinai’s Eric Schadt, previously of Pacific Biosciences.
To join the debate about the virtues (and vices) of this technology, please look out for Genome Biology’s Twitter chat – more information from the BioMed Central blog.
More about the author(s)
Richard Roberts, Chief Scientific Officer, New England Biolabs
Sir Richard J. Roberts FRS, otherwise known as Rich, is the 1993 Nobel Laureate in Physiology or Medicine and currently serves as Chief Scientific Officer for New England Biolabs. The focus of Roberts’ research throughout his award-winning career has been restriction enzymes and their associated methyltransferases, whose biotech potential he has helped develop at New England Biolabs since the company’s 1970s beginnings. Roberts is best known for his discovery of introns, for which he was awarded the Nobel prize, and for his development of restriction mapping; both achievements were selected by Genome Biology’s Editorial Board as key moments in 60 years of genome biology.
http://www.biomedcentral.com/biome/rich-roberts-discusses-single-molecule-sequencing-technology/
Robbins Geller Rudman & Dowd LLP and Scott+Scott, Attorneys at Law, LLP Announce Proposed Settlement of Class Action in the Pacific Biosciences Securities Litigation
SAN DIEGO, July 25, 2013 /PRNewswire/ -- The following statement is being issued by Robbins Geller Rudman & Dowd LLP and Scott+Scott, Attorneys at Law, LLP regarding the Pacific Biosciences Securities Litigation
SUPERIOR COURT OF THE STATE OF CALIFORNIA
COUNTY OF SAN MATEO
IN RE PACIFIC BIOSCIENCES OF CALIFORNIA, INC. SECURITIES LITIGATION
This Document Relates To: ALL ACTIONS.
Case No. CIV509210
SUMMARY NOTICE OF PROPOSED SETTLEMENT OF CLASS ACTION
Hon. Marie S. Weiner
TO: ALL PERSONS OR ENTITIES ("PERSONS") THAT PURCHASED PACIFIC BIOSCIENCES OF CALIFORNIA, INC. ("PACB" OR THE "COMPANY") COMMON STOCK BETWEEN OCTOBER 27, 2010 AND SEPTEMBER 20, 2011, INCLUSIVE (THE "CLASS PERIOD"), INCLUDING THOSE PERSONS THAT PURCHASED THE COMPANY'S STOCK PURSUANT OR TRACEABLE TO THE COMPANY'S REGISTRATION STATEMENT AND PROSPECTUS FOR THE COMPANY'S OCTOBER 27, 2010 INITIAL PUBLIC OFFERING (THE "CLASS").
THIS NOTICE WAS AUTHORIZED BY THE COURT. IT IS NOT A LAWYER SOLICITATION. PLEASE READ THIS NOTICE CAREFULLY AND IN ITS ENTIRETY.
YOU ARE HEREBY NOTIFIED that a hearing will be held on October 25, 2013 at 9:00 a.m., before the Honorable Marie S. Weiner at the Superior Court of California, County of San Mateo, Department 2, Court Room 7A, 400 County Center, Redwood City, CA 94063, to determine whether: (1) the proposed settlement (the "Settlement") of the above-captioned action ("Action") for at least $7,686,494.82 in cash should be approved by the Court as fair, reasonable and adequate; (2) the Final Judgment of Dismissal as provided under the Stipulation and Agreement of Settlement ("Stipulation") should be entered, dismissing the First Amended Consolidated Class Action Complaint filed in the Action on the merits and with prejudice; (3) the release by the Class of the Settled Claims, as set forth in the Stipulation, should be provided to the Released Parties; (4) this Action satisfies the applicable prerequisites for class action treatment under California Code of Civil Procedure §382; (5) to award Plaintiffs' Counsel attorneys' fees and expenses out of the Settlement Fund (as defined in the Notice of Proposed Settlement of Class Action ("Notice"), which is discussed below); (6) to reimburse Plaintiffs the costs and expenses (including lost wages) they incurred in prosecuting this Action on behalf of the Class out of the Settlement Fund; and (7) the Plan of Allocation should be approved by the Court.
This Action is a securities fraud class action brought on behalf of those Persons who purchased the common stock of PACB during the Class Period, against PACB, ten of its current and/or former key executives and directors, and four Underwriters (collectively, "Defendants") for allegedly misstating and omitting material facts from the Registration Statement and Prospectus filed with the SEC in connection with the October 27, 2010 Initial Public Offering ("IPO"), including: (1) facts relating to the status, development, and effectiveness of a technology to study the synthesis and regulation of DNA (the "RS System"); (2) the existence of serious "bugs" that were plaguing the RS System and causing it to be unstable and unreliable; (3) the effectiveness and accuracy of PACB's technologies for sequencing genomes as compared to their competitors' technologies; (4) the RS System's relatively low raw-read accuracy, throughput and yields; (5) that purchasers of the RS System would have to compromise read length for accuracy; (6) demand for the RS System; and (7) the strength of the competition to the RS System. Plaintiffs allege that these purportedly false and misleading statements inflated the price of the Company's stock, resulting in damage to Class Members when the truth was revealed. Defendants deny all of Plaintiffs' allegations.
IF YOU PURCHASED PACB COMMON STOCK ON OR ABOUT OCTOBER 27, 2010 THROUGH AND INCLUDING SEPTEMBER 20, 2011, INCLUDING IF YOU PURCHASED THE COMPANY'S STOCK PURSUANT AND/OR TRACEABLE TO THE COMPANY'S REGISTRATION STATEMENT AND PROSPECTUS FILED WITH THE SEC IN CONNECTION WITH THE COMPANY'S OCTOBER 27, 2010 IPO, YOUR RIGHTS MAY BE AFFECTED BY THE SETTLEMENT OF THIS ACTION.
To share in the distribution of the Settlement Fund, you must establish your rights by filing a Proof of Claim on or before October 16, 2013. Your failure to submit your Proof of Claim by October 16, 2013 will subject your claim to rejection and preclude your receiving any of the recovery in connection with the Settlement of this Action. If you are a member of the Class and do not request exclusion therefrom, you will be bound by the Settlement and any judgment and Release entered in the Action, including, but not limited to, the Final Order, whether or not you submit a Proof of Claim.
If you have not received a copy of the Notice, which more completely describes the Settlement and your rights thereunder (including your right to object to the Settlement), and a Proof of Claim form, you may obtain these documents, as well as a copy of the Stipulation (which among other things contains definitions for the defined terms used in this Summary Notice) and other settlement documents, online at www.PacificBiosciencesSecuritiesLitigation.com, or by writing to:
Pacific Biosciences Securities Litigation Settlement
c/o GCG
P.O. Box 35072
Seattle, WA 98124-3508
Phone: (866) 297-1225
Inquiries should NOT be directed to Defendants, the Court, or the Clerk of the Court.
Inquiries, other than requests for the Notice or for a Proof of Claim form, may be made to Plaintiffs' Counsel:
ROBBINS GELLER RUDMAN & DOWD LLP
James I. Jaconette, Esq.
655 West Broadway, Suite 1900
San Diego, CA 92101
Phone: (800) 449-4900
Fax: (619) 231-7423
SCOTT+SCOTT, ATTORNEYS AT LAW, LLP
Anne L. Box, Esq.
707 Broadway, Suite 1000
San Diego, CA 92101
Phone: (619) 233-4565
Fax: (619) 233-0508
IF YOU DESIRE TO BE EXCLUDED FROM THE CLASS, YOU MUST SUBMIT A REQUEST FOR EXCLUSION BY SEPTEMBER 25, 2013, IN THE MANNER AND FORM EXPLAINED IN THE NOTICE. ALL MEMBERS OF THE CLASS WHO HAVE NOT REQUESTED EXCLUSION FROM THE CLASS WILL BE BOUND BY THE SETTLEMENT ENTERED IN THE ACTION EVEN IF THEY DO NOT FILE A TIMELY PROOF OF CLAIM.
Dated: June 3, 2013
HON. MARIE S. WEINER
SUPERIOR COURT JUDGE FOR THE STATE OF CALIFORNIA, COUNTY OF SAN MATEO
SOURCE Robbins Geller Rudman & Dowd LLP; Scott+Scott, Attorneys at Law, LLP
RELATED LINKS
http://www.pacificbiosciencessecuritieslitigation.com
http://www.prnewswire.com/news-releases/robbins-geller-rudman--dowd-llp-and-scottscott-attorneys-at-law-llp-announce-proposed-settlement-of-class-action-in-the-pacific-biosciences-securities-litigation-216911841.html
Pacbio: Why We Stopped Using PacBioToCA and Lived Happily Thereafter
This article came out yesterday (7/24/13) A new edited version with updated PACBIO program: When we started working on PacBio data one year back, everyone recommended PacBioToCA. Pause for a moment to imagine how summer of 2012 was. Everyone was talking about Illumina, 454, de Bruijn graph, Velvet assembler and so on, and these ‘weird’ reads show up from nowhere. Using an analogy, everyone is talking about pizza and BioMickWatson shows five other foods that are like genome assembly, namely Eton mess, spaghetti Bolognese, Marmite, ‘macaroni’ cheese and anchovite. The initial impulse is to turn all those into toppings for pizza to make them attractive. That is what PacBioToCA does. It turns PacBios into Illuminas and then let you forget about them. In detail, PacBioToCA painstakingly aligns all Illumina reads on to the PacBios and then locally assemble the Illumina reads. From that point onward, you are back to Illumina world. However, the alignment was incredibly time-consuming. LSC – the same story. It aligns all Illumina reads on to PacBio using Novoalign, an incredibly slow PacBio-unaware aligner. We realized that it made more sense to assemble Illumina reads first and then align them on PacBios.
Over time we learned that any PacBio pipeline not using BLASR is not doing the analysis right. Mark Chaisson spent a lot of time to turn BLASR into an incredibly powerful tool. It includes the read-filtering program. It even has a PacBio read simulator, which, according to Mark, matches experimental data better than the published simulator PBSIM.
The main advantage of BLASR is its knowledge of indels being the primary mode of error in PacBio reads. So, it is very PacBio-aware, which other aligners are not.
Edit.
PacBioToCA also got upgraded, which we have not kept pace with. Here are the latest updates from Michael Schatz and others -
Also, Jason Chin mentioned a better approach -
The linked presentations are available here and most possibly Jason is suggesting the following pipeline -
If you are using Mike Schatz’s method, the following twitter discussion may be of help to you.
www.simplesharebuttons.comShare..... 4 0 0 0 0 .July 24th, 2013 | Category: Pacbio
5 comments to Pacbio: Why We Stopped Using PacBioToCA and Lived Happily Thereafter
Pacbio: Why We Stopped Using PacBioToCA and Liv...
July 24, 2013 at 9:57 am
[...] [...] [...]
.. Mark Chaisson
July 24, 2013 at 9:13 pm
BLASR is very conservative about ending alignments early, as opposed to pushing to the very end of a read. The reason for this is that the end of a read is sometimes hard to nail down. Because of this, BLASR is not very good at mapping Illumina reads since an early termination of an alignment cuts off lots of the read.
If one really insists on correcting pacbio, it always seemed better to do a *very* conservative Illumina assembly, and do some weighted mapping of the resulting contigs to this. No repeats should be resolved, and graph-based error correction would be kept to a minimum.
.. samanta
July 25, 2013 at 1:30 am
Mark, What do you mean by ‘weighted’ mapping? How does one set up the weights?
.. Mark
July 25, 2013 at 8:49 am
Good question – it’s a somewhat open question which is why I didn’t take the time to fill in what that meant. I’ll first define what I mean by a conservative assembly, which then relates to the weighting.
A typical de Bruijn assembly has the following steps:
1. Count k-mer frequency. The multiplicities are sampled from a mixture model where correct k-mers follow a Gaussian centered about the coverage, and incorrect k-mers that have an exponential distribution.
2. Pick a threshold that includes as little of the exponential distribution, and as much as the Gaussian distribution as possible, and build a de Bruijn graph using k-mers with multiplicity at least this cutoff.
3. Perform graph-based error correction – removing “bubbles”, and “tips” – errors in the middle of reads, and errors at the ends fo reads.
4. Use paired-read information to resolve repeats.
The sequence information from steps 2-4 may be used to map to pacbio reads to error correct them. Because data is being reduced, at any of these steps it is possible to remove true-positive sequences that will then leave gaps in the corrected pacbio reads, and possibly create gaps in the assembly, or to mis-assemble a contig and again create a gap in the pacbio assembly. While the number of gaps may be small, the overall effect may be large on the *mighty* N50.
A conservative assembly would remove as little data as possible, and not perform any repeat resolution with mate-pairs. This would result in a more complicated graph that contains more short edges, as well as spurious edges caused by sequencing error. These would be mapped back to the pacbio reads, and one would weight by: 1. length of the match, 2. confidence in the assembled contig, such as average coverage, and 3. number of alternate contigs that map to the same position that support a different consensus sequence.
.. samanta
July 25, 2013 at 12:22 pm
Thanks Mark. It is roughly the same prescription I have been following, as guided by Jason Chin. (for twitter discussions, click on this link)
.. http://www.homolog.us/blogs/blog/2013/07/24/pacbio-why-we-stopped-using-pacbiotoca-and-lived-happily-thereafter/
Debating PacBio: the #SMRTseq Tweet chat
Naomi Attar on July 23, 2013 at 2:59 pm In a recent Correspondence article published in Genome Biology, Rich Roberts, Mike Schatz and Mauricio Carneiro extolled the virtues of Pacific Biosciences’ SMRT sequencing platform. The article sought to address an impression held by many in the field that single-molecule sequencing is not yet a viable option, a view Roberts et al. believe to be wide of the mark. In particular, the long reads and non-biased error patterns produced by SMRT sequencing are praised as useful for a range of genomics applications, alongside the unique feature of direct base-modification readout.
We thought a discussion on the merits of SMRT sequencing and what its long term prospects may be would be timely and of interest to our Twitter audience, and so Genome Biology, together with Beyond The Genome, has decided to hold a Tweet chat.
Some points to consider
Availability: the number of PacBio machines in circulation is currently pretty limited – is there a need for more PacBio services to be provided by those institutes with machines to those without?
Cost: the initial outlay with PacBio is high if you are only going to use it for limited purposes, even though base-for-base it is competitively priced – its limitations mean that many users will use PacBio together with Illumina and so will have to shell out for two platforms. (A hybrid approach shows how SMRT and Illumina reads can complement each other very well).
Improving technology: SMRT sequencing is much improved since its early days and can now be used for many applications, as documented on PacBio’s blog. See, for example, this talk by Eric Schadt on sequencing whole human genomes.
Competitors: two competitors to think about might be Illumina’s Moleculo, which is now on the market, and Oxford Nanopore Technologies’s MinION and GridION, which aren’t.