LT.Swing trade!
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
ASHG 2015: Highlights from the Platinum Genome Session and More
Thursday, October 15, 2015
During the final days of the ASHG meeting last week in Baltimore, a number of scientists offered great presentations based on data generated with SMRT Sequencing, including an entire session on building platinum genomes. We’ve rounded up the highlights here:
Karyn Meltz Steinberg from Washington University’s McDonnell Genome Institute spoke about building a platinum human assembly from single-haplotype genomes. Her team defines “platinum” as covering at least 98% of the sequence with every contig associated with a chromosome. They use long-read PacBio sequencing for de novo sequencing and assembly, followed by scaffolding with BioNano Genomics or Dovetail Genomics technology. When necessary, they then perform PacBio sequencing of BACs for targeted regions, such as gap-filling. Using CHM13 as an example, she shared several examples of specific genomic regions and assembly challenges, both for short- and long-read data. By combining BioNano mapping with PacBio sequence data, they produced a hybrid assembly with 254 contigs, compared to 1,590 contigs for the initial PacBio assembly lacking BioNano mapping.
Bobby Sebra from the Icahn School of Medicine at Mount Sinai talked about an effort to resolve regions in the human genome — such as complex structural variants — that have not been addressed by NGS or Sanger sequencing. Working with the NA12878 genome, Sebra and his colleagues combined PacBio and Illumina sequence data with BioNano mapping. The resulting assembly filled 28 gaps in the latest human reference genome and featured a multi-megabase contig N50 length. The comparison to GRCh38 confirmed previous studies suggesting that tandem repeats and other structural variants are underrepresented in the reference genome; long-read sequencing can effectively characterize these regions. Sebra noted that many challenging regions in the human genome have implications for pharmacogenomics or disease associations, and that detailing these regions carefully will be important for clinical utility of genomics.
In that same session, Justin Zook from the National Institute of Standards and Technology presented on progress at the Genome in a Bottle consortium, including some upcoming reference genomes from Han Chinese and Ashkenazi Jewish family trios. These new genomes have been generated with a number of sequencing technologies, including ones from PacBio, BioNano, 10X, Complete Genomics, Oxford Nanopore, and others. GIAB has already released some reference materials, which scientists are using to help validate variant calls for their own genome assemblies. Zook mentioned tools produced by the CDC and underway at the Global Alliance to allow scientists to compare sequencing data to what other projects have reported. They’re also working on analysis tools to show confidence scores for structural variant calls.
In a separate session, Kiana Mohajeri from the University of Washington reported on a region of chromosome 8 that features the largest known inversion variant in the human genome; it spans several megabases and includes several segmental duplications. Seeking to determine the evolutionary history of this region and to better understand the variation found in human genomes, the team sequenced more than 70 BAC clones with SMRT Sequencing. They produced a gap-free 6.2 Mb tiling path with 99.999% accuracy — a far more complete and contiguous sequence than the human reference genome has for this region. The tiling path shows four inversion-associated repeats with 98% sequence identity flanking the internal inversions. By comparing the region to other primate genomes, they theorize that it was formed between 200,000 and 800,000 years ago, but note that the oldest of the repeats appears to be 19 million years old.
http://www.pacb.com/blog/ashg-2015-highlights-from-the-platinum-genome-session-and-more/
Who is saying what about the new PacBio Sequel system? -- -----
The big news from the world of DNA sequencing this week was that Pacific Biosciences has launched a new sequencing platform. The successor to their RS II platform has been named The Sequel System and it will be on display at the upcoming American Society of Human Genetics meeting. The new system promises a cost of sequencing a human genome (at 10x coverage) for $3,000.
The early buzz already seems pretty positive, and hopefully this sequel will turn out to be more like The Empire Strike Back than, say, Highlander II. What follows is a fairly comprehensive roundup of what people have been saying about this new platform — note that this story has been updated several times since I first wrote it (details of these updates are included at the end of this post):
From PacBio
•The Official Sequel System webpage, which includes this Apple-esque video (with CSO Jonas Korlach taking on the Jony Ive role).
•Details of PacBio's presentation and workshop at the ASHG 2015 meeting are available, with information about people can live stream the workshop.
•Listen to the webcast (a conference call with questions that took place on the morning of October 1st). See below for details of some of the questions that were asked.
From science news websites
•Bio-IT World's take on the news: A Worthy Sequel: PacBio's New Sequencing System.
•GenomeWeb have a page up: PacBio Launches Higher-Throughput, Lower-Cost Single-Molecule Sequencing System (free membership required to read).
'Traditional' news outlets
•NBC Bay Area TV news ran a short piece which strangely omits PacBio's name from the title: Menlo Park Company Aims for 'Precision Medicine'
From blogs
•I think CoreGenomics may have been the first blog to write something about the Sequel: The new Pacific Biosciences sequencer
•The incomparable Mick Watson presents his thoughts in a blog post: What does the PacBio Sequel mean for the future of sequencing?.
•Keith Robison has also weighed in with many detailed thoughts on his blog regarding the news: PacBio Sequel: Smaller Box, Bigger Bang.
•The Biomusings blog has entered into the discussion: What does SEQUEL mean for human genetics?
•From Paul Krzyzanowski's 'The Checkmate Scientist' blog: PacBio's gain would be Illumina's loss in a simple world…
From discussion forums
•There is a discussion unfolding on the SEQanswers forums.
•And as always there are discussions happening on on reddit, see r/bioinformatics and r/biology
From the world of finance
•The Motley Fool take a financial perspective on the news: Why Shares of Pacific Biosciences of California Inc. Soared Today.
•More financial insights at Zacks, 24/7 Wall St, and MarketWatch (among many others).
I guess the question that everyone is asking now concerns the possibility of someone making a genome assembly from sequence data using this platform, and then using this tool to produce a better version of the assembly. In this case, would it be a sequel Sequel SEQuel genome assembly?
Questions from the conference call
There were a lot of questions asked in the hour long conference call. I've transcribed some of them and indicated the time point where you can jump to if you are interested in hearing PacBio's answers to specific questions:
•7:40:"Can you give us some thoughts on turnaround time and cost per genome?"
•11:20:"Can you talk about the use case beyond your current customer base? How this expands the number of applications?"
•15:17:"Can you help us think about some of the major changes that went into the system? Is there still a manifold that moves in three dimensions?"
•19:20:"From a user standpoint, are there any changes to site preparations that you would have to make from Sequel vs RS II; any limitations on things like putting it on 2nd/3rd/4th floor?"
•22:25:"You've introduced a number of kits with various applications for the RS II, will the Sequel be able to run all of the applications from the beginning, or will it take time to introduce certain applications to the system?"
•24:34:"Are there specific customer types that you think are positioned to be more on the earlier side of adoption, such as human sequencing, or microbiology, plant, animal etc.?"
•33:20:"Can you give a perspective on what the scalability of this platform looks like comparatively (to the RS II)?"
•35:08:"In terms of the metrics you gave around price per human genome, can you help us think about that relative to Illumina? If you take a 30x coverage genome on Illumina, what is the equivalent coverage you would need on the Sequel to get something similar…and how long would that take you to do?"
•38:29:"Recognising a lot has been achieved with this launch: different computer architecture, different form factor, new optical systems, higher density, with a smaller footprint. I just want to make sure, there's no compromise in raw accuracy expected relative to the RS II?"
•47:46:"Could you describe in layman's terms the benefits of methylation detection for your system?"
•50:50:"With your technology relative to other platforms, can you help us understand — if you have these larger pieces of the puzzle if you will — how advantageous that could be after you're done generating data, when you get down to assembling the genome?"
•53:16:"I'm curious what percentage of potential customers that looked at the RS II passed given the high price tag? What is the incremental buyer opportunity at the price point of $350,000?"
•57:35:"Still trying to understand what percentage of competitive platforms you think you can swap out with the Sequel?"
Updates
2015-10-01 13.46: Added some more sources of news, including questions asked in conference call
2015-10-01 20.04: Added in more conference call details, with time points of different questions.
2015-10-01 20.39: Added Keith Robison's blog post
2015-10-02 06:34: Changed link for Bio-IT World's piece
2015-10-02 09.08: Added more links about PacBio's presentation at ASHG 2015
2015-10-02 09.41: Added link to CoreGenomics post and added disclaimer
2015-10-02 11.54: Added links to Sequel-related discussions on SEQanswers and reddit
2015-10-02 13.28: Added Biomusings and Checkmate Scientist blog posts, and split main part of article into different sections
2015-10-12 09.52: Addition of NBC Bay Area News piece
http://www.acgt.me/blog/2015/10/1/who-is-saying-what-about-the-new-pacbio-sequel-system
ASSEMBLING HIGH QUALITY HUMAN GENOMES:
GOING BEYOND THE ‘$1,000 GENOME’-- http://www.pacb.com/wp-content/uploads/2015/09/Assembling-High-Quality-Human-Genomes.pdf?utm_source=mssocial&utm_medium=social&utm_campaign=hs
Highly Sensitive and Cost-Effective Detection of BRCA1 and BRCA2
Cancer Variants in FFPE Samples Using Multiplicom’s MASTR
Technology & Single Molecule, Real-Time (SMRT®) Sequencing------https://s3.amazonaws.com/files.pacb.com/pdf/Highly-Sensitive-and-Cost-Effective-Detection-of-BRCA1-and-BRCA2-Cancer-Variants-in-FFPE-Samples-Using-Multiplicoms-MASTR-Technology-and-Single-Molecule-Real-Time-Sequencing.pdf-- Full-length cDNA Sequencing of Alternatively Spliced
Isoforms Provides Insight into Human Cancer https://s3.amazonaws.com/files.pacb.com/pdf/Full-length-cDNA-Sequencing-of-Alternatively-Spliced-Isoforms-Provides-Insight-into-Human-Cancer.pdf
Comprehensive genome and transcriptome structural analysis of a breast cancer cell line using PacBio long read sequencing. ---- -Genomic instability is one of the hallmarks of cancer, leading to widespread copy number variations, chromosomal fusions, and other structural variations in many cancers. The breast cancer cell line SK-BR-3 is an important model for HER2+ breast cancers, which are among the most aggressive forms of the disease and affect one in five cases. Through short read sequencing, copy number arrays, and other technologies, the genome of SK-BR-3 is known to be highly rearranged with many copy number variations, including an approximately twenty-fold amplification of the HER2 oncogene, along with numerous other amplifications and deletions. However, these technologies cannot precisely characterize the nature and context of the identified genomic events and other important mutations may be missed altogether because of repeats, multi-mapping reads, and the failure to reliably anchor alignments to both sides of a variation. To address these challenges, we have sequenced SK-BR-3 using PacBio long read technology. Using the new P6-C4 chemistry, we generated more than 70X coverage of the genome with average read lengths of 9-13kb (max: 71kb). Using Lumpy as well as our novel assembly-based algorithms for analyzing split-read alignments, we have developed a detailed map of structural variations in this cell line. Taking advantage of the newly identified breakpoints and combining these with copy number assignments, we have developed an algorithm to reconstruct the mutational history of this cancer genome. From this we have characterized the amplifications of the HER2 region, discovering a complex series of nested duplications and translocations between chr17 and chr8, two of the most frequent translocation partners in primary breast cancers. We have also carried out full-length transcriptome sequencing using PacBio’s Iso-Seq technology, which has revealed a number of previously unrecognized gene fusions and isoforms. Combining long-read genome and transcriptome sequencing technologies enables an in-depth analysis of how changes in the genome affect the transcriptome, including how gene fusions are created across multiple chromosomes. This analysis has established the most complete cancer reference genome available to date, and is already opening the door to applying long-read sequencing to patient samples with complex genome structures. https://ep70.eventpilotadmin.com/web/page.php?page=IntHtml&project=ASHG15&id=150121871
Tue, 10/06/2015 -Long Read Sequencing Dramatically Improves Blood Matching: Steven Marsh, Anthony Nolan Institute - ("One of the popular questions on the program this past year is how those doing sequencing decide between the quality of Pacific Bioscience's long reads and the cheaper short read technology, such as that of Illumina or Thermo Fisher. Today’s guest provided the most clear and dramatic answer yet: use the PacBio system exclusively"). -- Listen 0:00 Anthony Nolan and better registries for blood matching (3:53) Listen 3:53 A new world for HLA typing (5:26) Listen 9:19 Long reads and sequence based typing (3:13) Listen 12:33 The future of blood matching (5:26) Listen 17:59 Exclusively using PacBio now (5:30) One of the popular questions on the program this past year is how those doing sequencing decide between the quality of Pacific Bioscience's long reads and the cheaper short read technology, such as that of Illumina or Thermo Fisher. Today’s guest provided the most clear and dramatic answer yet: use the PacBio system exclusively. We heard this from Steve Marsh, the director of bioinformatics at the Anthony Nolan Research Institute in London. Established in 1974 by the mother of a boy with a rare blood disease, the Anthony Nolan Institute is a world leader in blood crossmatching and donor/patient registries. Steve and his team at the Institute have dramatically improved the resolution of HLA typing, one of the methods for matching a donor’s blood tissue with that of the transplant recipient. Steve says that thirty years ago when he entered the field, HLA typing was performed with serology and there were just 119 HLA antigens that were known. “We thought 119 was a lot of diversity,” says Steve. With the advent of genomic tools in the 90’s, HLA typing moved to the level of the genetic allele, done first with PCR and then with sequencing. “We knew that the HLA molecules were polymorphic, but now we know they are hyper-polymorphic. . . For example, 'A2' is a specificity, and serologically we recognized the specificity 'A2,' and that was it. We now recognize that there are over 500 different variants of A2,” Steve explains. Knowing more about the incredible diversity of blood types can make achieving a donor/patient match seem all the more prohibitive. More variables mean fewer candidates. But research at the Anthony Nolan is now paying off and is robust enough to make a difference in the clinic. Ideally blood registries will provide precise matches and do so immediately. All this explains why Steve is so keen on the PacBio system. In a field where the quality of the sequencing makes the difference between the right match and not, the increased price of the PacBio is worth it, Steve says. Finally, it should be said that we recorded this interview with Steve just before PacBio announced their new higher throughput Sequel Sequencer. This new smaller footprint instrument is promised out in 2016 at half the price, seven times the throughput and with the same high quality long reads. The decision between quality and price in the world of sequencing will soon be easier. Recommended For You Long Read Sequencing Dramatically Improves Blood Matching: Steven Marsh, Anthony Nolan Institute The World of DIY Genomics with K T Pickard Sequencing in Space: Chris Mason, Cornell Is the Future of Biology a Return to Chemistry? Carolyn Bertozzi, Stanford Going Beyond the $1,000 Genome with Mark Gerstein Creating the Foundation of Genomics: Marc Salit, NIST Still Unhappy with FDA’s Plan to Regulate LDTs, Professional Lab Groups Go Direct to the Senate A Diagnostic Success Story with Alka Chaubey, Greenwood Genetic Center With Two New Easy-to-Use Sequencing Instruments, Thermo Readies for Primetime in the Clinic Here's Looking at Euclid Looking for Mendelspod guest contributors? - See more at: http://mendelspod.com/podcasts/long-read-sequencing-dramatically-improves-blood-matching-steven-marsh-anthony-nolan/#sthash.GPnnEKoc.vdYUIZd0.dpuf
Illumina Downgraded by Leerink, Shares Plummet Over 10%
1:40 pm ET October 2, 2015 (Zacks)--(According to Investor’s Business Daily, this downgrade was partly induced by the earlier-than-expected launch of the new Sequel high-throughput gene-sequencing system of genomic services provider – Pacific Biosciences of California, Inc. PACB. The Sequel system is a direct competitor of Illumina’s NextSeq system.-
Shares of Illumina Inc. ILMN plunged more than 10% to reach $157.21 yesterday, after an analyst at investment bank Leerink Partners downgraded the company from Outperform to Market Perform. Consequently, the target price for the company also suffered a major fall from $225 to $185.
Notably, the analyst particularly expressed his concern over Illumina’s ability to successfully penetrate various diagnostics and consumer applications, citing uncertainty regarding the growth potential of the company’s next-generation sequencing (NGS) products in the research market. the target price for the company also suffered a major fall from $225 to $185.
OH AND
Pacific Biosciences of California, Inc. (PACB) ?
6.02 Up +0.57(+10.48%) NASDAQ - As of 03:35pm EDT
Thursday, October 1, 2015- by Paul Krzyzanowski-
("Regular readers here know that I think Illumina is a great company and is here to stay, but the quality of long read data trumps the cost in many situations. PacBio has just validated that concept, which is why Illumina's absolute dominance suddenly doesn't feel that guaranteed anymore.")
PacBio's gain would be Illumina's loss in a simple world...
...but the DNA sequencing market is anything but simple. The past several years have seen several generations of machines from the incumbent companies, principally Illumina and ThermoFisher, with PacBio a distant third. Until yesterday's reveal of the Sequel System as a successor to the RS II.
Naturally, shares of PacBio stocks popped today, rising 58% by the late afternoon of the trading day, in stark contrast to the 10% drop in Illumina. I'll get to why it's pretty clear that PacBio is in a very good position compared to Illumina, but these swings are even more impressive if you consider what happened with their respective market caps. PacBio's value increased by about $121 million, while Illumina went down by about $2.7 billion.
Why the discrepancy?
Running through possible scenarios like overestimation of Illumina's future cash flows (likely) or other non-PacBio sequencing platforms being viewed as the main beneficiaries today (not so much) sheds some light on what's happening. It's clear that PacBio insiders like Michael Hunkapiller had huge confidence in their long read technology for years now, and here's why, according to Fool contributor George Budwell:
The problem is that the PacBio RS II appears to be too costly to use on the huge scale required by Big Pharma to hopefully usher in the era of personalized medicine, mainly by improving in vitro diagnostics. The Sequel System, by contrast, was specifically designed to be used as part of Roche's efforts in the area of human in vitro diagnostics by reducing sequencing costs and the size of the sequencer, meaning that this machine could be a major breakthrough in terms of producing medicines that are tailor made for the individual.
Yes, cost was a big problem for adoption, but it's only part of the story.
Medical research centers, where personalized medicine is being developed today, largely rely on Illumina sequencers because the technology is entrenched. It's understood by techs and genomics facilities. It's (relatively) easy to analyze. In contrast, PacBio machines were less pervasive and people didn't get a lot of experience with using them or the data coming off them, making them "too much of a research tool even for researchers".
However, the longer reads that come off the PacBio (5+ contiguous kilobases vs 100-300 unpaired bases off Illumina machines) are much more useful for a variety of research and medical applications, with two principal ones standing out:
1.Long reads let you easily map out how cancer genomes have been messed up in the disease, which is a very important realization in the past year,
2.PacBio technology lets you find cancer related transcripts (i.e. gene fusions, like BCR-ABL in leukemias)
Medical researchers know that these things are pretty hard to do, even with the ridiculous throughput of Illumina machines today (I'm sure someone, somewhere, once thought that '640,000 reads ought to be enough for anyone'), and in fact, Illumina has been trying to get long-read sequencing off the ground since at least 2013.
However, PacBio has the potential to produce data that's much, much better for these two purposes, which is a huge win for the clinical sequencing market. Cost is definitely a factor; The reduced capital cost of the PacBio ($350k) makes it more likely that the adoption barrier I've just described can be overcome in the near future.
Regular readers here know that I think Illumina is a great company and is here to stay, but the quality of long read data trumps the cost in many situations. PacBio has just validated that concept, which is why Illumina's absolute dominance suddenly doesn't feel that guaranteed anymore.
http://www.checkmatescientist.net/2015/10/pacbios-gain-would-be-illuminas-loss-in.html?utm_source=feedburner&utm_medium=twitter&utm_campaign=Feed%3A+TheCheckmateScientist+%28The+Checkmate+Scientist%29
("So how does the new PacBio Sequel change the market? A lot of initial reactions I have had are that the Sequel is a real threat to Oxford Nanopore. It certainly ramps up the competition in the long read space, which is a really good thing. But actually, high-throughput long read machines like the Sequel and the PromethION don’t spell the end for one another – they actually spell the beginning of the end for Illumina – as a sequencing platform.")--
What does the PacBio Sequel mean for the future of sequencing?
7 Replies
PacBio have responded to intense competition in the sequencing market by releasing the Sequel, a machine that promises 7 times the throughput of their last machine (the RSII) at a price of $350k (the RSII cost more in the region of $750k). So they have a cheaper, higher throughput machine (though I haven’t seen cost-per-Gb figures). That’s around 7 Gigabases of reads averaging around 15Kb in length. Without doubt this is a very interesting platform and they will sell as many of them as they can produce. A sweet spot for assembly is still 50-60X for PacBio, so think 12 SMRT cells to get a human genome: let’s say £12k per genome. Edit 01/10/2015 my maths at 7am was not that good! 50X human genome is 150Gb, so that’s 21 SMRT cells and £21K per human genome. Much more expensive than Illumina’s $1000 genome, but far, far better.
I just want to say that at times I have been accused of being an Illumina- and a nanopore- fanboy; I am neither and both. I am just a fan of cool technology, from microarray in the 90s to sequencing now.
In long reads, let’s be clear, we are talking about the promise of Oxford Nanopore vs the proven technology of PacBio. And the Sequel changes the dynamics. However, the MinION fast mode is capable of throughput in the region of 7Gb (like the Sequel) and the PromethION is capable of throughput on Illumina-scale. Therefore, Oxford Nanopore are far from dead – though they need to respond.
So how does the new PacBio Sequel change the market? A lot of initial reactions I have had are that the Sequel is a real threat to Oxford Nanopore. It certainly ramps up the competition in the long read space, which is a really good thing. But actually, high-throughput long read machines like the Sequel and the PromethION don’t spell the end for one another – they actually spell the beginning of the end for Illumina – as a sequencing platform.
As soon as you have high-throughput, cheap long reads, it is in fact Illumina who face a problem. I love Illumina. When I first arrived at Roslin, I walked into our lab and (honestly!) stroked our Illumina GAIIx. Illumina have revolutionised biology. However, short reads have limitations – they are bad for genome assembly, they are bad at complex genomes, they’re actually quite bad at RNA-Seq, they are pretty bad for structural variation, they are bad at haplotypes and SNP phasing, and they are not that great at metagenomics. What has made Illumina the platform of choice for those applications is scale – but as soon as long read technologies reach a similar scale, Illumina looks like a poor choice.
The Sequel (and the PromethION) actually challenge Illumina – because in an era of cheap, long read sequencing, Illumina becomes a genotyping platform, not a sequencing platform.
7 thoughts on “What does the PacBio Sequel mean for the future of sequencing?”
binay panda
October 1, 2015 at 7:27 am
this is good news for the community. pacbio, along with oxford nanopore, represent proven long read technology that will help generate near complete genome and accurate transcriptome assembly. however, the criteria like read length, ability to assemble genomes accurately etc etc are not, imo, going to matter for wider market penetration. cost of the box, the easiness/availability of assay/chemistry kit and access to a winder distributor/support network, are going to play imp role. this is true assuming that the company can raise in an excess of $200million dollars before either going public or getting sold to another company in order to sustain internal research and development. this is an insane amount of money.
no matter which company wins, none are interested in working towards democratizing global science, i am afraid. they are not making access to the technology early, and making it widely & globally available. companies are restricting access to a certain geography at an early stage (perhaps there are support issues), therefore giving a head start to the researchers in that region. there is far more interesting scientific and societal applications globally that can be resolved using sequencers but early access and access to expert technical help are current obstacles (of course, along w access to money).
Reply ?
Kimmo Palin
October 1, 2015 at 8:11 am
Few issues: You say “..the MinION fast mode is capable of throughput in the region of 7Gb (like the Sequel) and the PromethION is capable of throughput on Illumina-scale.” Is that statement of fact that you can back up with data or reiteration of what ONT has said? (if you can call it a fact, I’ll turn red with jealousy and start throwing things ;)
The application that Illumina still has an edge is counting. Granted that ChIP-exo and Hi-C might be fine with PromethION but stuff like SELEX, where you need to count the ‘correct’ sequences without help of the reference, seems out of reach for single molecule methods (at least for now).
K
Reply ?
biomickwatson Post author
October 1, 2015 at 8:25 am
As I said in the post, it is the promise of Nanopore vs the proven tech of PacBio. But with ONT, the nanopore is the sequencer, and therefore it is possible to massively parallelise the technology in a way that isn’t possible with Illumina and PacBio.
I am not sure what SELEX is, but certainly for “counting” RNA (transcriptomics), long reads would beat Illumina every time – providing there is necessary throughput.
Reply ?
Ian Sudbery
October 1, 2015 at 1:34 pm
This ONT really ever have the throughput to accurately quantify transcriptions across many samples/conditions. Will there come a time in the near future when I can assemble sufficient nanopores in parallel to run 30 RNAseq samples on an ONT, with sufficient throughput on each sample to detect 1.5x differences in expression for the same cost/time as illumina? Or assays like HiC, where you need read counts in the 100s of millions.
I suppose what I’m asking is how long is it going to take for the PromethION to reach illumina-scale throughput (in terms of read counts rather than Mb of sequence) at the same costs?
Boyan
October 1, 2015 at 2:02 pm
“Massive parallelism” and “nanopores” in the same sentence are yet to be proven compatible. The Promethion is 48 minions in a single box. Scalability is an issue that ONT has yet to explain how they will tackle. Not saying it is impossible, just that it is highly non obvious mostly for hardware reasons. Fast mode is the answer they have provided so far, but when/if it becomes a reality it is a one time bump.
A question on your PacBio human genome calculation. 3G at 50X coverage at 7G per cell works out to 21 cells not 12.
Pingback: What does the PacBio Sequel mean for the future of sequencing? | r software hub
Edwin
October 1, 2015 at 4:01 pm
A few issues here. 50-70x is no longer the sweet spot. cited here:
http://wgs-assembler.sourceforge.net/wiki/index.php/Version_8.3_Release_Notes
Algorithms for better assemblies using lower coverage are improving. Also remember this is on 1 smrt cell now not multiple so the biases that existed by using multiple smrt cells is also gone.
Also the new insert libraries are at 30kb not 15kb.
Reply ?
https://biomickwatson.wordpress.com/2015/10/01/what-does-the-pacbio-sequel-mean-for-the-future-of-sequencing/ -----
Who is saying what about the new PacBio Sequel system?
October 01, 2015 http://www.acgt.me/blog/2015/10/1/who-is-saying-what-about-the-new-pacbio-sequel-system?utm_content=bufferdd2c4&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
NOT TO BASH A STOCK,(IHUB POLICY)!!!!!!!!!!!!!!!!!!!!!!!!!
DO SOME DD!! If you are just a shorter,you don`t belong here!!
Wednesday, September 30, 2015 --Introducing the Sequel System: The Scalable Platform for SMRT Sequencing --("lower sequencing project costs compared to the PacBio® RS II System").-- We are excited to announce our newest Single Molecule, Real-Time sequencer, the Sequel™ System. Watch this short video to learn about this exciting evolution in SMRT® Sequencing.
The Sequel System provides higher throughput, more scalability, a reduced footprint and lower sequencing project costs compared to the PacBio® RS II System, while maintaining the benefits of SMRT technology. The core of the Sequel System is the capacity of its redesigned SMRT Cells, which contain one million zero-mode waveguides (ZMWs) at launch, compared to 150,000 ZMWs in the PacBio RS II. Active individual polymerases are immobilized within the ZMWs, providing windows to observe and record DNA sequencing in real time.
With about seven times as many reads per SMRT Cell as the PacBio RS II, customers should be able to realize lower costs and shorter timelines for sequencing projects, with approximately half the up-front capital investment compared to previous technology. The Sequel System occupies a smaller footprint — less than one-third the size and weight — compared to the PacBio RS II. Since the new system is built on the company’s established SMRT Technology, most aspects of the sequencing workflow are unchanged.
The Sequel System is ideal for projects such as rapidly and cost-effectively generating high-quality, whole-genome de novo assemblies for larger genomes, such as human, plants, and animals. It can provide characterization of a wide variety of genomic variation types, including those in complex regions not accessible with short-read or synthetic long-range sequencing technologies, while simultaneously revealing epigenetic information. The system can also be used to generate data for full-length transcriptomes and targeted transcripts using the company’s IsoSeq™ protocol. The Sequel System’s increased throughput should also facilitate applications of SMRT Technology in metagenomics and targeted gene applications for which interrogation of larger numbers of individual DNA molecules is important.
In today’s press release, Michael Hunkapiller, Ph.D., CEO of Pacific Biosciences, commented: “We are extremely proud to introduce the Sequel System, which provides access to the existing benefits of SMRT Sequencing, including long reads, high consensus accuracy, uniform coverage, and integrated methylation information — a set of core attributes first pioneered with the PacBio RS. The system’s lower cost and smaller footprint represent our continued commitment to leveraging the scalability of our technology and the unique characteristics of SMRT Sequencing.”
“We will continue to support our PacBio RS II customers, and we expect to introduce improvements in sample prep, sequencing chemistry, and software that will extend the performance of that system, as we have done each year since the initial commercialization of the PacBio RS in 2011 and the PacBio RS II in 2013. We expect to make similar, substantial performance improvements each year for the Sequel System,” added Dr. Hunkapiller. “In addition, the Sequel architecture provides the ability to scale throughput by substantially varying the number of ZMWs on future SMRT Cells, thereby optimizing throughput and operating costs for specific applications.”
On Display at the ASHG Annual Meeting
We will showcase the Sequel System in our booth (#907) at the American Society of Human Genetics annual meeting taking place in Baltimore, Maryland, beginning October 6, 2015.
Whether or not you are attending the meeting, you can still attend our workshop, “Addressing Hidden Heritability through Long Read Single Molecule, Real-Time (SMRT) Sequencing,” on Wednesday, October 7, from 1:00-2:30 p.m. The event will be hosted by Michael Hunkapiller and Jonas Korlach from Pacific Biosciences, and include talks by Richard Gibbs from Baylor College of Medicine and Richard Wilson from Washington University in St. Louis. Those attending the conference in Baltimore can register here. We will also offer live streaming and access to the recording.
For more information about the Sequel System, visit our new website.
http://www.pacb.com/introducing-the-sequel-system-the-scalable-platform-for-smrt-sequencing/ (link 2 ). http://www.pacb.com/introducing-the-sequel-system-the-scalable-platform-for-smrt-sequencing/ --5:54 pm Pacific Biosciences announces launch of a new nucleic acid sequencing platform, The Sequel System (PACB) :
The Sequel System provides higher throughput, more scalability, a reduced footprint and lower sequencing project costs compared to the PacBio RS II System, while maintaining the existing benefits of the company's SMRT Technology. Pacific Biosciences will showcase the new product at the American Society of Human Genetics annual meeting taking place in Baltimore, Maryland beginning October 6, 2015.
•The Sequel System has been developed as part of the company's collaboration withF. Hoffman-La Roche Ltd (Roche) to ultimately provide a nucleic acid sequencingsystem for use in human in vitro diagnostics. Under that agreement, Roche agreedto pay Pacific Biosciences a total of $40 million in milestone payments relatedto the development of the Sequel System. The company previously reported that ithas earned $20 million to date, and now expects to earn the remaining $20 million during the fourth quarter of 2015.
Creating the Foundation of Genomics: Marc Salit, NIST Thu, 09/10/2015 - 10:15 -- Marc Salit, Leader of Genome Scale Measurement Group at the National Institute of Standards and Technology or NIST Chapters: Listen 0:00 Why should folks in genomics know about NIST? (6:08) Listen 6:08 How close are you to a complete human reference genome? (6:08) Listen 12:17 Using all sequencing technologies (8:04) Listen 20:21 Which NIST standards is the FDA using? (5:28) Listen 25:49 What will success look like? (4:29) What is a human genome? Well it’s the three billion letters of our DNA. But how is it measured? How do we know when we have it accurately represented? These are questions that will have to be answered as precision medicine takes hold; for we must have defined standards that will be the basis for regulatory policy, commerce, and better research. These are also the questions that are foremost on the mind of today’s guest. Marc Salit is the leader of the Genome Scale Measurement Group at the National Institute of Standards and Technology or NIST. In today’s show, he explains how NIST played a pivotal, foundational role in enabling the ‘Century of Physics.' Now Marc and NIST are looking for the right set of standards to enable the already-upon-us “Century of Biology.” The human reference genome is an example of a standard that Marc and his team are developing. Currently they are piloting what they call “Genome in a Bottle,” a physical reference standard to which all other human genomes can be measured. How far is the team to having a complete reference genome, and what is an example of the way they are working with the FDA to ensure safe and meaningful genomic tests? Join us as we peer in at the foundation of genomics. - See more at: http://mendelspod.com/podcasts/creating-foundation-genomics-marc-salit-nist/#sthash.f1b3af1U.dpuf
September 10th, 2015 Pacbio Bioinformatics Workshop – Slides from All Presentations--- A few days back, we posted the summary and the slides of Gene Myers’ talk at the recent bioinformatics workshop arranged by Pacbio. Readers interested in the remaining talks will find the following slides useful. While reading the slides, you may go to #SMRTBFX hashtag in twitter to find out snippets from the corresponding talks.
We are building an online teaching module (same model as this) on algorithms and programs related to analysis of long noisy reads, and Jason Chin has been gracious enough to help us with it. If you are interested in taking the class, please email ‘pandora at homolog.us’ and we will let you know, when the module is ready.
SMRT Informatics Developers Conference – Kevin Corcoran, Senior Vice President, Market Development, Pacific Biosciences
Making the Most of Long Reads – Gene Myers, Ph. D., Founding Director, Systems Biology Center, Max Planck Institute
PacBio SMRT Analysis 3.0 Preview – David Alexander, Ph.D.,Pacific Biosciences
Shotgun Presentations
MinHash for Overlapping and Assembly – Sergey Koren, Ph.D., National Biodefense Analysis and Countermeasures Center
The “Art” of Shotgun Sequencing – Jason Chin, Ph.D., Pacific Biosciences
PBHoney: Detecting SVs with Long-Read Sequencing – Adam English, Ph.D., Baylor College of Medicine
Structural Variation with PacBio Data – Ali Bashir, Ph.D., Mount Sinai School of Medicine
The Iso-Seq™ Method: Transcriptome Sequencing Using Long Reads – Elizabeth Tseng, Ph.D., Pacific Biosciences
CONVEX: De novo Transcriptome Error Correction by Convexification, David Tse, Ph.D., Stanford University
Transcriptome Analysis using Hybrid-Seq – Kin Fai Au, Ph.D., University of Iowa
Understanding Methylome, Metagenome, Structural Variants using SMRT Sequencing – Shinichi Morishita, Ph.D., University of Tokyo
-------------------------------------------------------------------------------------------------------------
Heroes and Heroines of New Media--2015
Our blog is deeply honored by the generous contribution of the following readers. Without their patronage, this site would go away.
Outstandingly Generous:
Amemiya C. Schnable J. Bowman B. Osipowski P.
Shen M. Furness M. Graur D. Diesh C.
Amemiya C.
http://www.homolog.us/blogs/blog/2015/09/10/pacbio-bioinformatics-workshop-slides-from-all-presentations/
Wednesday, August 26, 2015
The Road to Hell is Paved with Bioinformatics Formats
If you really want to raise a bioinformaticist's blood pressure, loudly declare your new tool generates output in brand new data formats. This leads to the frequent observation that a large fraction of bioinformatics work is simply converting formats. It is probably consensus that the field is awash in too many formats, though it is equally clear that we can't agree on which should survive. Between some recent news and a Twitter thread on the subject that erupted last night, there was a bunch of fodder for me to collect in a Storify -- and to lay out my own idiosyncratic views.
For example, today Pacific Biosciences announced at their bioinformatics conference that they are moving off HDF5 for read data and will go to unaligned BAM. For the point of view of existing bioinformatics tools, that's a win -- many tools can consume BAM. Except, of course, most tools that take in unaligned data. And while BAM has its merits, metadata is stored in a very simple tag-value format. HDF5 had the problem of a lot of tools aren't well developed; one reason I tried out Julia last year was to deal with the HDF5 files from Oxford Nanopore's MinION; Perl' HDF5 library choked on them.
One sentiment you'll find in the Storify is a hope that Oxford will abandon HDF5 - but it is certainly not my wish. HDF5 is a sophisticated structured (and compressed) format, enabling a rich representation of metadata. For Nanopore and probably many future sequencing technologies, converting to FASTQ or BAM would lose a lot of information. Indeed, tools which use signal-level data from MinION are already appearing, such as nanopolish.
Sadly, the history of bioinformatics seems to be littered with an aversion to richly-structured data formats. NCBI tried to push ASN.1 as a format for Genbank back when I was a graduate student, but as far as I can tell it never caught on outside NCBI. XML, which is similar, seems to have made limited headway, showing up in schemes such as Systems Biology Markup Language (SBML), but seemingly avoided as often as used. Actually, there is a phylogenetic tree file I was playing with the other day in NeXML format, which is great -- except when I handed it off to a colleague whose tree viewer couldn't use it.
The big advantage of XML, ASN.1, HDF5, YAML is the nightmare of parsing bad formats is eliminated; there are real standards here. Compare this with all the ways a Genbank flatfile can be botched (I see this complaint routinely on Twitter), in part because Genbank files (and I think PDB; haven't munged one in a while) retain the mainframe-era penchant for the lateral position of text being meaningful. Format validators exist for these formats, meaning that it is straightforward to prove that a given file is legal. Now, whether it is gibberish is a different question; NCBI had to invest a lot of time early on cleaning up bad ranges and such in the Genbank data they inherited.
In contrast, take FASTQ format -- please. It is very simple, which is a little nice, but mostly written and read by computers, which should be able to deal with complexity. The cost of the simple is an inability (inherited from FASTA format) for any sort of standardized structured metadata. For example, just storing what quality encoding scheme is in use would be worth a mint, let alone which platform generated the data. FASTQ doesn't seem to break many programs -- in contrast with FASTA in which all sorts of arguments erupt over what is legal and illegal metadata encoding in the header (hint: there is no standard, so anything goes!)
Also consider this vision: a lot of high throughput sequencing consists of reading FASTQ files, making minimal changes to them (primarily trimming) and writing new FASTQ files. Imagine if instead of keeping a trail of FASTQ files, the reads were all stored in a simple relational database (SQLite is a gem for this sort of thing) and all those transformations stored as edits or replacements of the original sequence, with the metadata trackable throughout. Sure, those indexes and structures and metadata would consume some space, but far more would be saved. It's (at least to me) a beautiful idea -- except no tool out there could use it.
On the bright side, some formats have died. When I was an undergraduate, every database and sequence handling program seemed to have its own format. One of my first graduate school projects was a multi-format parser so I could read files in FASTA, Genbank, EMBL, SwissProt (very similar to EMBL) and GCG formats, and write FASTA, Genbank and GCG -- but that was hardly the whole the spectrum of formats in use those days (but it was the set I needed to deal with). I wish I could say that nobody is reinventing that wheel, but at Starbase I'm frequently asking for files sent to me to be converted out of Geneious or DNA*STAR formats . You'd think these folks would quit inventing proprietary binary formats, but noooo. Reminiscent of ABI, which for a long time kept the binary format for Sanger tracefiles a secret, with regular changes to screw up anyone who had hacked the format. Yet another advantage to formats such as XML that come with a format definition -- it is often possible to write parsers that simply ignore the parts of the file they don't understand, so long as the XML (or ASN.1 or similar) is valid.
While I'm ranting, a related curse is that every programmer feels a need to come up with a different way of naming the parameters and a different way of collecting them in a file. I feel a little churlish complaining, as I love these three tools, but wouldn't it be nice if MIRA, Celera Assembler and SPADES had parameter file formats that were at least closely related to each other? I really admire the effort that went into Nucleotid.es; just contemplating putting together all those config files would probably cause me to nix the project.
So in summary, I'd vote for rich formats which are precisely defined so that the headache of parsing, mis-parsing & crashing parsers can be behind us. Stop coming up with yet another tab-delimited mess (though I'm afraid this is a bit of a do-as-I-say-not-as-I-do -- I'm terrible about imposing such on myself). If you start a project, try to look around for something existing to at least steal generously from, instead of inventing yet another idiosyncratic format for bioinformaticians to curse out.
http://omicsomics.blogspot.com/2015/08/the-road-to-hell-is-paved-with.html
Posted by Keith Robison at 11:52 PM
Complete Genome Sequence of Achromobacter xylosoxidans MN001, a
Cystic Fibrosis Airway Isolate--- (This work was supported by the University of Minnesota Medical School.
We thank Karl Oles (Mayo Clinic Bioinformatics Core) for performing
PacBio library preparation and sequencing, and Kevin Silverstein,
John Vestrum, and the staff at the UMN MSI for their efforts in installing
and supporting PacBio SMRT Analysis. We also thank the patients and
caregivers at the UMN Adult CF Treatment Center for their support of
this research.) Achromobacter xylosoxidans is an aerobic Gram-negative bacterium
that is widely distributed throughout freshwater and soil
environments. This bacterium is also an opportunistic human
pathogen of immunocompromised hosts and is commonly associated
with a range of respiratory infections. In individuals with
cystic fibrosis (CF), A. xylosoxidans has seen an increase in prevalence,
with some treatment centers reporting positive culture rates
as high as 17.6% (1). Culture-independent 16S rRNA gene studies
suggest that this frequency might be even higher. Despite its association
with chronic lung infections and poor pulmonary function
scores (2), the impact of A. xylosoxidans on CF disease progression
is not entirely clear. This uncertainty, in addition to the bacterium’s
multidrug resistance (3), robust biofilm formation (4), and
transmissibility (5), warrants further study of the molecular basis for
its phenotypes. Here, we report the complete genome sequence of
A. xylosoxidans MN001—a multidrug-resistant isolate recovered
from multiple patients at the University of Minnesota Cystic Fibrosis
Center (institutional review board approval 1401M47262).
Genomic DNA (gDNA) was isolated from MN001 using the
Wizard purification kit (Promega) and sequenced using singlemolecule
real-time (SMRT) and Illumina technologies. SMRT libraries
were constructed according to Pacific Biosciences protocols
with a 20-kb insert size. Following ligation of SMRTbell
adapters, sheared gDNA was size selected with a 4-kb cutoff using
Blue Pippin electrophoresis (Sage Science) to generate a greater
fraction of long reads capable of resolving repeat sequencing in the
A. xylosoxidans genome. Sequencing was performed using the
PacBio RS II platform. Subread filtering from 2 SMRTcells captured
with a 240-min movie and P6-C4 chemistry yielded 592 Mbp of sequence
reads with an average read length of 11,520 bp ( N50, 16,106
bp). Assembly was performed using the Hierarchical Genome Assembly
Process (HGAP) version 3 (6) in SMRT Analysis version 2.2
hosted on the University of Minnesota supercomputer. Remaining
indels were removed with three successive passes through Quiver to
achieve a final consensus accuracy of 99.9997% (QV 56) at 100
coverage. This assembly consisted of one circular contig representing
a 5,876,039-bp chromosome. Illumina libraries were analyzed using MiSeq with 250-bp paired-end sequences, yielding ~2.5 million
reads. Reads were mapped onto the SMRT-derived contig using
breseq version 0.24rc6 (7),andthe 10 indels remaining after polishing
were corrected using Pilon version 1.10 (8), yielding essentially perfect
final per-base accuracy. TheMN001genome was annotated with
Prokka version 1.11 (9) using A. xylosoxidans NH44874-1996 (4) as
the reference genome.
At 5.8 Mbp, the MN001 genome is remarkably smaller than the
seven A. xylosoxidans genomes published to date. The GC content
is 67.72%, which is consistent with previously sequenced
strains. A total of 5,328 genes, including 10 rRNA (3 16S, 3 23S, 4
5S) and 67 tRNA genes, were annotated. Function was assigned
for 4,333 of 5,251 predicted coding sequences (82%). Detailed
analyses of the MN001 genome, including comparative studies
with other A. xylosoxidans strains are in progress.
Nucleotide sequence accession numbers. The assembly and
annotation have been deposited in GenBank under the accession
number CP012046. PacBio and Illumina reads have been deposited
to the NCBI Sequence Read Archive under BioProject number
PRJNA288995.
ACKNOWLEDGMENTS
This work was supported by the University of Minnesota Medical School.
We thank Karl Oles (Mayo Clinic Bioinformatics Core) for performing
PacBio library preparation and sequencing, and Kevin Silverstein,
John Vestrum, and the staff at the UMN MSI for their efforts in installing
and supporting PacBio SMRT Analysis. We also thank the patients and
caregivers at the UMN Adult CF Treatment Center for their support of
this research.
REFERENCES
1. Lambiase A, Catania MR, del Pezzo M, Rossano F, Terlizzi V, Sepe A,
Raia V. 2011. Achromobacter xylosoxidans respiratory tract infection in
cystic fibrosis patients. Eur J Clin Microbiol Infect Dis 30:973–980. http://
dx.doi.org/10.1007/s10096-011-1182-5.
2. Hansen CR, Pressler T, Høiby N, Gormsen M. 2006. Chronic infection
with Achromobacter xylosoxidans in cystic fibrosis patients; a retrospective
case control study. J Cyst Fibros 5:245–251. http://dx.doi.org/10.1016/
j.jcf.2006.04.002.
crossmark
July/August 2015 Volume 3 Issue 4 e00947-15 Genome Announcements genomea.asm.org 1
Downloaded from http://genomea.asm.org/ on August 21, 2015 by guest
3. Saiman L, Chen Y, Tabibi S, San Gabriel P, Zhou J, Liu Z, Lai L, Whittier
S. 2001. Identification and antimicrobial susceptibility of Alcaligenes xylosoxidans
isolated from patients with cystic fibrosis. J Clin Microbiol 39:
3942–3945. http://dx.doi.org/10.1128/JCM.39.11.3942-3945.2001.
4. Jakobsen TH, Hansen MA, Jensen PØ, Hansen L, Riber L, Cockburn A,
Kolpen M, Hansen CR, Ridderberg W, Eickhardt S, Hansen M, Kerpedjiev
P, Alhede M, Qvortrup K, Burmølle M, Moser C, Kühl M, Ciofu O,
Givskov M, Sørensen SJ, Hoiby N, Bjarnsholt T. 2013. Complete genome
sequence of the cystic fibrosis pathogen Achromobacter xylosoxidans
NH44784-1996 complies with important pathogenic phenotypes. PLoS
One 8:e68484. http://dx.doi.org/10.1371/journal.pone.0068484.
5. Van Daele S, Verhelst R, Claeys G, Verschraegen G, Franckx H, Van
Simaey L, de Ganck C, De Baets F, Vaneechoutte M. 2005. Shared
genotypes of Achromobacter xylosoxidans strains isolated from patients at a
cystic fibrosis rehabilitation center. J Clin Microbiol 43:2998–3002. http://
dx.doi.org/10.1128/JCM.43.6.2998-3002.2005.
6. Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C,
Clum A, Copeland A, Huddleston J, Eichler EE, Turner SW, Korlach J.
2013. Nonhybrid, finished microbial genome assemblies from long-read
SMRT sequencing data. Nat http://genomea.asm.org/content/3/4/e00947-15.full.pdf+html
Published 18 August 2015
Single-Molecule Real-Time Sequencing Combined with Optical Mapping Yields Completely Finished Fungal Genome---
ABSTRACT
Next-generation sequencing (NGS) technologies have increased the scalability, speed, and resolution of genomic sequencing and, thus, have revolutionized genomic studies. However, eukaryotic genome sequencing initiatives typically yield considerably fragmented genome assemblies. Here, we assessed various state-of-the-art sequencing and assembly strategies in order to produce a contiguous and complete eukaryotic genome assembly, focusing on the filamentous fungus Verticillium dahliae. Compared with Illumina-based assemblies of the V. dahliae genome, hybrid assemblies that also include PacBio-generated long reads establish superior contiguity. Intriguingly, provided that sufficient sequence depth is reached, assemblies solely based on PacBio reads outperform hybrid assemblies and even result in fully assembled chromosomes. Furthermore, the addition of optical map data allowed us to produce a gapless and complete V. dahliae genome assembly of the expected eight chromosomes from telomere to telomere. Consequently, we can now study genomic regions that were previously not assembled or poorly assembled, including regions that are populated by repetitive sequences, such as transposons, allowing us to fully appreciate an organism’s biological complexity. Our data show that a combination of PacBio-generated long reads and optical mapping can be used to generate complete and gapless assemblies of fungal genomes.
IMPORTANCE Studying whole-genome sequences has become an important aspect of biological research. The advent of next-generation sequencing (NGS) technologies has nowadays brought genomic science within reach of most research laboratories, including those that study nonmodel organisms. However, most genome sequencing initiatives typically yield (highly) fragmented genome assemblies. Nevertheless, considerable relevant information related to genome structure and evolution is likely hidden in those nonassembled regions. Here, we investigated a diverse set of strategies to obtain gapless genome assemblies, using the genome of a typical ascomycete fungus as the template. Eventually, we were able to show that a combination of PacBio-generated long reads and optical mapping yields a gapless telomere-to-telomere genome assembly, allowing in-depth genome analyses to facilitate functional studies into an organism’s biology.
Footnotes
Citation Faino L, Seidl MF, Datema E, van den Berg GCM, Janssen A, Wittenberg AHJ, Thomma BPHJ. 2015. Single-molecule real-time sequencing combined with optical mapping yields completely finished fungal genome. mBio 6(4):e00936-15. doi:10.1128/mBio.00936-15.
This is an open-access article distributed under the terms of the Creative Commons Attribution-Noncommercial-ShareAlike 3.0 Unported license, which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original author and source are credited. http://mbio.asm.org/content/6/4/e00936-15
Thursday, August 20, 2015--The Gapless Assembly: Scientists Describe Workflow for Producing Complete Eukaryote Genome ------- In a new mBio publication, scientists from Wageningen University and KeyGene in The Netherlands report results from several strategies used to assemble the genome of a filamentous fungus, and describe the specific pipeline they recommend for sequencing and assembling eukaryotic genomes.
“Single-Molecule Real-Time Sequencing Combined with Optical Mapping Yields Completely Finished Fungal Genome” comes from lead authors Luigi Faino and Michael Seidl, senior author Bart Thomma, and collaborators. Using Verticillium dahliae as a model, which is a plant pathogen responsible for the damaging verticillium wilt disease in many crop species, they compared short-read and long-read sequencing approaches and incorporated optical mapping data to develop the method that generated the highest-quality assembly for the 36 Mb genome. This particular fungus was an ideal fit for the project, the authors note, due to its extensive genomic rearrangements and enrichment for repetitive elements.
Starting with an exploration of hybrid strategies for assembly, they used a previously generated short-read assembly and employed optical mapping data, resulting in more than 4,500 contigs. This was followed by filling gaps in the assembly with SMRT® Sequencing data, which brought the total contig count down to about 500.
The researchers also tested single-step and two-step hybrid assemblies using both long and short reads, adding optical mapping data and using assemblers such as SPAdes. All of these approaches left a number of gaps in the assembly.
Next, they moved on to assemblies produced solely from PacBio® data, testing various levels of genome coverage and both the MHAP and HGAP assemblers. “All assemblies based on six or more SMRT Cells generated comparable assembly outputs, with a total assembly size of ~36.5 Mb composed of up to 49 contigs, an N50 that exceeded 2.9 Mb, and a largest contig exceeding 5.5 Mb in all cases,” the team reports. “All assemblies based on PacBio sequencing outperformed the hybrid assemblies as long as the sequencing depth exceeded 72x.” The authors noted that HGAP delivered a more accurate genome assembly due to the extra genome polishing step in the assembly protocol, whereas MHAP delivered a more contiguous genome assembly in instances of lower genome coverage.
Lastly, optical map data was used to improve upon the PacBio-only assembly. “We were able to show that a combination of PacBio-generated long reads and optical mapping yields a gapless telomere-to-telomere genome assembly,” the scientists write, “allowing in-depth genome analyses to facilitate functional studies into an organism’s biology.”
The team next sequenced another V. dahliae strain using the PacBio-and-optical-map strategy and yielded another gapless assembly complete with eight telomere-to-telomere chromosomes, which they used to correct the orientation of several scaffolds in a previously generated Sanger assembly of that strain.
Armed with these assemblies, the scientists delved into a study of transposable and repetitive elements in the fungus, finding that long-terminal-repeat retrotransposons were the most common transposable element in both genomes. “Strikingly, in total, the repetitive elements in the V. dahliae genomes amount to 12%, which is 3 times higher than all previous estimates for these genomes,” they report. Identifying and understanding these elements is especially important for this plant pathogen, the authors add, because transposable elements and repeat-driven expansion have been critical factors in its virulence.
The team concludes that these findings show the utility of SMRT Sequencing and optical mapping for producing cost-effective, complete genome assemblies for complex eukaryotic organisms. http://blog.pacificbiosciences.com/2015/08/the-gapless-assembly-scientists.html
Thursday, August 20, 2015--The Gapless Assembly: Scientists Describe Workflow for Producing Complete Eukaryote Genome ------- In a new mBio publication, scientists from Wageningen University and KeyGene in The Netherlands report results from several strategies used to assemble the genome of a filamentous fungus, and describe the specific pipeline they recommend for sequencing and assembling eukaryotic genomes.
“Single-Molecule Real-Time Sequencing Combined with Optical Mapping Yields Completely Finished Fungal Genome” comes from lead authors Luigi Faino and Michael Seidl, senior author Bart Thomma, and collaborators. Using Verticillium dahliae as a model, which is a plant pathogen responsible for the damaging verticillium wilt disease in many crop species, they compared short-read and long-read sequencing approaches and incorporated optical mapping data to develop the method that generated the highest-quality assembly for the 36 Mb genome. This particular fungus was an ideal fit for the project, the authors note, due to its extensive genomic rearrangements and enrichment for repetitive elements.
Starting with an exploration of hybrid strategies for assembly, they used a previously generated short-read assembly and employed optical mapping data, resulting in more than 4,500 contigs. This was followed by filling gaps in the assembly with SMRT® Sequencing data, which brought the total contig count down to about 500.
The researchers also tested single-step and two-step hybrid assemblies using both long and short reads, adding optical mapping data and using assemblers such as SPAdes. All of these approaches left a number of gaps in the assembly.
Next, they moved on to assemblies produced solely from PacBio® data, testing various levels of genome coverage and both the MHAP and HGAP assemblers. “All assemblies based on six or more SMRT Cells generated comparable assembly outputs, with a total assembly size of ~36.5 Mb composed of up to 49 contigs, an N50 that exceeded 2.9 Mb, and a largest contig exceeding 5.5 Mb in all cases,” the team reports. “All assemblies based on PacBio sequencing outperformed the hybrid assemblies as long as the sequencing depth exceeded 72x.” The authors noted that HGAP delivered a more accurate genome assembly due to the extra genome polishing step in the assembly protocol, whereas MHAP delivered a more contiguous genome assembly in instances of lower genome coverage.
Lastly, optical map data was used to improve upon the PacBio-only assembly. “We were able to show that a combination of PacBio-generated long reads and optical mapping yields a gapless telomere-to-telomere genome assembly,” the scientists write, “allowing in-depth genome analyses to facilitate functional studies into an organism’s biology.”
The team next sequenced another V. dahliae strain using the PacBio-and-optical-map strategy and yielded another gapless assembly complete with eight telomere-to-telomere chromosomes, which they used to correct the orientation of several scaffolds in a previously generated Sanger assembly of that strain.
Armed with these assemblies, the scientists delved into a study of transposable and repetitive elements in the fungus, finding that long-terminal-repeat retrotransposons were the most common transposable element in both genomes. “Strikingly, in total, the repetitive elements in the V. dahliae genomes amount to 12%, which is 3 times higher than all previous estimates for these genomes,” they report. Identifying and understanding these elements is especially important for this plant pathogen, the authors add, because transposable elements and repeat-driven expansion have been critical factors in its virulence.
The team concludes that these findings show the utility of SMRT Sequencing and optical mapping for producing cost-effective, complete genome assemblies for complex eukaryotic organisms. http://blog.pacificbiosciences.com/2015/08/the-gapless-assembly-scientists.html
Developments in high throughput sequencing – June 2015 edition
Posted on June 17, 2015 by lexnederbragt
This is the fourth edition of this visualisation, previous editions were in June 2014, October 2013 and December 2012.
As before, full run throughput in gigabases (billion bases) is plotted against single-end read length for the different sequencing platforms, both on a log scale. Yes, I know a certain new instrument seems to be missing, hang on, I’m coming back to that…
https://flxlexblog.wordpress.com/2015/06/17/developments-in-high-throughput-sequencing-june-2015-edition/#more-674
PacBio Plans to Expand; Provides Update on Position in Increasingly Competitive Market
Aug 06, 2015 | Monica Heger
NEW YORK (GenomeWeb) – Pacific Biosciences has increased its headcount by 10 percent over the last year and plans to move into a larger office space with increased manufacturing capacity, the company said this week.
During a conference call discussing its second quarter performance, CEO Mike Hunkapiller gave an overview of the single-molecule sequencing company's position, saying that the increasingly competitive human genome sequencing space will eventually be the firm's largest market.
In addition, Hunkapiller said the company is on track to deliver a clinical sequencing system to Roche by next year, per the agreement the firms struck in 2013 to develop a system based on PacBio's single-molecule sequencing technology for the clinical market.
During the quarter, PacBio reached another $10 million development milestone, and Hunkapiller said the firm would reach another milestone in the fourth quarter. It has now secured $20 million of the $40 million tied to development milestones.
Due to the projected milestone, the firm raised its full-year revenue forecast, predicting revenue would grow more than 40 percent compared to 2014.
Hunkapiller discussed progress the company has made in its three main markets: de novo microbial sequencing, plant and animal sequencing, and human sequencing — both de novo and targeted sequencing to focus on difficult regions.
Hunkapiller asserted that PacBio is now the "gold standard" in microbial sequencing. As an example, the UK's Public Health England and Wellcome Trust Sanger Center are in the process of sequencing 3,000 bacterial reference strains using PacBio technology, he said. So far, the group has completed the sequencing of 650 strains and will finish the project in the next year or two, Hunkapiller said.
The plant and animal sequencing market accounted for about half of the company's consumable revenue over the last year, Hunkapiller said. The RNA sequencing application, Iso-Seq, has proven to be especially popular among this market for researchers studying transcript diversity.
Human sequencing is the "newest, fastest growing, and eventually, likely, our largest" application, Hunkapiller said, and will include de novo sequencing, transcript isoform analysis, epigenetic analysis, and targeted gene sequencing. The system is especially suited to sequence "difficult" regions of the genome, such as repetitive regions, areas of high GC content, and homopolymers.
In addition, the firm expects to make a significant dent in the clinical sequencing market. The company is "developing a version of our technology that will go through regulatory approval … for [Roche] to sell into the clinical diagnostic arena," he said. "We and Roche are making a substantial investment in the clinical diagnostic sequencing arena, so one might expect that we both anticipate a substantial return from that."
Nonetheless, the human sequencing market is extremely competitive and will only become more so with the entry of companies like Oxford Nanopore and 10X Genomics, which could potentially also provide long reads but at a much lower price point and footprint.
While PacBio said last quarter that it would no longer report on the number of instruments sold or in its backlog, its instrument revenue was $4.3 million, down from $4.7 million in the previous year's second quarter and flat sequentially.
Investment firm Piper Jaffray estimated that PacBio installed eight systems, below its estimated 10, and took orders for eight systems, leaving 15 in its backlog.
However, customers with installed systems appear to be using them. Consumable usage was up nearly 50 percent at $4.5 million in the quarter from $3 million in Q2 2014. Annual per-system consumable revenue is over $130,000, Hunkapiller said.
"While PacBio is clearly making solid progress on their Roche collaboration, the lumpy system placements lead us to lower our 2016 revenue estimate" to $83.4 million from $88.3 million, William Quirk, a senior research analyst at Piper Jaffray, wrote in a note to investors.
https://www.genomeweb.com/business-news/pacbio-plans-expand-provides-update-position-increasingly-competitive-market
Posted on August 4, 2015 -- New SMRT-BS Method to Revolutionize Quantitative, Multiplexed Targeted Bisulfite Sequencing for Methylation Analysis-- Technical Readership Level:
Bisulfite sequencing offers researchers a profound look into the epigenome and the methylation status of genes. A significant driving force in the development of epigenetic research since 1992, the detection of CpG methylation and methylation abnormalities in DNA via bisulfite sequencing has become overwhelmingly popular – and interest continues to grow.
The unparalleled power of next-generation sequencing (NGS) platforms provides researchers with new insights into the nuances of gene expression and countless other critical cellular processes. Still, bioinformatics expertise, cost, and throughput limitations prevent NGS bisulfite sequencing research from achieving ultimate accessibility and utilization by a range of scientists.
Developing technologies, however, such as single molecule real-time (SMRT) DNA sequencing platforms and other progressive tools, including commercially available bisulfite conversion kits and the novel SMRT bisulfite sequencing (SMRT-BS) method, are paving the way for researchers to quantitatively sequence longer read lengths, at a lower cost, and at high-throughput capacities. In light of these progressions, the practicality and accuracy of targeted bisulfite sequencing using SMRT-BS could take epigenetic research in a once unimaginable direction.
Bisulfite Sequencing Methods
Next-generation bisulfite sequencing enables researchers to profile DNA methylation across an entire genome at an unmatched resolution. But next-generation bisulfite sequencing approaches such as reduced representation bisulfite sequencing (RRBS) and whole-genome bisulfite sequencing, although popularly used to detect CpG methylation in a single experiment, are limited.
These approaches are expensive per sample, restricted to low throughput, and require in-depth bioinformatics knowledge, therefore prompting many researchers to actually continue to utilize earlier targeted bisulfite sequencing techniques, which allows a more focused interrogation of a region of interest. Unfortunately, the limitations of targeted bisulfite seq are still constricting and offer little room for long reads and multiplexing. But what if these restrictions could be overcome?
A recent development in bisulfite conversion methods, known as single molecule real-time bisulfite sequencing (SMRT-BS) could do just that.
This newly developed protocol, along with commercially available kits and SMRT sequencing platforms such as the Pacific Bio system, could potentially elevate the abilities of both large and small-scale labs to conduct cost-effective, extraordinarily accurate, quantitative CpG methylation detection analysis on a scale that many have thought to be previously unattainable.
SMRT-BS
The single molecule real-time bisulfite sequencing (SMRT-BS) method was presented at ASHG in 2014 by Stuart A. Scott, PhD, from the Department of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai. In an attempt to overcome the short read length and multiplexing limitations of targeted bisulfite sequencing, a team of researchers developed an innovative technique that combines bisulfite conversion with third-generation single molecule real-time (SMRT) sequencing without the need for library preparation.
Schematic procedure of the Illustration of SMRT Bisulfite Sequencing
In the SMRT-BS procedure described in the study published in BMC Genomics, genomic DNA is first subjected to bisulfite conversion. Then, primers that are region-specific along with universal oligonucleotide tags are used to amplify the bisulfite-treated DNA. Universal anti-tag primers in addition to sample-specific multiplexing barcodes then re-amplify amplicon templates. After amplicon purification, pooling, and SMRT sequencing with the PacBio RS II system, CpG methylation can then be quantitated (Fig. 1).
After testing six commercially available bisulfite conversion kits in the study, the researchers found that only one, Epigentek’s Methylamp DNA Modification Kit, was most ideal, producing consistent amplification of the longer 1109 bp product at 65°C and amplifying products up to an impressive ~2.0 kb.
Typically, high throughput methods face limitations of amplicon size and short read length due to the harsh chemical fragmentation of DNA during bisulfite conversion, but the most robust amplification of the longer 1109 bp amplicon could be achieved using bisulfite-converted DNA from the Methylamp kit. Researchers also ensured that this was not due to poor bisulfite conversion efficiency.
This optimal, long-range amplification protocol allowed the team to attain significant amplification of DNA products up to ~2.0 kb, with the most consistently and stably amplified product at ~1.5 kb. Because PacBio’s system is capable of sequencing reads over 40,000 base pairs in comparison to Illumina sequencing technology which, at most, sequences amplicons 300 bp in length, it is highly suited to this developing bisulfite sequencing application alongside the Methylamp kit with its capacity to produce intact long fragment lengths.
SMRT-BS Validation and Reproducibility
Multiple analyses validate and strongly support the impressive accomplishments of this cost-effective targeted bisulfite sequencing approach, accomplishments which include the ability to sequence amplicons ~1.5 kb in length and, theoretically, ~91% of all CpG islands in the human genome.
The researchers compared methylation levels from peripheral blood DNA of TUBGCP3, MEST, and EHPA8 CpG islands with two unique and independent second-generation sequencing and microarray platforms and found that the CpG methylation levels were correlated impressively with the SMRT-BS technology, r = 0.906 ± 0.052 and r = 0.933 ± 0.031, respectively. For all methylation levels and amplicon sizes (625-1491 bp), reproducibility was very high. Between independent triplicate amplicons, the researchers found an average overall correlation of r = 0.972 ± 0.024.
They also subjected four CpG island amplicons and 30 hematological malignancy cell lines to the new method and concluded that SMRT-BS is also highly accurate and comparable to current methods. For instance, one of the cell lines subjected to SMRT-BS with a sequencing depth of 101X had also been investigated using RRBS methylation analysis with a sequencing depth of 31X through the ENCODE project. Among 129 CpG sites, 78 CpG sites had >20X coverage by RRBS and the Pearson correlation between the methylation levels of these two methods at these CpG sites was an exceptional 0.900.
SMRT-BS Considerations
Although this exciting methodology may open new doors for the expansion of epigenetic research publications for a wide range of researchers, there are some considerations that should be acknowledged.
Scientists should note that reduced sequencing depth was associated with increased variability in CpG methylation quantitation. This was especially applicable to CpG regions with intermediate methylation. Therefore, researchers should tactfully choose sequencing depth in order to achieve optimal results with SMRT-BS. Overall, reduced correlation was observed with amplicons greater than 1.0 kb compared to the correlation of amplicons less than 1.0 kb and because greater sequencing depth was linked to greater methylation reproducibility and shorter amplicon length, the researchers suggest overcoming the increased variability by increasing sequencing depth for regions with intermediate methylation levels. This adjustment is expected to achieve similar confidence intervals and margins of error for large amplicons.
Additionally, researchers should be selective with their bisulfite conversion kit of choice for SMRT-BS, as only the most optimal bisulfite PCR protocol was successfully utilized in this study and very few kits tested produced desirable bisulfite conversion rates or amplicon lengths. The use of a third-generation sequencing platform such as the PacBio RS II system allows for extraordinarily long reads and accurate sequencing of DNA, but the quality and appropriate length of the DNA fragments must be initially achieved for successful results.
Future Directions with SMRT-BS
With the exciting development of SMRT-BS and long-read multiplexed amplicon sequencing, researchers can interrogate the genome at an extraordinary depth and perform a closer examination of a specific region of interest at a lower cost. Combining a high quality bisulfite conversion kit with the exceptional sequencing power of the SMRT-based sequencing platform, such as the PacBio RS II system, will enable SMRT-BS to raise epigenetic research to new heights as a highly accurate, cost-effective, multiplexed bisulfite sequencing method for targeted CpG methylation analysis.
Reference:
Yang, Y., Sebra, R., Pullman, B.S., Qiao, W., Peter, I., Desnick, R.J., Geyer, C.R., DeCoteau, J.F., and Scott, S.A. (2015). Quantitative and multiplexed DNA methylation analysis using long-read single-molecule real-time bisulfite sequencing (SMRT-BS). BMC Genomics, 16:350.
Related Posts:
?Simple Tweak Could Make Post-Bisulfite Library Preparation Suitable for RRBS
?“Post-Bisulfite” and Higher Sensitivity DNA Library Preparation Technology for Epigenetics-Targeted Next Generation Sequencing Analysis
?Advancements in DNA Methylation Analysis Technologies
?In-Depth Analysis of 42 Whole-Genome Bisulfite Sequencing Data Carried Out by Broad Institute
Technical Readership Level:
http://www.whatisepigenetics.com/new-smrt-bs-method-to-revolutionize-quantitative-multiplexed-targeted-bisulfite-sequencing-for-methylation-analysis/
Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing-- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132628
Thu-23rd-July-2015 10:08 GMT-- A*STAR’s Genome Institute of Singapore recently expanded their collaboration with Pacific Biosciences, a California-based provider of the PacBio® RS II Sequencing System, based on novel Single Molecule, Real-Time (SMRT) technology.
SINGAPORE – A*STAR’s Genome Institute of Singapore (GIS) recently expanded their collaboration with Pacific Biosciences (PacBio), a California-based provider of the PacBio® RS II Sequencing System, based on novel Single Molecule, Real-Time (SMRT) technology. Together, GIS and PacBio will combine efforts to advance research in infectious diseases, genomics, sequence analysis, and translational healthcare in Singapore.
The collaboration builds on the complementary strengths of both the GIS and PacBio in the analysis and understanding of bacteria and viruses, including those that cause diseases such as diarrhoea, meningitis, urinary tract infection, dengue, and liver cancer, as well as those that we continuously live with, termed the microbiota. Understanding the dynamics of bacterial genomes is particularly relevant to address the growing challenge of antibiotic resistance in Singapore and the rest of the world.
Dr Swaine Chen, Dr Martin Hibberd, and Dr Niranjan Nagarajan from the GIS are spearheading the collaboration. Dr Chen said, “Working together with PacBio, we are able to fully sequence bacterial genomes and arrive at deeper insights into how bacteria cause disease”.
This latest collaboration extends and expands the previous one with PacBio. Earlier in the year, Dr Chen and Dr Hibberd completed an initial formal collaboration with PacBio - one that provided insights into how E. coli causes urinary tract infection, the results of which are anticipated for publication later this year.
Ram Laxman, President and General Manager of Pacific Biosciences, Asia Pacific, commented, “SMRT® Sequencing technology is proving to be the Gold Standard in bacterial and viral sequencing due to its ability to fully resolve repeat regions and “finish” genomes. Other NGS technologies are not able to finish even the smallest of bacterial genomes due to their sequence context bias and very short sequencing reads. It’s like trying to complete a complex puzzle with missing puzzle pieces”. He also added that researchers started using PacBio RS II for tracking mutations in viruses that cause pandemics like MERS. The quick turnaround time, coupled with the highest consensus sequencing accuracy is crucial to track the spread and mutation rates on these viruses.
Michael Hunkapiller, President and CEO of Pacific Biosciences, commented, “We see a huge potential in working with world-renowned institutions like GIS to further broaden the applications of SMRT Sequencing, especially those that have direct impact on human health.”
Prof Ng Huck Hui, Executive Director of GIS said, “This expanded collaboration with PacBio® enables us to further our research towards improving public healthcare in Singapore. By acquiring a deeper understanding of bacterial genomes, we can tackle common infectious diseases and antibiotic resistance in an efficient manner, leading to better patient outcomes.”
http://researchsea.com/html/article.php/aid/8950/cid/3/research/medicine/the_agency_for_science__technology_and_research__a_star_/gis_expands_collaboration_with_pacific_biosciences_to_advance_healthcare_research_in_singapore.html
Jobs
Postdoc position - Statistical bioinformatics in the context of splicing
Start date
Application deadline
01.09.2015 10.08.2015
Employment fraction Job reference
100% Postdoc position - Statistical bioinformatics in the context of splicing
City Type of contract
Zurich 18 months with extension
Employer
University of Zurich
Job description
An exciting opportunity is now available in the Robinson group at the University of Zurich for a postdoc-level scientist with a background in statistics, genomics or a relevant complementary profile. In particular, applicants with other numerical backgrounds, such as Computer Science, Physics or Mathematics are encouraged to apply. Demonstrated data analysis and programming skills are required.
The funded project involves the exploration, discovery, analysis and interpretation of splicing patterns in disease-relevant genes across large current-generation RNA-seq datasets as well as targeted long read sequencing of cDNA using third-generation technologies (Pacific Biosciences).
The Robinson lab and Institute of Molecular Life Sciences at UZH provides a rich and diverse scientific environment, with many connections to basic and clinical science as well as industry partnerships. We balance collaborative data analysis with the development of novel statistical methods and software packages for the analysis of genomic data.
Zurich is a world-class city to live in.
The position will be available as early as September 2015 and will be for 18 months in the first instance, with many opportunities for extension.
Job tasks
•Analysis of large-scale genomic data,
•Development of statistical methods and software tools.
Profile requirements
Demonstrated data analysis and programming experience.
Please send your application to
To apply, send a cover letter, CV and the names and email address of at least 2 references to Prof. Dr. Mark Robinson by August 10, 2015. In the cover letter, detail how your profile fits the research activities of the group and your long-term career goals.
http://www.isb-sib.ch/aboutsib/jobs/sib/details/960-postdoc-position-statistical-bioinformatics-in-the-context-of-splicing.html
Tuesday, July 14, 2015--SMRT Sequencing Contributes to Detection of DNA Methylation in C. elegans
A recent paper in the journal Cell presents novel findings of DNA methylation in C. elegans, an organism previously believed not to have such epigenetic marks. Scientists used several approaches to analyze the adenine N6-methylation (6mA) found in C. elegans, including SMRT® Sequencing to directly observe base modifications across the genome.
From lead authors Eric Greer and Mario Blanco with senior author Yang Shi at Harvard Medical School, “DNA Methylation on N6-Adenine in C. elegans” describes a range of technological methods deployed to assess methylation across the worm’s genome. The team queried the nematode with specific antibodies for 6mA; immunofluorescence; ultra-high-performance liquid chromatography combined with triple-quadrupole tandem mass spectrometry; SMRT Sequencing; and MeDIPseq, an antibody-based immunoprecipitation paired with DNA sequencing.
PacBio® sequencing was chosen for its ability to directly interrogate base modifications without using antibodies. “In this analysis, SMRT sequencing detected 6mA on 225,586 adenines—0.7% of the total adenines in the worm genome—which is equivalent to 0.3% bulk adenine methylation, as some adenines were methylated 10% of the time, whereas others were methylated 90% of the time,” the authors report. That value supported what was seen in the mass spec analysis. “Similar to the MeDIP-seq results, the SMRT sequencing analysis identified a broad distribution of 6mA across all chromosomes of the worm genome, with no one genomic feature being significantly enriched or depleted for 6mA,” they add.
Beyond finding this evidence of methylation in C. elegans, the team also showed that these marks are heritable, with a modification increasing over several generations. In this project, the scientists found a DNA demethylase as well as a potential methyltransferase. “Together, these data identify a DNA modification in C. elegans and raise the exciting possibility that 6mA may be a carrier of heritable epigenetic information in eukaryotes,” the scientists report. They also found indications of crosstalk between histone methylation and 6mA.
According to the paper, scientists have mostly studied 6mA in prokaryotes. In various organisms, its function has been linked to identifying invasive DNA, influencing gene expression, affecting RNA splicing, and more. “At the present time, the molecular function of 6mA is still unclear,” the researchers write. “Our study identifies a new DNA modification in C. elegans, as well as regulators that control the dynamics of this modification, and advances 6mA as a potential carrier of non-genetic information across generations.”
This paper is one of three in the Cell issue focused on the presence of m6A in eukaryotes; the other two demonstrate the presence of this form of methylation in Chlamydomonas reinhardtii and Drosophila melanogaster. For a nice perspective on the overall trend of newly detected m6A, check out “An Adenine Code for DNA: A Second Life for N6-Methyladenine” from the same issue. http://blog.pacificbiosciences.com/2015/07/smrt-sequencing-contributes-to.html
Thursday, July 9, 2015--The Festival of Genomics Review: A Celebration of Long Reads
At the inaugural Festival of Genomics event in Boston, more than 1,500 people turned out to see what was billed as a conference unlike any other. The meeting was indeed unique, featuring a play (starring well-known scientists), a giant chess board, and a Genome Dome, in addition to the more familiar lineup of excellent speakers and workshops.
To help kick off the festival, genomic luminaries Craig Venter and James Lupski presented plenary talks on day 1 and set the stage for some exciting science to follow. Lupski’s talk was particularly impactful, as he described how his team at Baylor recently sequenced his own personal genome using 10-fold PacBio® long-read coverage to analyze copy number changes underlying his rare genomic disorder.
Naturally, our favorite part was the dedicated track on long-read sequencing, chaired by our very own CSO Jonas Korlach. This track turned out to be the most popular session of the festival, with standing-room-only attendance. The impressive speaker lineup included Chad Nusbaum of the Broad Institute, Mike Snyder from Stanford, Mark Gerstein of Yale University, Dick McCombie from Cold Spring Harbor Laboratory, Somasekar Seshagiri from Genentech, William LaRochelle from Roche, and Sergey Koren from NBACC. Each speaker detailed the unique value of long-read sequencing for a wide variety of applications, including human genome de novo assembly, structural variant sequencing, full-length transcript profiling, cancer genome assembly, pseudogene analysis, and more.
In a panel discussion with lots of audience participation, Korlach was joined by McCombie, Gerstein, and Snyder for what turned out to be a wide-ranging conversation about the utility of long-read sequencing, and the impact longer reads will have as they gain greater adoption.
The audience was particularly interested in learning about what they might be missing with short-read data. Gerstein said that some of the most interesting parts of genomes are repeats, and that long reads are helpful in elucidating those regions, while Snyder noted that trinucleotide repeats are not reliably found by short reads. The panel also pointed out pseudogenes as an element that will be better viewed with long reads. Gerstein noted that if a type of sequence can’t be seen with short reads, increasing genome coverage doesn’t matter. He encouraged attendees to integrate long-read data whenever possible, adding that even a modest amount of long-read coverage can be very valuable.
Snyder reported that the main source of whole genome sequencing errors is mismapping of short reads. “There’s no question that long reads will help,” he said. He also cited extreme GC regions and homopolymers as elements that can be challenging to represent accurately with short-read sequence data.
Genome variants were another popular topic. The panelists agreed that long reads are key to tracking structural variants in genomes, with Gerstein suggesting that structural variation could help explain some of the “missing heritability” in the human genome. He predicted that many SNPs will turn out to be markers of a structural variant in linkage disequilibrium, rather than being causal elements on their own. McCombie said that while we still don’t know how much medical relevance this variation will have, he finds the subject “intriguing” and clearly worth further study.
There was a good deal of discussion about whole genome sequencing for humans, with Snyder envisioning a future where the concept of reference genomes is outdated because everyone will be his or her own personal reference. Until that’s a reality, Gerstein pointed out the value of generating more reference genomes to better represent common structural variation across a number of distinct ethnic populations. McCombie noted that as we sequence more people, it’s of great importance to make sure we think carefully about consent forms to maximize the value of all that data for lots of different uses in the future.
The conversation also included RNA as Snyder highlighted the underuse of transcriptome data for clinical research. He argued that genomes plus transcriptomes (and eventually microbiomes and methylomes, as well) will be the best way to put together a comprehensive picture of human health and disease. He noted that in cancer studies, his team produces a transcriptome sequence along with a genome sequence for each project.
Below are a few takeaway points from other speakers in the long-read track:
Nusbaum presented a single-contig assembly of M. tuberculosis, which has a genome with 66 percent GC content. “It’s a perfect assembly,” he said.
Koren facetiously used a mathematical proof to support his theorem that “long reads solve everything” before presenting his MHAP algorithm and the human genome assemblies it produced. In his latest assemblies, the largest contigs were approaching the size of individual chromosome arms, which further proved his point.
Seshagiri from Genentech told attendees he uses SMRT® Sequencing for genome, transcriptome, and epigenome characterization, which will be especially important in cancer research.
McCombie said that long reads offer consistent results, while analyzing genomes “like we used to” misses tens of thousands of variants.
In addition to lining up the fantastic science at the festival, the organizers offered a unique opportunity to support Greenwood Genetic Center by participating in a ‘Race the Helix’ event onsite. Our PacBio runners joined with Sage Science to tackle the treadmill on the show floor, finishing second in distance covered and winning the coveted ‘Best Dressed’ award.
We look forward to the next Festival of Genomics, taking place Nov. 3 – 5 just up the road from us in San Mateo, CA. http://blog.pacificbiosciences.com/2015/07/the-festival-of-genomics-review.html
Michael Hunkapillar
100%
Approval Rating https://www.owler.com/iaApp/101491/pacific-biosciences-company-profile?utm_source=twitter&utm_medium=social&utm_campaign=mtcpc
Pacific Biosciences of California Upgraded to Buy at Zacks (PACB)
July 7th, 2015 -- Zacks upgraded shares of Pacific Biosciences of California (NASDAQ:PACB) from a hold rating to a buy rating in a research report released on Tuesday morning, Market Beat Ratings reports. Zacks currently has $6.00 price objective on the stock.
According to Zacks, “Pacific Biosciences reported dismal first-quarter 2015 results. The company reported net loss of $0.27 per share, in line with the Zacks Consensus Estimate but wider than the year-ago quarter loss. Nevertheless, we believe that Pacific Biosciences has significant growth prospects in plant & animal sequencing and human genome sequencing based on its SMRT technology. New products like the barcode sample prep kits continue to expand the company’s product portfolio. Additionally, development and distribution collaborations with the likes of Roche NimbleGen and RainDance Technologies will drive market penetration going forward. However, persistent losses and cash burning are the primary headwinds, in our view.” http://www.owler.com/iaApp/article/559c3e8de4b02f9936956e58.htm
Jul 06, 2015 | Julia Karow---
» Following Feasibility Study, UK Registry Plans to Implement PacBio for HLA Typing by Year's End
https://www.genomeweb.com/sequencing-technology/following-feasibility-study-uk-registry-plans-implement-pacbio-hla-typing
Diploid Human Genome Assembly using Pacbio Technology--June 29th, 2015 | Category: pacbio We are surprised that the researchers are using technology from a company that was supposed to go out of business due to competition from Oxford Nanopore!!!! A new Nature Method paper reports –
Assembly and diploid architecture of an individual human genome via single-molecule technologies
We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.
A condensed version of major claims in the paper is available from this post in Pacbio blog.
Heroes and Heroines of New Media--2014
I am strongly influenced by Charles Hugh Smith, who runs his insightful social blog of Two Minds. I hope he will not mind, if I copy his style of acknowledgement to the supporters of our blog.
Our blog is deeply honored by the generous contribution of the following readers. Without their patronage, this site would go away.
http://www.homolog.us/blogs/blog/2015/06/29/diploid-human-genome-assembly-using-pacbio-technology/
Webinar detail--Date: Tuesday, June 30, 2015
Sequencing of long fragments provides greater insight into genetic variants, enabling a better view of indels and larger structural variants. Partnering this strength of PacBio sequencing with Roche NimbleGen target enrichment enables research breakthroughs through accurate phasing and the identification of haplotypes.
During the webinar, the presenters will:
•Describe a targeted sequencing workflow that combines Roche NimbleGen’s SeqCap EZ enrichment technology with Pacific Biosciences’ SMRT® Sequencing to generate multi-kilobase reads of regions of interest with even coverage.
• Demonstrate that 6 kb fragments can captured, providing sequence information well beyond the targeted capture site and well into (and often across) the adjacent intronic regions.
•Show that multi-kilobase genomic regions can be phased and complex structural variants can be detected, enabling the generation of haplotypes.
http://www.nimblegen.com/news/events/webinar/index.html
GEN News Highlights
Jun 29, 2015
Human Genome Sequenced without Cloning Steps
(“Combining long read sequencing and BioNano genome mapping produces highly contiguous de novo assemblies, enabling unbiased comparison of nearly complete genomes—something we have been trying to do for years.”)!!!!!!!!!!
Researchers have sequenced the human diploid genome without the need for any DNA amplification techniques.[Darryl Leja, NHGRI]
Completion of the human genome sequence in 2003 was a milestone in the biological sciences that can be compared to few other endeavors. However, the project wasn’t without its pitfalls and limitations. In particular, the final assembled sequence, often referred to as the reference genome, is composed of a haploid sequence from its human donor. Since human genomes are diploid, receiving one set of chromosomes from maternal DNA and the other set of paternal DNA, there are many advantages to sequencing the genome in its entirety, simultaneously.
Now, a collaboration of scientists from BioNano Genomics, Pacific Biosciences and led by researchers from Icahn School of Medicine at Mt. Sinai has created a comprehensive analysis of a diploid human genome using two complementary single DNA molecule methods for sequencing and genome mapping. Furthermore, the sequencing was accomplished using long sequencing reads without the need for any DNA amplification techniques—an essential step for all other large-scale sequencing projects that can often introduce replication errors and artifacts into the nascent strand.
The investigators sequenced, mapped, and analyzed a diploid human genome with the goal to integrate single-molecule sequence data and genome mapping data. This approach generated a de novo assembled genome that was reference quality and improved upon the contiguity observed from traditional sequencing methods. Moreover, the combination of BioNano genome mapping and Pacific Biosciences sequencing resulted in an improvement in the contiguity of the initial sequence assembly nearly 30-fold and the initial genome map assembly nearly 8-fold.
“This is the first study demonstrating that our genome mapping technology and single molecule sequencing technology complement each other to generate a reference quality whole genome assembly with haplotype blocks several hundreds of kilobases long,” explained Han Cao, Ph.D., founder and CSO of BioNano Genomics. “This is also the first full de novo assembly of a human genome leveraging intact long native DNA (> 150 kb), without any clone libraries and the artifacts that cloning can introduce.”
The results from this study were released today in Nature Methods through an article entitled “Assembly and Diploid Architecture of an Individual Human Genome via Single Molecule Technologies.”
The researchers’ initial objective was to investigate information often overlooked with sequencing, such as long range repeats and rearrangements, which can be clinically important in complex diseases such as cancer or cardiovascular disease.
Interestingly, as the research team was comparing their newly generated genomic sequence with the current reference genome they found an underrepresentation of lipoprotein A (LPA) gene tandem repeats in the reference sequence. The LPA gene is involved in regulating plasma lipid levels and has been shown to be associated with risk of cardiovascular disease. Quantifying these long 5.6 kb repeats over the span of hundreds of kilobases enables the researchers to assess health risk.
“Many large and complex forms of variation are missed by traditional next generation sequencing approaches,” said Ali Bashir, Ph.D. assistant professor of genetics and genomics at Mt. Sinai and senior author of the study. “Combining long read sequencing and BioNano genome mapping produces highly contiguous de novo assemblies, enabling unbiased comparison of nearly complete genomes—something we have been trying to do for years.”
http://www.genengnews.com/gen-news-highlights/human-genome-sequenced-without-cloning-steps/81251446/
Monday, June 29, 2015--Nature Methods Paper Uses Long-Read Data for Highly Contiguous Diploid Human Genome new publication in Nature Methods describes a new single-molecule assembly approach that resulted in “the most contiguous clone-free human genome assembly to date,” according to lead authors Matthew Pendleton, Robert Sebra, Andy Pang, and Ajay Ummat.
The paper, “Assembly and Diploid Architecture of an Individual Human Genome via Single Molecule Technologies,” comes from a large team of collaborators at the Icahn School of Medicine at Mount Sinai, Cornell, Cold Spring Harbor Laboratory, and other institutions.
Their new approach leverages the best aspects of each single-molecule data type by combining long-read sequencing for de novo assembly with single-molecule genome maps for scaffolding. The resulting hybrid assembly represents a mix of SMRT® Sequencing data and single-molecule genome maps from BioNano Genomics’ NanoChannel Arrays.
The paper describes sequencing the well-studied NA12878 genome using SMRT Sequencing and generating single-molecule genome maps with nicking enzymes. “Individually, the assemblies and genome maps markedly improve contiguity and completeness compared with de novo assemblies from clone-free, short-read shotgun sequencing data,” the authors write. “Moreover, by combining the two platforms, we achieve scaffold N50 values greater than 28 Mb, improving the contiguity of the initial sequence assembly nearly 30-fold and of the initial genome map nearly 8-fold.”
The scientists then compared their assembly to the human reference genome to identify a comprehensive set of genetic variants, including a wide variety of larger structural variants that are often overlooked by short-read SBS approaches. The scientists note that while short-read technologies are frequently used to survey genomes to identify single nucleotide variants, they cannot resolve most large-scale genetic variation, including a wide variety of structural variants and repetitive regions that confound short-read assemblies.
“Though the cost of sequencing has markedly decreased, de novo human genome analysis has, to some extent, regressed,” the authors report. “Although HuRef and the original Celera whole-genome shotgun assembly have scaffold N50 values … of 19.5 Mb and 29 Mb respectively, the best next-generation sequencing (NGS) assemblies have scaffold N50 values of 11.5 Mb, even with the use of high-coverage fosmid jumping libraries.” The biggest challenges in these short-read assemblies, they add, are repetitive structures, transposable elements, segmental duplications, and heterochromatin.
Advantages of this extraordinary contiguity in their single-molecule assembly, to which short-read NGS data was later added, include detecting large structural variants and successfully phasing both single nucleotide and structural variants. Comparisons of the assembly to reference genomes allowed the team to resolve and phase structural variants such as tandem repeats across the genome. They successfully separated maternal and paternal alleles, revealing complex events that had been missed in previous assemblies.
For structural elements, the authors report that “a major benefit of continuous long reads is the ability to directly observe structural variants,” an approach they say is more effective than relying on breakpoint analysis or local realignment.
The combination of SMRT Sequencing data, genome maps, and NGS data “allowed us to resolve long-standing assembly discrepancies,” the scientists write. http://blog.pacificbiosciences.com/2015/06/nature-methods-paper-uses-long-read.html
The Multi-Platform Approach to Clinical Sequencing with Bobby Sebra, Icahn School Submitted by Ayanna Monteverdi on Mon, 06/15/2015 -
Bobby Sebra, Director of Technology Development at the Icahn School of Medicine, Mount Sinai Hospital
Before Bobby Sebra became the Director of Technology Development at the Icahn School of Medicine at Mt Sinai in New York he worked at Pacific Biosciences, helping to develop their single molecule, long read (SMRT) sequencing technology.
In today's interview Bobby says he left PacBio to be free to use all of the available sequencing platforms to develop clinical solutions. At the Icahn School, he has been scaling up the facilities to include Illumina, Ion Torrent, PacBio and BioNano Genomics sequencers, as wells as researching some of the newer platforms such as 10X Genomics, and Oxford Nanopore Technologies. Bobby’s work includes matching these various platforms with the right project, often going back and forth between short read and long read technologies to get an adequate result.
Building on his familiarity with the PacBio system, one of Bobby's primary projects at Icahn is to take PacBio’s new long read technology and develop new clinical applications, such as looking at more polymorphic domains in the human genome at high throughput.
What are his big challenges? Bobby says that a single cell approach is the next important step for clinical sequencing, and he looks forward to a platform which integrates single cell analysis into one workflow. He is also pushing sequencing tool providers to be able to work with lower input, or smaller initial samples.
What clinical projects has Bobby excited, and what is his reaction to recent skepticism about the clinical potential for the study of genomics? Join us for a wide ranging discussion on the latest in clinical sequencing.
- See more at: http://mendelspod.com/podcasts/multi-platform-approach-clinical-sequencing-bobby-sebra-icahn-school/#sthash.gL8uoRdi.dpuf
Mount Sinai Scientists Develop New Technique for Analyzing the Epigenetics of Bacteria, a Potential New Tool to Combat Pathogens and Overcome Antibiotic Resistance
New York – June 15, 2015 /Press Release/ ––
Scientists from the Icahn School of Medicine at Mount Sinai have developed a new technique to more precisely analyze bacterial populations, to reveal epigenetic mechanisms that can drive virulence. The new methods hold the promise of a potent new tool to offset the growing challenge of antibiotic resistance by bacterial pathogens. The research was published today in the journal Nature Communications, and conducted in collaboration with New York University Langone Medical Center and Brigham and Women’s Hospital of Harvard Medical School.
The information content of the genetic code in DNA is not limited to the primary nucleotide sequence of A’s, G’s, C’s and T’s. Individual DNA bases can be chemically modified, with significant functional consequences. In the bacterial kingdom, the most prevalent base modifications are in the form of DNA methylations, specifically to adenine and cytosine residuals. Beyond their participation in host defense, increasing evidence suggests that these modifications also play important roles in the regulation of gene expression, virulence and antibiotic resistance.
The research team employed the PacBio® RS II system from Pacific Biosciences, which can collect data on base modifications simultaneously as it collects DNA sequence data. PacBio’s single molecule, real-time sequencing enables the detection of N6-methyladenine and 4-methylcytosine, two major types of DNA modifications comprising the bacterial methylome. However, existing methods for studying bacterial methylomes rely on a population-level consensus that lack the single-cell resolution required to observe epigenetic heterogeneity.
“We created a technique for the detection and phasing of DNA methylation at the single molecule level. We found that a typical clonal bacterial population that would otherwise be considered homogeneous using conventional techniques has epigenetically distinct subpopulations with different gene expression patterns" said Gang Fang, PhD, Assistant Professor of Genetics and Genomics at the Icahn School of Medicine at Mount Sinai and senior author of the study. “Given that phenotypic heterogeneity within a bacterial population can increase its advantage of survival under stress conditions such as antibiotic treatment, this new technique is quite promising for future treatment of bacterial pathogens, as it enables de novo detection and characterization of epigenetic heterogeneity in a bacterial population.”
The researchers studied seven bacterial strains, demonstrating the new technique reveals distinct types of epigenetic heterogeneity. For Helicobacter pylori, a pathogenic bacterium that colonizes over 40% of the world population and is associated with gastric cancer, the team discovered that epigenetic heterogeneity can quickly emerge as a single cell divides, and different subpopulations with distinct methylation patterns have distinct gene expressions patterns. This may have contributed to the increasing rate of antibiotic resistance of Helicobacter pylori.
“The application of this new technique will enable a more comprehensive characterization of the functions of DNA methylation and their impact on bacterial physiology. Resolving nucleotide modifications at the single molecule, single nucleotide level, especially when integrated with other single molecule- or single cell-level data, such as RNA and protein expression, will help resolve regulatory relationships that govern higher order phenotypes such as drug resistance” said Eric Schadt, PhD, Founding Director of the Icahn Institute and Professor of Genomics at the Icahn School of Medicine at Mount Sinai. “The approach we developed can also be used to analyze DNA viruses and human mitochondrial DNA, both of which present significant epigenetic heterogeneity.”
Paper cited:
John Beaulaurier, Xue-Song Zhang, Shijia Zhu, Robert Sebra, Chaggai Rosenbluh, Gintaras Deikus, Nan Shen, Diana Munera, Matthew K. Waldor, Andrew Chess, Martin J. Blaser, Eric E. Schadt, and Gang Fang. "Single molecule-level detection and long read-based phasing of epigenetic variations in bacterial methylomes." Nature Communications. DOI: 10.1038/ncomms8438
About the Mount Sinai Health System
The Mount Sinai Health System is an integrated health system committed to providing distinguished care, conducting transformative research, and advancing biomedical education. Structured around seven hospital campuses and a single medical school, the Health System has an extensive ambulatory network and a range of inpatient and outpatient services—from community-based facilities to tertiary and quaternary care.
The System includes approximately 6,600 primary and specialty care physicians, 12-minority-owned free-standing ambulatory surgery centers, over 45 ambulatory practices throughout the five boroughs of New York City, Westchester, and Long Island, as well as 31 affiliated community health centers. Physicians are affiliated with the Icahn School of Medicine at Mount Sinai, which is ranked among the top 20 medical schools both in National Institutes of Health funding and by U.S. News & World Report.
For more information, visit http://www.mountsinai.org, or find Mount Sinai on Facebook, Twitter, YouTube and Instagram.
http://www.mountsinai.org/about-us/newsroom/press-releases/mount-sinai-scientists-develop-new-technique-for-analyzing-the-epigenetics-of-bacteria?utm_source=mssocial&utm_medium=social&utm_campaign=hs
Monday, June 15, 2015---Scientists Publish New Methylation Analysis Protocols Using SMRT Sequencing
Scientists from the Icahn School of Medicine at Mount Sinai and the University of Saskatchewan teamed up to develop an innovative approach to methylation analysis using Single Molecule, Real-Time (SMRT®) Sequencing. The resulting method was just published in BMC Genomics.
Lead author Yao Yang and colleagues note in the paper [“Quantitative and multiplexed DNA methylation analysis using long-read single-molecule real-time bisulfite sequencing (SMRT-BS)”] that existing methods for methylation analysis are limited by cost and throughput in the case of Sanger sequencing, or short read lengths with NGS technologies. Their goal was to develop a method combining long reads, high accuracy, and high throughput.
“Coupled with an optimized long-range bisulfite amplification protocol and empowered by the long read lengths of SMRT sequencing (up to ~20 kb), multiplexed SMRT bisulfite sequencing (SMRT-BS) can accurately measure CpG methylation across ~1.5 kb regions without the need for PCR amplicon subcloning,” Yang et al. write. “As a cost-effective alternative to other targeted bisulfite sequencing techniques, SMRT-BS is an efficient and highly quantitative method for DNA methylation analysis.”
The technique incorporates bisulfite conversion of DNA, followed by amplification based on targeted primers. Amplicon templates are re-amplified and barcoded for multiplexing, and then purified and sequenced prior to CpG methylation analysis. The scientists found that results from the procedure were “reproducible and highly concordant” with other methylation analysis methods, particularly as sequencing depth increased. SMRT-BS data was validated using orthogonal technologies including microarrays and short-read sequencers.
“A key component to the development of SMRT-BS was the optimization of bisulfite conversion and PCR, which resulted in amplicons up to ~1.5-2.0 kb from bisulfite-converted DNA,” the researchers write, noting that amplicons of this length “theoretically can cover ~91% of CpG islands in the human genome.”
Using long-read sequencing technology “allows for more thorough regional CpG methylation assessment and increases the capacity for studying the relationship between phased single nucleotide variants and allele-specific CpG methylation,” they report.
Yang et al. predict that this approach could be used for diagnostic methylation analysis and for confirmation of epigenome-wide association studies, in addition to the usual research applications in transcriptional regulation, human imprinting disorders, and other methylation-specific studies.
In another recent publication, entitled “CGGBP1 mitigates cytosine methylation at repetitive DNA sequences,” scientists at the Science for Life Laboratory at Uppsala University used bisulfite conversion paired with PacBio® sequencing to examine the effect of depleting the transcription factor CGGBP1 on the level of methylation in Alu and LINE repeats.
CGGBP1 is known to bind CGG-rich regions of the genome and repress transcription of Alus and LINEs, but it is not known whether this binding in turn affects methylation status.
Lead author Prasoon Agarwaal and colleagues used genome-wide amplification of Alu and LINE-1 repeats using consensus primers and PacBio sequencing to examine the extent to which an observed genome-wide increase in CpG methylation after CGGBP1 knock down was focused in these regions. Interestingly, although there was an increase in Alu methylations overall, “an inspection of the distribution of methylation frequencies indicated two different directions of methlylation change,” the scientists report. Some Alus had 12% greater methylation, while other had 8% less methylation. Methylation was also increased in LINE-1 elements. The authors note that “the possibility of bi-directional change in Alu CpG suggests that different Alu elements may be subjected to different mechanisms of CpG methylations regulation by CGGBP1,” and cite the need for follow-up studies to identify the differences between the two populations of Alu elements.
More generally, they note that while this experiment reflects an overall characterization of methylation changes, “these data give a sound platform to build upon to uncover the sequence contexts in which CGGBP1 exerts methylation regulation at specific sites.” http://blog.pacificbiosciences.com/2015/06/scientists-publish-new-methylation.html
Thursday, June 11, 2015-Updated! Data Release: Human MCF-7 Transcriptome
UPDATE:
Our R&D team has added a new dataset for the MCF-7 human breast cancer transcriptome, originally released in 2013. The new results were produced using 28 SMRT® Cells with 4-hour movies and P6-C4 chemistry. Sizing was performed with the SageELF™ platform (fractions collected: 1-2 kb, 2-3 kb, 3-5 kb, and 5-10 kb). Sequencing of the larger fractions with our newer sequencing chemistry that generates longer reads added longer transcripts (up to 10 kb) to the MCF-7 dataset, which previously had only transcripts up to 4 kb.
New FASTA and GFF files are available, representing the new combined dataset. Raw data for both the 2013 and 2015 sequencing is also available.
ORIGINAL POST (December 11, 2013):
Understanding the biology of a genome requires knowing the full complement of mRNA isoforms. In recent years, microarrays, high-throughput cDNA sequencing, and RNA-seq have become very useful tools for studying transcriptomes. High-throughput cDNA sequencing is accurate but laborious, while the inherently complex nature of the transcriptome makes transcript assembly computationally intractable. Recently, Steijger et al. (1) showed that complete isoform reconstruction from RNA-seq short-read data remains challenging even when all constituent exons are identified.
A number of recent publications have demonstrated the utility of full-length transcript sequencing by taking advantage of the long read lengths of SMRT® Sequencing technology (2)–(4). SMRT Sequencing produces reads that originate from independent observations of single molecules; no assembly is needed if a read spans the entire length of the transcript. To demonstrate the capabilities of PacBio® Isoform Sequencing (Iso-Seq) technology and show a glimpse of the complexity of eukaryotic transcriptomes, we generated a deep dataset of full-length cDNA sequencing of RNA from MCF-7, a human breast cancer cell line. The sequencing data was collected from several internal training sessions where different library preparation techniques were tested. We are releasing the underlying data in an effort to aid the design of future PacBio Iso-Seq experiments and to spur advances in the development of bioinformatics tools for analyzing full-length transcripts.
In our final dataset, we obtained 44,531 non-redundant transcript-length consensus sequences ranging from 400 bp – 4,900 bp, with an average length of 1,929 bp (Fig. 1a). The total percentage of consensus bases that disagreed with the hg19 genome is 0.27%, out of which 0.16% are due to substitutions and thus could likely be true SNPs (Fig. 1b). About half of the transcribed loci have one observed isoform, while the rest have mostly 2-5 isoforms (Fig. 2). We compared our predicted full-length transcripts against the known annotations and found that we were able to recover full-length alternative splice forms (Fig. 3), alternative polyadenylation, novel transcripts, and known fusion genes (Fig. 4). We encourage interested researchers to explore the dataset.
Materials & Methods
Full-length cDNA was generated from polyA RNA using standard cDNA synthesis kits (Clontech® SMARTer™ and Invitrogen® Superscript® kits). To capture longer, rarer transcripts in sufficient abundance, parts of the double-stranded cDNA were size selected into three fractions, which were subsequently amplified and converted into SMRTbell™ templates. Details on the sample preparation can be found on Sample Net. SMRTbell libraries were sequenced using the P4-C2 sequencing chemistry with 2-hour movies.
After sequencing, we computationally determined the completeness of the sequences using polyA-tail signals and library adapters. To obtain a non-redundant set of full-length, high-quality transcript sequences without bias from other sequencing platforms, we developed a de novo, isoform-level clustering algorithm that uses only PacBio data. Briefly, the algorithm iteratively clusters reads to generate consensus sequences that represent the original transcripts. The algorithm takes into account the existence of the polyA-tail signal to differentiate isoforms with alternative stop sites. The final consensus sequences were called using Quiver and filtered to create the final polished, full-length, non-redundant dataset. Details of the clustering algorithm will be described in two upcoming webinars on Wednesday, January 22 at 8 AM PST and 5 PM PST.
Some statistics from the sequencing and results are listed below:
Number of SMRT Cells: 119
no-size selection: 12
1-2 kb: 37
2-3 kb: 37
> 3 kb: 33
Total number of post-filtered bases: 14,062,161,755
Figure 4. Known cancer fusion gene BCAS4/BCAS3 identified. PacBio transcripts (top, red) show three different fusion variants of the BCAS4/BCAS3 genes. All three variants contain a portion of the 5’ region of the BCAS4 gene (chr20q13) and a portion of the 3’ region of the BCAS3 gene (chr17q23).
We welcome researchers to download and use the dataset for their research. For citation of the dataset, please use:
MCF-7 transcriptome sequence data was generated by Pacific Biosciences, Menlo Park, California, and additional information about the sequencing and assembly is provided at http://blog.pacificbiosciences.com/2013/12/data-release-human-mcf-7-transcriptome.html and http://datasets.pacb.com.s3.amazonaws.com/2013/IsoSeqHumanMCF7Transcriptome/list.html. The data used in the present study was retrieved from PacBio’s online database at http://datasets.pacb.com.s3.amazonaws.com/2013/IsoSeqHumanMCF7Transcriptome/list.html/ (date of retrieval).
References
T. Steijger, J. F. Abril, P. G. Engström, et. al., “Assessment of transcript reconstruction methods for RNA-seq,” Nat. Methods, vol. 10, no. 12, pp. 1177–1184, Nov. 2013.
D. Sharon, H. Tilgner, F. Grubert, and M. Snyder, “A single-molecule long-read survey of the human transcriptome,” Nat. Biotechnol., vol. 31, no. 11, pp. 1009–1014, Nov. 2013.
W. Zhang, P. Ciclitira, and J. Messing, “PacBio sequencing of gene families-a case study with wheat gluten genes,” Gene, 2013.
K. F. Au, V. Sebastiano, P. T. Afshar, J. D. Durruthy, L. Lee, B. A. Williams, H. van Bakel, E. E. Schadt, R. A. Reijo-Pera, J. G. Underwood, and W. H. Wong, “Characterization of the human ESC transcriptome by hybrid sequencing,” Proc. Natl. Acad. Sci. U. S. A., Nov. 2013.
http://blog.pacificbiosciences.com/