Pacific Biosciences of California Inc (PACB): Long reads of the year 2013 January 8,...

Pacific Biosciences of California Inc (PACB)

Reply Private New

Next 10 Prev Next

Send PM Follow Ignore

Followers	3
Posts	607
Boards Moderated	0
Alias Born	06/07/2010

Paulieme

Re: None

Wednesday, 01/08/2014 8:55:54 PM

Wednesday, January 08, 2014 8:55:54 PM

Long reads of the year 2013
January 8, 2014 by Next-Gen Sequencing Data
A bit late to the yearly review posts. But here it is. Long Reads of the year 2013. As you can see, this “Long Reads” are slightly different Here we summarize a few “long read” sequence data that got publicly available last year and point to where one can download the data. They are awesome resources and great to start playing with them in the new year.

One of the most exciting things in “next-gen sequencing” happened this year is the availability “long” sequence reads, be it genomic or transcriptomic. Two sequencing technologies, that already have “long reads” and got a lot of attraction this year are Illumina’s Moleculo and PacBio. And Oxford Nanopore data is just around the corner. With Oxford Nanopore’s early access program, it is expected that, we might see some data by February 2014 (AGBT 2014?).

The year 2013 started with Illumina acquiring Moleculo for its long-read technology. And another biggest change that happened is that PacBio got more social (possibly realizing the threat from Illumina) :). PacBio started blogging in mid 2012, but had just two blog posts in 2012. Then, 2013 came, PacBio got really prolific and till now it has over 55 posts. In addition, PacBio also started making its data publicly available using the blog.

Moleculo and PacBio sequence data from Drosophila
After acquiring Moleculo, Illumina launched Fast Track Long Read sequencing service using Moleculo long read technology. As part of the early Access launch, Illumina shared long reads data set from Dr. Dmitri Petrov’s group at Stanford, comprising two libraries of Drosophila melanogaster, each run on a single HiSeq lane and producing ~30Gb data. Visit Illumina’s Base Space to get the data.

Around the same time, Casey Bergman’s lab made PacBio long reads publicly available. The raw PacBio data is 1,357,183,439 bp with ~7.5x coverage of the 180 Mb male D. melanogaster genome. The 63G PacBio data can be downloaded from Bergman’s lab website. Not just this, Begman lab also had Illumina data from the same sample and combined it with the PacBio reads to offer error corrected sequence data.

Another possible Moleculo data is from the publication first publication using Moleculo technology. The Moleculo team worked on the project before naming the technology as Moleculo and the results came out in a paper on eLife. However, it looks like the data is not available freely. Are there other Moleculo data out in the wild?

PacBio RNA-seq data from Human MCF-7
PacBio long generated sequencing data of RNA from MCF-7, a human breast cancer cell line and made it available on its website. The data obtained from P4-C2 sequencing chemistry and contains 44,531 non-redundant transcript-length consensus sequences with read length ranging from 400 bp – 4,900 bp (an average length of 1,929 bp). Here is the PacBio blog post offering more details on the “long read” data.

Long-Read Shotgun Sequencing of a Human Genome
Pacbio released the data generated from P5-C3 scaffolding sequencing chemistry and contains over 3.6 M reads with average length of 8,849 bases. (Half of sequenced bases in reads greater than: 10,985 bp). The data is from an interesting human cell line derived from a complete hydatidiform mole (CHM).

A hydatidiform mole is defined as a pregnancy with no embryo and clinically presents in approximately 1 in 1,500 pregnant women in North America. The CHM cells have a diploid genome, typically XX, that is a result of replication of a haploid paternal (sperm) genome. Through the corresponding absence of allelic variation, this sample has been used to generate a haploid reference genome sequence, and many associated resources are available, including physical maps, genotypes (iSCAN), and a large-insert BAC library (CHORI-17). It is also one of the targets for the production of a higher quality “platinum” genome assembly.

Visit PacBio blog for accessing the data.

PacBio RNA-seq data
Mike Snyder’s group from Stanford did the first long-read survey of human transcriptome and generated 476,000 CCS reads from cDNA with an average length of 1 kb to investigate the isoform complement of a diverse pool of RNA samples representing 20 human tissues and organs. Data from 454 platform with average read length 522 bp , but on the same samples, is also available. PacBio RNA-seq Data on ENA: PRJEB3969

PacBio RNA-seq data from hESC cell line
Wing Wong’s team from Stanford published a new method that can use PacBio and Illumina reads to identify isoforms in PNAS. The team used C2 chemistry to generate over 7.5 M lreads of average length 2-3 Kb from hESC cell line H1. Data can be accessed at GSE51861.

7Share1Share0Share0Share0Share0Share You may also like:
EncodingInformationAsDNA EncodingInformationAsDNA
Information Storage in DNARoche to Shut Down NJ R&D Facility and 1000 Jobs to Go Roche to Shut Down NJ R&D Facility and 1000 Jobs to Go
Roche, the swiss based pharma giant announced that it will be closing Nutley NJ R&D...2013 NGS Conferences 2013 NGS Conferences
Here is the list of Next-Gen sequencing conferences in 2013. NextGenSeek hopes to list...23andMe Reduces DNA Testing Kit Prize and Removes Subscription Plan 23andMe Reduces DNA Testing Kit Prize and Removes Subscription Plan
23andMe the personal genomics company based in California announced that it is...Illumina Sues Complete Genomics Again Illumina Sues Complete Genomics Again
Illumina announced that it is filing its second patent infringement lawsuit against...Did You Know There Are (At Least) 14 Next-Gen Sequence Technology Companies? Did You Know There Are (At Least) 14 Next-Gen Sequence Technology Companies?
Would you believe there are next-gen sequencing technology companies other than the... [ what's this ] Share on facebookShare on twitterShare on emailShare on pinterest_shareMore Sharing Services0Related posts:

1.PacBio Aims to Reach Average Read of Lengths of 7000-9000 Bases in 2013
2.Illumina CEO Jay Flatley on Moleculo and Verinata Health
3.Illumina Acquires Moleculo Inc. for Longer Reads
4.Update on Moleculo Technology from PAGXXI
5.Illumina Gives More Details on Moleculo Technology
Filed Under: Illumina Long Read Sequencing Service, Moleculo Long Reads, Moleculo Technology, PacBio, PacBio RNA-seq · Tagged With: Moleculo, Moleculo Long Reads, PacBio, PacBio Long Reads
Comments
Lex Nederbragy says:
January 8, 2014 at 4:18 pm
Great idea, this post! Some comments:

The Drosophila moleculo data is available through Illumina’s basespace (free registration required).

PacBio released several bacterial genome datasets, from projects illustrating the potential for finished genomes using this platform.

Reply
Lex Nederbragy says:
January 8, 2014 at 6:59 pm
And then I forgot to include the Arabidopsis Pacbio long reads, as well as the reads generated from the Human Microbiome Project ‘mock community’ sample – both released by the company and available through pacbiodevnet.com

http://nextgenseek.com/2014/01/long-sequence-reads-to-play-with-during-the-holidays/