InvestorsHub Logo
Followers 3
Posts 607
Boards Moderated 0
Alias Born 06/07/2010

Re: Paulieme post# 316

Wednesday, 02/12/2014 4:19:23 PM

Wednesday, February 12, 2014 4:19:23 PM

Post# of 1060
February 12th, 2014 | Genome Assembly, Pacbio
Hybrid Assembly – (ii): The Error Models of PacBio Reads
Now that we have contigs assembled from short Illumina reads aligned on to long PacBio reads, the question of which one to trust often pops us in our mind. Let us explain the issues more clearly.

——————————————————————–
A. Often we come across regions, where 3/4th of a long Illumina contig matches very well with the PacBio read (after allowing for 85% of error rate), but the remaining contig is not seen anywhere nearby.

Possibilities:

Illumina is correct.

One can make a case that Illumina contig is built from hundreds of short overlapping regions, and therefore the Illumina contig is more accurate.

PacBio is correct.

One can also argue that the particular genomic region is different in two chromosomes and the PacBio read is capturing a different chromosome compared to what is assembled from Illumina. Possibly the chromosomal region has a large insertion/deletion.


——————————————————————–

B. We also come across regions, where the Illumina contig matches PacBio closely, but has a large gap inside. The gap is usually filled with homopolymers.

Possibilities:

Once again, one can argue about both possibilities mentioned in A.
——————————————————————–

C. Third case of ambiguity is multiple copies of the same Illumina contigs matching a PacBio contig.

Possibilities:

PacBio is correct.

By its design, k-mer based de Bruijn graph assembly compresses duplicated regions into one block. Therefore, the contig assembly method used for Illumina reads is incapable of resolving tandem repeat regions.

Illumina is correct.

PacBio technology circularizes the chromosomal fragments and then goes over them again and again. Therefore, the raw PacBio reads have multiple copies of the same chromosomal region, but the initial processing step splits them into different reads. It is possible that the processing step may have missed a few circularized junctions.

—————————————————————–

Yesterday, we took time to meticulously work through a few cases to understand what is going on, and we will share the examples here. They are anecdotal cases rather than systematic analysis of the entire data set, but will illustrate the points mentioned in A, B and C to help you appreciate the issues.

This commentary will be expanded with many figures and examples.

http://www.homolog.us/blogs/blog/2014/02/12/hybrid-assembly-ii-error-models-pacbio-reads/- (Bobby Sebra's #AGBT14 poster: an all-in-one Infectious Disease Pipeline using @PacBio long reads Thursday 1:00-2:30 pic.twitter.com/2qTCXwEU2g)-https://twitter.com/IcahnInstitute/status/433706208562647040/photo/1
Volume:
Day Range:
Bid:
Ask:
Last Trade Time:
Total Trades:
  • 1D
  • 1M
  • 3M
  • 6M
  • 1Y
  • 5Y
Recent PACB News