[mira_talk] Re: Making the best assembly

  • From: C Jenkins <cej.jenkins@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 7 Dec 2015 09:07:17 -0800

We are talking about cDNA/RNASeq data for a eukaryotic parasite.

There were 72276 reads used in the Illumina assembly, but a big chunk of my
raw reads were not used.

I used Mira clipping, and so I used unprocessed reads.

On Sun, Dec 6, 2015 at 10:11 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

On 06 Dec 2015, at 23:00 , C Jenkins <cej.jenkins@xxxxxxxxx> wrote:
I have a largely undescribed species of a trematode parasite. It is
similar in life cycle to Schistosoma mansoni.
I have 454 and illumina single end reads from 4 different populations. I
need to first create a reference transcriptome.
The illumina data is... rough. I first assembled it using Trinity, and
found only 531 contigs... which is orders of magnitude less than I expected.
So I used MIRA to do a 454 assembly, a illumina assembly and a hybrid
assembly. Now I'm trying to figure out which is any good.
[…]

We are talking about EST/CDNA/RNASeq, right? Because for eukaryotic
genomes, MIRA is definitively not the right tool.

First things first: if you used MIRA 4.0.x, then give the current
development version a try. It’s light years ahead of 4.0.
Second: that table of yours … there’s something I do not understand: the
columns. E.g.: the 454 assembly has 33k reads but a coverage of 43k? Or:
the Illumina assembly really has only 72k reads as input?
Third: for the Illumina assembly, did you give MIRA “unprocessed” reads?
This is recommended.

What I normally do for RNASeq assemblies with Illumina; I take a very
small subset (100k or so) and assemble that to see whether there are
unexpected things like, e.g. an unfiltered library with 80% rRNA or similar
funny surprises. Then a quick run with 1m reads and if all seems OK, I
generally start the assembly with 10 to 15 million read pairs (20 to 30m
reads) as this is generally regarded as sweet spot for transcriptome
assemblies.

I haven’t tried 4.9.x on 454 data though, so I cannot predict its
performance there.

B.



--
You have received this mail because you are subscribed to the mira_talk
mailing list. For information on how to subscribe or unsubscribe, please
visit http://www.chevreux.org/mira_mailinglists.html




--
CJenkins, MS
PhD Candidate
Washington State University/University of Idaho

Other related posts: