[mira_talk] Re: adaptor trim

  • From: Huang Yi <huang.y.hy@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 12 Nov 2015 21:54:34 +0800

Thanks Adrian. This is another question I met. I know MIRA required no more
than 70-80 coverage of reads for assembly. But in my case, if I input 70x
reads, the final assembly report told me only 50x reads were used. To
increase the final coverage, I tried 90x and 100x reads.

90x input reads, then 62x reads were used.
100x input reads, MIRA stopped and say the coverage was 87, which was too
high to work.

If I want to get just right 70x reads used for assembly, I should try more
times. Does it also happen to you? Or I did anything wrong again?

2015-11-12 12:59 GMT+08:00 Adrian Pelin <apelin20@xxxxxxxxx>:

The difference in coverage is minimal, but could you get them to the same
level? Either 60x or 70x. Just make sure the seq depth of both assemblies
is the same so that we can judge the stats just based on
adaptor presence/absence and nothing else. Seems like a 2x change in n50,
that looks pretty drastic


On Wednesday, 11 November 2015, Huang Yi <huang.y.hy@xxxxxxxxx> wrote:

Below are two assemblies I did:

1. Gave 90x adaptor free reads to MIRA, 62x reads were used and generated
754 contigs.

Length assessment:
------------------
Number of contigs: 754
Total consensus: 7705193
Largest contig: 102257
N50 contig size: 22587
N90 contig size: 5455
N95 contig size: 2524


2. Gave 100x raw reads to MIRA, 70x reads were used and generated 366
contigs:

Length assessment:
------------------
Number of contigs: 366
Total consensus: 7261126
Largest contig: 152181
N50 contig size: 45496
N90 contig size: 12905
N95 contig size: 6847

I think using raw data is okay for MIRA. But I don't understand why the
trimmed data got worse result than raw data. It should be better, right? or
something wrong with my setting?

Here is the config file I used:

project = 90xmiradenovo
job = genome,denovo,accurate
parameters = -DI:trt=/tmp -NW:cmrnl=no SOLEXA_SETTINGS -CL:pec=yes

readgroup = DataForautopairing
data = ./45x_1.fastq ./45x_2.fastq
technology = solexa
template_size = 300 700 autorefine
strain = B14

Thanks!

Yi

2015-11-11 21:57 GMT+08:00 Huang Yi <huang.y.hy@xxxxxxxxx>:

thank you!

2015-11-11 21:54 GMT+08:00 Bastien Chevreux <bach@xxxxxxxxxxxx>:

On 11 Nov 2015, at 8:49 , Huang Yi <huang.y.hy@xxxxxxxxx> wrote:

[…]
My other question is: there is "Num. reads assembled" in file
"info_assembly.txt", which I think shows how many reads were used for MIRA
assembly. This number is less than what I feed to MIRA. Is it possible to
pull those reads out?


Yes. Reads mapped/assembled are tracked in the
“*projectname*_info_contigreadlist.txt”
file, reads not mapped/assembled (and the reason why) in the file “
*projectname*_info_debrislist.txt”.

B.





Other related posts: