[mira_talk] Re: Solexa paired ends and 454 single reads

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 21 Jan 2010 20:45:50 +0100

On Donnerstag 21 Januar 2010 Mihaela Angelova wrote:
> Thanks a lot for answering me.
> Mira is still not in any Pass now. The log assembly is stuck at this :
> 
> Now running threaded and partitioned skimmer with 272 partitions in 2
>  threads: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|....
>  [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%]
> ....|.... [90%] ....|.... [100%]
> 
> I have ~668.000 reads (from 454) and ~75.000.000 paired-end reads in total.

Hi Mihaela,

uh, 75m reads? If it's a bacterium, that's far to many. If it's something 
larger ... hmm, I don't know why it should take an eternity right there. If 
you are interested I can give you a version with a lot of debugging. It won't 
help you immediately, but at least I'd have an idea where the problem is.


> I have noticed that a huge number of paired
> ends were discarded because they were too short :
> 
> AAA/2 too small even with paired end
> 
> Is this because their ends might have been clipped.

Yes, exactly.

> In the project_int_clippings.0.txt there are messages like :
> 
> proposed cutback 1b:  left AAA/2        9 -> 24
> 
> If I use the noclipping command, would it make the run more accurate?

I very, very strongly suggest to use the proposed end clipping (PEC) for data 
sets with a coverage >15x (I suppose that to be the case for yours). PEC is 
inanely effective in doing in pre-screen and clips extremely effectively 
without clipping to much. If you don't use PEC, the assemblies are worse in 
95% of the time, especially with Solexa and 454 reads.

The output log gives you an idea how much was really clipped. It looks like 
this (example for a project with 800k 454 reads):

Looking for proposed cutbacks ... done.
Performed clips:
        Num reads cliped left: 31737
        Num reads cliped right: 131198
        Num reads completely killed: 7205
        Total bases clipped         : 2236490

Should these numbers be extremely high for you, then one should ask why the 
dataset is like that.

Regards,
  Bastien

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: