[mira_talk] Re: After Scaffolding

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 16 Sep 2009 22:50:41 +0200

On Freitag 11 September 2009 Davide Sassera wrote:
> 1 Repeats: could you please tell me what nnr would you set in this
> situation?
>
> 0       8200338
> 1       1253412
> 2       168676
> 3       30776
> 4       6770
> 5       3196
> 6       2272
> 7       2074
> 8       1538
> 9       568
> 10      476
> 11      208
> 12      338
> 13      306
> 14      318
> 15      258
> 16      210
> 17      166
> 18      212
> 19      196
> 20      200
> 21      198
> 22      234
> 23      152
> 24      100
> 25      72
> 26      28
> 27      26
> 28      32
> 29      42
> 30      44
> 31      32
> 32      42
> 33      54
> 34      68
> 35      32
> 36      64
> 37      86
> 38      40
> 39      78
> 40      68
> 41      50
> 42      56
> 43      48
> 44      50
> 45      48
> 46      26
> 47      46
> 48      32
> 49      30
> 50      22
> 51      12
> 52      4
> 56      2
> 57      6

If MIRA does not complain (MEGAHUBS and such), none. If it complains: try 20. 
If still complaining: 15, 10, 9, 8, 7, ...

> 2 sequencing errors: you said that having more zeros than 1s is bad, so
> in my situation what should I do? jump off a cliff? change sequence
> company?

Jumping of cliffs higher than 2.57cm is highly discouraged as this might have 
negative side-effects on the environment in case of blood and gore splashing 
around. Save the planet, just don't do it :-)

Now, things are not always as easy as I write. There are cases when the repeat 
histogram is not straightforward:
- misestimation of the repetitiveness of the data by SKIM. This happens often
  with highly repetitive genomes (>20%-40%) and always with non-normalised
  EST.
- high-coverage 454 and/or Solexa data

To dump the sequence company you'd have to look at the error rate of the reads 
ion a finsihed project (telling MIRA *not* to use the sequence editor as this 
would get you away quite a lot of them)

> 3 chimeras: I remember in a previous version that after the annoying
> request of some italian guy (me) you implemented a chimera finder. it
> found around 4000 in my assembly. Is it possible to set some parameter
> to make this search less lenient. I'm willing to lose some good reads in
> order to get read of chimeras

Nothing you can do as this is one of the rare algorithms in MIRA where one 
can't tweak anything at the moment: either there are overlaps spanning reads 
or not. What you can try is using higher number for -SK:bph, but I doubt that 
it will lead to any changer.

Why do you want to tweak? Do you have the impressions that there are still 
chimeras?

Regards,
  Bastien


-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: