[mira_talk] Re: 0.5TB not enough space?

Many repeats in Mycoplasma, unfortunately. Also, many HGTs, so k-mer
frequencies may not be as uniform as in other bugs. But thanks for turning
my attention to this, I will try running the assembly with some nasty repeat
masking.

@Bastien "Heavens! Is there any valid reason you set up an assembly with a
coverage >= 1000x?"

Not one which I would like to put down in writing ;)

Thanks,

Iddo




On Mon, Aug 15, 2011 at 2:37 PM, Robert Bruccoleri <
bruc@xxxxxxxxxxxxxxxxxxxxx> wrote:

> **
> Dear Iddo,
>     There's another issue with your data: it looks noisy. Look at this
> section of the log file:
>
> Measured avg. frequency coverage: 1014
>
> Deduced thresholds:
> -------------------
> Min normal cov: 405.6
> Max normal cov: 1622.4
> Repeat cov: 1926.6
> Heavy cov: 8112.0
> Crazy cov: 20280.0
> Mask cov: 101400
>
> Repeat ratio histogram:
> -----------------------
> 0     5028189
> 1     837017
> 2     269532
> 3     37454
> 4     4716
>
>
> The repeat ratio histogram of a clean sequence file from a genome
> sequencing with decent coverage will show the "1" bin to be the biggest. The
> fact that the 0 bin is biggest is a sign that your sequences are filled with
> random sequence not from your bacteria. That's even more true with such a
> high coverage.
>
> Regards,
> Bob
>
>
> Bastien Chevreux wrote:
>
> On Monday 15 August 2011 19:36:32 Iddo Friedberg wrote:
>
> > Oops. I put up the wrong logfile. The run was definitely not on an NFS
>
> > system
>
>
>  Heavens! Is there any valid reason you set up an assembly with a coverage
> >= 1000x ? No, not ten, not one hundred ... one thousand! You are aware that
> this actually decreases the quality of a genome assembly, right? Non-random
> errors in the sequencing will be the death of it.
>
>
>  You might want to read quickly through
>
>
> http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html#sect_seqadv_a_word_or_two_on_coverage
>
>
>  especially the small paragraph labelled with a nice, warm and re-assuring
> "Warning".
>
>
>  Back to your project: slash down the amount of data by a factor of ten
> and all will be well :-)
>
>
>  B.
>
>
>  PS: and I'm actually now thinking of adding another warning flag which
> will let MIRA stop if it detects a coverage >= 150x in genome de-novo ...
> anyone having an oppinion on this?
>
>
>
>
>


-- 
Iddo Friedberg
http://iddo-friedberg.net/contact.html

Other related posts: