[mira_talk] Re: "Simple" assembly: contigs and reads from closing PCRs

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 17 Aug 2011 22:34:48 +0200

On Wednesday 17 August 2011 18:27:27 Lionel Guy wrote:
> [...]
> Had to adjust the -SK:mmhr otherwise MIRA would find 1 megahub and stop.
> I got the thing to run (in a couple minutes, first time ever I got MIRA
> to run in less than an hour ;)),

Hey! Is that some kind of criticism I read there? If yes, I'm going to sulk in 
the corner :-)

> but the assembly is not good: I get
> only 216 reads assembled, and get 48 large contigs, totalling 720kb (i.e
> a third of the scaffold). Most contigs consist of one of the old contigs
> plus one joining read, but I never got two of the old contigs joined.
> 
> mira --project=$ASS_ID --job=denovo,genome,sanger,accurate -OUT:ora=yes
> -GE:not=$NCORES -DI:trt=$TMP -SK:mmhr=10 -AS:urd=yes &> $ASS_ID
> \_log_assembly.txt &

I suppose the reads you generated did not "feel" like real sequencing data and 
the clipping routines will had a hell of a joyful time slicing through that 
data set while thinking that these things look like sequencing artefacts.

> I am aware this is a very particular case, but I'd like to know if
> someone has experience with that kind of cases. What parameters should I
> change?

You could either 
a) switch off all clipping routines in MIRA, i.e. 
     "--noclipping -CL:pec:no"
b) if you do not want to do that, change the data set to look a bit more like
   real sequencing data, see next paragraph on how to do that.

Depending on the nastiness of your bug, additional measures might be necessary 
in case some of the 5kb overlaps are very similar and MIRA would like to join 
a couple of wrong ones. If that happens: for every 20kb piece you add, add the 
reverse complement sequence. Give the bases of the fwd and rev pieces a 
quality of 45: you don't need to create files for this if your PCR product 
sequences have qualities, then -AS:bdq=45 would give all reads without quality 
a default quality of 45. Then tell the contig object to go on a repeat marker 
base hunt in low coverage, high quality mode:

  -CO:mrpg=2:mnq=90:mgqrt=90

This still would not help if there are 100% identical 5kb overlaps in more 
than 2 reads. Never happened to me, but then you would just need to fake SNPs 
in those overlaps to get the assembled correctly.

B.

Other related posts: