[mira_talk] Re: Reference assembly issues...

  • From: Shankar Manoharan <shankarmanostar@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 3 Apr 2012 23:08:24 +0530

*Thanks again Dr. Chevreux (if that makes you happy :D) I used the
fastqselect tool to pull out the debris list from the reference assembly
and ran a de novo of the genome with no trace info. There were around 40000
debris reads which produced roughly 600 contigs :D So, I'll take your word
for it and do as you said...Thanks again :)*
*
*
*If nobody has sent you gold coins as requested, I may in some time :D*
*
*
*Shankar Manoharan
Graduate Student
Department of Genetics
Madurai Kamaraj University*
*Ph. +919790167534*
*
*
*I strongly believe in doing my best and leaving the rest to God*
*
*



On Tue, Apr 3, 2012 at 10:47 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> On Apr 3, 2012, at 16:16 , Shankar Manoharan wrote:
>
> *Thank you professor. :) Helped a LOT.
> *
>
>
> Hmmm ... "Prof. Dr. Chevreux" sounds good, but as I have no professor
> title (not even "h.c."), I think you shouldn't call me that :-)
>
> *My next plan of work is to recover the 40k odd reads which are in the
> debris of the reference assembly, try to do a de novo assembly of these and
> try to fit them into the de novo assembly.
> *
>
>
> Good strategy, I use it quite often.
>
> There is one cave-at: you will get also all the error-ridden reads in the
> data set from the debris, and if you put all the debris into a de-novo, it
> may be that those error-rich reads catch the statistics module off-guard.
> You may want to assemble the debris as "est" instead of "genome". I know it
> sounds a bit weird, but it is the only work-around I can give at the moment
> for this special kind of data.
>
> *I'd like your opinion on that professor. Plus, how can I extract debris
> reads from the Sff file based on the headers that MIRA provides in the info
> directory ? Do we have a script for that or should I write my own ? I'm a
> rather lousy scripter :(
> *
>
>
> Then it would be a good opportunity to improve ;-)
>
> On the other hand: you do not need to. convert_project comes with an
> option ("-n") to supply a names file which tells it to extract only certain
> reads from a data set. I think this will come in handy in your case.
>
> And you may want to extract the reads from the last "readpool.maf" in the
> checkpoint directory. They are as clean as MIRA could get them, so if you
> tell convert_project to extract clipped data ("-c"), this would probably
> help you also a lot (remember to turn off all clipping in MIRA if you use
> that already clipped set as input).
>
> B.
>
>

Other related posts: