*Thanks again Dr. Chevreux (if that makes you happy :D) I used the fastqselect tool to pull out the debris list from the reference assembly and ran a de novo of the genome with no trace info. There were around 40000 debris reads which produced roughly 600 contigs :D So, I'll take your word for it and do as you said...Thanks again :)* * * *If nobody has sent you gold coins as requested, I may in some time :D* * * *Shankar Manoharan Graduate Student Department of Genetics Madurai Kamaraj University* *Ph. +919790167534* * * *I strongly believe in doing my best and leaving the rest to God* * * On Tue, Apr 3, 2012 at 10:47 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote: > On Apr 3, 2012, at 16:16 , Shankar Manoharan wrote: > > *Thank you professor. :) Helped a LOT. > * > > > Hmmm ... "Prof. Dr. Chevreux" sounds good, but as I have no professor > title (not even "h.c."), I think you shouldn't call me that :-) > > *My next plan of work is to recover the 40k odd reads which are in the > debris of the reference assembly, try to do a de novo assembly of these and > try to fit them into the de novo assembly. > * > > > Good strategy, I use it quite often. > > There is one cave-at: you will get also all the error-ridden reads in the > data set from the debris, and if you put all the debris into a de-novo, it > may be that those error-rich reads catch the statistics module off-guard. > You may want to assemble the debris as "est" instead of "genome". I know it > sounds a bit weird, but it is the only work-around I can give at the moment > for this special kind of data. > > *I'd like your opinion on that professor. Plus, how can I extract debris > reads from the Sff file based on the headers that MIRA provides in the info > directory ? Do we have a script for that or should I write my own ? I'm a > rather lousy scripter :( > * > > > Then it would be a good opportunity to improve ;-) > > On the other hand: you do not need to. convert_project comes with an > option ("-n") to supply a names file which tells it to extract only certain > reads from a data set. I think this will come in handy in your case. > > And you may want to extract the reads from the last "readpool.maf" in the > checkpoint directory. They are as clean as MIRA could get them, so if you > tell convert_project to extract clipped data ("-c"), this would probably > help you also a lot (remember to turn off all clipping in MIRA if you use > that already clipped set as input). > > B. > >