On Apr 3, 2012, at 16:16 , Shankar Manoharan wrote: > Thank you professor. :) Helped a LOT. Hmmm ... "Prof. Dr. Chevreux" sounds good, but as I have no professor title (not even "h.c."), I think you shouldn't call me that :-) > My next plan of work is to recover the 40k odd reads which are in the debris > of the reference assembly, try to do a de novo assembly of these and try to > fit them into the de novo assembly. Good strategy, I use it quite often. There is one cave-at: you will get also all the error-ridden reads in the data set from the debris, and if you put all the debris into a de-novo, it may be that those error-rich reads catch the statistics module off-guard. You may want to assemble the debris as "est" instead of "genome". I know it sounds a bit weird, but it is the only work-around I can give at the moment for this special kind of data. > I'd like your opinion on that professor. Plus, how can I extract debris reads > from the Sff file based on the headers that MIRA provides in the info > directory ? Do we have a script for that or should I write my own ? I'm a > rather lousy scripter :( Then it would be a good opportunity to improve ;-) On the other hand: you do not need to. convert_project comes with an option ("-n") to supply a names file which tells it to extract only certain reads from a data set. I think this will come in handy in your case. And you may want to extract the reads from the last "readpool.maf" in the checkpoint directory. They are as clean as MIRA could get them, so if you tell convert_project to extract clipped data ("-c"), this would probably help you also a lot (remember to turn off all clipping in MIRA if you use that already clipped set as input). B.