On Dec 12, 2011, at 16:09 , Christoph Hahn wrote: > After your step 2 (convert_project -f maf -t fasta -A "SOLEXA_SETTINGS > -CO:fnicpst=yes" derjavinoidesmtlane8_out.maf iteration1) I get a whole bunch > of files: > > iteration1_AllStrains.padded.fasta > [...] > iteration1_default.padded.fasta > [...] > iteration1_derjavinoidesmtlane8.padded.fasta > [...] > iteration1_derjavinoides_mt.padded.fasta > [...] > I was continuing with the iteration1_derjavinoidesmtlane8.padded.fasta, > although I am not sure if it would maybe be safer to continue with the > unpadded file. Take the unpadded version. From the manual: fasta contains the contig consensus sequences (and .fasta.qual the consensus qualities). Please note that they come in two flavours: padded and unpadded. The padded versions may contains stars (*) denoting gap base positions where there was some minor evidence for additional bases, but not strong enough to be considered as a real base. Unpadded versions have these gaps removed. > What confuses me are all the other files. What are they? THey are obviously > all some variants of the reference.. If you map reads from different strain to a reference, what should MIRA give you as consensus? See? Not that easy. So MIRA takes the broad approach: one consensus per strain, plus one consensus for reads without strain info ("default") and one consensus for all strains together ("AllStrains") > -) the file with the trimmed reads that I obtained from the first mapping > attempt with Mira (mynewl8data.fastq) as well as the file I get from mirabait > (mymtreads_iteration1.fastq) both start with the reference and are then > followed by several sequences (header e.g. @rr_####50####) before the actual > reads. Apparently these @rr_#### sequences are all part of the reference.. > what exactly is it? Uhhhh ... you seeing these ###-reads tells me something went wrong. Somewhere. Where exactly did you get the original file from? Additional question: did the assembly run over several passes? If yes, why? To answer your question: the rr_### reads are "rails", helper reads used by MIRA during the assembly. They are not present in the final results, only in intermediate files. > -) Also I tried to use mirabait to identify reads that map onto the the > sequence of the host organism, but unfortunately it seems as if the reference > sequences are too long. Is there a way of dealing with this, apart from > cutting the reference in smaller bits? This is the error message I get: > "Read gi|354459049|gb|AGKD01000001.1| is 194200 bp long and thus longer than > MAXREADSIZEALLOWED (29900) bases. Skim cannot handle than, sorry." Oooooooops! This is something which should not happen. Definitively a bug. I'll have a look at it as I am currently working on this part pf MIRA. In the mean time: sorry, you need to fragment :-/ Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html