Hi all, I'm playing with our first Solexa data in MIRA, doing a preliminary reference assembly (2.7 million Illumina reads, non-paired, 38bp) on a 2Mbp genome with the call to MIRA (V3rc4): mira --project=BAnh1IrMREF02 --job=mapping,genome,normal,solexa -OUT:ora=yes -GE:not=1 -AS:urd=yes -SB:lb=yes:bft=fasta:bbq=20 SOLEXA_SETTINGS -LR:ft=fastq In my output .caf and .ace files, I found only very few reads from my input files (with names like HWI-EAS210R_0001:6:1:3:224#GATCAG/1). Instead, I found ~400000 reads with read names like _cer_sxa_0_ _cer_sxa_1_ .. These reads are generally much(!) longer than my input reads. Does anyone know what these reads are? I guess they could be fake reads to reduce read numbers while preserving coverage, but I am not sure? And if so, does the coverage truly represent all mismatches (i.e. are "allel frequencies" truly preserved)? And if I wanted to find all reads mapped to a certain site, is that info preserved somewhere? Is there a way to turn this feature off? Greatful for any help Björn ==================================== Björn Nystedt, PhD Molecular Evolution EBC, Uppsala University Norbyv. 18C, 752 36 Uppsala Sweden phone: +46 (0)18-471 45 88 email: Bjorn.Nystedt@xxxxxxxxx ==================================== -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html