On Montag 23 November 2009 Jeremiah Davie wrote: > Hi All, > I am learning to use MIRA and wanted to incorporate Sanger data > along with a 454 run. The problem is that the sanger data is saved as > *.ab1 files, as the sanger sequencing facility at our university uses > an older ABI sequencer. Hello Jeremiah, first question is: does your provider just provide the ab1 files or do they also have a service where they preprocess the data? If yes, ask them to do that. There's a ton of stuff one should be aware of and it's by no mean trivial (quality clipping, sequence vector trimming etc.pp) Most providers should still have pretty good pipelines from the heydays of Sanger sequencing. If they do, they can give you the data in almost any format and MIRA should be able to use it: FASTA + XML, EXP or even masked FASTA if there's no other possibility. > I can use Sequencher to convert those files to > fasta/fastq files, but cannot generate the traceinfo.xml files that > MIRA expects. Is there a way to avoid using the traceinfo.xml files? Yes, using EXP files for masked FASTA files. > Conversely, does anyone know of a program that will convert an .ab1 > file to fasta/fastq/traceinfo.xml collection? Nothing public I know of (I know at least two companies has an internal pipeline for that). But there's still GAP4 and the pregap4 pipeline. Comes with a pretty robust ab1 -> EXP conversion pipeline. And MIRA can then read the Sanger reads in EXP and 454 reads in FASTA + XML. Have a look at it: http://staden.sourceforge.net/ > If not, can someone > guide me to an easy to follow guide for writing a traceinfo.xml file? The canonical source would be the NCBI (they standardized the format): http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?cmd=show&f=rfc_b&m=doc&s=rfc_b Have a look at the example a bit down on the page, it's not really difficult. But please note that MIRA does currently not parse "<common_fields>" (it's somewhat recent), all these fields need to be placed per read into the file. Here's a minimal entry per read I would generate: <trace> <trace_name>HBBAA1U0001</trace_name> <trace_file>HBBAA1U0001.scf</trace_file> <clip_vector_left>56</clip_vector_left> <clip_vector_right>737</clip_vector_right> <clip_quality_left>80</clip_vector_left> <clip_quality_right>700</clip_vector_right> <template_id>HBBAA1U0001</template_id> <insert_size>1500</insert_size> <insert_stdev>450</insert_stdev> </trace> Leave out "tremplate_id" and the "insert_*" if you don't work with templates. > Any help would be greatly appreciated; I'm pulling my hair out on > this. Sincerely, Jeremiah Don't! It's not worth it. Besides: they'll disappear in the coming years faster than you'd wish for :-) Regards, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html