[mira_talk] howto format paired-end reads for mira

This question came to me via mail and Filip allowed me to put question and 
answer to the MIRA talk list.

On Mittwoch 24 Juni 2009 Filip Van Nieuwerburgh wrote:
> First of all, I want to thank you for making MIRA available for the
> community. It is the best free assembler I used until now (but very slow).

Hi Filip,

yeah, MIRA is not the fastest. But it's the one which creates the least 
trouble afterwards for finishing (at least for me). I don't mind the computer 
needing a few hours more if I can save time :-)

> I am very excited about your last update, which includes more support on
> true hybrid assembly.

454 & Sanger has been wrking since ... quite some time. Solexa it could do 
that since the early *40 versions, albeit not really well and not documented 
at all.

De-novo Solexa (or a denovo hybrid with Solexa) is still slow in the end-game 
... this is going to be improved in one of the next updates.

> I have a question on using Illumina mate paired reads in a denovo hybrid
> assembly (./bin/mira -project=bchoc -job=denovo,genome,accurate,454,solexa
> COMMON_SETTINGS -highlyrepetitive SOLEXA_SETTINGS
> -GE:tismin=2000:tismax=3000 >&log_assemblybchocHybridDate090624.txt).
>
> How should the reads be formatted? To illustrate my question, I copy/paste
> the requirements stated in the Velvet manual:
>
> 1.       For paired-end reads, the assumption is that each read is next to
> its mate read. In other words, if the reads are indexed from 0, then reads
> 0 and 1 are paired, 2 and 3, 4 and 5, etc. If for some reason you have
> forward and reverse reads in two different FASTA files but in corresponding
> order, the bundled Perl script shuffleSequences.pl will merge the two files
> into one as appropriate. To use it, type:> ./shuffleSequences.pl
> forward_reads.fa reverse_reads.fa output.fa

This is not needed by MIRA. Only requisite: one part of the pair must be named 
"/1" and the other part "/2" (or a non-Solexa-standard ".f" and ".r", but then 
you need to adapt the -LR:rns switch).

As the "/1" "/2" naming is the current standard from the Illumina pipeline, 
you probably don't need to do anything.

> 2.       Concerning read orientation, Velvet expects paired-end reads to
> come from opposite strands facing each other, as in the traditional Sanger
> format.

Yep, this is perfect and MIRA expects that, too, for all reads (Sanger, 454 
and Solexa). As 454 has a "forward/forward" orientation initially, one pair 
must be turned around. But the sff_extract script from Jose does that 
automatically for you and the output it produces is directly usable by MIRA.

> What are the requirements for MIRA? Maybe both reads have to be on the same
> strand and facing the same direction in MIRA?

Nope :-)

> Thank you very much for answering my question. I realize that you must
> receive many questions from all over the world.

This is why I prefer questions on the mailing list ... people tend to find 
answers there pretty quickly with Google once a similar question was asked.

Would you mind if i push this answer mail to the list?

Regards,
  Bastien

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: