[mira_talk] Re: force MIRA to produce only singletons and no debris

  • From: Michele Vidotto <michele.vidotto@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 16 Feb 2011 17:42:21 +0100

Now I'll explain well the reason for the request (have all the
non-aligned sequences as singletons, but not as debris in output).

In a recent article of transcriptomics, where the assembly was
performed with MIRA 3
("http://www.biomedcentral.com/1471-2164/11/635";), the authors say:

"Due to the heuristic nature of the assembly process and
previous reports of redundancy (different contig
Belonging to the transcript sequences Same region) in sets of
transcriptome contigs assembled with different
methods [17], a second run of assembly Was Conducted
using the previously Obtained contigs and singlets as
input. "

So inspired by this statement, I made a script to iterate successive
assemblies using MIRA, in which, at each cycle, I use as input
sequences, the contig + singlets produced in the previous cycle.

In the first assembly is fine to have the distinction between debris
and contig + singlets. Instead, over the next cycles I expect that,
assembling the contig + singlets obtained from the first assembly, I
get only contigs + singletons and not yet debris.
 This because in the second round i assemble sequences (contigs made
also by one read) produced by the program and composed, in turn, from
reads that have not been discarded by MIRA in the previous round and
so of good overall quality.

For this reason I would find a combination of settings in MIRA that
let me get in the output all sequences as contigs or singlets, but not
as debris.

I take this opportunity also to ask whether such approach can really
improve the assembly of a transcriptome or whether, in your opinion,
it did nothing but introduce errors.

Thank you very much!








2011/2/16 Bastien Chevreux <bach@xxxxxxxxxxxx>:
> On Wednesday 16 February 2011 15:32:47 Michele Vidotto wrote:
>> However now MIRA continues to give both singlets and debris. I was
>> wondering if I can get all the sequences that are not aligned, in
>> output, in the form of singletons (thus abolishing the distinction
>> between debris and singletons).
>
> Hmmm, I was going to reply that there is no distinction between debris and
> singlets as they're all unaligned reads. But rethinking it, there are
> differences.
>
> In MIRA: reads which get thrown out during quality checks or at different
> stages of clipping will never ever appear as singlets, most of the time
> there's normally just too much junk in this population. Sorry, this behaviour
> of MIRA will not be changed.
>
> Everything else which passes the stage can be put into singlets via
> -OUT:sssip:stsip if -AS:mrpc does not interfere.
>
>> In particular I would like that all the non-aligned sequences were
>> found in the file "* _out.unpadded.fasta" as singletons, and to be
>> listed in the file "* _info_contigreadlist.txt" without having any
>> listed in "*_info_debrislist.txt"
>
> All singlets will appear both in the FASTA as well as in the contigreadlist
> file. Debris will only appear in the debris file.
>
>> is it bossible with a particular commands combinations?
>
> As I wrote above: reads not passing the initial clipping stages will always
> end up in debris.
>
> Is there a particular application you're after that you need the singlets so
> badly in the result files?
>
> B.
>
> --
> You have received this mail because you are subscribed to the mira_talk 
> mailing list. For information on how to subscribe or unsubscribe, please 
> visit http://www.chevreux.org/mira_mailinglists.html
>



-- 
Michele Vidotto
(Ph.D. Student)
Department of Biology
Universita` degli Studi di Padova
Via Ugo Bassi 58/B,
35131, Padova, Italy
Phone: +39 049 827 6204
Fax: +39 049 827 6209
mailto: michele.vidotto@xxxxxxxxxxxxxxxxx

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: