[mira_talk] Re: Getting coverage by each technology per each contig

  • From: Laurent MANCHON <lmanchon@xxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 21 May 2014 21:13:56 +0200

Le 21/05/2014 21:07, Martin MOKREJŠ a écrit :
Hi Bastien,
   thank your for encouraging me to dive into this myself. ;) Was quite simple. 
Attached is what
I stitched in a half an hour, it works for my purpose although some bits should 
be moved to
separate functions. Anyway, if you want you can include it in the mira bundle, 
unless you are going
to write the "whole" thing yourself (which would be a good idea).
   The numpy dependency is not ideal, for doing using median() and average() it 
is not pretty. But
in overall, it works.

   Would *_contigstats.txt contain one more column with the sequencing 
technology abbreviated, things
would have been even easier.

Martin

Bastien Chevreux wrote:
On 20 May 2014, at 12:27 , Martin MOKREJŠ <mmokrejs@xxxxxxxxx> wrote:
  did anybody try to write some script to calculate how many reads were used 
for each resulting contig? Or even better, getting coverage by each technology 
for a given contig? I think it would be helpful to have these in the resulting 
FASTA files. I think it was already mentioned on this list but I just cannot 
find it.
I do not remember seeing this on the list. I hope that this isn’t a sign of my 
memory starting to fail me … :-)

  What I am really after is to split the resulting contigs into those specific 
to one or another technology. Looks mira_convert cannot do this right away but 
I hope to collect the contig names first and then ask just for them. So, in two 
executions could do the job I think.
There is indeed nothing you could use out of the box from the MIRA package.

The easiest solution I can come up with atm is if you parse the contigreads 
file in the info directory and categorise reads using a hopefully common name 
identifier per technology. That would enable you to create lists of contigs 
which are formed by one technology only (or predominantly by one technology if 
you want).

If that does not work for you, then you would need to parse the MAF file. The 
documentation for it in the MIRA docs is only for MAF v1, but MAF v2 has not 
changed much, it just added the concept of readgroups and a couple of other, 
quite minor changes. Just ask if you need help with that.

B.



very usefull script,
thank you Martin.


--

+----------------------------------+
                             .-.
 Laurent MANCHON             /v\
 Email: lmanchon@xxxxxx     // \\
                           /(   )\
                            ^^_^^
+----------------------------------+


Other related posts: