[mira_talk] Re: Several illumina insert sizes

On Jul 12, 2011, at 15:41 , Nestor Zaburannyi wrote:
> I have come to a point in which i must use several illumina libraries with 
> various insert sizes for assembly. However, even after generating *.xml for 
> only one of them, i got 1300MB file. Mira doesn't like this volume of xml i 
> guess, because it takes all the RAM and all the SWAP space and doesn't 
> progress any further in observable future.

Hmmm ... I guess that expat starts to become a limitation as well as the 
implemented strategy to first load everything into RAM before parsing.

But it should not be that big of a problem with a file size of "only" 1.3 GiB, 
so I'm a bit at loss here.

> My only idea is to assemble one library and then use the contigs as false 
> reads with quality information. Or to assemble without any paired information 
> at all.

I've got a better one: you wait until end of this week, or end of next at the 
latest, and I'll write a quick hack to get that template insert size info 
loaded quickly. It will be dirty and not pretty, but the format I envision 
should be simple enough to get you going.

And I need to start thinking about changing the whole data load machinery of 
MIRA to accommodate this kind of data a lot more easier/efficiently.

> Has anybody encountered such problems? Also, is my *.xml correct?
> 
> <?xml version="1.0"?>
> <TRACE_VOLUME>
>        <TRACE>
>                <TI>1:4:1:1426:1923#ATGTAA/1</TI>
>                <TEMPLATE_ID>1:4:1:1426:1923#ATGTAA</TEMPLATE_ID>
>                <INSERT_SIZE>300</INSERT_SIZE>
>                <INSERT_STDEV>30</INSERT_STDEV>
>                <TRACE_END>FORWARD</TRACE_END>
>        </TRACE>

Ummm ... I do not think that using <TI> is what you want, except if your reads 
are named, e.g., "gnl|ti|1:4:1:1426:1923#ATGTAA/1" which I doubt they are.

Is there a reason you do not use <trace_name> ?

Which makes me think whether this actually might not be the problem in the 
first place whith the XML loading seemingly going into infinite loop. I'd need 
to check that.

B.


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: