On Jul 12, 2011, at 15:41 , Nestor Zaburannyi wrote:
> I have come to a point in which i must use several illumina libraries with 
> various insert sizes for assembly. However, even after generating *.xml for 
> only one of them, i got 1300MB file. Mira doesn't like this volume of xml i 
> guess, because it takes all the RAM and all the SWAP space and doesn't 
> progress any further in observable future.

Hmmm ... I guess that expat starts to become a limitation as well as the 
implemented strategy to first load everything into RAM before parsing.

But it should not be that big of a problem with a file size of "only" 1.3 GiB, 
so I'm a bit at loss here.

> My only idea is to assemble one library and then use the contigs as false 
> reads with quality information. Or to assemble without any paired information 
> at all.

I've got a better one: you wait until end of this week, or end of next at the 
latest, and I'll write a quick hack to get that template insert size info 
loaded quickly. It will be dirty and not pretty, but the format I envision 
should be simple enough to get you going.

And I need to start thinking about changing the whole data load machinery of 
MIRA to accommodate this kind of data a lot more easier/efficiently.

> Has anybody encountered such problems? Also, is my *.xml correct?
> <?xml version="1.0"?>
>        <TRACE>
>                <TI>1:4:1:1426:1923#ATGTAA/1</TI>
>                <TEMPLATE_ID>1:4:1:1426:1923#ATGTAA</TEMPLATE_ID>
>                <INSERT_SIZE>300</INSERT_SIZE>
>                <INSERT_STDEV>30</INSERT_STDEV>
>                <TRACE_END>FORWARD</TRACE_END>
>        </TRACE>

Ummm ... I do not think that using <TI> is what you want, except if your reads 
are named, e.g., "gnl|ti|1:4:1:1426:1923#ATGTAA/1" which I doubt they are.

Is there a reason you do not use <trace_name> ?

Which makes me think whether this actually might not be the problem in the 
first place whith the XML loading seemingly going into infinite loop. I'd need 
to check that.


