[mira_talk] Re: Genome assembly with Illumina MPs only

  • From: "Rohit Kolora" <rohit@xxxxxxxxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 9 Nov 2015 13:48:55 +0100

Hi Bernhard,

I don't think sub-sampling would be needed, unless you have >80X coverage.
Also you need to know how much of the genome would be represented due to
your sub-sampling.

Just try to remove redundancy in the data. Since chimeras are imminent in
Matepair data, they need to be handled well too :)

Cheers


Dear Rohit,

thank you for your help. I have stopped the previous MP assembly and
restarted with a subsample (10%) of the MP data as a single read
assembly, let's see what it can do.

Many thanks,

Bernhard

On 11/06/2015 12:07 PM, Rohit Kolora wrote:
Hi,

If you have just Mate-pair data then combine the two files in any order
and feed them as single end data.

Mate pairs should only be used for scaffolding, as for contig building
these reads can be used without pairing information as single ones. But
not sure how big your genome is and how much would be missing as
paired-end merging helps a lot to generate longer contigs.


Rohit

Hello,

I'm attempting a genome assembly, with only Illumina mate pair reads,
with insert sizes of about 5 kb. It's a fairly large amount of data
(2x
50 GB fastq files). Unfortunately, something went wroing with the
corresponding paired end reads, they do not pass quality filters and
cannot be assembled at all.

This is the manifest file for the assembly:

project = Metazoan_MP
job = genome,denovo,accurate
parameters = -GE:not=128
parameters = -GE:mps=1000

readgroup = DataIlluminaPairedLibMP
autopairing
data = Metazoan-MP-R1-all.fastq Metazoan-MP-R2-all.fastq
technology = solexa

It is running on an HPC where I've reserved 128 cores and 1024 GB of
RAM.

The MP-only genome assembly is now running for more than two weeks,
and
only the first checkpoint has been passed (12 days ago). Since then,
two
files are constantly updated, sometimes growing, sometimes shrinking:

Metazoan_MP_int_posmatchc_pass.1.bin
Metazoan_MP_int_posmatchf_pass.1.bin

After extending the walltime two times already, the HPC administrator
asked me if there was hope that this assembly could be finished
successfully at all.

Mira in principle works like a charm, I've done several Illumina-only
and Illumina-454 hybrid transcriptome assemblies.

Can you help me in determining if this MP-only assembly may be
completed
within another two weeks, or if there is little hope for an assembly
with these raw data?

Many thanks for your help,

Bernhard

--
You have received this mail because you are subscribed to the
mira_talk
mailing list. For information on how to subscribe or unsubscribe,
please
visit http://www.chevreux.org/mira_mailinglists.html




--
Dr. Bernhard Egger FLS
Group leader
Institute of Zoology, University of Innsbruck
Technikerstr. 25
6020 Innsbruck
Austria

http://www.uibk.ac.at/zoology/staff/egger/

http://www.uibk.ac.at/zoology/research/regeneration/


--
You have received this mail because you are subscribed to the mira_talk
mailing list. For information on how to subscribe or unsubscribe, please
visit http://www.chevreux.org/mira_mailinglists.html





--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html

Other related posts: