Hi Bernhard,
I don't think sub-sampling would be needed, unless you have >80X coverage.
Also you need to know how much of the genome would be represented due to
your sub-sampling.
Just try to remove redundancy in the data. Since chimeras are imminent in
Matepair data, they need to be handled well too :)
Cheers
Dear Rohit,
thank you for your help. I have stopped the previous MP assembly and
restarted with a subsample (10%) of the MP data as a single read
assembly, let's see what it can do.
Many thanks,
Bernhard
On 11/06/2015 12:07 PM, Rohit Kolora wrote:
Hi,
If you have just Mate-pair data then combine the two files in any order
and feed them as single end data.
Mate pairs should only be used for scaffolding, as for contig building
these reads can be used without pairing information as single ones. But
not sure how big your genome is and how much would be missing as
paired-end merging helps a lot to generate longer contigs.
Rohit
Hello,
I'm attempting a genome assembly, with only Illumina mate pair reads,
with insert sizes of about 5 kb. It's a fairly large amount of data
(2x
50 GB fastq files). Unfortunately, something went wroing with the
corresponding paired end reads, they do not pass quality filters and
cannot be assembled at all.
This is the manifest file for the assembly:
project = Metazoan_MP
job = genome,denovo,accurate
parameters = -GE:not=128
parameters = -GE:mps=1000
readgroup = DataIlluminaPairedLibMP
autopairing
data = Metazoan-MP-R1-all.fastq Metazoan-MP-R2-all.fastq
technology = solexa
It is running on an HPC where I've reserved 128 cores and 1024 GB of
RAM.
The MP-only genome assembly is now running for more than two weeks,
and
only the first checkpoint has been passed (12 days ago). Since then,
two
files are constantly updated, sometimes growing, sometimes shrinking:
Metazoan_MP_int_posmatchc_pass.1.bin
Metazoan_MP_int_posmatchf_pass.1.bin
After extending the walltime two times already, the HPC administrator
asked me if there was hope that this assembly could be finished
successfully at all.
Mira in principle works like a charm, I've done several Illumina-only
and Illumina-454 hybrid transcriptome assemblies.
Can you help me in determining if this MP-only assembly may be
completed
within another two weeks, or if there is little hope for an assembly
with these raw data?
Many thanks for your help,
Bernhard
--
You have received this mail because you are subscribed to the
mira_talk
mailing list. For information on how to subscribe or unsubscribe,
please
visit http://www.chevreux.org/mira_mailinglists.html
--
Dr. Bernhard Egger FLS
Group leader
Institute of Zoology, University of Innsbruck
Technikerstr. 25
6020 Innsbruck
Austria
http://www.uibk.ac.at/zoology/staff/egger/
http://www.uibk.ac.at/zoology/research/regeneration/
--
You have received this mail because you are subscribed to the mira_talk
mailing list. For information on how to subscribe or unsubscribe, please
visit http://www.chevreux.org/mira_mailinglists.html