[mira_talk] Settings for ddRAD data using EST mode

  • From: Magnus Popp <magnus.popp@xxxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Mon, 6 Oct 2014 16:48:25 +0000

Hi,

I’m attempting to assemble ddRAD data from an iontorrent PGM using the EST mode 
of mira 4.0.2. The data comes from genomic DNA that has been completely 
digested with two different RE’s, ligated with indexed adaptors, size selected 
(inserts ca 300-420bp) and then sequenced. The result is a reduced genome 
representation, and in this particular case I end up with 2-3k loci of 350-420 
bp for each sample, analogous to ESTs except the contigs aren’t necessarily 
coding and don’t have a poly-A tail - so basically not very EST like at all...

What I’ve seen so far using the EST mode is quite promising but I need to get 
rid of the 3’ end of the reads in order to minimise the number of IUPAC coded 
bases in my consensus.

To complicate matters, there's some heterozygosity in my organisms (di- and 
possibly tetraploid plants) so I only really want to get rid of IUPAC calls 
stemming from poor quality, not completely eliminate them. As this is 
iontorrent PGM data, the read length varies quite a bit and many contigs have a 
3’ tail consisting of very few (1-4) reads. 



What I’ve done so far is:

#1
project = casfas
job = est,accurate

readgroup = casfas
data = casfas_18.fastq
technology = iontor

PARAMETERS = COMMON_SETTINGS -AS:nop=5 sep=on IONTOR_SETTINGS -AS:mrl=70 mrpc=5

and then some tests with:

#2
project = casfas
job = est,accurate

readgroup = casfas
data = casfas_18.fastq
technology = iontor

PARAMETERS = COMMON_SETTINGS -AS:nop=5 sep=on IONTOR_SETTINGS -AS:mrl=70 mrpc=5 
-CL pec=on cpat=off (and cpat=on too)


and:

#3
project = casfas
job = est,accurate

readgroup = casfas
data = casfas_18.fastq
technology = iontor

PARAMETERS = COMMON_SETTINGS -AS:nop=5 sep=on IONTOR_SETTINGS -AS:mrl=70 mrpc=5 
-CL pec=on cpat=off qc=on

While #2 reduce the number of IUPACs with 50% (or even better for some 
species), #3 is better than #1 but worse than #2. #3 also gave some odd 
increase in number of contigs for my single test run but I have to run a few 
more to see if that’s consistent

So, my questions are:

1) Do you have any suggestions on how to reduce the number of IUPACs due to low 
quality in the 3’ en of reads?
2) Is there a way of telling mira to clip a contig where it goes below a 
certain minimum coverage?
3) And somewhat unrelated - what’s the quality score in the fasta.qual files 
and how is it calculated?

Cheers,
Magnus
--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: