[mira_talk] Re: Settings for ddRAD data using EST mode

  • From: Magnus Popp <magnus.popp@xxxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Wed, 8 Oct 2014 16:01:20 +0000

Hi and thanks for the reply!

Concerning your examples below, I’m OK with the first case as I would interpret 
an IUPAC "R" as a result of merging what probably are two alleles, but ideally 
mira would give me two contigs. I see that wasn’t what I wrote in my question, 
but that’s what I meant, sorry…

In second example the preferred result would be that the contig (or perhaps the 
read) is clipped when the quality drops that low.

I will align each RAD loci/“EST” across a handfull of fairly closely related 
species and although the errors (be it a single base or a IUPAC base) most of 
the time only will result in apomorphic characters, they will mess up the 
diversity and branch length estimates in my phylogenetic analyses. So I rather 
have somewhat shorter RAD loci/ESTs with ± reliable data than longer ones with 
shaky 3’ ends at this stage.

Unless you suggest otherwise, I’ll probably use miraconvert and set -q to 
something that works with my data and then simply truncate at the first N or 
perhaps just merge the fasta and qual files and use a suitable cut off in fastq 
quality trimmer.

Thanks again!
Cheers,
MAgnus


On 8 Oct 2014, at 13:41, Bastien Chevreux 
<bach@xxxxxxxxxxxx<mailto:bach@xxxxxxxxxxxx>> wrote:

On October 6, 2014 at 6:48 PM Magnus Popp 
<magnus.popp@xxxxxxxxxx<mailto:magnus.popp@xxxxxxxxxx>> wrote:
To complicate matters, there's some heterozygosity in my organisms (di- and
possibly tetraploid plants)
so I only really want to get rid of IUPAC calls stemming from poor quality,
not completely eliminate them.
As this is iontorrent PGM data, the read length varies quite a bit and many
contigs have a 3’ tail consisting of very few (1-4) reads.

MIRA usually tries to get you unambiguous calls and only falls back to IUPAC if
that fails. From what you wrote you would hope for MIRA to give you

 - e.g.: an IUPAC in case it needs to decide between an A at qual 60 at a G at
qual 61
 - e.g.: a single base G in case it needs to decide between an A at qual 6 at a
G at qual 7

Am I summarising correctly? If yes: can you tell me why you think that this is a
good idea? I do have some trouble at grasping the reasoning for this.

1) Do you have any suggestions on how to reduce the number of IUPACs due to
low quality in the 3’ en of reads?

Clip the 3' a bit harder? Not ideal, I know.

2) Is there a way of telling mira to clip a contig where it goes below a
certain minimum coverage?

No. I think there was a functionality in miraconvert which would N out consensus
on the coverage being below a given level, but atm I see that only wrt to
quality (see -q parameter). I'm not sure why I dropped the version with coverage
... need to check.

3) And somewhat unrelated - what’s the quality score in the fasta.qual files
and how is it calculated?

//www.freelists.org/post/mira_talk/Quality-Values,4

HTH,
  B.

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

__________________________________________
Magnus Popp

Natural History Museum
University of Oslo
P.O. Box 1172 Blindern
NO-0318 Oslo, Norway
Phone: +47 22851875
Fax: +47 22851835

Visiting address: Office 112C, Botanical Museum, Sars gate 1, Tøyen

www.nhm.uio.no/english/about/organization/research-collections/people/magnuspo/index.html<http://www.nhm.uio.no/english/about/organization/research-collections/people/magnuspo/index.html>
www.forbio.uio.no/
__________________________________________




Other related posts: