[mira_talk] Re: Debris list classification

  • From: Hélène Boulain <helene.boulain@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 1 Apr 2015 11:01:21 +0200

My Sanger data, comes from Genbank. So they are assembled ESTs normally.
Thank for your help, I will run MIRA with new options and I will also keep
NO_OVERLAP, so I will compare which genes I can identify in both.

Cheers

Helene

2015-04-01 4:02 GMT+02:00 Chris Hoefler <hoeflerb@xxxxxxxxx>:

So your Sanger data are reads, not assembled ESTs, correct?

I think especially with the 454 data, you want to use -AS:mrpc=1, but not
-OUT:sssip. Mira should give you everything that is significant. If you
really want to be sure, grab the NO_OVERLAP reads and keep those separate
instead of using -OUT:sssip. I suspect they will be mostly junk or chimeras.

The advantage of contigs+singlets vs contigs+debris is that Mira will tell
you why it made a singlet (by using tags) which can help you identify
variants.



On Mar 31, 2015, at 10:35 AM, Hélène Boulain <helene.boulain@xxxxxxxxx>
wrote:

Thank you for your help !

I'm trying to do an hybrid assembly between 454 (with qual) and old
published EST (without qual). My aim is just that to identify which genes
are expressed in my specific tissue. I 'm trying to make an hybrid assembly
because genes identified from both kinds of data show few overlap. So I'm
testing this way and learning how to use MIRA.

I had already extracted no-overlapping reads based on debris list. I will
keep them, and I will also try to run MIRA with -AS:mrpc=1 and
-OUT:sssip=yes.

Does it make a difference if I search my expressed genes with contigs
(contigs+singlets with options AS:mrpc=1 and -OUT:sssip=yes) or with
contigs + no_overlapping reads extracted from the debris list ?
(I have a reference genome and I just want create a confident list of
expressed genes in order to find candidate for functional analysis)


Thank you


Helene


2015-03-31 17:02 GMT+02:00 Chris Hoefler <hoeflerb@xxxxxxxxx>:

I'm not quite sure I understand what you are trying to do. Is it a hybrid
EST assembly with 454 and Sanger data, or 454 data mapped to a Sanger EST
reference, or something else...?

The answer to your question, I think, is that you need to set -AS:mrpc=1
(minimum_reads_per_contig)
to have Mira save "significant singlets" in the result file as contig
singlets. You can get a few more if you also set -OUT:sssip=yes
(savesimplesingletsinproject),
but beware because not every non-overlapping read is necessarily a genuine
singlet (ie: a rare transcript/splice variant).

The other tags in the debris file should be more or less
self-explanatory. There are a lot of clipping and normalization routines
that Mira uses, so when it throws reads out during these steps it usually
tells you why in the debris file. You definitely do not want to use these
as singlets.



On Tue, Mar 31, 2015 at 1:55 AM, Hélène Boulain <helene.boulain@xxxxxxxxx
wrote:


Hello,

I'm assembling 454 data with EST data with Mira 4 and I would like to
know where are my singletons in the debris_list. I searched in the manual
but I didn't find details about debris list classification.

For example, I parsed debris from 454 and EST Sanger and I obtained that
:

454 debris:

1 CLIP_KNOWNADAPTORRIGHT
11 CLIP_LOWERCASEFRONT
2 CLIP_MASKEDBASES
466 CLIP_POLYAT
106251 DIGITAL_NORMALISATION
26834 NO_OVERLAP
6834 SHORTONLOAD
14188 TINY_CLUSTER_ORPHAN
288 TINY_CONTIG

and EST debris:

133 CLIP_POLYAT
3537 DIGITAL_NORMALISATION
1129 NO_OVERLAP
214 SHORTONLOAD
538 TINY_CLUSTER_ORPHAN
30 TINY_CONTIG

I would like keep singletons (From Sanger EST in priority) to find my
expressed genes after. Should I take only NO_OVERLAP or others ? Which ?
Could you please explain me the debris list classification ?

Thank you very much

Cheers

Helene Boulain




--
*Hélène Boulain - PhD student*
UMR INRA 1349 IGEPP (Institut de Génétique, Environnement et Protection
des Plantes)
Equipe Ecologie et Génétique des Insectes
Bâtiment 320, Domaine de la Motte BP 35327
35653 Le Rheu Cedex France
helene.boulain@xxxxxxxxx




--
*Hélène Boulain - PhD student*
UMR INRA 1349 IGEPP (Institut de Génétique, Environnement et Protection des
Plantes)
Equipe Ecologie et Génétique des Insectes
Bâtiment 320, Domaine de la Motte BP 35327
35653 Le Rheu Cedex France
helene.boulain@xxxxxxxxx

Other related posts: