[mira_talk] Fwd: Re: no 3'end

  • From: Huang Yi <huang.y.hy@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 5 Oct 2015 14:55:23 +0800

Dear All,

Thanks very much for all your help. This virus is a single-stranded RNA
virus. We didn't do any deletion on 3' end and the raw data was from
host+viral mixture.. I don't know how to set MIRA to assemble a circle
virus or linear virus. I appreciate if any of you could let me know it.

In result folder, there are two unpadded fasta files, one is
Largecontig.fasta and the other is out.fasta. I found here were some short
fragments which can map to 3'UTR. Although there are some short gaps, I
think they are also part of assembled genome, right?

One other question is, when I blast the 5'UTR sequence, the direction of it
was opposite to that of other close relatives. This 5UTR comes from a
Largecontig which contains an ORF with correct direction. It is not a short
fragment. Is there any methods that I can check how mira assembled this
part, further to know if this is the special character of this virus or
something wrong with the data?

Thanks again!

Yi



---------- Forwarded message ----------
From: Martin MOKREJŠ <mmokrejs@xxxxxxxxx>
Date: 2015-10-04 0:51 GMT+08:00
Subject: [mira_talk] Re: no 3'end
To: mira_talk@xxxxxxxxxxxxx


Hi Yi,

Huang Yi wrote:

Hello,

Question again. I am working on a small virus genome now. The data are
illumina reads. When I used denovo assembly, mira quickly made a strain,
which share over 95% nucleotide identities with a reference virus genome.
But that denovo assembled strain didn't contain 3'UTR. If I used reference
assembly, mira gave me a "complete" strain, which is highly similar to
reference (~99%). Many reads can map to reference genome's 3'UTR region
very well, which is around 600nt.


You did not say whether it is a virus with circular or linear genome. Also,
provided Adrian Pelin proposed inspecting SNPs ... well, first of all, is
this an RNA or DNA virus? You speak of 3'-UTR so I guess it is a +RNA
virus. What is the target genome length? What coverage you have in do novo
vs. mapping modes?


I prefer to trust the denovo assembled strain because that virus were
isolated from a different host. It may not as same as reference. But I am
curious that why mira didn't assemble the 3'UTR region? Does it mean my
studied virus didn't have that 600nt long 3'UTR or there is any parameter I
didn't set correctly? Thanks!


Could be mira discarded the ends of reads because they were too close to
adapters. Did you use some custom adapters for e.g. reverse-transcription
of viral RNA? Did you remove them from your data?

I could also imagine it was a virus with a circular genome but assembled in
a linear assembly mode - or mira discarded reads which seemed to map to
both linear ends. You should provide more details.

Martin

--
Martin Mokrejs, Ph.D.
Adapter/artefact removal from datasets based on the following technologies:
454 / IonTorrent / Evrogen MINT / Clontech SMART / ..., Illumina
http://www.bioinformatics.cz/software/supported-protocols/


--
You have received this mail because you are subscribed to the mira_talk
mailing list. For information on how to subscribe or unsubscribe, please
visit http://www.chevreux.org/mira_mailinglists.html

Other related posts: