[mira_talk] Re: Fwd: Re: no 3'end

  • From: Martin MOKREJŠ <mmokrejs@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 7 Oct 2015 23:01:01 +0200

Hi Yi,

Huang Yi wrote:

Dear All,

Thanks very much for all your help. This virus is a single-stranded RNA virus.
We didn't do any deletion on 3' end and the raw data was from host+viral
mixture.. I don't know how to set MIRA to assemble a circle virus or linear
virus. I appreciate if any of you could let me know it.

You still did not asnwer but I presume a single-stranded +RNA virus. So, very high
mutation rate "by definition" and you sequenced a population of many slightly
different viruses. I wouldn't be surprised mira was cautious and in a de novo assembly
mode did not assemble the ends of the virus.

Does the virus contain say an inverted repeat in the 5'-UTR region (so at the
genomic 5'-end)? Check with a virologist what kind of structure you should
observe. It could be mira inverted the repeat region.

How about 3'-end, it was also "lost" in de novo asse,bly mode, you said? Again, how about
an inverted repeat region the genome. That is difficult for the assembler, and, look for a
"pan-hadle" structure in a book about virology. ;-) The repeats at the 5'- and 3'-end
could basepair, bingo. ;-)


In result folder, there are two unpadded fasta files, one is Largecontig.fasta
and the other is out.fasta. I found here were some short fragments which can
map to 3'UTR. Although there are some short gaps, I think they are also part of
assembled genome, right?

Don't know.


One other question is, when I blast the 5'UTR sequence, the direction of it was
opposite to that of other close relatives. This 5UTR comes from a Largecontig
which contains an ORF with correct direction. It is not a short fragment. Is
there any methods that I can check how mira assembled this part, further to
know if this is the special character of this virus or something wrong with the
data?

You mean the contig covering the 5'-UTR is in reverse-complementary orientation
while another contig with ORF is a plus? Yuou will have to orient the contigs
yourself if you did not have a paired-end data or there were too few pairs
usable.

Or do you mean that the very same contig (no N's in between? -- aka scaffold?)
had a portion in wrong orientation? Then try to show which paired reads support
the expected orientation.

You also did not answer what coverage you have in the UTR's and which you have
along the CDS (and how do they compare to coverages in mapping mode).

Martin


Thanks again!

Yi



---------- Forwarded message ----------
From: *Martin MOKREJŠ* <mmokrejs@xxxxxxxxx <mailto:mmokrejs@xxxxxxxxx>>
Date: 2015-10-04 0:51 GMT+08:00
Subject: [mira_talk] Re: no 3'end
To: mira_talk@xxxxxxxxxxxxx <mailto:mira_talk@xxxxxxxxxxxxx>


Hi Yi,

Huang Yi wrote:

Hello,

Question again. I am working on a small virus genome now. The data are illumina
reads. When I used denovo assembly, mira quickly made a strain, which share over 95%
nucleotide identities with a reference virus genome. But that denovo assembled strain
didn't contain 3'UTR. If I used reference assembly, mira gave me a "complete"
strain, which is highly similar to reference (~99%). Many reads can map to reference
genome's 3'UTR region very well, which is around 600nt.


You did not say whether it is a virus with circular or linear genome. Also,
provided Adrian Pelin proposed inspecting SNPs ... well, first of all, is this
an RNA or DNA virus? You speak of 3'-UTR so I guess it is a +RNA virus. What is
the target genome length? What coverage you have in do novo vs. mapping modes?


I prefer to trust the denovo assembled strain because that virus were
isolated from a different host. It may not as same as reference. But I am
curious that why mira didn't assemble the 3'UTR region? Does it mean my studied
virus didn't have that 600nt long 3'UTR or there is any parameter I didn't set
correctly? Thanks!


Could be mira discarded the ends of reads because they were too close to
adapters. Did you use some custom adapters for e.g. reverse-transcription of
viral RNA? Did you remove them from your data?

I could also imagine it was a virus with a circular genome but assembled in a
linear assembly mode - or mira discarded reads which seemed to map to both
linear ends. You should provide more details.

Martin

--
Martin Mokrejs, Ph.D.
Adapter/artefact removal from datasets based on the following technologies:
454 / IonTorrent / Evrogen MINT / Clontech SMART / ..., Illumina
http://www.bioinformatics.cz/software/supported-protocols/

--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html

Other related posts: