[mira_talk] Re: virus genome assembly

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 22 May 2014 21:13:55 +0200

On 22 May 2014, at 17:25 , Laurent MANCHON <lmanchon@xxxxxxxxxxxxxx> wrote:
> no no sorry, it's a mistake, it was 454 single end.
> and the problem is to detect RNA splicing between 2 conditions.
> I don't know if it is possible to do that with Mira. 
> My first idea was to use first gmap to map the reads onto genome, then run 
> cufflinks on the resulting bam .

Disclaimer: my experience with viral sequences is minimal. I once played around 
a little bit with public data, but that is just about all.

That being said, maybe your problem can be reduced a bit. You wrote detected … 
does that mean “counting” of known variants or “discovery” of new variants?

For counting of known variants, what I’d do would be a simple test by simply 
putting all splicing variants into a reference file and map your reads against 
that (with relatively struct settings). Of course, reads coming from regions 
which are in common with all variants will be evenly distributed, but reads 
from certain splicing variants will automatically be mapped to the variant 
matching the best. Then it’s just a counting problem. You can even “improve” 
that detection by simply taking the splice junction +/- 30 bp as bait for 
“mirabait”. If the 454 data is from late 454 machines, there is some hope that 
not too many sequencing errors would be there and again you’d have a simple 
counting problem.

If it’s for de-novo discovery … hmmm … maybe I’d try a mapping with strict 
settings followed by a de-novo of unmapped reads. At least for Illumina that’s 
work.

B.


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: