[mira_talk] Re: awkward letters in assembled data

  • From: Peter Cock <p.j.a.cock@xxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 29 Sep 2011 19:39:21 +0100

2011/9/29 Visam Gültekin <teutara@xxxxxxxxxxx>:
> Hi again Peter,
>
> " What does the start of the sequence data look like? "
>
>>gi|31574137|gb|GT5716.1|GT5596
>
> others go sequential..
>
> V.G.

That suggests you have FASTA sequences (are there
matching QUAL files with quality scores?).

The style of name with the vertical bars suggests it is
from the NCBI, but the identifiers don't seem to match
anything sensible.

You probably need to talk to your data provider to try
and get (a lot) more information from them about what
exactly the data is from, what organism, what platform,
what protocol etc. As part of this one good question I've
learn to ask as early as possible is how was the biological
sample prepared - were specific PCR primers used?
If so, you'll need their sequences to remove the PCR
adapters from your reads before assembly.

Peter

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: