[mira_talk] Re: Unusual mira usage inquiry
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Fri, 29 Jun 2012 00:23:32 +0200
On Jun 28, 2012, at 15:34 , Nick Hathaway wrote:
> My lab is trying to use mira for something a little unusual. We are trying
> to see what different strains of plasmodium falciparum are in a sample. We
> used PCR to amplify one gene about 300 bp in length and used 454 sequencing
> to create non-paired end reads. We want to use mira in a preliminary step to
> see if we can simply get out the different strains by being very strict with
> parameters to see if we can form some contigs of the different strains. Do
> you have any suggestions for parameters we could play around with to do this.
> Also I'm brand new at this kind of work so sorry if I'm not being clear.
Let's see. With 300bp, most of your 454 sequences should cover the gene
completely, which is good.
You probably do not want to play too much around with parameters in the first
time: per default, MIRA is already very sensitive and will pick up even low
abundant variants as soon as these reach a certain threshold (see -CO:mrpg for
this). What I did not quite in your question was: "very strict" parameters.
What are you looking for? Is "very strict" equal to "I want to find even the
lowest abundant variants" or is it rather "don't care too much about variants,
I want to get a full gene"?
There are several approaches you could take, depending on what you have and
what you want. I really cannot give you a recipe, just hints, because much of
what needs to be done is determined by what the data looks like.
Do you already have the basic sequence of the gene? If yes, then performing a
simple mapping with, say, 1000 to 2000 random reads should give you an overview
on what you could expect. Just for getting to know the data. I quickly googled
Plasmodium, it may have splicing, right? Mapping with just 1k to 2k reads will
tell you whether or not you really have to account for that. It will also tell
you about the most frequent variations (SNPs, small indels) you might need to
take into account. Again, 1k or 2k reads will give you a broad overview.
Assuming you have SFF files, just follow this basic guide:
- extracting data to more manageable format, see
http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html#sect_454_preparing_the_454_data_for_mira
and the section after.
- extract the first 1000 or 2000 sequences from that FASTQ, name this, say,
testgene_in.454.fastq
- copy the sequence of your gene of interest as FASTA to your project directory
and name it, say, testgene_backbone_in.fasta
- start a mapping with your data, a bit like this:
mira
--project=testgene --job=mapping,genome,accurate,454
-AS:nop=1
-SB:bsn=MyReferenceGene:bft=gbf:bbq=30
454_SETTINGS
-SB:ads=yes:dsn=MyUnknownGenes
>&log_assembly.txt
Convert the result to gap4 or gap5 and have a look at it. Search for the
markers MIRA will have set (SROc and MCVc tags in the assembly). Increase
-CO:mrpg if the SNPs marked were too sensitive. Once the first analysis done,
tackle the rest of the reads.
In case you do not have the gene already as reference, well, then doing a
de-novo assembly in EST mode with 1000 to 2000 reads should give you a pretty
good idea on what to expect. Start MIRA like this:
mira
--project=testgene --job=denovo,est,accurate,454
>&log_assembly.txt
then convert the result to gap4 or gap5 and have a look at it in the contig
editor. Using the contig merge functions of the editor you will get an idea on
what mutations caused the reads to be put into different contigs. I would try
to reconstruct a canonical gene, probably the version with the most abundant
variants. Using this as reference, then continue with a mapping approach as
described above. In case there are wildly different splice variants, do this
for several "canonical genes"
hth
B.
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: