[mira_talk] Re: change stringency in Mira mapping assembly

Dear Bastien,

Thanks again for your answer!

On 01/13/2012 08:52 PM, Bastien Chevreux wrote:
On Jan 13, 2012, at 9:30 , Christoph Hahn wrote:
I ran my initial assembly like so:
mira --project=derjavinoidesmtlane8 --job=mapping,genome,accurate,solexa -MI:somrnl=0 -AS:nop=1 -SB:bsn=derjavinoides_mt:bft=gbf:bbq=30 SOLEXA_SETTINGS -CO:msr=no -GE:uti=no:tismin=120:tismax=200 -SB:dsn=derjavinoidesmtlane8

I still don t quite get the issue with the strains for my particular case - I am not trying to map reads from different strains. I am not giving any prior information about strains via a straindata file or anything like that. So, if I am correct, with "-SB:dsn=derjavinoidesmtlane8 " I specify the default strain name for all reads.

The iteration1_default* files just contain @ - that seems logical to me as there is no reads without strain information used. Thats that. The iteration_derjavinoidesmtlane8* files pretty much contain what I was expecting. Stretches of rather conserved sequence parts where I get a sequence and then bits of @ in between where the reference is too divergent. What I am not clear about is why I get iteration1_derjavinoides_mt* files. derjavinoides_mt was just the strain name I specified for the backbone, right?. The sequence in the derjavinoides_mt* files is identical to the reference with the exception that there are some 50 @ added before the actual start of the reference sequence and some 10 @ added after the end of the reference sequence. What is that? The iteration1_Allstrains* file seems to contain a consensus sequence, as expected.

Read overhangs. You should look at the assembly in an assembly viewer/editor.

Imagine you have a backbone which consists of exactly four bases: ACGT. To that you map a read "TTTACGTAAA". The alignment now looks like this:

backbone       ACGT
read        TTTACGTAAA

When MIRA then creates strain specific FASTA, you will get

@@@ACGT@@@

for the strain of the backbone and

TTTACGTAAA

for the strain of your read.
ok. makes sense!

The original file is a standard fastq file. I actually trimmed the reads in the file already quite thouroughly before the initial mapping attempt with mira.

Don't do a trimming yourself, let MIRA do that. It's way better, really (there's a thread somewhere on the mailing list about this, including tests :-)

Ok. Will do!
Then I was wanting to obtained the even further clipped reads from Mira after the first mapping assembly as you suggested with (convert_project -f maf -t FASTQ -C readpool.maf mynewl8data). The obtained fastq file already contains this rails then. Also all following fastq files do. The assembly only ran over one pass. See the stdout of the convert_project below. I don`t know what`s the problem.. Do you have any suggestions what I could try to get rid of this rails?

Via a detour: make a simple list of the reads you want to keep and use "convert_project -n"

Tried that. Created a file called list.txt containing a list of reads like this:
@PCUS-319-EAS487_0005_FC:8:1:3863:1082#0/1
@PCUS-319-EAS487_0005_FC:8:1:3863:1082#0/2
@PCUS-319-EAS487_0005_FC:8:1:4858:1080#0/1
@PCUS-319-EAS487_0005_FC:8:1:5224:1081#0/1
@PCUS-319-EAS487_0005_FC:8:1:5224:1081#0/2
@PCUS-319-EAS487_0005_FC:8:1:6053:1078#0/1
@PCUS-319-EAS487_0005_FC:8:1:6053:1078#0/2
@PCUS-319-EAS487_0005_FC:8:1:7388:1078#0/1
@PCUS-319-EAS487_0005_FC:8:1:7388:1078#0/2
@PCUS-319-EAS487_0005_FC:8:1:7439:1081#0/1

then ran convert_project -f maf -t fastq -n list.txt readpool.maf test

the file test.fastq is empty..

I also tried to extract the list from a regular fastq file: convert_project -f fastq -n list.txt testdata.fastq test

again, the test.fastq file stays empty. See stdout below. What do I do wrong?

Loading from fastq, saving to: fastq
Loading data from FASTQ ...Localtime: Mon Jan 16 23:28:01 2012
Counting sequences in FASTQ file: found 25 sequences.
Localtime: Mon Jan 16 23:28:01 2012
Using calculated FASTQ quality offset: 95
Localtime: Mon Jan 16 23:28:01 2012
Loading data from FASTQ file:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]

Done.
Loaded 25 reads, Localtime: Mon Jan 16 23:28:01 2012
 done.

Data conversion process finished, no obvious errors encountered.
SC Read:: read name issorted (1) capacity 4294967295(4) size 1
SC Read:: scf path name issorted (1) capacity 255(1) size 1
SC Read:: exp path name issorted (1) capacity 255(1) size 1
SC Read:: machine type issorted (1) capacity 255(1) size 1
SC Read:: primer issorted (1) capacity 255(1) size 1
SC Read:: strain issorted (1) capacity 255(1) size 1
SC Read:: base caller issorted (1) capacity 255(1) size 1
SC Read:: dye issorted (1) capacity 255(1) size 1
SC Read:: process status issorted (1) capacity 255(1) size 1
SC Read:: clone vector name issorted (1) capacity 65535(2) size 1
SC Read:: sequencing vector name issorted (1) capacity 65535(2) size 1
SC asped issorted (1) capacity 4294967295(4) size 1

cheers,
Christoph

Other related posts: