Dear Bastien, Thanks again for your answer! On 01/13/2012 08:52 PM, Bastien Chevreux wrote:
On Jan 13, 2012, at 9:30 , Christoph Hahn wrote:I ran my initial assembly like so:mira --project=derjavinoidesmtlane8 --job=mapping,genome,accurate,solexa -MI:somrnl=0 -AS:nop=1 -SB:bsn=derjavinoides_mt:bft=gbf:bbq=30 SOLEXA_SETTINGS -CO:msr=no -GE:uti=no:tismin=120:tismax=200 -SB:dsn=derjavinoidesmtlane8I still don t quite get the issue with the strains for my particular case - I am not trying to map reads from different strains. I am not giving any prior information about strains via a straindata file or anything like that. So, if I am correct, with "-SB:dsn=derjavinoidesmtlane8 " I specify the default strain name for all reads.The iteration1_default* files just contain @ - that seems logical to me as there is no reads without strain information used. Thats that. The iteration_derjavinoidesmtlane8* files pretty much contain what I was expecting. Stretches of rather conserved sequence parts where I get a sequence and then bits of @ in between where the reference is too divergent. What I am not clear about is why I get iteration1_derjavinoides_mt* files. derjavinoides_mt was just the strain name I specified for the backbone, right?. The sequence in the derjavinoides_mt* files is identical to the reference with the exception that there are some 50 @ added before the actual start of the reference sequence and some 10 @ added after the end of the reference sequence. What is that? The iteration1_Allstrains* file seems to contain a consensus sequence, as expected.Read overhangs. You should look at the assembly in an assembly viewer/editor.Imagine you have a backbone which consists of exactly four bases: ACGT. To that you map a read "TTTACGTAAA". The alignment now looks like this:backbone ACGT read TTTACGTAAA When MIRA then creates strain specific FASTA, you will get @@@ACGT@@@ for the strain of the backbone and TTTACGTAAA for the strain of your read.
ok. makes sense!
The original file is a standard fastq file. I actually trimmed the reads in the file already quite thouroughly before the initial mapping attempt with mira.Don't do a trimming yourself, let MIRA do that. It's way better, really (there's a thread somewhere on the mailing list about this, including tests :-)
Ok. Will do!
Tried that. Created a file called list.txt containing a list of reads like this:Then I was wanting to obtained the even further clipped reads from Mira after the first mapping assembly as you suggested with (convert_project -f maf -t FASTQ -C readpool.maf mynewl8data). The obtained fastq file already contains this rails then. Also all following fastq files do. The assembly only ran over one pass. See the stdout of the convert_project below. I don`t know what`s the problem.. Do you have any suggestions what I could try to get rid of this rails?Via a detour: make a simple list of the reads you want to keep and use "convert_project -n"
@PCUS-319-EAS487_0005_FC:8:1:3863:1082#0/1 @PCUS-319-EAS487_0005_FC:8:1:3863:1082#0/2 @PCUS-319-EAS487_0005_FC:8:1:4858:1080#0/1 @PCUS-319-EAS487_0005_FC:8:1:5224:1081#0/1 @PCUS-319-EAS487_0005_FC:8:1:5224:1081#0/2 @PCUS-319-EAS487_0005_FC:8:1:6053:1078#0/1 @PCUS-319-EAS487_0005_FC:8:1:6053:1078#0/2 @PCUS-319-EAS487_0005_FC:8:1:7388:1078#0/1 @PCUS-319-EAS487_0005_FC:8:1:7388:1078#0/2 @PCUS-319-EAS487_0005_FC:8:1:7439:1081#0/1 then ran convert_project -f maf -t fastq -n list.txt readpool.maf test the file test.fastq is empty..I also tried to extract the list from a regular fastq file: convert_project -f fastq -n list.txt testdata.fastq test
again, the test.fastq file stays empty. See stdout below. What do I do wrong?
Loading from fastq, saving to: fastq Loading data from FASTQ ...Localtime: Mon Jan 16 23:28:01 2012 Counting sequences in FASTQ file: found 25 sequences. Localtime: Mon Jan 16 23:28:01 2012 Using calculated FASTQ quality offset: 95 Localtime: Mon Jan 16 23:28:01 2012 Loading data from FASTQ file:[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Done. Loaded 25 reads, Localtime: Mon Jan 16 23:28:01 2012 done. Data conversion process finished, no obvious errors encountered. SC Read:: read name issorted (1) capacity 4294967295(4) size 1 SC Read:: scf path name issorted (1) capacity 255(1) size 1 SC Read:: exp path name issorted (1) capacity 255(1) size 1 SC Read:: machine type issorted (1) capacity 255(1) size 1 SC Read:: primer issorted (1) capacity 255(1) size 1 SC Read:: strain issorted (1) capacity 255(1) size 1 SC Read:: base caller issorted (1) capacity 255(1) size 1 SC Read:: dye issorted (1) capacity 255(1) size 1 SC Read:: process status issorted (1) capacity 255(1) size 1 SC Read:: clone vector name issorted (1) capacity 65535(2) size 1 SC Read:: sequencing vector name issorted (1) capacity 65535(2) size 1 SC asped issorted (1) capacity 4294967295(4) size 1 cheers, Christoph