[mira_talk] Re: Questions on consensus sequence vs .TCS file

From: yongmei <yongmei@xxxxxx>
To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
Date: Thu, 20 Feb 2014 10:04:09 +0800
Thanks, Francisco. I will try the ACE file and to see if there is any 
difference. 
I used text files converted from both .caf and .maf files, and got the same 
results. 
Thank you. 

Yongmei 
________________________________________
From: mira_talk-bounce@xxxxxxxxxxxxx [mira_talk-bounce@xxxxxxxxxxxxx] On Behalf 
Of Francisco Pina Martins [f.pinamartins@xxxxxxxxx]
Sent: Thursday, February 20, 2014 12:57 AM
To: mira_talk@xxxxxxxxxxxxx
Subject: [mira_talk] Re: Questions on consensus sequence vs .TCS file

Ok, so i had a chance to take a look at the files you've sent me via
dropbox.

Here's what I got.

The "contig" bases in the TCS file generated by CAF_2_TCS.py are
inconsistent with the bases from the reads. This is not due to a
counting error in the script, but rather the "contig" bases in the CAF
file seem to be different from what is represented in the reads.

This can be checked by running "miraconvert" to convert the CAF to ACE
and visualizing the results in a program like tablet.

This seems to indicate a problem with the generation of the CAF file in
mira.

I wish I could figure out what is going on with the CAF format, but my
skills in C are very close to 0.

I will also try to compare the CAF file with a MAF file from mira and
see what I get from there. But it seems to me that the result will be
the same.

Cheers,

Francisco

On 02/10/2014 08:48 AM, yongmei wrote:
> Hi, Francisco,
> I sent you a dropbox link of a .caf file and a .tcs file that generated by 
> your CAF_2_TCS.py end of January.
> I am just wondering whether you have time to have a look.
>
> Thanks.
>
> Yongmei
>
> ________________________________________
> From: mira_talk-bounce@xxxxxxxxxxxxx [mira_talk-bounce@xxxxxxxxxxxxx] On 
> Behalf Of Francisco Pina Martins [f.pinamartins@xxxxxxxxx]
> Sent: Friday, January 24, 2014 1:09 AM
> To: mira_talk@xxxxxxxxxxxxx
> Subject: [mira_talk] Re: Questions on consensus sequence vs .TCS file
>
> Ok, so I've checked things, and here's what I've gotten:
>
> CAF_2_TCS writes the "consensus base" directly from the CAF file contig
> data.
> This means it is not likely a bug in CAF_2_TCS.py in the sums of the
> number of bases.
> I'm inclined to think it might be a bug in the way the CAF file contig
> is generated. However, I would like to confirm this, as it might
> eventually be a bug in the way CAF_2_TCS.py considers the positions of
> the bases.
> Can I have a (small) example CAF file where this occurs please? Just so
> I can see exactly what is happening and where.
> If the file is very large (judging by the coverage values it must be,
> even if it contains only one contig), just PM me something like a
> dropbox link instead of sending an email attachment.
>
> Thanks,
>
> Francisco
>
>
> On 23/01/14 08:54, yongmei wrote:
>> Thanks for your email.
>>
>>>> I am very confused with the result.
>>>> Below is a part of the .tcs file converted from the mira output .caf file 
>>>> using CAF_2_TCS.py
>>>> […]
>>> Ummmm, CAF_2_TCS.py is nothing I wrote. Who’s the author, have you tried to 
>>> contact him?
>>> What does MIRA tell you in its result files (FASTA) what the seemingly 
>>> wrong bases are? Are these correct? If yes, this would really be a strong 
>>> indicator >for a bug in the py script and not in MIRA.
>> The result fasta file in the mira results folder for these bases are not 
>> correct either. They are the same as it in the TCS.
>> I wrote my own R program to parse the .caf file from mira's output, and got 
>> the same information as CAF_2_TCS.py.
>>
>>>> I also tried to convert the fastq to fasta file and set the default_qual = 
>>>> 50 and use the fasta file to do the same mira assembly,
>>>> and I got the perfect results.
>> I mean the .fasta in the mira output folder looks perfect, so as the TCS 
>> file and my own results.
>>
>> Since we know what our sample is, it should be very similar to the reference 
>> (maybe with a couple of mutations in every 1kb).
>> When we use the fastq to do the assembly, the result shows lots of 
>> mutations. And when I checked the .caf file use CAF_2_TCS.py
>> or my own R program, and I found that many of the "Mutations" actually are 
>> not mutations, for example, for a base, there are
>> more than 20k "A", and only less than 100 "C","T","G" and "*", I expected 
>> the result for this base to be  "A", however, the result file shows
>> a "G" for this base. And we had quite a lot this kind of cases.
>> However, if I use the fasta file to run mira, we do not have this kind of 
>> problems at all.
>> So I am wonder whether there is some problem with our .fastq file or 
>> something else.
>>
>> Thank you very much for your help.
>> Best wishes,
>> Yongmei
>> ________________________________________
>> From: mira_talk-bounce@xxxxxxxxxxxxx [mira_talk-bounce@xxxxxxxxxxxxx] On 
>> Behalf Of Bastien Chevreux [bach@xxxxxxxxxxxx]
>> Sent: Thursday, January 23, 2014 3:31 PM
>> To: mira_talk@xxxxxxxxxxxxx
>> Subject: [mira_talk] Re: Questions on consensus sequence vs .TCS file
>>
>> On 23 Jan 2014, at 3:41 , yongmei <yongmei@xxxxxx> wrote:
>>> I am very confused with the result.
>>> Below is a part of the .tcs file converted from the mira output .caf file 
>>> using CAF_2_TCS.py
>>> […]
>> Ummmm, CAF_2_TCS.py is nothing I wrote. Who’s the author, have you tried to 
>> contact him?
>>
>> What does MIRA tell you in its result files (FASTA) what the seemingly wrong 
>> bases are? Are these correct? If yes, this would really be a strong 
>> indicator for a bug in the py script and not in MIRA.
>>
>>> I also tried to convert the fastq to fasta file and set the default_qual = 
>>> 50 and use the fasta file to do the same mira assembly,
>>> and I got the perfect results.
>> I’m not sure if I understood your last sentence correctly. What result is 
>> perfect? The TCS?
>>
>> B.
>>
>>
>> --
>> You have received this mail because you are subscribed to the mira_talk 
>> mailing list. For information on how to subscribe or unsubscribe, please 
>> visit http://www.chevreux.org/mira_mailinglists.html
>
> --
> You have received this mail because you are subscribed to the mira_talk 
> mailing list. For information on how to subscribe or unsubscribe, please 
> visit http://www.chevreux.org/mira_mailinglists.html
>


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html
Follow-Ups:
- [mira_talk] Re: Questions on consensus sequence vs .TCS file
  - From: Clayton Coffman
References:
- [mira_talk] Re: Questions on consensus sequence vs .TCS file
  - From: yongmei
- [mira_talk] Re: Questions on consensus sequence vs .TCS file
  - From: Francisco Pina Martins
[mira_talk] Re: Questions on consensus sequence vs .TCS file

Other related posts: