[mira_talk] Re: Questions on consensus sequence vs .TCS file

From: Clayton Coffman <clayton.coffman@xxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Wed, 19 Feb 2014 20:14:55 -0600
UNSUBSCRIBE


On Wed, Feb 19, 2014 at 8:04 PM, yongmei <yongmei@xxxxxx> wrote:

> Thanks, Francisco. I will try the ACE file and to see if there is any
> difference.
> I used text files converted from both .caf and .maf files, and got the
> same results.
> Thank you.
>
> Yongmei
> ________________________________________
> From: mira_talk-bounce@xxxxxxxxxxxxx [mira_talk-bounce@xxxxxxxxxxxxx] On
> Behalf Of Francisco Pina Martins [f.pinamartins@xxxxxxxxx]
> Sent: Thursday, February 20, 2014 12:57 AM
> To: mira_talk@xxxxxxxxxxxxx
> Subject: [mira_talk] Re: Questions on consensus sequence vs .TCS file
>
> Ok, so i had a chance to take a look at the files you've sent me via
> dropbox.
>
> Here's what I got.
>
> The "contig" bases in the TCS file generated by CAF_2_TCS.py are
> inconsistent with the bases from the reads. This is not due to a
> counting error in the script, but rather the "contig" bases in the CAF
> file seem to be different from what is represented in the reads.
>
> This can be checked by running "miraconvert" to convert the CAF to ACE
> and visualizing the results in a program like tablet.
>
> This seems to indicate a problem with the generation of the CAF file in
> mira.
>
> I wish I could figure out what is going on with the CAF format, but my
> skills in C are very close to 0.
>
> I will also try to compare the CAF file with a MAF file from mira and
> see what I get from there. But it seems to me that the result will be
> the same.
>
> Cheers,
>
> Francisco
>
> On 02/10/2014 08:48 AM, yongmei wrote:
> > Hi, Francisco,
> > I sent you a dropbox link of a .caf file and a .tcs file that generated
> by your CAF_2_TCS.py end of January.
> > I am just wondering whether you have time to have a look.
> >
> > Thanks.
> >
> > Yongmei
> >
> > ________________________________________
> > From: mira_talk-bounce@xxxxxxxxxxxxx [mira_talk-bounce@xxxxxxxxxxxxx]
> On Behalf Of Francisco Pina Martins [f.pinamartins@xxxxxxxxx]
> > Sent: Friday, January 24, 2014 1:09 AM
> > To: mira_talk@xxxxxxxxxxxxx
> > Subject: [mira_talk] Re: Questions on consensus sequence vs .TCS file
> >
> > Ok, so I've checked things, and here's what I've gotten:
> >
> > CAF_2_TCS writes the "consensus base" directly from the CAF file contig
> > data.
> > This means it is not likely a bug in CAF_2_TCS.py in the sums of the
> > number of bases.
> > I'm inclined to think it might be a bug in the way the CAF file contig
> > is generated. However, I would like to confirm this, as it might
> > eventually be a bug in the way CAF_2_TCS.py considers the positions of
> > the bases.
> > Can I have a (small) example CAF file where this occurs please? Just so
> > I can see exactly what is happening and where.
> > If the file is very large (judging by the coverage values it must be,
> > even if it contains only one contig), just PM me something like a
> > dropbox link instead of sending an email attachment.
> >
> > Thanks,
> >
> > Francisco
> >
> >
> > On 23/01/14 08:54, yongmei wrote:
> >> Thanks for your email.
> >>
> >>>> I am very confused with the result.
> >>>> Below is a part of the .tcs file converted from the mira output .caf
> file using CAF_2_TCS.py
> >>>> [...]
> >>> Ummmm, CAF_2_TCS.py is nothing I wrote. Who's the author, have you
> tried to contact him?
> >>> What does MIRA tell you in its result files (FASTA) what the seemingly
> wrong bases are? Are these correct? If yes, this would really be a strong
> indicator >for a bug in the py script and not in MIRA.
> >> The result fasta file in the mira results folder for these bases are
> not correct either. They are the same as it in the TCS.
> >> I wrote my own R program to parse the .caf file from mira's output, and
> got the same information as CAF_2_TCS.py.
> >>
> >>>> I also tried to convert the fastq to fasta file and set the
> default_qual = 50 and use the fasta file to do the same mira assembly,
> >>>> and I got the perfect results.
> >> I mean the .fasta in the mira output folder looks perfect, so as the
> TCS file and my own results.
> >>
> >> Since we know what our sample is, it should be very similar to the
> reference (maybe with a couple of mutations in every 1kb).
> >> When we use the fastq to do the assembly, the result shows lots of
> mutations. And when I checked the .caf file use CAF_2_TCS.py
> >> or my own R program, and I found that many of the "Mutations" actually
> are not mutations, for example, for a base, there are
> >> more than 20k "A", and only less than 100 "C","T","G" and "*", I
> expected the result for this base to be  "A", however, the result file shows
> >> a "G" for this base. And we had quite a lot this kind of cases.
> >> However, if I use the fasta file to run mira, we do not have this kind
> of problems at all.
> >> So I am wonder whether there is some problem with our .fastq file or
> something else.
> >>
> >> Thank you very much for your help.
> >> Best wishes,
> >> Yongmei
> >> ________________________________________
> >> From: mira_talk-bounce@xxxxxxxxxxxxx [mira_talk-bounce@xxxxxxxxxxxxx]
> On Behalf Of Bastien Chevreux [bach@xxxxxxxxxxxx]
> >> Sent: Thursday, January 23, 2014 3:31 PM
> >> To: mira_talk@xxxxxxxxxxxxx
> >> Subject: [mira_talk] Re: Questions on consensus sequence vs .TCS file
> >>
> >> On 23 Jan 2014, at 3:41 , yongmei <yongmei@xxxxxx> wrote:
> >>> I am very confused with the result.
> >>> Below is a part of the .tcs file converted from the mira output .caf
> file using CAF_2_TCS.py
> >>> [...]
> >> Ummmm, CAF_2_TCS.py is nothing I wrote. Who's the author, have you
> tried to contact him?
> >>
> >> What does MIRA tell you in its result files (FASTA) what the seemingly
> wrong bases are? Are these correct? If yes, this would really be a strong
> indicator for a bug in the py script and not in MIRA.
> >>
> >>> I also tried to convert the fastq to fasta file and set the
> default_qual = 50 and use the fasta file to do the same mira assembly,
> >>> and I got the perfect results.
> >> I'm not sure if I understood your last sentence correctly. What result
> is perfect? The TCS?
> >>
> >> B.
> >>
> >>
> >> --
> >> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
> >
> > --
> > You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
> >
>
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>
References:
- [mira_talk] Re: Questions on consensus sequence vs .TCS file
  - From: yongmei
- [mira_talk] Re: Questions on consensus sequence vs .TCS file
  - From: Francisco Pina Martins
- [mira_talk] Re: Questions on consensus sequence vs .TCS file
  - From: yongmei
[mira_talk] Re: Questions on consensus sequence vs .TCS file

Other related posts: