[mira_talk] Re: caf2phdball

  • From: Sven Klages <sir.svencelot@xxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sun, 25 Oct 2009 18:43:39 +0100

Hi Lionel,

2009/10/25 Lionel Guy <guy.lionel@xxxxxxxxx>

> Hi Sven,
>
> Thanks for your suggestions, I'll implement them soon. About the
> chromatogram names, is it enough to give the name and positions in the phd
> file? Don't you need an actual file? Does it work for


Yes,you do need an actual file, as this should be opened when you request it
in consed.
You need to have some approximate positions for the chromatogram to be
opened correctly.
If you leave all pos at 0, all "peaks" in the chromat are squashed together
and no (normal) editing is possible.



> Sanger reads too (I guess I could link the actual abi files there,
> otherwise)?
>

Link 'em to chromat_dir? It doesn't matter where you put your chromats, as
long they are in chromat_dir or /tmp or consed.uncompressedChromatDirectory
when requested by consed.

We do keep our chromatograms in (indexed) tarballs. No filesystem problem,
very fast :-)


>
> Do the numbers (15, 19) in the calculation of $peakpos come from empirical
> data?
>

Yes, I shamelessly copied it from *sff2phdball*.*c *(from consed v17 AFAIR).

cheers,
Sven


>
> Cheers,
>
> Lionel
>
>
> On 25 Oct 2009, at 17:11 , Sven Klages wrote:
>
>  Hi Lionel,
>>
>> if I find some time I'll test it as well.
>>
>> We have even phd.ball of almost 30G(!), for more or less historical
>> reasons, as consed supports loading more than one phd.ball since v17 AFAIK.
>> We started using phd.balls quite ealier (we also wrote our own predPhrap),
>> because we were not able to  (effeciently) handle 400,000 or more single phd
>> files in a single filesystem ..
>>
>> You should think about distinguishing sanger and 454 data, as for 454 data
>> you probably can
>> omit the follwing tags:
>>
>> CALL_METHOD:
>> QUALITY_LEVELS:
>>
>> I'd also think about adding real chromatogram names to the phd.ball as
>> only this option lets you edit single reads (and thus lets you changing
>> consensus) ...
>>
>> If you do so, you need to calculate the peak positions as well.
>> $peakpos = (++$basepos - 1)*19 + 15;
>>
>> just some thoughts,
>> Sven
>>
>> 2009/10/23 Lionel Guy <guy.lionel@xxxxxxxxx>
>> Hi there,
>>
>> Following my yesterday's message, I changed my original idea and finally
>> parsed the mira-produced caf file to obtain a phd.ball file to be used
>> with consed. The idea behind that is to have qualities associated with
>> reads when editing mira assemblies within consed. This is very important
>> for example when merging/tearing contigs, because the consensus is
>> recalculated in a very, very bad way if you don't have qualities
>> (especially because mira doesn't physically trims the reads from the
>> vector sequences...).
>>
>> The result is a small perl script that works for my data, but I would be
>> glad if others could test it to see if it works with other types of
>> data. All comments are welcome!
>>
>> CAVEAT: this script produces huuuuge files, because it writes one line
>> per base, plus headers. For example, I have 350'000 reads and some long
>> Sanger, and I get a file which is 1.4 Gb...
>>
>> Cheers,
>>
>> Lionel
>>
>>
> ============================================
> Lionel Guy
> Thunmansgatan 25, SE-75421 Uppsala
>
> phone: +46 (0)18 245596
> mobile: +46 (0)73 9760618
> email: guy.lionel@xxxxxxxxx
> ============================================
>
>
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>

Other related posts: