Hi all, The SFF file format allows for two sets of left/right clipping points - quality based and adapter based: http://eutils.ncbi.nih.gov/Traces/trace.fcgi?cmd=show&f=formats&m=main&s=formats In practice, SFF files straight from the Roche 454 sequencer always seem to have just quality based trimming (the adapter clipping entries are zero, meaning no trimming). Perhaps some pipelines add adapter (or vector or barcode) based trimming values to the SFF file - but I suspect they are generally left unused (by Roche), and the quality clipping values serve double duty. After all, the left "quality" clipping point always seems to account for the tcag key sequence at the start of every 454 read. Furthermore, in a raw MID barcoded SFF file the barcodes are not considered in the clipping (i.e. they are still part of the trimmed read), but after splitting an SFF file by MID, the "quality" left clipping values ARE changed to trim off the barcode (and the adapter clipping values remain unused). i.e. As far as I know, Roche don't use the adapter clipping values in the SFF spec, instead they use the "quality" clipping values for both kinds of clipping. This fits with what Bastien wrote on the list back on 18 May 2010, > > ... the software from > Roche still (after 5 years) is not able to make the distinction between > clipping by quality and clipping by adaptor, although they did think of it > when implementing data structures. //www.freelists.org/post/mira_talk/Mixed-454-shotgun-and-paired-end-assembly-run-time,1 The NCBI traceinfo.xml also allows for two sets of left/right clipping points - this time quality based and vector based: CLIP_QUALITY_LEFT, CLIP_QUALITY_RIGHT and CLIP_VECTOR_LEFT, CLIP_VECTOR_RIGHT. http://eutils.ncbi.nih.gov/Traces/trace.fcgi?cmd=show&f=rfc&m=main&s=rfc What puzzles me is why using sff_extract on a typical SFF file (with "quality" clipping points but not adapter clipping points) produces a traceinfo.xml file with vector trimming entries and NOT quality clipping. Is this just a practical solution to the fact that SFF files from Roche seem to just have a single value for quality+adapter clipping so this be simply mapped to separate quality+vector clipping values? Why I am asking is MIRA can "unclip" or "untrim" reads to try and use the ends of a read which are labelled as poor quality (MIRA option -DP:ure for use read extension). To do this, you really need to know if the clipping information is quality clipping (when it is safe to extend), or adapter/vector clipping (when you should not extend the reads). If these two types of information were in the traceinfo.xml file given to MIRA would it take advantage of this distinction? From looking at the manual, by default ure is on for Sanger but off for 454 and the other sequencing technologies. Is there anyway to specify ure only at the start/end of reads? I'm thinking that for most 454 reads applying ure to the end (3') only might be safe: Left clipping will normally be for the key sequence (tcag) and any barcode (mid) which should be respected, but right clipping will usually be quality clipping and can be unclipped. Except of course where there is a 3' MID or primer sequence ;) Peter -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html