In some of the genome assembly projects that I'm working on, I see an uneven GC content at the beginning (first 10 bases) of my reads. Since the library preparation is expected to be unbiased, uneven GC content suggests that there is a contaminant sequence at the beginning of some of my reads.
Let's assume for the sake of argument that the contaminant sequence is a short subsequence of an adapter, but it's too short to identify by sequence similarity. Does anyone have any ideas about how to handle the problem besides trimming the 5' end? Does the option -CL:possible_vector_leftover_clip handle this type of problem?
Thanks. --Bob
begin:vcard fn:Robert Bruccoleri n:Bruccoleri;Robert org:Audacious Energy, LLC and Congenomics, LLC adr:;;;;;;USA email;internet:bruc@xxxxxxx title:President version:2.1 end:vcard