On 25 Jun 2014, at 17:36 , Offord, Victoria <vofford@xxxxxxxxx> wrote: > […] > Does anyone have any advice on how I can avoid this? I have copied in the > manifest below. Phew … this thread is one reason I like the people on this list. There can be weeks of calm, but a seemingly simple question can lead to very interesting threads. OK, there were already pretty good answers on the list already and here’s just a couple of my thoughts to recapitulate what was already said and add a couple of new thigs: 1. most important: get the rRNA out. If you already know the sequence: mirabait. If you don’t: try sortmerna (never used it) or reconstruct yourself with small test assemblies (100k to 200k, look for highest expressed transcripts, then use mirabait with confirmed (BLAST at NCBI) sequences). If you find good hits at NCBI that also contain ITS (internal transcribed spacers), also filter these sequences completely away (ITS can also reach very high expression) 2. When using Illumina, don’t do quality clip before going into MIRA as it does a better job than simpler clippers that just look at quality. 3. With Illumina data, you can do an adapter removal prior to MIRA if you want. I don’t think it is necessary, but hey :-) 4. Never run two poly-A removers on a data-set. If you did a poly-A removal prior to MIRA, you should switch off the one in MIRA as this may then remove legitimate poly-A/T CDS sequences which have a fringe similarity to poly-A tails 5. The number of megahubs in your data set is pretty small, allowing it to continue via -SK:mmhr will not harm 6. After the above (especially the rRNA in point 1), maybe you want to relax the -HS:nrr setting: you’ve used 15, which is not really necessary. B. -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html