Bastien Chevreux wrote: > On 15 Jun 2014, at 17:34 , Martin MOKREJŠ <mmokrejs@xxxxxxxxx> wrote: >> Could you ensure me that the normalization discarded from my 454 data only >> reads shorter than Illumina? > > No, I cannot as this is not how LDN works. For repetitive reads, it simply > checks whether all the kmers it is composed of have already been taken enough > times. If yes, it discards the read. > >> Can I ensure mira does not apply diginorm to 454 data, except cases when >> eventually 454-read is a substring of a LONGER Illumina read? How can I >> disable it for 454 technology? Can be -HS:ldn specified under 454_SETTINGS? > > No. You can’t. No. > > In this order. Though I see your point, and making this partly configurable > seems easy enough. I’ll give it a look. But read on. > >> […] >> Still, I wonder what else could I consider before doing so. > > Looking through the code, one thing which should work is specifying the > readgroup for the 454 before the Illumina reads. LDN works readgroup by > readgroup (in the order as specified by the manifest). In your current > configuration, it first looks at all the short Illuminas and takes these. > Which then can leave quite a number of 454 out on the street as - from a kmer > perspective - they don’t add to the assembly and are thus discarded. > > Turning this around in the manifest should alleviate the problem with the 454 > reads. However, the way your assembly is set up, you will run into similar > trouble with all the Illumina readgroups: you’ll get overproportional amount > of reads from the readgroups early in the process. Still, I lost about 1/2 of the 454 reads but it is still better then staying with 1/4 only. However, in those per-individual assemblies with 454- data defined as the very last group I lost just 1/5 of reads. I thought I could get around by just disabling the second round of diginorm but had too much data after merging 8 normalized individuals. And with 4 individual had too few. ;) > >> Or should I have better used miraSearchForSNPs instead? > > You can’t: that has been discontinued. Sorry. Please update the manual below http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html#sect1_est_est_difference_assembly_clustering . It still refers to it. It is also unclear to me to what to adjust the -CO:rodirs= I have 12 diploid animals, normalized by mira, with many defaults and notably with -CO:asir=yes (I think I should have assigned strain names to each to enable setup closer to -CO:asir=yes, too late now). Then I extracted all assembled or potentially useful reads from debris and did two assemblies with same settings and again with normalization enabled. Now I just want to merge redundant contigs from all previous assembly attempts (with each allele) together. project = Mybug_reassembly_of_contigs job = est,denovo,accurate readgroup = mira data = ../all_15_assemblies_out.unpadded.fasta technology = Sanger parameters = COMMON_SETTINGS -GENERAL:number_of_threads=6 -HS:ldn=no Seems I shoudl add -AL:egp=no -CO:asir=yes but what to do about -CO:rodirs= ? Can I disable clipping of contigs? How about min. overlap and min read count per contig? Basically I believe everything "redundant" appears 3-15x. Or is it better to use genomic assembly mode to assemble the partial transcript contigs? Thanks, Martin -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html