On Jul 16, 2011, at 18:11 , 000.calabi.yau.000@xxxxxxxxxxxxxx wrote: > I have seen that PacBio released some E.coli datasets. > (http://www.pacbiodevnet.com/share/datasets/EColiOutbreak). Yep, seen them too. > I wonder what your opinion is on using reads of this length for scaffolding > in larger genome projects. Reads of length 3kb? Wonderful. > I mean the error rate seems pretty high still, but with such long reads this > shouldn't be a too big problem, or? Unfortunately, it is. 15% error rate means an error every 6 to 7 bases on average. That's way too much to my likings. The normal MIRA workflow would also not work well, but I had plans to test a couple of things. > So I am wondering if one would think into that direction would it make sense > to do a MIRA hybrid assembly or would this need more specialized assembly > routines? > And if yes are thinking about adding support like this to MIRA? I am. Probably PacBio also realised that they would not get much momentum if many of the available tools do not work with their data. At least that is my interpretation of their recent efforts to present long CCS-reads (circular consensus sequence reads) which they say have 93% accuracy. Now, there's something MIRA can start to work with. Not really perfect, but anyway not bad. Will need some time though. B. -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html