[mira_talk] Re: Reassembly contigs in MIRA

  • From: Andrej Benjak <abenjak@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Fri, 04 Jul 2014 09:32:09 +0200

For the record, I never worked with transcriptomes, I am only giving my rationale here.


I still think this is not the way to go. Different kmers or not, these are contigs. Contigs edges are areas where the assembler found conflicts in the alignments. Mostly these are repetitive areas, homopolymers, low complexity regions, low coverage etc... Depending on the algorithm these areas can be handled differently (that's why you get slightly different results depending on which program or parameters you use), but simply stitching contigs together based on partial overlaps is not correct. If it were that simple, Velvet, or any other assembler would make this contigs bigger.

Since you are assembling a transcriptome, you could order contigs into larger scaffolds using other programs. I forgot now which programs are these, look for Trinity or stuff like that. Basically, these programs align your contigs against a protein databse of a closely relates species.

My personal strategy would be to use MIRA to assemble raw reads. Then align the contigs against a good protein dataset in order to look for possible misassemblies or weird stuff, and if possible scaffold some cotigs (I would be careful with the scaffolding though, there is the risk of producing chimeric scaffolds, epsecially in a non-haploid and heterozygous organism, I suppose).

andrej

On 07/04/2014 08:50 AM, suganthi appalasamy wrote:
Hi there,

Thank you for your feedback. Im trying to assemble contigs assembled using Velvet at different kmers. so i have few libraries, and im trying to reassemble all the libraries (different kmers). Does this sound legit?


Thanks!!

~sue~


On Fri, Jul 4, 2014 at 2:41 PM, Andrej Benjak <abenjak@xxxxxxxxx <mailto:abenjak@xxxxxxxxx>> wrote:

    Hi,

    The log says that the fasta file cannot be found, so are you sure
    the path is correct?

    But this doesn't matter really, the problem is you are feeding
    MIRA with assembled contigs as they were Illumina reads. They are
    not Illumina anymore, you assembled them with Velvet.
    You can assemble these contigs with MIRA using 'technology=text'
    or 'sanger' sure, but then it would be in your interest to delete
    the results right away...

    Reassembling contigs can only produce unreliable results (unless
    you are after some special cases and you know what you are doing,
    I suppose). If anything gets assembled, it's likely to be
    misassembled. This is because contigs don't cary any information
    about coverage, repeats, errors etc..

    So I really recommend to use MIRA on raw Illumina reads and that's
    it. You can always compare the results from MIRA with those from
    Velvet, see where the differences are etc...

    Cheers,
    Andrej



    On 07/04/2014 05:57 AM, suganthi appalasamy wrote:
    Hi,

    *I have assembled my transcriptome **r**eads with Velvet (Oases).
    i have few libraries of assembled contigs which i wish to
    reassemble all of them using MIRA. Can this be done? *
    *
    *
    *i tried and i found this error message in my log. *


    Looking for files named in data ...Data
    
'/media/data2/dell2/home/su/bin/Project__C1B1/Sample_lane3/C1B1_VELVET/_77/transcript_77.fa'
    was not found (neithe$
    Data
    
'/media/data2/dell2/home/su/bin/Project_C1B1/Sample_lane3/C1B1_VELVET/_79/trancript_79.fa'
    was not found (neither as file nor as directory) or led to a read
    error

    Fatal error (may be due to problems of the input data or parameters):

    
********************************************************************************
    * Some 'data' entries named in the manifest file could not be
    verified, see    *
    * the log above.                                     *
    * Maybe some files are missing, not readable or there is a typo
    in the         *
    * manifest file?                                     *
    
********************************************************************************
    ->Thrown: void Manifest::loadManifestFile(string & mfilename)




    *My manifest.conf looks like this*


    projectname = C1B1
    job = genome,denovo,accurate
    parameters = -GE:not=4 \ COMMON_SETTINGS -NW:cmrnl=no \
    -OUT:ora=on -OUT:rtd=no -OUT:rrot=no \ SOLEXA_SETTINGS -CL:pec=on
    readgroup = unpaireddata
    data =
    
/media/data2/dell2/home/su/bin/Project_C1B1/Sample_lane3/C1B1_VELVET/_77/transcript_77.fa/media/data2/dell2/home/su/bin/Project_C1B1/Sample_lane3/C1B1_VELVET/_79/transcript_79.fa
    technology = solexa
    template_size = 100 100000 autorefine



    *Thank you!!*



Other related posts: