[mira_talk] Re: --highlyrepetitive vs normal

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sat, 13 Oct 2012 14:22:02 +0200

On Oct 11, 2012, at 18:43 , Juan Daniel Montenegro Cabrera wrote:
> [...]
> and the Y axis is the result of adding the --highlyrepetitive flag.
> 
> The assemblies are quite similar, but for some reason there is one huge 
> missassembly in the first reference contig. The mira manual does not say much 
> about the  flag and I wanted to know what are the exact changes of this flag 
> in the assembly process.

The changes by "highlyrepetitive" are quite varied. Once encountered, a whole 
set of standard parameters is changed by default and some are dynamically 
adjusted. First the default settings applied:

        "\n_COMMON_SETTINGS"
        "\n\t-AS:sep=yes"
        "\n\t-CO:mr=yes:mroir=false"
        "\n\t-SK:mnr=yes:nrr=10"
        "\n"
        "\n_SANGER_SETTINGS"
        "\n\t-GE:uti=yes"
        "\n\t-AS:urdcm=1.2"
        "\n\t-CL:pvlc=yes:pvcmla=10"
        "\n\t-DP:ure=yes:feip=0:leip=0"
        "\n\t-CO:emea=15:amgb=yes:amgbemc=yes:amgbnbs=yes"
        "\n"
        "\n_454_SETTINGS"
        "\n\t-DP:ure=no"
        "\n\t-AS:urdcm=1.4"
        "\n\t-CL:pvlc=yes:pvcmla=10"
        "\n\t-CO:emea=5:amgb=no"
        "\n"
        // TODO: what for IonTorrent? atm like 454
        "\n_IONTOR_SETTINGS"
        "\n\t-DP:ure=no"
        "\n\t-AS:urdcm=1.4"
        "\n\t-CL:pvlc=yes:pvcmla=10"
        "\n\t-CO:emea=5:amgb=no"
        "\n"
        "\n_PCBIOHQ_SETTINGS"
        "\n\t-DP:ure=no"
        "\n\t-AS:urdcm=1.4"
        "\n\t-CL:pvlc=yes:pvcmla=10"
        "\n\t-AL:egp=no"
        "\n\t-CO:emea=5:amgb=no"
        "\n"
        // TODO: PacBio LQ
        "\n_PCBIOLQ_SETTINGS"
        "\n\t-DP:ure=no"
        "\n\t-AS:urdcm=1.4"
        "\n\t-CL:pvlc=yes:pvcmla=10"
        "\n\t-AL:egp=no"
        "\n\t-CO:emea=5:amgb=no"
        "\n"
        "\n_SOLEXA_SETTINGS"
        "\n\t-DP:ure=no"
        "\n\t-AS:urdcm=1.9"
        "\n\t-CL:pvlc=no"
        "\n\t-CO:emea=5:amgb=yes:amgbemc=yes:amgbnbs=yes"
        // for Text "technology", nothing

After that, -AS:nop and -AS:rbl are adjusted, the code and output should speak 
for itself:

      if(tmpactpar->mp_assembly_params.as_numpasses<6){
        cout << "  - increassing number of passes (-AS:nop) ";
        tmpactpar->mp_assembly_params.as_numpasses++;
        if(tmpactpar->mp_assembly_params.as_numpasses<=5){
          tmpactpar->mp_assembly_params.as_numpasses++;
          cout << "by two.\n";
        }else{
          cout << "by one.\n";
        }
      }
      if(tmpactpar->mp_assembly_params.as_numrmbbreakloops<3){
        cout << "  - increasing maximum of RMB break loop (-AS:rbl).\n";
        tmpactpar->mp_assembly_params.as_numrmbbreakloops++;
      }


B.

Other related posts: