[codeface] Re: AW: Re: AW: Re: AW: Re: AW: Re: Preparing time series data - sloccount analysis

  • From: Wolfgang Mauerer <wm@xxxxxxxxxxxxxxxx>
  • To: codeface@xxxxxxxxxxxxx
  • Date: Wed, 04 Mar 2015 20:44:59 +0100


Am 04/03/2015 um 15:58 schrieb Matthias Gemmer:
>> Von: codeface-bounce@xxxxxxxxxxxxx <codeface-bounce@xxxxxxxxxxxxx> im 
>> Auftrag von Wolfgang Mauerer <wm@xxxxxxxxxxxxxxxx>
>> Gesendet: Dienstag, 3. März 2015 20:32
>> An: codeface@xxxxxxxxxxxxx
>> Betreff: [codeface] Re: AW: Re: AW: Re: AW: Re: Preparing time series data - 
>> sloccount analysis
>>
>> Am 03/03/2015 um 10:40 schrieb Matthias Gemmer:
>>>> Von: codeface-bounce@xxxxxxxxxxxxx <codeface-bounce@xxxxxxxxxxxxx> im 
>>>> Auftrag von Wolfgang Mauerer <wm@xxxxxxxxxxxxxxxx>
>>>> Gesendet: Montag, 2. März 2015 18:28
>>>> An: codeface@xxxxxxxxxxxxx; Mitchell Joblin
>>>> Betreff: [codeface] Re: AW: Re: AW: Re: Preparing time series data - 
>>>> sloccount analysis
>>>>
>>>> Am 02/03/2015 um 18:09 schrieb Matthias Gemmer:
>>>>>> Von: codeface-bounce@xxxxxxxxxxxxx <codeface-bounce@xxxxxxxxxxxxx> im 
>>>>>> Auftrag von Wolfgang Mauerer <wm@xxxxxxxxxxxxxxxx>
>>>>>> Gesendet: Montag, 2. März 2015 17:22
>>>>>> An: codeface@xxxxxxxxxxxxx
>>>>>> Betreff: [codeface] Re: AW: Re: Preparing time series data - sloccount 
>>>>>> analysis
>>>>>>
>>>>>> Am 02/03/2015 um 17:20 schrieb Wolfgang Mauerer:
>>>>>>>> Enter an environment number, or 0 to exit  Selection: 9
>>>>>>>> Browsing in the environment with call:
>>>>>>>>    add.sloccount.ts(conf, sloccount.plot.id, commit.date, res)
>>>>>>>> Called from: debugger.look(ind)
>>>>>>>> Browse[1]> ls()
>>>>>>>> [1] "commit.date" "conf"        "plot.id"     "values"
>>>>>>>> Browse[1]> print(commit.date)
>>>>>>>> [1] "2006-09-05 20:20:16 UTC"
>>>>>>>> Browse[1]> print(values)
>>>>>>>> $lang.info
>>>>>>>>   lang lines  fraction
>>>>>>>> 1  xml    98 0.5833333
>>>>>>>> 2 perl    70 0.4166667
>>>>>>>>
>>>>>>>> $metrics
>>>>>>>>   person.months total.cost schedule.months avg.devel
>>>>>>>> 1          0.37       4426            1.71      0.22
>>>>>>>
>>>>>> that looks alright -- is plot.id properly assigned?
>>>>>
>>>>> Browse[1]> print(plot.id)
>>>>> numeric(0)
>>>> so that's the culprit... There is no valid plot ID for the time
>>>> series in the database. Can you please check that an appropriate
>>>> table is available in the database?
>>>>
>>>
>>> There is a table called timeseries with the column plotId.
>>> mysql> DESCRIBE timeseries;
>>> +--------------+------------+------+-----+---------+-------+
>>> | Field        | Type       | Null | Key | Default | Extra |
>>> +--------------+------------+------+-----+---------+-------+
>>> | plotId       | bigint(20) | NO   | MUL | NULL    |       |
>>> | time         | datetime   | NO   |     | NULL    |       |
>>> | value        | double     | NO   |     | NULL    |       |
>>> | value_scaled | double     | YES  |     | NULL    |       |
>>> +--------------+------------+------+-----+---------+-------+
>>> 4 rows in set (0.02 sec)
>>>
>>> The table is also filled with data. The table contains datasets for
>>> plotId=5, plotId=6, plotId=7 and plotId=8.
>>>
>>>>
>>>> Which values do sloccount.plot.id (and understand.plot.id) have
>>>> in do.complexity.analysis (Frame 3/4)?
>>>>
>>>
>>> The values for sloccount.plot.id and understand.plot.id are obviously
>>> invalid.
>>>
>>> Browse[1]> print(sloccount.plot.id)
>>> numeric(0)
>>> Browse[1]> print(understand.plot.id)
>>> numeric(0)
>>
>> it was not so obvious to me; I was trying to ensure that
>> parallelisation did not introduce any issues here. But your observation
>> clarified that this is not the case.
>>
>> Since the error seems to be deterministically reproducible at your
>> site, can you debug around the creation of the index (for instance by
>> printing out what's going on; alternatively, you could also use the
>> built-in debugger)?
>>
> 
> In the file codeface/R/complexity.r:
> 
> Assignment of sloccount.plot.id and understand.plot.id:
>   ## Obtain a plot IDs for the sloccount and understand raw time series before
>   ## parallel processing commences to avoid race conditions
>   sloccount.plot.id <- get.or.create.plot.id(conf, "sloccount")
>   understand.plot.id <- get.or.create.plot.id(conf, "understand_raw")
>       -> sloccount.plot.id and understand.plot.id have the value "x".
>              Are these values feasible? Or Shall I have a closer look at the 
> function 'get.or.create.plot.id'?                

since the SQL specification for the plot ID is

`id` BIGINT NOT NULL AUTO_INCREMENT

the value "x" seems quite impossible. Can you please query your
database to see what value is stored there?

Best regards, Wolfgang Mauerer
> 
> The part where the sloccount analysis is performed:
>   if (conf$sloccount == TRUE) {
>       loginfo(str_c("Performing sloccount analysis for ", commit.hash, "\n"),
>                       logger="complexity")
>       res <- do.sloccount.analysis(code.dir)
>       
>       # This call fails
>       add.sloccount.ts(conf, sloccount.plot.id, commit.date, res)
>   }
>   logdevinfo("Finished analysing sample ", i, "\n", logger="complexity")
>   
>       -> After 'res <- do.sloccount.analysis(code.dir)' res contains:
>               "lang.info.lang"
>               "lang.info.lines"
>               "lang.info.fraction"
>               "metrics.person.months"
>               "metrics.total.cost"
>               "metrics.schedule.months"
>               "metrics.avg.devel"
>               "1"
>               "xml"
>               98
>               0.583333333333333
>               0.37
>               4426
>               1.71
>               0.22
>               "2"
>               "perl"
>               70
>               0.416666666666667
>               0.37
>               4426
>               1.71
>               0.22
>               
> Best regards, Matthias Gemmer
> 
>> Best regards, Wolfgang Mauerer
> 

Other related posts: