[codeface] AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: Preparing time series data - sloccount analysis

  • From: Matthias Gemmer <matthias.gemmer@xxxxxxxxxxxxxxxxxxxx>
  • To: "codeface@xxxxxxxxxxxxx" <codeface@xxxxxxxxxxxxx>
  • Date: Thu, 5 Mar 2015 14:17:28 +0000

>Von: codeface-bounce@xxxxxxxxxxxxx <codeface-bounce@xxxxxxxxxxxxx> im Auftrag 
>von Mitchell Joblin <joblin.m@xxxxxxxxx>
>Gesendet: Donnerstag, 5. März 2015 15:05
>An: codeface@xxxxxxxxxxxxx
>Betreff: [codeface] Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: 
>Re: Preparing time series data - sloccount analysis
>
>On Thu, Mar 5, 2015 at 1:50 PM, Wolfgang Mauerer <wm@xxxxxxxxxxxxxxxx> wrote:
>>
>>
>> Am 05/03/2015 um 14:19 schrieb Matthias Gemmer:
>>>> Von: codeface-bounce@xxxxxxxxxxxxx <codeface-bounce@xxxxxxxxxxxxx> im 
>>>> Auftrag von Mitchell Joblin <joblin.m@xxxxxxxxx>
>>>> Gesendet: Donnerstag, 5. März 2015 13:14
>>>> An: Wolfgang Mauerer
>>>> Cc: codeface@xxxxxxxxxxxxx
>>>> Betreff: [codeface] Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: 
>>>> Preparing time series data - sloccount analysis
>>>>
>>>> On Thu, Mar 5, 2015 at 11:12 AM, Wolfgang Mauerer
>>>> <wolfgang.mauerer@xxxxxxxxxxx> wrote:
>>>>> On 05.03.2015 12:04, Matthias Gemmer wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Browse[1]> print(plot.id)
>>>>>>>>>>>>>> numeric(0)
>>>>>>>>>>>>>
>>>>>>>>>>>>> so that's the culprit... There is no valid plot ID for the time
>>>>>>>>>>>>> series in the database. Can you please check that an appropriate
>>>>>>>>>>>>> table is available in the database?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> There is a table called timeseries with the column plotId.
>>>>>>>>>>>> mysql> DESCRIBE timeseries;
>>>>>>>>>>>> +--------------+------------+------+-----+---------+-------+
>>>>>>>>>>>> | Field        | Type       | Null | Key | Default | Extra |
>>>>>>>>>>>> +--------------+------------+------+-----+---------+-------+
>>>>>>>>>>>> | plotId       | bigint(20) | NO   | MUL | NULL    |       |
>>>>>>>>>>>> | time         | datetime   | NO   |     | NULL    |       |
>>>>>>>>>>>> | value        | double     | NO   |     | NULL    |       |
>>>>>>>>>>>> | value_scaled | double     | YES  |     | NULL    |       |
>>>>>>>>>>>> +--------------+------------+------+-----+---------+-------+
>>>>>>>>>>>> 4 rows in set (0.02 sec)
>>>>>>>>>>>>
>>>>>>>>>>>> The table is also filled with data. The table contains datasets for
>>>>>>>>>>>> plotId=5, plotId=6, plotId=7 and plotId=8.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Which values do sloccount.plot.id (and understand.plot.id) have
>>>>>>>>>>>>> in do.complexity.analysis (Frame 3/4)?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The values for sloccount.plot.id and understand.plot.id are
>>>>>>>>>>>> obviously
>>>>>>>>>>>> invalid.
>>>>>>>>>>>>
>>>>>>>>>>>> Browse[1]> print(sloccount.plot.id)
>>>>>>>>>>>> numeric(0)
>>>>>>>>>>>> Browse[1]> print(understand.plot.id)
>>>>>>>>>>>> numeric(0)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> it was not so obvious to me; I was trying to ensure that
>>>>>>>>>>> parallelisation did not introduce any issues here. But your
>>>>>>>>>>> observation
>>>>>>>>>>> clarified that this is not the case.
>>>>>>>>>>>
>>>>>>>>>>> Since the error seems to be deterministically reproducible at your
>>>>>>>>>>> site, can you debug around the creation of the index (for instance 
>>>>>>>>>>> by
>>>>>>>>>>> printing out what's going on; alternatively, you could also use the
>>>>>>>>>>> built-in debugger)?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In the file codeface/R/complexity.r:
>>>>>>>>>>
>>>>>>>>>> Assignment of sloccount.plot.id and understand.plot.id:
>>>>>>>>>>    ## Obtain a plot IDs for the sloccount and understand raw time
>>>>>>>>>> series before
>>>>>>>>>>    ## parallel processing commences to avoid race conditions
>>>>>>>>>>    sloccount.plot.id <- get.or.create.plot.id(conf, "sloccount")
>>>>>>>>>>    understand.plot.id <- get.or.create.plot.id(conf, 
>>>>>>>>>> "understand_raw")
>>>>>>>>>>        -> sloccount.plot.id and understand.plot.id have the value 
>>>>>>>>>> "x".
>>>>>>>>>>               Are these values feasible? Or Shall I have a closer 
>>>>>>>>>> look
>>>>>>>>>> at the function 'get.or.create.plot.id'?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> since the SQL specification for the plot ID is
>>>>>>>>>
>>>>>>>>> `id` BIGINT NOT NULL AUTO_INCREMENT
>>>>>>>>>
>>>>>>>>> the value "x" seems quite impossible. Can you please query your
>>>>>>>>> database to see what value is stored there?
>>>>>>>>>
>>>>>>>>
>>>>>>>> The table is empty.
>>>>>>>> mysql> select * from plots;
>>>>>>>> Empty set (0.01 sec)
>>>>>>>
>>>>>>>
>>>>>>> please try to run the other SQL statements produced by the code to see
>>>>>>> why no entry is created. get.or.create.plot.id() inserts a new entry
>>>>>>> into the table is no ID for a desired plot is available.
>>>>>>
>>>>>>
>>>>>> The branch which creates a plot ID is not entered. The condition
>>>>>> 'length(res) < 1' is
>>>>>> in both cases (sloccount.plot.id and understand.plot.id) not satisfied.
>>>>>>
>>>>>> For sloccount.plot.id <- get.or.create.plot.id(conf, "sloccount"):
>>>>>>    res <- dbGetQuery(con, str_c(query, ";"))
>>>>>>    # str_c(query, ";"): SELECT id FROM plots WHERE name='sloccount' AND
>>>>>> projectId=2;
>>>>>>    # res: "id"
>>>>>>    # length(res): 1
>>>>>>    if (length(res) < 1) {
>>>>>>      ## Plot ID is not assigned yet, create one
>>>>>>      res <- get.clear.plot.id.con(con, pid, plot.name, range.id)
>>>>>>    } else {
>>>>>>      res <- res$id
>>>>>>    }
>>>>>>    # res: "x"
>>>>>
>>>>>
>>>>> @Mitchell, could you try to reproduce this? I don't see why a result
>>>>> with non-zero length should be returned from the SQL query if the
>>>>> database is empty.
>>>>
>>>> The SQL query probably returns a data frame and length(..) called on a
>>>> data frame does not return the number of rows. To get the number of
>>>> rows of a data frame you should be using nrow(..) instead of
>>>> length(..).
>>>>
>>>> --Mitchell
>>>>
>>>
>>> That worked for me.
>>> After replacing 'length' with 'nrow' a new plot ID is created!
>>
>> The following patch should fix this for good then:
>>

After patching there is still a problem on my side.
The call 'dbWriteTable' in the function add.sloccount.ts produces (as before) 
the message
Unknown column 'person.months' in 'field list'.

-- Matthias Gemmer

>>> diff --git a/codeface/R/db.r b/codeface/R/db.r
>>> index db53811..32da240 100644
>>> --- a/codeface/R/db.r
>>> +++ b/codeface/R/db.r
>>> @@ -59,10 +59,10 @@ get.clear.plot.id.con <- function(con, pid, plot.name, 
>>> range.id=NULL,
>>>
>>>    res <- dbGetQuery(con, str_c("SELECT id", query))
>>>
>>> -  if (length(res) != 1) {
>>> +  if (nrow(res) != 1) {
>>>      stop("Internal error: Plot ", plot.name, " appears multiple times in 
>>> DB",
>>>           "for project ID ", pid)
>>> -  }
>>> +}
>>>
>>>    return(res$id)
>>>  }
>>> @@ -81,7 +81,7 @@ get.plot.id.con <- function(con, pid, plot.name, 
>>> range.id=NULL) {
>>>      query <- str_c(query, " AND releaseRangeId=", range.id)
>>>    }
>>>    res <- dbGetQuery(con, str_c("SELECT id", query))
>>> -  if (length(res) < 1) {
>>> +  if (nrow(res) < 1) {
>>>      stop("Internal error: Plot ", plot.name, " not found in DB",
>>>           " for project ID ", pid)
>>>    }
>>> @@ -104,7 +104,7 @@ get.or.create.plot.id.con <- function(con, pid, 
>>> plot.name, range.id=NULL) {
>>>    }
>>>    res <- dbGetQuery(con, str_c(query, ";"))
>>>
>>> -  if (length(res) < 1) {
>>> +  if (nrow(res) < 1) {
>>>      ## Plot ID is not assigned yet, create one
>>>      res <- get.clear.plot.id.con(con, pid, plot.name, range.id)
>>>    } else {
>>> @@ -125,7 +125,7 @@ get.revision.id <- function(conf, tag) {
>>>                      str_c("SELECT id FROM release_timeline WHERE 
>>> projectId=",
>>>                            conf$pid, " AND tag=", sq(tag), " AND 
>>> type='release'"))
>>>
>>> -  if (length(res) > 1) {
>>> +  if (nrow(res) > 1) {
>>>      stop("Internal error: Revision if for tag ", tag, " (project ", 
>>> conf$project,
>>>           ") appears multiple times in DB!")
>>>    }
>>
>> However, I don't really understand why it worked before in this case...
>> @Mitchell: I'll push the patch to master unless you object, but can you
>> please try to understand why we did not run into problems earlier?
>
>Please push to master after running the test suite, otherwise please
>make a pull request and then I will test it before merging.
>
>Thanks,
>
>Mitchell
>
>>
>> Thanks, Wolfgang
>>

Other related posts: