[codeface] AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: Preparing time series data - sloccount analysis

  • From: Matthias Gemmer <matthias.gemmer@xxxxxxxxxxxxxxxxxxxx>
  • To: "codeface@xxxxxxxxxxxxx" <codeface@xxxxxxxxxxxxx>
  • Date: Thu, 5 Mar 2015 15:24:44 +0000

>Von: codeface-bounce@xxxxxxxxxxxxx <codeface-bounce@xxxxxxxxxxxxx> im Auftrag 
>von Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxx>
>Gesendet: Donnerstag, 5. März 2015 15:20
>An: codeface@xxxxxxxxxxxxx
>Betreff: [codeface] Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: 
>Re: AW: Re: Preparing time series data - sloccount analysis
>
>On 05.03.2015 15:17, Matthias Gemmer wrote:
>>> Von: codeface-bounce@xxxxxxxxxxxxx <codeface-bounce@xxxxxxxxxxxxx> im 
>>> Auftrag von Mitchell Joblin <joblin.m@xxxxxxxxx>
>>> Gesendet: Donnerstag, 5. März 2015 15:05
>>> An: codeface@xxxxxxxxxxxxx
>>> Betreff: [codeface] Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: 
>>> Re: Preparing time series data - sloccount analysis
>>>
>>> On Thu, Mar 5, 2015 at 1:50 PM, Wolfgang Mauerer <wm@xxxxxxxxxxxxxxxx> 
>>> wrote:
>>>>
>>>>
>>>> Am 05/03/2015 um 14:19 schrieb Matthias Gemmer:
>>>>>> Von: codeface-bounce@xxxxxxxxxxxxx <codeface-bounce@xxxxxxxxxxxxx> im 
>>>>>> Auftrag von Mitchell Joblin <joblin.m@xxxxxxxxx>
>>>>>> Gesendet: Donnerstag, 5. März 2015 13:14
>>>>>> An: Wolfgang Mauerer
>>>>>> Cc: codeface@xxxxxxxxxxxxx
>>>>>> Betreff: [codeface] Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: 
>>>>>> Preparing time series data - sloccount analysis
>>>>>>
>>>>>> On Thu, Mar 5, 2015 at 11:12 AM, Wolfgang Mauerer
>>>>>> <wolfgang.mauerer@xxxxxxxxxxx> wrote:
>>>>>>> On 05.03.2015 12:04, Matthias Gemmer wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Browse[1]> print(plot.id)
>>>>>>>>>>>>>>>> numeric(0)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> so that's the culprit... There is no valid plot ID for the time
>>>>>>>>>>>>>>> series in the database. Can you please check that an appropriate
>>>>>>>>>>>>>>> table is available in the database?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There is a table called timeseries with the column plotId.
>>>>>>>>>>>>>> mysql> DESCRIBE timeseries;
>>>>>>>>>>>>>> +--------------+------------+------+-----+---------+-------+
>>>>>>>>>>>>>> | Field        | Type       | Null | Key | Default | Extra |
>>>>>>>>>>>>>> +--------------+------------+------+-----+---------+-------+
>>>>>>>>>>>>>> | plotId       | bigint(20) | NO   | MUL | NULL    |       |
>>>>>>>>>>>>>> | time         | datetime   | NO   |     | NULL    |       |
>>>>>>>>>>>>>> | value        | double     | NO   |     | NULL    |       |
>>>>>>>>>>>>>> | value_scaled | double     | YES  |     | NULL    |       |
>>>>>>>>>>>>>> +--------------+------------+------+-----+---------+-------+
>>>>>>>>>>>>>> 4 rows in set (0.02 sec)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The table is also filled with data. The table contains datasets 
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> plotId=5, plotId=6, plotId=7 and plotId=8.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Which values do sloccount.plot.id (and understand.plot.id) have
>>>>>>>>>>>>>>> in do.complexity.analysis (Frame 3/4)?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The values for sloccount.plot.id and understand.plot.id are
>>>>>>>>>>>>>> obviously
>>>>>>>>>>>>>> invalid.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Browse[1]> print(sloccount.plot.id)
>>>>>>>>>>>>>> numeric(0)
>>>>>>>>>>>>>> Browse[1]> print(understand.plot.id)
>>>>>>>>>>>>>> numeric(0)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> it was not so obvious to me; I was trying to ensure that
>>>>>>>>>>>>> parallelisation did not introduce any issues here. But your
>>>>>>>>>>>>> observation
>>>>>>>>>>>>> clarified that this is not the case.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Since the error seems to be deterministically reproducible at your
>>>>>>>>>>>>> site, can you debug around the creation of the index (for 
>>>>>>>>>>>>> instance by
>>>>>>>>>>>>> printing out what's going on; alternatively, you could also use 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> built-in debugger)?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In the file codeface/R/complexity.r:
>>>>>>>>>>>>
>>>>>>>>>>>> Assignment of sloccount.plot.id and understand.plot.id:
>>>>>>>>>>>>     ## Obtain a plot IDs for the sloccount and understand raw time
>>>>>>>>>>>> series before
>>>>>>>>>>>>     ## parallel processing commences to avoid race conditions
>>>>>>>>>>>>     sloccount.plot.id <- get.or.create.plot.id(conf, "sloccount")
>>>>>>>>>>>>     understand.plot.id <- get.or.create.plot.id(conf, 
>>>>>>>>>>>> "understand_raw")
>>>>>>>>>>>>         -> sloccount.plot.id and understand.plot.id have the value 
>>>>>>>>>>>> "x".
>>>>>>>>>>>>                Are these values feasible? Or Shall I have a closer 
>>>>>>>>>>>> look
>>>>>>>>>>>> at the function 'get.or.create.plot.id'?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> since the SQL specification for the plot ID is
>>>>>>>>>>>
>>>>>>>>>>> `id` BIGINT NOT NULL AUTO_INCREMENT
>>>>>>>>>>>
>>>>>>>>>>> the value "x" seems quite impossible. Can you please query your
>>>>>>>>>>> database to see what value is stored there?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The table is empty.
>>>>>>>>>> mysql> select * from plots;
>>>>>>>>>> Empty set (0.01 sec)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> please try to run the other SQL statements produced by the code to see
>>>>>>>>> why no entry is created. get.or.create.plot.id() inserts a new entry
>>>>>>>>> into the table is no ID for a desired plot is available.
>>>>>>>>
>>>>>>>>
>>>>>>>> The branch which creates a plot ID is not entered. The condition
>>>>>>>> 'length(res) < 1' is
>>>>>>>> in both cases (sloccount.plot.id and understand.plot.id) not satisfied.
>>>>>>>>
>>>>>>>> For sloccount.plot.id <- get.or.create.plot.id(conf, "sloccount"):
>>>>>>>>     res <- dbGetQuery(con, str_c(query, ";"))
>>>>>>>>     # str_c(query, ";"): SELECT id FROM plots WHERE name='sloccount' 
>>>>>>>> AND
>>>>>>>> projectId=2;
>>>>>>>>     # res: "id"
>>>>>>>>     # length(res): 1
>>>>>>>>     if (length(res) < 1) {
>>>>>>>>       ## Plot ID is not assigned yet, create one
>>>>>>>>       res <- get.clear.plot.id.con(con, pid, plot.name, range.id)
>>>>>>>>     } else {
>>>>>>>>       res <- res$id
>>>>>>>>     }
>>>>>>>>     # res: "x"
>>>>>>>
>>>>>>>
>>>>>>> @Mitchell, could you try to reproduce this? I don't see why a result
>>>>>>> with non-zero length should be returned from the SQL query if the
>>>>>>> database is empty.
>>>>>>
>>>>>> The SQL query probably returns a data frame and length(..) called on a
>>>>>> data frame does not return the number of rows. To get the number of
>>>>>> rows of a data frame you should be using nrow(..) instead of
>>>>>> length(..).
>>>>>>
>>>>>> --Mitchell
>>>>>>
>>>>>
>>>>> That worked for me.
>>>>> After replacing 'length' with 'nrow' a new plot ID is created!
>>>>
>>>> The following patch should fix this for good then:
>>>>
>>
>> After patching there is still a problem on my side.
>> The call 'dbWriteTable' in the function add.sloccount.ts produces (as 
>> before) the message
>> Unknown column 'person.months' in 'field list'.
>the patch was not supposed to address this issue. Did you check that the
>database schema is uptodate with the latest version?
>
>Best regards, Wolfgang Mauerer

The SQL-Schema on my side is identical to the schema of the latest commit 
(79ebe108e2) to the master-branch.

The SQL-Statement of 'dbWriteTable'
  Browse[1]> print(statement)
  [1] "LOAD DATA LOCAL INFILE '/tmp/RtmpZRB6lj/rsdbi503512c75278'  INTO TABLE 
`sloccount_ts`  LINES TERMINATED BY '\n'   (`plotId`, `time`, `person.months`, 
`total.cost`, `schedule.months`, `avg.devel`);"

matches the corresponding table.
  mysql> describe sloccount_ts;
  +-----------------+------------+------+-----+---------+-------+
  | Field           | Type       | Null | Key | Default | Extra |
  +-----------------+------------+------+-----+---------+-------+
  | plotId          | bigint(20) | NO   | MUL | NULL    |       |
  | time            | datetime   | NO   | PRI | NULL    |       |
  | person_months   | double     | NO   |     | NULL    |       |
  | total_cost      | double     | NO   |     | NULL    |       |
  | schedule_months | double     | NO   |     | NULL    |       |
  | avg_devel       | double     | NO   |     | NULL    |       |
  +-----------------+------------+------+-----+---------+-------+

Best Regards, Matthias Gemmer

Other related posts:

  • » [codeface] AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: Preparing time series data - sloccount analysis - Matthias Gemmer