[codeface] Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: Preparing time series data - sloccount analysis

  • From: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxx>
  • To: codeface@xxxxxxxxxxxxx
  • Date: Thu, 05 Mar 2015 15:20:22 +0100

On 05.03.2015 15:17, Matthias Gemmer wrote:
Von: codeface-bounce@xxxxxxxxxxxxx <codeface-bounce@xxxxxxxxxxxxx> im Auftrag von 
Mitchell Joblin <joblin.m@xxxxxxxxx>
Gesendet: Donnerstag, 5. März 2015 15:05
An: codeface@xxxxxxxxxxxxx
Betreff: [codeface] Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: 
Preparing time series data - sloccount analysis

On Thu, Mar 5, 2015 at 1:50 PM, Wolfgang Mauerer <wm@xxxxxxxxxxxxxxxx> wrote:


Am 05/03/2015 um 14:19 schrieb Matthias Gemmer:
Von: codeface-bounce@xxxxxxxxxxxxx <codeface-bounce@xxxxxxxxxxxxx> im Auftrag von 
Mitchell Joblin <joblin.m@xxxxxxxxx>
Gesendet: Donnerstag, 5. März 2015 13:14
An: Wolfgang Mauerer
Cc: codeface@xxxxxxxxxxxxx
Betreff: [codeface] Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: 
Preparing time series data - sloccount analysis

On Thu, Mar 5, 2015 at 11:12 AM, Wolfgang Mauerer
<wolfgang.mauerer@xxxxxxxxxxx> wrote:
On 05.03.2015 12:04, Matthias Gemmer wrote:

Browse[1]> print(plot.id)
numeric(0)

so that's the culprit... There is no valid plot ID for the time
series in the database. Can you please check that an appropriate
table is available in the database?


There is a table called timeseries with the column plotId.
mysql> DESCRIBE timeseries;
+--------------+------------+------+-----+---------+-------+
| Field        | Type       | Null | Key | Default | Extra |
+--------------+------------+------+-----+---------+-------+
| plotId       | bigint(20) | NO   | MUL | NULL    |       |
| time         | datetime   | NO   |     | NULL    |       |
| value        | double     | NO   |     | NULL    |       |
| value_scaled | double     | YES  |     | NULL    |       |
+--------------+------------+------+-----+---------+-------+
4 rows in set (0.02 sec)

The table is also filled with data. The table contains datasets for
plotId=5, plotId=6, plotId=7 and plotId=8.


Which values do sloccount.plot.id (and understand.plot.id) have
in do.complexity.analysis (Frame 3/4)?


The values for sloccount.plot.id and understand.plot.id are
obviously
invalid.

Browse[1]> print(sloccount.plot.id)
numeric(0)
Browse[1]> print(understand.plot.id)
numeric(0)


it was not so obvious to me; I was trying to ensure that
parallelisation did not introduce any issues here. But your
observation
clarified that this is not the case.

Since the error seems to be deterministically reproducible at your
site, can you debug around the creation of the index (for instance by
printing out what's going on; alternatively, you could also use the
built-in debugger)?


In the file codeface/R/complexity.r:

Assignment of sloccount.plot.id and understand.plot.id:
    ## Obtain a plot IDs for the sloccount and understand raw time
series before
    ## parallel processing commences to avoid race conditions
    sloccount.plot.id <- get.or.create.plot.id(conf, "sloccount")
    understand.plot.id <- get.or.create.plot.id(conf, "understand_raw")
        -> sloccount.plot.id and understand.plot.id have the value "x".
               Are these values feasible? Or Shall I have a closer look
at the function 'get.or.create.plot.id'?


since the SQL specification for the plot ID is

`id` BIGINT NOT NULL AUTO_INCREMENT

the value "x" seems quite impossible. Can you please query your
database to see what value is stored there?


The table is empty.
mysql> select * from plots;
Empty set (0.01 sec)


please try to run the other SQL statements produced by the code to see
why no entry is created. get.or.create.plot.id() inserts a new entry
into the table is no ID for a desired plot is available.


The branch which creates a plot ID is not entered. The condition
'length(res) < 1' is
in both cases (sloccount.plot.id and understand.plot.id) not satisfied.

For sloccount.plot.id <- get.or.create.plot.id(conf, "sloccount"):
    res <- dbGetQuery(con, str_c(query, ";"))
    # str_c(query, ";"): SELECT id FROM plots WHERE name='sloccount' AND
projectId=2;
    # res: "id"
    # length(res): 1
    if (length(res) < 1) {
      ## Plot ID is not assigned yet, create one
      res <- get.clear.plot.id.con(con, pid, plot.name, range.id)
    } else {
      res <- res$id
    }
    # res: "x"


@Mitchell, could you try to reproduce this? I don't see why a result
with non-zero length should be returned from the SQL query if the
database is empty.

The SQL query probably returns a data frame and length(..) called on a
data frame does not return the number of rows. To get the number of
rows of a data frame you should be using nrow(..) instead of
length(..).

--Mitchell


That worked for me.
After replacing 'length' with 'nrow' a new plot ID is created!

The following patch should fix this for good then:


After patching there is still a problem on my side.
The call 'dbWriteTable' in the function add.sloccount.ts produces (as before) 
the message
Unknown column 'person.months' in 'field list'.
the patch was not supposed to address this issue. Did you check that the
database schema is uptodate with the latest version?

Best regards, Wolfgang Mauerer


Other related posts: