>Von: codeface-bounce@xxxxxxxxxxxxx <codeface-bounce@xxxxxxxxxxxxx> im Auftrag >von Mitchell Joblin <joblin.m@xxxxxxxxx> >Gesendet: Donnerstag, 5. März 2015 15:05 >An: codeface@xxxxxxxxxxxxx >Betreff: [codeface] Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: >Re: Preparing time series data - sloccount analysis > >On Thu, Mar 5, 2015 at 1:50 PM, Wolfgang Mauerer <wm@xxxxxxxxxxxxxxxx> wrote: >> >> >> Am 05/03/2015 um 14:19 schrieb Matthias Gemmer: >>>> Von: codeface-bounce@xxxxxxxxxxxxx <codeface-bounce@xxxxxxxxxxxxx> im >>>> Auftrag von Mitchell Joblin <joblin.m@xxxxxxxxx> >>>> Gesendet: Donnerstag, 5. März 2015 13:14 >>>> An: Wolfgang Mauerer >>>> Cc: codeface@xxxxxxxxxxxxx >>>> Betreff: [codeface] Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: >>>> Preparing time series data - sloccount analysis >>>> >>>> On Thu, Mar 5, 2015 at 11:12 AM, Wolfgang Mauerer >>>> <wolfgang.mauerer@xxxxxxxxxxx> wrote: >>>>> On 05.03.2015 12:04, Matthias Gemmer wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Browse[1]> print(plot.id) >>>>>>>>>>>>>> numeric(0) >>>>>>>>>>>>> >>>>>>>>>>>>> so that's the culprit... There is no valid plot ID for the time >>>>>>>>>>>>> series in the database. Can you please check that an appropriate >>>>>>>>>>>>> table is available in the database? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> There is a table called timeseries with the column plotId. >>>>>>>>>>>> mysql> DESCRIBE timeseries; >>>>>>>>>>>> +--------------+------------+------+-----+---------+-------+ >>>>>>>>>>>> | Field | Type | Null | Key | Default | Extra | >>>>>>>>>>>> +--------------+------------+------+-----+---------+-------+ >>>>>>>>>>>> | plotId | bigint(20) | NO | MUL | NULL | | >>>>>>>>>>>> | time | datetime | NO | | NULL | | >>>>>>>>>>>> | value | double | NO | | NULL | | >>>>>>>>>>>> | value_scaled | double | YES | | NULL | | >>>>>>>>>>>> +--------------+------------+------+-----+---------+-------+ >>>>>>>>>>>> 4 rows in set (0.02 sec) >>>>>>>>>>>> >>>>>>>>>>>> The table is also filled with data. The table contains datasets for >>>>>>>>>>>> plotId=5, plotId=6, plotId=7 and plotId=8. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Which values do sloccount.plot.id (and understand.plot.id) have >>>>>>>>>>>>> in do.complexity.analysis (Frame 3/4)? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> The values for sloccount.plot.id and understand.plot.id are >>>>>>>>>>>> obviously >>>>>>>>>>>> invalid. >>>>>>>>>>>> >>>>>>>>>>>> Browse[1]> print(sloccount.plot.id) >>>>>>>>>>>> numeric(0) >>>>>>>>>>>> Browse[1]> print(understand.plot.id) >>>>>>>>>>>> numeric(0) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> it was not so obvious to me; I was trying to ensure that >>>>>>>>>>> parallelisation did not introduce any issues here. But your >>>>>>>>>>> observation >>>>>>>>>>> clarified that this is not the case. >>>>>>>>>>> >>>>>>>>>>> Since the error seems to be deterministically reproducible at your >>>>>>>>>>> site, can you debug around the creation of the index (for instance >>>>>>>>>>> by >>>>>>>>>>> printing out what's going on; alternatively, you could also use the >>>>>>>>>>> built-in debugger)? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> In the file codeface/R/complexity.r: >>>>>>>>>> >>>>>>>>>> Assignment of sloccount.plot.id and understand.plot.id: >>>>>>>>>> ## Obtain a plot IDs for the sloccount and understand raw time >>>>>>>>>> series before >>>>>>>>>> ## parallel processing commences to avoid race conditions >>>>>>>>>> sloccount.plot.id <- get.or.create.plot.id(conf, "sloccount") >>>>>>>>>> understand.plot.id <- get.or.create.plot.id(conf, >>>>>>>>>> "understand_raw") >>>>>>>>>> -> sloccount.plot.id and understand.plot.id have the value >>>>>>>>>> "x". >>>>>>>>>> Are these values feasible? Or Shall I have a closer >>>>>>>>>> look >>>>>>>>>> at the function 'get.or.create.plot.id'? >>>>>>>>> >>>>>>>>> >>>>>>>>> since the SQL specification for the plot ID is >>>>>>>>> >>>>>>>>> `id` BIGINT NOT NULL AUTO_INCREMENT >>>>>>>>> >>>>>>>>> the value "x" seems quite impossible. Can you please query your >>>>>>>>> database to see what value is stored there? >>>>>>>>> >>>>>>>> >>>>>>>> The table is empty. >>>>>>>> mysql> select * from plots; >>>>>>>> Empty set (0.01 sec) >>>>>>> >>>>>>> >>>>>>> please try to run the other SQL statements produced by the code to see >>>>>>> why no entry is created. get.or.create.plot.id() inserts a new entry >>>>>>> into the table is no ID for a desired plot is available. >>>>>> >>>>>> >>>>>> The branch which creates a plot ID is not entered. The condition >>>>>> 'length(res) < 1' is >>>>>> in both cases (sloccount.plot.id and understand.plot.id) not satisfied. >>>>>> >>>>>> For sloccount.plot.id <- get.or.create.plot.id(conf, "sloccount"): >>>>>> res <- dbGetQuery(con, str_c(query, ";")) >>>>>> # str_c(query, ";"): SELECT id FROM plots WHERE name='sloccount' AND >>>>>> projectId=2; >>>>>> # res: "id" >>>>>> # length(res): 1 >>>>>> if (length(res) < 1) { >>>>>> ## Plot ID is not assigned yet, create one >>>>>> res <- get.clear.plot.id.con(con, pid, plot.name, range.id) >>>>>> } else { >>>>>> res <- res$id >>>>>> } >>>>>> # res: "x" >>>>> >>>>> >>>>> @Mitchell, could you try to reproduce this? I don't see why a result >>>>> with non-zero length should be returned from the SQL query if the >>>>> database is empty. >>>> >>>> The SQL query probably returns a data frame and length(..) called on a >>>> data frame does not return the number of rows. To get the number of >>>> rows of a data frame you should be using nrow(..) instead of >>>> length(..). >>>> >>>> --Mitchell >>>> >>> >>> That worked for me. >>> After replacing 'length' with 'nrow' a new plot ID is created! >> >> The following patch should fix this for good then: >> After patching there is still a problem on my side. The call 'dbWriteTable' in the function add.sloccount.ts produces (as before) the message Unknown column 'person.months' in 'field list'. -- Matthias Gemmer >>> diff --git a/codeface/R/db.r b/codeface/R/db.r >>> index db53811..32da240 100644 >>> --- a/codeface/R/db.r >>> +++ b/codeface/R/db.r >>> @@ -59,10 +59,10 @@ get.clear.plot.id.con <- function(con, pid, plot.name, >>> range.id=NULL, >>> >>> res <- dbGetQuery(con, str_c("SELECT id", query)) >>> >>> - if (length(res) != 1) { >>> + if (nrow(res) != 1) { >>> stop("Internal error: Plot ", plot.name, " appears multiple times in >>> DB", >>> "for project ID ", pid) >>> - } >>> +} >>> >>> return(res$id) >>> } >>> @@ -81,7 +81,7 @@ get.plot.id.con <- function(con, pid, plot.name, >>> range.id=NULL) { >>> query <- str_c(query, " AND releaseRangeId=", range.id) >>> } >>> res <- dbGetQuery(con, str_c("SELECT id", query)) >>> - if (length(res) < 1) { >>> + if (nrow(res) < 1) { >>> stop("Internal error: Plot ", plot.name, " not found in DB", >>> " for project ID ", pid) >>> } >>> @@ -104,7 +104,7 @@ get.or.create.plot.id.con <- function(con, pid, >>> plot.name, range.id=NULL) { >>> } >>> res <- dbGetQuery(con, str_c(query, ";")) >>> >>> - if (length(res) < 1) { >>> + if (nrow(res) < 1) { >>> ## Plot ID is not assigned yet, create one >>> res <- get.clear.plot.id.con(con, pid, plot.name, range.id) >>> } else { >>> @@ -125,7 +125,7 @@ get.revision.id <- function(conf, tag) { >>> str_c("SELECT id FROM release_timeline WHERE >>> projectId=", >>> conf$pid, " AND tag=", sq(tag), " AND >>> type='release'")) >>> >>> - if (length(res) > 1) { >>> + if (nrow(res) > 1) { >>> stop("Internal error: Revision if for tag ", tag, " (project ", >>> conf$project, >>> ") appears multiple times in DB!") >>> } >> >> However, I don't really understand why it worked before in this case... >> @Mitchell: I'll push the patch to master unless you object, but can you >> please try to understand why we did not run into problems earlier? > >Please push to master after running the test suite, otherwise please >make a pull request and then I will test it before merging. > >Thanks, > >Mitchell > >> >> Thanks, Wolfgang >>