[codeface] Re: Data model

  • From: Mitchell Joblin <joblin.m@xxxxxxxxx>
  • To: codeface@xxxxxxxxxxxxx
  • Date: Tue, 30 Jun 2015 15:01:27 +0000

Hi Claus,

Thanks for your questions.

On Mon, Jun 8, 2015 at 11:30 AM, Claus Hunsen <hunsen@xxxxxxxxxxxxxxxxx> wrote:

Hi everybody,

we had a short look on the data model of Codeface and now have some
questions, which somebody on this list might be able to answer. This
would help us to understand the model better and make our extensions
regarding software metrics properly.

- Does the data of the 'commit' table come solely from the blame analysis?

No. The blame analysis is used to map lines of code to developers. To
get the commit data we mostly use "git log" and "git show" commands
which are far faster than using blame. In the cluster.py file you can
see the commits getting added to the database table and from there you
can trace back where the commit object was generated. That mostly
occurs in VCS.py.

- What is the column 'commit_dependency.impl'?

This column is to add the implementation for whatever entity is added
there. Basically the source code. For features I don't think this gets
added but for functions/files it does.

- What is the 'author_commit_stats' table that seems to be more a view?

There is an author_commit_stats but that is not a view. There is
alternatively an author_commit_stats_view, which is a view. This table
stores data about the number of added, deleted, total lines by a given
developers and the number of commits they made.

- The same question for 'commit_communication'?

I'm not sure if this is ever filled. Perhaps we will use it in the
future to see when two developers discuss a commit, for example on a
mailing list.

- Can someone explain the idea of the "time series and plots"
submodule? This seems quite confusing to us.

Wolfgang wrote that so he can probably do a better job of explaining
what is going on there. I think that some various data is queried over
different revisions (e.g., number of lines of code added during a
revision) to produce a number of time series. Those time series are
then analyzed and plotted. No sure if that helps at all. @Wolfgang,
could you please shed a little more light on this?



Furthermore, we identified some smaller issues that might hinder
extension of the Codeface model and the linking of several related
projects. Hopefully, you can share your opinion on our observations.

- One author (table 'person') cannot be part of several projects. At
least, the author occurs several times in the database, once for each
project.
~> A 'person--project' table would help, while removing the FK from
the 'person' table.

Right, that is potentially an issue. So far we have not had a need to
do analyses that cross cut multiple projects. I can see this would be
useful for ecosystems or projects that split there work into multiple
repositories. I'm supportive of this change. If you would like to make
a change to the data model then please alter the model using mysql
workbench 6 and then forward engineer the model to generate the
script. Please put both the changes to the model and the generated
script in the same commit. Its difficult to identify the changes in to
the model since git sees it as just a binary. Thanks.

- Additionally, the email addresses should also be moved to a mapping
table 'person--email'.

I don't quite get the rationale here, perhaps I am missing something.
Why not keep the emails in the person table?

- The 'release_timeline' table is lacking the commit hashes the
releases refer to. This way, we cannot identify the right commit in
the 'commit' table that corresponds to a 'release_timeline' object.
~> Adding the hash to the 'release_timeline' table and also adding a
mapping table 'commit--tag' would enable us to get the right commit
for a release tag.

Yes, I see the issue here. So you want the single commit that the tag
is referencing but not all the commits from a particular range. I
guess we would have the add the hash, no problem. Are you ok with
making those changes? I can have a look once you submit a patch. If
you need help with anything just ask.


Kind regards,

Mitchell




Best regards,
Claus


Other related posts: