[codeface] Re: RC system

From: Andreas Ringlstetter <andreas.ringlstetter@xxxxxxxxxxxxxxxxxxxx>
To: <codeface@xxxxxxxxxxxxx>
Date: Mon, 15 Feb 2016 11:36:06 +0100

Hello,

after due deliberation, I came to the conclusion that the use cases of
the current RC system can be fully covered by the proposed branch
support system I'm currently working on.

I'm therefor proposing to drop that subsystem in favour of the more
generic solution.

There are a few minor differences between the current system and the
proposed one:

As the branch system works purely by traversing the commit dependency
graph, rather than going by timestamps, edge cases like commits which
are only contributing to a possible development branch, but are not
contributing to the upcoming release, are no longer falsly attributed to
the stabilization phase.

Instead, when partitioning the commit graph by RC1 and release tags,
phases can overlap in case the commit history itself isn't completely
linear.
This can lead to visible overlaps in the line charts plotting the churn
rate. I aim for a stacked representation in the case that activity from
two branches (or phases) overlaps.

This should not cause any significant differences for the projects
examined so far, especially the Linux kernel project. Mostly since these
projects don't record active development in the main repository, they
are rarely making use of non-linear histories, and commits are eagerly
rewritten in favour of a clean history, so the non-linearity of the
development process is hidden anyway.
Effectively leaving the inspected repositories with an seemingly linear
history , where both partitioning by timestamp in the forcefully
serialized presentation as well as the clean approach of respecting the
commit tree yield mostly the same results.

I do expect significant differences to the existing system in projects
which require work to be tracked in the VCS and which also abstain from
rewriting the VCS history. I'm not aware that any project matching this
criteria has been analysed for RC phases with codeface yet.

Now as for how the branch system can be used to model the RC analysis,
and how to generate the same plots:

Simply speaking, each merge window and each stabilizing window can be
treated as a regular, disjunct branch, whereby each two branches map to
the same label. I already described in an earlier mail how the
partitioning is performed in detail.

I will add the option to group branches by an additional, arbitrary
attribute, for this use case e.g. "Is RC phase" with possible values
"Yes" and "No". I will treat each distinct value for that dimension as a
separate group, with no assumption about the semantic.

All analysis passes like churn rate, time series analysis, etc. are
applied to each individual branch - regardless of what it represents.
For further analysis, they can be attributed to the correct phase by the
label and attribute assigned in the project configuration.

Feedback?
I'm mostly concerned about 3rd party scripts which are relying on the
current architecture. So if someone relies on the current system, please
tell me.

This would also be the chance to request possible extensions to the RC
analysis, respectively the generalized system.

One specific question:
Is there a demand for multiple auxiliary dimensions per project? E.g. to
create grouped plots both for "Is RC phase" and some other attribute of
interest?
And what is the expected behaviour when two branches share the same
label AND value for an auxiliary dimension when plotting? Issue a
warning and fail?

Best regards,
Andreas Ringlstetter

Am 21.12.2015 um 19:00 schrieb Wolfgang Mauerer:

Am 19/12/2015 um 16:26 schrieb Andreas Ringlstetter:

Hello,

is the RC system still in use? There is the field releaseRCStartId in
the release_range model, and the only accessing location in the Python
part is project.py/project_analyse() via
dbmanager.py/get_release_range() and dispatching to
cluster.py/doProjectAnalysis() where it is then discarded silently.

In the R part, it is only used to plot something in
analyse_ts.r/do.ts.analysis(), and only the raw timestamp is ever being
used.

What was that system supposed to do? Are there any plans to reactivate it?

release candidate information is used in various places, particularly
for plotting time series and for classifying commits.

I need to change the semantic of the release_timeline, I can leave that
table as it is if required, but I can no longer use it as the
authoritative source for boundary tags when partitioning the commit list
into ranges.

what we need to keep is (if I'm not overlooking something):

- A classification for each commit if it falls into a RC phase or not
- Time stamp information for the RC start date.

Best regards, Wolfgang Mauerer

Greetings,
Andreas

Follow-Ups:
- [codeface] Re: RC system
  - From: Wolfgang Mauerer

[codeface] Re: RC system

Other related posts: