Am 10.11.2015 um 22:25 schrieb Wolfgang Mauerer:
Am 10/11/2015 um 16:01 schrieb Andreas Ringlstetter:
what is actually required to provide branch support in Codeface?
For starters, it's changing a few assumptions:
- Branches can overlap, meaning the corresponding ranges can overlap.
They no longer form a single series. This can be cheated around using
multiple projects for multiple series.
The least invasive way to model this, is defining new "meta-projects"
which are simply plotting multiple regular projects against each other.
I'm not much in favour of this approach: All release ranges of a
project (and the associated inferred data) are currently dispatched
from a project-specific view. A branch is nothing else than a
(generalised) release range, so it should be accessible like any other
release range.
The optimal solution of actually allowing multiple series per project
would break too much of the existing code base.
Why too much? I see these main modifications:
* Global time series (composed of multiple release range sub-series)
would be augmented with branch-specific time series (a branch
can, but need not be part of the global series).
* There needs to be a strategy how to present and order such series in
the web front-end. Widgets that compare ranges (like release distance)
need to be modified to compare meaningful ranges (the widget should
also be taught which ranges are pointless to compare, for instance
those generated by a sliding window approach).
* Clusters etc. need to be computed for every generalised range.
- A branch can't be isolated using the "start..end" syntax, since it may
have multiple anchor points belonging designating different branches.
This requires to use the explicit multi point notation for git,
specifying the start commits with "--not start" or "^start". No
additional end commits are required when using tags. It is safe to add
start commits to every range query.
The start and end commits defining the range also need to be specified
when using the date based range partition method. It's not possible to
omit them.
I think this can be modeled by adding new branch boundary values to the
project configuration, single value for the branch end, and a list for
the branch base.
This is mainly an issue of coming up with a good DSL for describing
generalised ranges in the configuration file. While the problem is
surely complex in its full generality, I don't think going fully
general is necessary: When the analysed branch structure becomes
too complicated, it's usually not of interest to be examined. What
is important from my point of view is
* Tracking feature branches from the branch point to the merge point
* Slicing a history from A to B into N intervals (as already supported
for full histories, but could be generalised to more restricted
ranges)
* Combining sub-ranges into larger ranges (for instance, like all
current release ranges are currently spliced in some order to generate
the global time series)
that would be much appreciated.
A new sanity check is required to check if all specified revisions are
within the branch boundary.
Btw: There's definitely a bug in the current system. If two specified
revisions happen to be in parallel branches, common code will be wrongly
attributed individually to two different ranges. This is caused by only
using the "start..end" notation, while it would have been necessary to
explicitly exclude ALL commits reachable from earlier revisions in the
range.
generating commit lists is deliberately performed as simple as possible.
There is no "correct" solution of ordering contributions wrt. to the
real world anyway -- just think of a developer who has experimented for
a couple of days, and then at some point in time squashes several
commits together to create a new one. The date attributed of this commit
will not show the real creation date of the code, but just the date
of the squash. Unless there's a realistic counterexample, we work on the
assumption that the currently used approach does not introduce
substantial perturbations. Some mis-attributed commits should usually
just cause insubstantial noise.
can you provide an example of a real project where such a scenario
E.g. for the series "A B C D", the correct query for the commits
contributing to "D" is not "C..D", but actually "D ^A ^B ^C".
The semantic difference shows in the following graph:
D
|\
C|
|B
|/
A
This is already breaking the assumption about non-overlapping ranges
(not much of a surprise...), but Codeface should actually be capable of
handling this the moment the query is corrected.
occurs?
Best regards, Wolfgang Mauerer
This might have caused double attributions in past analysis done with
codeface, I haven't checked if this pattern occurred in any of the
selected revision sets for the project configs shipped with Codeface.