[codeface] Re: Branch support

  • From: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>
  • To: Andreas Ringlstetter <andreas.ringlstetter@xxxxxxxxxxxxxxxxxxxx>, <codeface@xxxxxxxxxxxxx>
  • Date: Sat, 14 Nov 2015 17:24:54 +0100



Am 11/11/2015 um 16:20 schrieb Andreas Ringlstetter:

since the problem seems to affect only a tiny fraction of all projects,
the easiest approach is to not overcomplicate the range specification,
but rather to come up with means of bringing repos into the desired
form if required. In the worst case, we cannot analyse some commercial
projects, but that's the status quo anyway. Just think of projects
managed with ClearCase and other horrors along these lines.
It's not a problem. The partition logic required to properly isolate
branches requires me to make this change anyway.

I think I will go with the following notation in the config file:
- Unordered(!) set of tags used to partition the entire commit graph.
- List of series, each series being defined by upper and lower boundaries.

The boundaries specified in the series must only use enlisted tags.
Boundaries can be omitted.

An additional series with the name "global" is implicitly added if not
present, and has no boundaries per default.

Ranges are defined by a set of head and a set of base revisions. In the
most trivial case, only single head and no base revisions are specified.

"Base" might be misleading, it's actually a stop signal to stop
traversion of the graph when this revision is encountered. So multiple
bases can shadow each other.

The revision graph is partitioned into disjunct ranges by the following
process:
- For each tag A construct a range with A as the only head revision.
Check for any other tag B if it is in the history of A.
If it is, add B as a base revision of the range.
- For each range, check if the time difference between the oldest and
newest included revision is within the maximum allowed range size. If
not, compute a number of intermediate timestamps and split the range
accordingly. (This requires multiple base and head revisions to get a
fully disjunct and complete partition across all possible border cases.
The split occurs on the latest revision preceding(!) the timestamp)
- For each range, reduce the set of head and base revisions until each
is minimal. (Optional, but should improve human comprehension. I don't
expect to be able to remove head revisions, but base revisions can be
potentially eliminated if one is reachable by another.)

That gives me a complete and disjunct set of ranges which are common to
all series.

Each range can then be persisted in the database with the list of head
and base revisions.

For each series, I will then have to determine all ranges fully within
the boundary of the corresponding series.

Each series is then persisted in the database with the corresponding
list of ranges.


Beyond this point, no process should attempt to read the tag or series
list from the configuration file or even to re-construct it!

Two ranges can trivially be identified as consecutive by matching base
vs head commits in the database. (Careful: Two ranges can be matching on
more than one edge!)

So much for the setup and partitioning. Details on modifications
necessary to later stages as I get them fleshed out.

thanks for working out the details so far; the considerations look
valid. I have three requirements, though:

- The scheme, once it matures, should be discussed with concrete
examples. It should be possible to express complex situations,
but writing configuration files should still be possible for
casual users. The focus is on clarity.
- The nomenclature used should be unique, properly documented, and
consistently be applied across the source base.
- The current syntax is quite appropriate for a large number of
projects, and should in all cases be supported.

I assume you are referring to (a slight generalisation) of it
with "list of series of lower and upper boundaries", except that in
the current notation, the upper and lower bounds of consecutive
intervals are identical, which leads to a simplified notation.

Thanks & best regards, Wolfgang Mauerer

Other related posts: