[codeface] Re: Multi datasource analysis

  • From: Andreas Ringlstetter <andreas.ringlstetter@xxxxxxxxxxxxxxxxxxxx>
  • To: <codeface@xxxxxxxxxxxxx>
  • Date: Tue, 27 Oct 2015 12:17:51 +0100



Am 27.10.2015 um 12:04 schrieb Mitchell Joblin:

- Partitioning of all projects based on natural timestamps defined by
releases in the master project. Most likely to break when projects are
making heavy use of overlapping feature branches , and the correlation
of release cycles in master and slave repositories can't be taken as
given for all projects. Essentially, one repository is declared as
authoritative, and every other repository is expected to follow the same
release cycles. If this premise doesn't hold, activity will be
miss-attributed.

My feeling is that this won't work that well. Release ranges already are
hard to interpret without considering how the release ranges are related
between repositories in an ecosystem project. I think the necessary
assumptions are too strong to be realistic.

- Grouping tags from multiple repositories by (API) compatibility.
Commits are not partitioned by timestamp, but exclusively assigned
towards a tag. This approach lacks any natural timely correlation
between corresponding commit sets from different repositories. There is
also the issue of being unable to correctly assign contributions towards
a specific version if a component was made upwards compatible ahead of
time. In return this should yield the most coherent data regarding
actual development cycles, even when releases are not happening timely
on subcomponents. This approach is only applicable for data sources
where activity can be mapped directly to a specific version.

Same feeling for this approach.

It should be suitable for e.g. the Android AOSP, which is utilizing
several dozen repositories to isolate artifacts, while all repositories
are in fact sharing the same major tags which are also reused in the bug
tracker. (Mostly to surpass the scalability limits of Git.)

In fact, these two approaches should yield identical results for this
specific project. I will have to check if the same is true for other
projects where either approach would be applicable, in which case one of
them can be eliminated.

It's not a generic solution applicable to all projects, only if they
match a certain structure. If there is no enforced synchronization in
the project setup, it obviously won't make much sense to assume such.

--Andreas

Other related posts: