[haiku-depot-web] Dealing with Multiple Repositories and Conflicts

  • From: Andrew Lindesay <apl@xxxxxxxxxxxxxx>
  • To: haiku-depot-web@xxxxxxxxxxxxx
  • Date: Fri, 22 May 2015 00:09:33 +1200

Hello;

I am working through some planning for the changes necessary to handle multiple repositories in HDS. This email is to give my thoughts on this and an opportunity to discuss any problems that anybody can see.

Terms
~~~~~

First to be clear - I distinguish between a package and a package version like this;

Package = "apr"
Package Version = apr - 1.5.0-1 - x86_gcc2

A package has no version coordinates or arch and is identified by the name alone. The package version is the package with coordinates and arch.

Multiple Repositories - Not Really
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In a way, at the moment, HDS has "multiple repositories" configured, but actually they are all just different architectures for "HaikuPorts". I am planning to change the "repository" concept to be the "HaikuPorts" (as an example) which would then be associated with multiple URLs to feed-in from. Once this is done, it will be possible to add other repositories which is what I'm trying to achieve. I think this part of the change is simple, makes sense and is fairly straight-forward. No problems here!

Conflicting Data Between Repositories
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Where matters get complex is considering handling the inevitable situation wherein the same package and/or same package version (determined by the version coordinates + arch) appears in two or more different repositories. This raises some challenges. At the moment this is _not_ allowed during import from HPKR.

My initial reaction was to consider the Package Version as being independent of anything found in a repository [1] and to then create an additional structure to capture the material from the HPKR that is specific to that Package Version as found on that specific repository URL.

After working through the implications, I decided that this approach is not ideal. It attempts to handle the conflicts gracefully, but in doing so creates complexity and has flaws. The main flaw would be that we cannot be sure that different sets of maintainers across different repositories are going to name their packages the same and we cannot be too sure that they will version their packages the same either. So I would consider that any attempt to unify (at least at the version-level) data between repositories is probably going to lead to maintenance problems and anomalies.

I don't think this makes sense.

A Better Approach
~~~~~~~~~~~~~~~~~

So my thinking now is that Package Version data and User Ratings (including aggregates) are separated by repository -- not shared between repositories. This keeps things logically and conceptually simple and I think it will work.

For example; If you were to search _across_ repositories and two repositories happen to have the same Package then you will see this in the data because two Package Versions will be returned and would be identified as belonging to the two repositories. The system would not try to 'hide' this fact from you by presenting this as one search result.

Packages themselves, as identified by their "name" in the HPKR data would be logically the *same* between repositories. Is it reasonable to assume that repository maintainers are going to avoid Package name-conflicts between repositories? I guess if this were not the case, there would be the case for problems at the HaikuOS level. ^^ Assuming this, the following data would be shared across repositories;

* screenshots
* icons
* prominence
* categorization
* authorization

Conflicting package version + arch from any single repository (URL) would be considered an error, but it would be allowed between repository sources and by extension, repositories.

cheers.

[1]
http://www.silvereye.co.nz/tmp/hds-img-datamodel-21may2015__DRAFT.pdf

--
Andrew Lindesay

Other related posts: