Hello;
I am working through some planning for the changes necessary to handle
multiple repositories in HDS. This email is to give my thoughts on this
and an opportunity to discuss any problems that anybody can see.
Terms
~~~~~
First to be clear - I distinguish between a package and a package
version like this;
Package = "apr"
Package Version = apr - 1.5.0-1 - x86_gcc2
A package has no version coordinates or arch and is identified by the
name alone. The package version is the package with coordinates and arch.
Multiple Repositories - Not Really
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In a way, at the moment, HDS has "multiple repositories" configured, but
actually they are all just different architectures for "HaikuPorts". I
am planning to change the "repository" concept to be the "HaikuPorts"
(as an example) which would then be associated with multiple URLs to
feed-in from. Once this is done, it will be possible to add other
repositories which is what I'm trying to achieve. I think this part of
the change is simple, makes sense and is fairly straight-forward. No
problems here!
Conflicting Data Between Repositories
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Where matters get complex is considering handling the inevitable
situation wherein the same package and/or same package version
(determined by the version coordinates + arch) appears in two or more
different repositories. This raises some challenges. At the moment
this is _not_ allowed during import from HPKR.
My initial reaction was to consider the Package Version as being
independent of anything found in a repository [1] and to then create an
additional structure to capture the material from the HPKR that is
specific to that Package Version as found on that specific repository URL.
After working through the implications, I decided that this approach is
not ideal. It attempts to handle the conflicts gracefully, but in doing
so creates complexity and has flaws. The main flaw would be that we
cannot be sure that different sets of maintainers across different
repositories are going to name their packages the same and we cannot be
too sure that they will version their packages the same either. So I
would consider that any attempt to unify (at least at the version-level)
data between repositories is probably going to lead to maintenance
problems and anomalies.
I don't think this makes sense.
A Better Approach
~~~~~~~~~~~~~~~~~
So my thinking now is that Package Version data and User Ratings
(including aggregates) are separated by repository -- not shared between
repositories. This keeps things logically and conceptually simple and I
think it will work.
For example; If you were to search _across_ repositories and two
repositories happen to have the same Package then you will see this in
the data because two Package Versions will be returned and would be
identified as belonging to the two repositories. The system would not
try to 'hide' this fact from you by presenting this as one search result.
Packages themselves, as identified by their "name" in the HPKR data
would be logically the *same* between repositories. Is it reasonable to
assume that repository maintainers are going to avoid Package
name-conflicts between repositories? I guess if this were not the case,
there would be the case for problems at the HaikuOS level. ^^ Assuming
this, the following data would be shared across repositories;
* screenshots
* icons
* prominence
* categorization
* authorization
Conflicting package version + arch from any single repository (URL)
would be considered an error, but it would be allowed between repository
sources and by extension, repositories.
cheers.
[1]
http://www.silvereye.co.nz/tmp/hds-img-datamodel-21may2015__DRAFT.pdf
--
Andrew Lindesay