[codeface] Re: igraph vertex indexing irregularities

From: Mitchell Joblin <joblin.m@xxxxxxxxx>
To: codeface@xxxxxxxxxxxxx
Date: Wed, 20 Nov 2013 11:10:15 +0100
On Tue, Nov 19, 2013 at 6:27 PM, Wolfgang Mauerer
<wolfgang.mauerer@xxxxxxxxxxx> wrote:
> On 19/11/13 16:24, Mitchell Joblin wrote:
>
>> It has recently become apparent that we rely on igraph to automatically
>> index the vertices sequentially, although there are exceptions where
>> this sequential index assumption is simply incorrect. In general, we
>> need to be more careful of these indexing issues and incorporate more
>> sanity checks after graph manipulations. In this direction, the
> absolutely agreed, we need more sanity checks to avoid code silently
> producing bogus results without explicit errors. In-DB, local and
> global graph indexing is slightly messy right now. It would be great
> if you could work at that some time.
>
>> following patch can be used to introduce explicit ids for the vertices
>> by setting row and column names for the adjacency matrix. This is not an
>> appropriate solution because now igraph use vertex ids that are strings
>> by casting integer row and column names from the adjacency matrix. We
>> need the vertex ids to be integer type so that we can use it for
>> indexing other data structures. Until this moment I see no opportunity
>> to convert the vertex ids to integer type and using the V(g)$name
>> attribute is restricted to character type.
>
> I see the implicit conversion for get.data.frame() where the numeric
> IDs seem to be converted to character based ones. Are there any
> other places where this happens?

I see actually don't see there is any implicit conversion for
get.data.frame(). I see that get.data.frame() eventually calls
get.edges() which then uses the V(g)$name attribute to build up the
edgelist. If we don't explicitly set vertex ids then get.data.frame()
will return integers.
>>
>> While searching for a solution I also came across this posting
>> (http://lists.gnu.org/archive/html/igraph-help/2007-10/msg00014.html)
>> where the creator of igraph mentions that sometimes the vertex ids can
>> be silently renumbered.
>>
>> I propose that we should handle indexing entirely on our own by
>> assigning a vertex attribute that is not ever manipulated by igraph. We
>> should implement sanity checks after all graph manipulations to check
>> the vertex attribute has not been dropped or changed.
> If we proceed like this, we will need to pay attention to igraph
> operations that implicitly use the internal numbering system
> and not our proposed explicit stable numbering system, for instance
> get.data.frame(). When we assign consecutive indices to the adjacency
> matrix the graph is created from, get.data.frame() is the only opportunity
> when a conversion from character back to numeric is required as far as a
> saw. So how about first setting the ids sequentially, and then making
> sure that the edge list contains only numeric entries?

That would be a better solution, but the exact problem is that I see
no way to set the ids sequentially and making sure the edge list
contains only integer entries. Essentially that is the technical
problem. The edge list uses the V(g)$name attribute which is a
character. V(g)$name is the attribute igraph offers to set your own
vertex sequence. When we don't want to explicitly set the ids and rely
on igraph's implicit indexing we can have integer type. I need to look
into the igraph code a little more to see what happens when no
V(g)$name attribute is set, I think thats in the C code though.
>
> --- a/codeface/R/cluster/persons.r
> +++ b/codeface/R/cluster/persons.r
> @@ -648,7 +648,21 @@ plot.group <- function(N, .tags, .iddb, .comm) {
>  ## compute some attributes for proper visualisation, and export the
>  ## result as a graphviz dot format if a filename is provided.
>  save.group <- function(conf, .tags, .iddb, idx, .prank, .filename=NULL, 
> label) {
> -  g <- graph.adjacency(.tags[idx,idx], mode="directed")
> +  ## Select the subset of the global graph that forms the community,
> +  ## and ensure that the per-cluster indices range from 1..|V(g.cluster)|
> +  subset <- .tags[idx, idx]
> +
> +  ## A 1x1 matrix is not of class matrix, but of class numeric,
> +  ## so ncol() won't work any more in this case. Ensure that we are actually
> +  ## working with a matrix before explicitely setting the row and column
> +  ## names to consecutive numbers.
> +  if (class(subset) == "matrix") {
> +    rownames(subset) <- 1:ncol(subset)
> +    colnames(subset) <- 1:ncol(subset)
> +  }
> +
> +  g <- graph.adjacency(subset, mode="directed")
> +
>    ## as.character is important. The igraph C export routines bark
>    ## otherwise (not sure what the actual issue is)
>    ## NOTE: V(g)$name as label index does NOT work because the name attribute
> @@ -698,6 +712,8 @@ store.graph.db <- function(conf, baselabel, idx, .iddb, 
> g.reg, g.tr, j) {
>    ## Construct a systematic representation of the graph for the data base
>    edges <- get.data.frame(g.reg, what="edges")
>    colnames(edges) <- c("fromId", "toId")
> +  edges$fromId <- as.numeric(edges$fromId)
> +  edges$toId <- as.numeric(edges$toId)
>
> This way, we make the decision for sequential indices explicit,
> but avoid having to use special accessor functions for them.
> Do you see any issues with the approach? It survives basic testing,
> but I have not looked into the implications in detail.
> Code is in branch understand on github to make testing easier
> (see b4c809f44c).

Yeah we could definitely do that. I originally wanted to avoid having
to cast since I imagine we will again have bugs where we forgot to
cast it and just assumed it was integer type. That integer assumption
is what kind of lead us to this problem in the first place. The
biggest problem I see above is we still are relying on igraphs
indexing which can be silently changed, so I would rather avoid using
it altogether. If we do use the above solution I also think as.integer
would be better than as.numeric since R's numeric data type is
basically double precision float I think.

Kind regards,

Mitchell
>
> Thanks, Wolfgang
>
>>
>> Kind regards,
>>
>> Mitchell
>>
>> From 4d24b49cf3cbec631ce033bdf5d661c6cc7af5bd Mon Sep 17 00:00:00 2001
>> From: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx
>> <mailto:mitchell.joblin.ext@xxxxxxxxxxx>>
>> Date: Sun, 17 Nov 2013 13:54:09 +0100
>> Subject: [PATCH] Fix: igraph graph.adjacency(..) vertex id assignment
>>
>> - we want to maintain a sequencial indexing for all graphs and
>>   subgraphs as a design decision
>> - graph.adjacency will use column names to assign vertex ids
>>   if they are provided, otherwise a sequencial index is the
>>   default with numeric type
>> - if indexing from a larger matrix, the global index will be
>>   used
>> - it is not possible to reassign vertex ids as an numeric
>>   type after a string has been used (e.g. V(g)$name <- numeric)
>> - to avoid a string cast everytime we want to use the vertex
>>   ids for indexing the only solution is to generate the igraph
>>   graph object from a matrix with no row/col lables
>>
>> Signed-off-by: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx
>> <mailto:mitchell.joblin.ext@xxxxxxxxxxx>>
>> ---
>>  codeface/R/cluster/persons.r | 13 +++++++++++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/codeface/R/cluster/persons.r b/codeface/R/cluster/persons.r
>> index 25e85be..47689db 100755
>> --- a/codeface/R/cluster/persons.r
>> +++ b/codeface/R/cluster/persons.r
>> @@ -647,8 +647,17 @@ plot.group <- function(N, .tags, .iddb, .comm) {
>>  ## Given a single cluster of persons, construct an igraph object,
>>  ## compute some attributes for proper visualisation, and export the
>>  ## result as a graphviz dot format if a filename is provided.
>> -save.group <- function(conf, .tags, .iddb, idx, .prank, .filename=NULL,
>> label) {
>> -  g <- graph.adjacency(.tags[idx,idx], mode="directed")
>> +save.group <- function(conf, .adj.mat, .iddb, idx, .prank,
>> .filename=NULL, label) {
>> +  adj.mat.sub <- .adj.mat[idx,idx]
>> +  ## remove row and column names otherwise igraph assigns vertices string
>> +  ## ids, even using V(g)$name <- numeric.vec won't eliminate the
>> string type id
>> +  ## which makes it impossible for using the vertex ids to index unless
>> +  ## we do a string to integer cast everytime we want to use the index
>> +  rownames(adj.mat.sub) <- NULL
>> +  colnames(adj.mat.sub) <- NULL
>> +
>> +  g <- graph.adjacency(adj.mat.sub, mode="directed")
>> +
>>    ## as.character is important. The igraph C export routines bark
>>    ## otherwise (not sure what the actual issue is)
>>    ## NOTE: V(g)$name as label index does NOT work because the name
>> attribute
>> --
>> 1.7.12.4 (Apple Git-37)
>>
>>
>>
>
>
Follow-Ups:
- [codeface] Re: igraph vertex indexing irregularities
  - From: Wolfgang Mauerer
References:
- [codeface] igraph vertex indexing irregularities
  - From: Mitchell Joblin
- [codeface] Re: igraph vertex indexing irregularities
  - From: Wolfgang Mauerer
[codeface] Re: igraph vertex indexing irregularities

Other related posts: