[codeface] Re: igraph vertex indexing irregularities

From: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxx>
To: JOBLIN MITCHELL Z002UURH <mitchell.joblin.ext@xxxxxxxxxxx>
Date: Tue, 19 Nov 2013 18:27:38 +0100
On 19/11/13 16:24, Mitchell Joblin wrote:

> It has recently become apparent that we rely on igraph to automatically
> index the vertices sequentially, although there are exceptions where
> this sequential index assumption is simply incorrect. In general, we
> need to be more careful of these indexing issues and incorporate more
> sanity checks after graph manipulations. In this direction, the
absolutely agreed, we need more sanity checks to avoid code silently
producing bogus results without explicit errors. In-DB, local and
global graph indexing is slightly messy right now. It would be great
if you could work at that some time.

> following patch can be used to introduce explicit ids for the vertices
> by setting row and column names for the adjacency matrix. This is not an
> appropriate solution because now igraph use vertex ids that are strings
> by casting integer row and column names from the adjacency matrix. We
> need the vertex ids to be integer type so that we can use it for
> indexing other data structures. Until this moment I see no opportunity
> to convert the vertex ids to integer type and using the V(g)$name
> attribute is restricted to character type.

I see the implicit conversion for get.data.frame() where the numeric
IDs seem to be converted to character based ones. Are there any
other places where this happens?
> 
> While searching for a solution I also came across this posting
> (http://lists.gnu.org/archive/html/igraph-help/2007-10/msg00014.html)
> where the creator of igraph mentions that sometimes the vertex ids can
> be silently renumbered. 
> 
> I propose that we should handle indexing entirely on our own by
> assigning a vertex attribute that is not ever manipulated by igraph. We
> should implement sanity checks after all graph manipulations to check
> the vertex attribute has not been dropped or changed.
If we proceed like this, we will need to pay attention to igraph
operations that implicitly use the internal numbering system
and not our proposed explicit stable numbering system, for instance
get.data.frame(). When we assign consecutive indices to the adjacency
matrix the graph is created from, get.data.frame() is the only opportunity
when a conversion from character back to numeric is required as far as a
saw. So how about first setting the ids sequentially, and then making
sure that the edge list contains only numeric entries?

--- a/codeface/R/cluster/persons.r
+++ b/codeface/R/cluster/persons.r
@@ -648,7 +648,21 @@ plot.group <- function(N, .tags, .iddb, .comm) {
 ## compute some attributes for proper visualisation, and export the
 ## result as a graphviz dot format if a filename is provided.
 save.group <- function(conf, .tags, .iddb, idx, .prank, .filename=NULL, label) 
{
-  g <- graph.adjacency(.tags[idx,idx], mode="directed")
+  ## Select the subset of the global graph that forms the community,
+  ## and ensure that the per-cluster indices range from 1..|V(g.cluster)|
+  subset <- .tags[idx, idx]
+
+  ## A 1x1 matrix is not of class matrix, but of class numeric,
+  ## so ncol() won't work any more in this case. Ensure that we are actually
+  ## working with a matrix before explicitely setting the row and column
+  ## names to consecutive numbers.
+  if (class(subset) == "matrix") {
+    rownames(subset) <- 1:ncol(subset)
+    colnames(subset) <- 1:ncol(subset)
+  }
+
+  g <- graph.adjacency(subset, mode="directed")
+
   ## as.character is important. The igraph C export routines bark
   ## otherwise (not sure what the actual issue is)
   ## NOTE: V(g)$name as label index does NOT work because the name attribute
@@ -698,6 +712,8 @@ store.graph.db <- function(conf, baselabel, idx, .iddb, 
g.reg, g.tr, j) {
   ## Construct a systematic representation of the graph for the data base
   edges <- get.data.frame(g.reg, what="edges")
   colnames(edges) <- c("fromId", "toId")
+  edges$fromId <- as.numeric(edges$fromId)
+  edges$toId <- as.numeric(edges$toId)

This way, we make the decision for sequential indices explicit,
but avoid having to use special accessor functions for them.
Do you see any issues with the approach? It survives basic testing,
but I have not looked into the implications in detail. 
Code is in branch understand on github to make testing easier
(see b4c809f44c).

Thanks, Wolfgang

> 
> Kind regards, 
> 
> Mitchell
> 
> From 4d24b49cf3cbec631ce033bdf5d661c6cc7af5bd Mon Sep 17 00:00:00 2001
> From: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx
> <mailto:mitchell.joblin.ext@xxxxxxxxxxx>>
> Date: Sun, 17 Nov 2013 13:54:09 +0100
> Subject: [PATCH] Fix: igraph graph.adjacency(..) vertex id assignment
> 
> - we want to maintain a sequencial indexing for all graphs and
>   subgraphs as a design decision
> - graph.adjacency will use column names to assign vertex ids
>   if they are provided, otherwise a sequencial index is the
>   default with numeric type
> - if indexing from a larger matrix, the global index will be
>   used
> - it is not possible to reassign vertex ids as an numeric
>   type after a string has been used (e.g. V(g)$name <- numeric)
> - to avoid a string cast everytime we want to use the vertex
>   ids for indexing the only solution is to generate the igraph
>   graph object from a matrix with no row/col lables
> 
> Signed-off-by: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx
> <mailto:mitchell.joblin.ext@xxxxxxxxxxx>>
> ---
>  codeface/R/cluster/persons.r | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/codeface/R/cluster/persons.r b/codeface/R/cluster/persons.r
> index 25e85be..47689db 100755
> --- a/codeface/R/cluster/persons.r
> +++ b/codeface/R/cluster/persons.r
> @@ -647,8 +647,17 @@ plot.group <- function(N, .tags, .iddb, .comm) {
>  ## Given a single cluster of persons, construct an igraph object,
>  ## compute some attributes for proper visualisation, and export the
>  ## result as a graphviz dot format if a filename is provided.
> -save.group <- function(conf, .tags, .iddb, idx, .prank, .filename=NULL,
> label) {
> -  g <- graph.adjacency(.tags[idx,idx], mode="directed")
> +save.group <- function(conf, .adj.mat, .iddb, idx, .prank,
> .filename=NULL, label) {
> +  adj.mat.sub <- .adj.mat[idx,idx]
> +  ## remove row and column names otherwise igraph assigns vertices string
> +  ## ids, even using V(g)$name <- numeric.vec won't eliminate the
> string type id
> +  ## which makes it impossible for using the vertex ids to index unless
> +  ## we do a string to integer cast everytime we want to use the index
> +  rownames(adj.mat.sub) <- NULL
> +  colnames(adj.mat.sub) <- NULL
> +
> +  g <- graph.adjacency(adj.mat.sub, mode="directed")
> +
>    ## as.character is important. The igraph C export routines bark
>    ## otherwise (not sure what the actual issue is)
>    ## NOTE: V(g)$name as label index does NOT work because the name
> attribute
> -- 
> 1.7.12.4 (Apple Git-37)
> 
> 
>
Follow-Ups:
- [codeface] Re: igraph vertex indexing irregularities
  - From: Mitchell Joblin
References:
- [codeface] igraph vertex indexing irregularities
  - From: Mitchell Joblin
[codeface] Re: igraph vertex indexing irregularities

Other related posts: