On Graphvix - Part 6
Subgraphs
Subgraphs vs Clusters
In DOT, a subgraph is any subset of the defined graph enclosed within subgraph ID { ... }
, or even just { ... }
. This can be used to specify a common style for a set of nodes or edges (subgraphs can have their own global properties, like the graph as a whole, as we saw in the previous post), or to help group the graph for directional ranking (as we’ll see in a future post).
From the DOT documentation: “A cluster is a subgraph placed in its own distinct rectangle of the layout. A subgraph is recognized as a cluster when its name has the prefix cluster
”, that is, any subset of the defined graph enclosed within subgraph cluster<ID> { ... }
.
In addition to the global properties for nodes and edges, a cluster can also define a set of layout options for itself, such as a label and border/fill colors.
So in Graphvix, we can model this either as two discrete struct types, or as a single struct type with a flag to determine whether the defined subgraph should be considered a cluster or not. There is enough overlap between subgraphs and clusters that sharing a struct type feels like the correct choice.
The Subgraph
struct
defmodule Subgraph do
defstruct [
id: nil,
is_cluster: false,
vertex_ids: [],
global_properties: [
node: [],
edge: []
],
cluster_properties: []
end
end
Nothing should be too surprising here. Like the Graph
module, we have a nested keyword map for global properties shared between all vertices and edges in the subgraph. There is also a list of cluster_properties
, but this will only be used if is_cluster
is set to true.
We have an id
field to ensure a unique reference and DOT
identifier for each subgraph. Finally, there is a list of vertex ids: we will continue to store vertices in the digraph
ETS table, and reference them in any subgraph.
In addition to the subgraph module, we need to make two changes to the Graph
module: add a way to store subgraphs created inside the main graph, and keep track of the next subgraph id to ensure unique references.
The first of these changes is straightforward: we can add a subgraphs: []
entry to our Graph
struct definition.
The second task is more involved, but the answer already exists in our code. When we looked into the :digraph
module, we saw that the neighbors table contained two fields tracking the next id to be used for the vertices and the edges:
iex> :ets.tab2list(graph.ntab)
[
{:"$eid", 0},
{:"$vid", 0},
...
]
and, in part 3 of this series, [link here], we saw how we can read those values to ensure new vertices have the correct id. Based on that, we can modify our Graph.new/0
function to insert another entry into that table, an id field that can track our subgraphs:
def new do
digraph = :digraph.new()
{_, _, _, ntab, _} = digraph
:ets.insert(ntab, {:"$sid", 0})
...
end
Then, when we create a new subgraph or cluster, we can pull the current value from that row, and increment the next value in the table, the same as we do when creating a new vertex.
This post is part of a series:
- Part 1 - The over engineering-ing
- Part 2 - A :digraph primer
- Part 3 - IDs
- Part 4 - A first API
- Part 5 - Global settings
- Part 6 - Subgraphs
- Part 7 - Records
- Part 8 - Records API
- Part 9 - Ranking
- Postscript - HTML Records
Creating a new subgraph
Creating a new subgraph is fairly straightforward. The function needs to take a fair number of arguments, but all fairly simple:
- the graph the subgraph belongs to
- the ids of the vertices included in the subgraph
- whether the subgraph should be considered a cluster
- global properties for contained nodes and edges
- styles for the cluster (if applicable)
It would then return a tuple of two elements, the graph (which has had the subgraph added to its list of subgraphs), and the id of the new subgraph.
DOT format for subgraphs
This is where things get interesting. Up until this point, our Graph.to_dot
function has printed all vertices and edges at the top level of the graph. Now that we have introduced subgraphs, this needs to change.
For each subgraph, it needs to be printed containing the definitions of any vertices it contains, as well as any edges which have both their endpoints inside the subgraph. In addition to this, those vertices and edges need to not be printed in the top level of the main graph.
Vertices
Handling the vertices is the simpler of these two tasks. Each subgraph stores the ids of the vertices, so when printing the subgraph, we can pass in the vertex table from the main graph, find the contained vertices, and print them. The inverse can be done when determining which vertices should be printed in the top-level graph: for each vertex, we check whether it is included in any of the graph’s subgraphs, and only print it if it is not.
In the subgraph, we can return a subset of a graph’s vertices whose ids appear in that subgraph:
defp vertices_in_this_subgraph(subgraph_vertex_ids, vertices_from_graph) do
vertices_from_graph
|> Enum.filter(fn {vid, _attributes} -> vid in subgraph_vertex_ids end)
end
And in the graph, we can select all vertices that appear in its subgraphs:
defp vertex_ids_in_subgraphs(%__MODULE__{subgraphs: subgraphs}) do
Enum.reduce(subgraphs, [], fn c, acc ->
acc ++ c.vertex_ids
end)
end
With these lists constructed, we can use the existing vertices_to_dot
function and helper functions to print each vertex in DOT
format in its correct place in the graph.
Edges
Determining whether an edge is entirely contained by a single subgraph (that is, the vertices on both ends of the edge are in the same subgraph) is a bit more complex, but follows the same pattern:
For a subgraph, we determine, for each edge in a graph, whether both its ends are in the subgraph in question:
def edges_with_both_vertices_in_subgraph(%{vertex_ids: vertex_ids}, graph) do
[_, etab, _] = Graphvix.Graph.digraph_tables(graph)
edges = :ets.tab2list(etab)
Enum.filter(edges, fn {_, vid1, vid2, _} ->
both_vertices_in_subgraph?(vertex_ids, vid1, vid2)
end)
end
def both_vertices_in_subgraph?(vertex_ids, vid1, vid2) do
vid1 in vertex_ids && vid2 in vertex_ids
end
And in the main graph, we find all edges which have both their ends in the same one of its subgraphs, using the helper function Subgraph.both_vertices_in_subgraph?/3
we defined above:
def edges_contained_in_subgraphs(graph = {subgraphs: subgraphs}) do
[_, etab, _] = digraph_tables(graph)
edges = :ets.tab2list(etab)
Enum.filter(edges, fn {_, vid1, vid2, _} ->
Enum.any?(subgraphs, fn %{vertex_ids: vertex_ids} ->
Subgraph.both_vertices_in_subgraph?(vertex_ids, vid1, vid2)
end)
end)
end
And, again, we can use the existing *_to_dot
functions to print out each edge at the correct level in the graph.
And that wraps up looking at subgraphs! Thanks for reading, and stay tuned for the next topic: Record Nodes.