Kernels for graphs
These are convolution kernels for graphs. To use one of these, when initializing the
model, set kernel_choice = 'kernel name', e.g.
kernel_choice = "GraphRBF".
Kernel Name |
Description |
kernel_settings |
|---|---|---|
GraphRBF |
Compares graphs by averaging over
an RBF kernel applied pairwise to
all node representations
in the two graphs.
|
“averaging”: str
One of ‘none’, ‘sqrt’, ‘full’. See
below.
“intercept”:bool
|
GraphMatern |
Compares graphs by averaging over
a Matern kernel applied pairwise to
all node representations
in the two graphs.
|
“averaging”: str
One of ‘none’, ‘sqrt’, ‘full’. See
below.
“matern_nu”:float
“intercept”:bool
|
GraphCauchy |
Compares graphs by averaging over
a Cauchy kernel applied pairwise to
all node representations
in the two graphs.
|
“averaging”: str
One of ‘none’, ‘sqrt’, ‘full’. See
below.
“intercept”:bool
|
Consider a graph where each node has an associated set of features. GraphRBF compares two graphs A and B by taking each node in graph A and evaluating an RBF kernel across that node vs each node in graph B. GraphMatern and GraphCauchy do the same thing, except using Cauchy or Matern kernels. A naive implementation would have quadratic scaling in the size of the graph; in xGPR, we implement these kernels with a “trick” that results in linear scaling with graph size and number of datapoints.
These convolution kernels can be slower than fixed-vector input kernels, especially for large graphs, because to limit memory use, the convolutions are performed in batches (rather than all at once).
When using any of these kernels, you are required to supply sequence_lengths
when building a dataset or doing inference. This is the number of nodes
in each graph. xGPR uses this information to mask out any zero-padding
you may have applied to make all the graphs the same size.
Note that all three offer averaging as an option. What this means is as follows. The GraphRBF kernel computes the similarity of two graphs with \(L_1\) nodes in graph 1, \(L_2\) nodes in graph 2 as:
Cauchy and Matern are the same except with Cauchy and Matern kernels substituted.
Notice this is actually performing \(L_1 * L_2\) node comparisons
between the two, so the result will be larger when the graphs are larger. We can compensate
for this by dividing by \(L_1 * L_2\), which is full averaging, or dividing by \(\sqrt{L_1 * L_2}\), which is
sqrt averaging. Averaging is helpful if the property you are trying to predict does not
depend on graph size. It is counterproductive if graph size actually is important.
Usually the validation set performance difference
between GraphMatern, GraphCauchy and GraphRBF is
small; if this is your primary concern, we recommend defaulting
to GraphRBF and experimenting with the others if desired to
see if some small further performance achievement can be obtained.