pwcca

CCA

Description

This function computes pairwise projection-weighted CCA (PWCCA) similarities between multiple representations (Morcos, Raghu, and Bengio 2018). PWCCA reweights canonical directions by the magnitude of each representation’s projection onto those directions, emphasizing components that are most used by the representation.

Usage

repsim.pwcca(mats)

pwcca(mats)

Arguments

mats: sequence of array-like, length \(M\) List or tuple of M data representations, each of shape (n_samples, n_features_k). All matrices must share the same number of rows for matching samples. Each element can be a NumPy array or any object convertible to one via numpy.asarray.

mats: A list of length M containing data matrices of size (n_samples, n_features_k). All matrices must share the same number of rows for matching samples.

Returns

numpy.ndarray: Array of shape (M, M) of PWCCA similarities.

matrix: An (M, M) symmetric matrix of PWCCA similarities.

Examples

# | cache: true
# load necessary packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
import repsim

# set a random seed
np.random.seed(1)

# prepare the prototype
iris = load_iris(as_frame=True).frame.iloc[:, :4]
url = "https://vincentarelbundock.github.io/Rdatasets/csv/datasets/USArrests.csv"
usarrests = pd.read_csv(url, index_col=0)

X = StandardScaler().fit_transform(iris.sample(50, random_state=1))
Y = StandardScaler().fit_transform(usarrests)

n, p_X, p_Y = X.shape[0], X.shape[1], Y.shape[1]

# generate 10 of each by perturbation
mats = []
for _ in range(10):
    mats.append(X + np.random.normal(scale=1.0, size=(n, p_X)))
for _ in range(10):
    mats.append(Y + np.random.normal(scale=1.0, size=(n, p_Y)))

# compute similarities
out_pwcca = repsim.pwcca(mats)

# visualize
fig, ax = plt.subplots(figsize=(8, 4), constrained_layout=True)
labs = [f"rep {i}" for i in range(1, 21)]
even_idx = list(range(1, 20, 2))

im = ax.imshow(out_pwcca, origin="upper")
ax.set_title("PWCCA")

Text(0.5, 1.0, 'PWCCA')

_ = ax.set_xticks(even_idx)
_ = ax.set_xticklabels([labs[i] for i in even_idx], rotation=90)
_ = ax.set_yticks(even_idx)
_ = ax.set_yticklabels([labs[i] for i in even_idx])    

plt.show()

# load necessary packages
library(repsim)

# set a random seed
set.seed(1)

# prepare the prototype
X <- as.matrix(scale(as.matrix(iris[sample(1:150, 50, replace = FALSE), 1:4])))
Y <- as.matrix(scale(as.matrix(USArrests)))
n   <- nrow(X)
p_X <- ncol(X)
p_Y <- ncol(Y)

# generate 10 of each by perturbation
mats <- vector("list", length = 20L)
for (i in 1:10){
  mats[[i]] <- X + matrix(rnorm(n * p_X, sd = 1), nrow = n)
}
for (j in 11:20){
  mats[[j]] <- Y + matrix(rnorm(n * p_Y, sd = 1), nrow = n)
}

# compute similarities
out_pwcca <- pwcca(mats)

# visualize: two heatmaps side by side
labs <- paste0("rep ", 1:20)
par(pty = "s")

image(out_pwcca[, 20:1], axes = FALSE, main = "PWCCA")
axis(1, seq(0, 1, length.out = 20), labels = labs, las = 2)
axis(2, at = seq(0, 1, length.out = 20), labels = rev(labs), las = 2)

References

Morcos, Ari S., Maithra Raghu, and Samy Bengio. 2018. “Insights on Representational Similarity in Neural Networks with Canonical Correlation.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 5732–41. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.