lin_reg

Alignment

Description

Compute pairwise linear-regression fit between multiple representations (Kornblith et al. 2019). After centering, for each pair, a least-squares map is used to assess how well one representation predicts the other, and the result is symmetrized.

Usage

repsim.lin_reg(mats)

lin_reg(mats)

Arguments

  • mats: sequence of array-like, length \(M\) List or tuple of M data representations, each of shape (n_samples, n_features_k). All matrices must share the same number of rows for matching samples. Each element can be a NumPy array or any object convertible to one via numpy.asarray.
  • mats: A list of length M containing data matrices of size (n_samples, n_features_k). All matrices must share the same number of rows for matching samples.

Returns

numpy.ndarray
Array of shape (M, M) of symmetric similarities.
matrix
An (M, M) symmetric matrix of similarities.

Examples

# | cache: true
# load necessary packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
import repsim

# set a random seed
np.random.seed(1)

# prepare the prototype
iris = load_iris(as_frame=True).frame.iloc[:, :4]
url = "https://vincentarelbundock.github.io/Rdatasets/csv/datasets/USArrests.csv"
usarrests = pd.read_csv(url, index_col=0)

X = StandardScaler().fit_transform(iris.sample(50, random_state=1))
Y = StandardScaler().fit_transform(usarrests)

n, p_X, p_Y = X.shape[0], X.shape[1], Y.shape[1]

# generate 10 of each by perturbation
mats = []
for _ in range(10):
    mats.append(X + np.random.normal(scale=1.0, size=(n, p_X)))
for _ in range(10):
    mats.append(Y + np.random.normal(scale=1.0, size=(n, p_Y)))

# compute similarities
out_R2 = repsim.pwcca(mats)

# visualize
fig, ax = plt.subplots(figsize=(8, 4), constrained_layout=True)
labs = [f"rep {i}" for i in range(1, 21)]
even_idx = list(range(1, 20, 2))

im = ax.imshow(out_R2, origin="upper")
ax.set_title("Linear Regression")
Text(0.5, 1.0, 'Linear Regression')
_ = ax.set_xticks(even_idx)
_ = ax.set_xticklabels([labs[i] for i in even_idx], rotation=90)
_ = ax.set_yticks(even_idx)
_ = ax.set_yticklabels([labs[i] for i in even_idx])

plt.show()

# load necessary packages
library(repsim)

# set a random seed
set.seed(1)

# prepare the prototype
X <- as.matrix(scale(as.matrix(iris[sample(1:150, 50, replace = FALSE), 1:4])))
Y <- as.matrix(scale(as.matrix(USArrests)))
n   <- nrow(X)
p_X <- ncol(X)
p_Y <- ncol(Y)

# generate 10 of each by perturbation
mats <- vector("list", length = 20L)
for (i in 1:10){
  mats[[i]] <- X + matrix(rnorm(n * p_X, sd = 1), nrow = n)
}
for (j in 11:20){
  mats[[j]] <- Y + matrix(rnorm(n * p_Y, sd = 1), nrow = n)
}

# compute similarities
out_R2 <- pwcca(mats)

# visualize: two heatmaps side by side
labs <- paste0("rep ", 1:20)
par(pty = "s")

image(out_R2[, 20:1], axes = FALSE, main = "Linear Regression")
axis(1, seq(0, 1, length.out = 20), labels = labs, las = 2)
axis(2, at = seq(0, 1, length.out = 20), labels = rev(labs), las = 2)

References

Kornblith, Simon, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. 2019. “Similarity of Neural Network Representations Revisited.” In International Conference on Machine Learning, 3519–29. PMLR.