lin_reg

Alignment

Description

Compute pairwise linear-regression fit between multiple representations (Kornblith et al. 2019). After centering, for each pair, a least-squares map is used to assess how well one representation predicts the other, and the result is symmetrized.

Usage

repsim.lin_reg(mats)

lin_reg(mats)

Arguments

mats: sequence of array-like, length \(M\) List or tuple of M data representations, each of shape (n_samples, n_features_k). All matrices must share the same number of rows for matching samples. Each element can be a NumPy array or any object convertible to one via numpy.asarray.

mats: A list of length M containing data matrices of size (n_samples, n_features_k). All matrices must share the same number of rows for matching samples.

Returns

numpy.ndarray: Array of shape (M, M) of symmetric similarities.

matrix: An (M, M) symmetric matrix of similarities.

Examples

# | cache: true
# load necessary packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
import repsim

# set a random seed
np.random.seed(1)

# prepare the prototype
iris = load_iris(as_frame=True).frame.iloc[:, :4]
url = "https://vincentarelbundock.github.io/Rdatasets/csv/datasets/USArrests.csv"
usarrests = pd.read_csv(url, index_col=0)

X = StandardScaler().fit_transform(iris.sample(50, random_state=1))
Y = StandardScaler().fit_transform(usarrests)

n, p_X, p_Y = X.shape[0], X.shape[1], Y.shape[1]

# generate 10 of each by perturbation
mats = []
for _ in range(10):
    mats.append(X + np.random.normal(scale=1.0, size=(n, p_X)))
for _ in range(10):
    mats.append(Y + np.random.normal(scale=1.0, size=(n, p_Y)))

# compute similarities
out_R2 = repsim.pwcca(mats)

# visualize
fig, ax = plt.subplots(figsize=(8, 4), constrained_layout=True)
labs = [f"rep {i}" for i in range(1, 21)]
even_idx = list(range(1, 20, 2))

im = ax.imshow(out_R2, origin="upper")
ax.set_title("Linear Regression")

Text(0.5, 1.0, 'Linear Regression')

_ = ax.set_xticks(even_idx)
_ = ax.set_xticklabels([labs[i] for i in even_idx], rotation=90)
_ = ax.set_yticks(even_idx)
_ = ax.set_yticklabels([labs[i] for i in even_idx])

plt.show()

# load necessary packages
library(repsim)

# set a random seed
set.seed(1)

# prepare the prototype
X <- as.matrix(scale(as.matrix(iris[sample(1:150, 50, replace = FALSE), 1:4])))
Y <- as.matrix(scale(as.matrix(USArrests)))
n   <- nrow(X)
p_X <- ncol(X)
p_Y <- ncol(Y)

# generate 10 of each by perturbation
mats <- vector("list", length = 20L)
for (i in 1:10){
  mats[[i]] <- X + matrix(rnorm(n * p_X, sd = 1), nrow = n)
}
for (j in 11:20){
  mats[[j]] <- Y + matrix(rnorm(n * p_Y, sd = 1), nrow = n)
}

# compute similarities
out_R2 <- pwcca(mats)

# visualize: two heatmaps side by side
labs <- paste0("rep ", 1:20)
par(pty = "s")

image(out_R2[, 20:1], axes = FALSE, main = "Linear Regression")
axis(1, seq(0, 1, length.out = 20), labels = labs, las = 2)
axis(2, at = seq(0, 1, length.out = 20), labels = rev(labs), las = 2)

References

Kornblith, Simon, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. 2019. “Similarity of Neural Network Representations Revisited.” In International Conference on Machine Learning, 3519–29. PMLR.