Distance Measures between Samples through Empirical Cumulative Distribution Functions

We measure distance between two empirical cumulative distribution functions of the data. Unlike ecdfdist, this function takes raw data samples as input, and internally computes the empirical cumulative distribution functions (ECDF) for distance calculations.

ecdfdistS(
  veclist,
  method = c("KS", "Lp", "Wasserstein"),
  p = 1,
  as.dist = FALSE
)

Arguments

veclist: a length \(N\) list of vectors.
method: name of the distance/dissimilarity measure. Case insensitive (default: ks).
p: exponent for Lp or Wasserstein distance (default: p=1).
as.dist: a logical; TRUE to return dist object, FALSE to return an \((N\times N)\) symmetric matrix of pairwise distances (default: FALSE).

Value

either dist object of an \((N\times N)\) symmetric matrix of pairwise distances by as.dist argument.

Examples

# \donttest{
## toy example : 10 of random and uniform distributions
mylist = list()
for (i in 1:10){
  mylist[[i]] = stats::rnorm(50, sd=2)
}
for (i in 11:20){
  mylist[[i]] = stats::runif(50, min=-5)
}

## compute three distances
d_KS = ecdfdistS(mylist, method="KS")
d_LP = ecdfdistS(mylist, method="Lp")
d_OT = ecdfdistS(mylist, method="Wasserstein")

## visualize
opar = par(no.readonly=TRUE)
par(mfrow=c(1,3), pty="s")
image(d_KS[,nrow(d_KS):1], axes=FALSE, main="Kolmogorov-Smirnov")
image(d_LP[,nrow(d_LP):1], axes=FALSE, main="Lp (p=1)")
image(d_OT[,nrow(d_OT):1], axes=FALSE, main="Wasserstein (p=1)")

par(opar)
# }