This function searches for clusters in the input data set using different
strategies and generates an object of class mcaps
which stores multiple
objects of class caps
. This is a helper function to facilitate comparison
of clustering methods and choice of an optimal one.
Usage
compare_caps(
x,
y,
n_clusters = 1:5,
is_domain_interval = FALSE,
transformation = c("identity", "srsf"),
metric = c("l2", "normalized_l2", "pearson"),
clustering_method = c("kmeans", "hclust-complete", "hclust-average", "hclust-single",
"dbscan"),
warping_class = c("none", "shift", "dilation", "affine", "bpd"),
centroid_type = c("mean", "medoid", "median", "lowess", "poly"),
cluster_on_phase = FALSE
)
Arguments
- x
A numeric vector of length \(M\) or a numeric matrix of shape \(N \times M\) or an object of class
funData::funData
. If a numeric vector or matrix, it specifies the grid(s) of size \(M\) on which each of the \(N\) curves have been observed. If an object of classfunData::funData
, it contains the whole functional data set and they
argument is not used.- y
Either a numeric matrix of shape \(N \times M\) or a numeric array of shape \(N \times L \times M\) or an object of class
fda::fd
. If a numeric matrix or array, it specifies the \(N\)-sample of \(L\)-dimensional curves observed on grids of size \(M\). If an object of classfda::fd
, it contains all the necessary information about the functional data set to be able to evaluate it on user-defined grids.- n_clusters
An integer vector specifying a set of clustering partitions to create. Defaults to
1:5
.- is_domain_interval
A boolean specifying whether the sample of curves is defined on a fixed interval. Defaults to
FALSE
.- transformation
A string specifying the transformation to apply to the original sample of curves. Choices are no transformation (
transformation = "identity"
) or square-root slope functiontransformation = "srsf"
. Defaults to"identity"
.- metric
A string specifying the metric used to compare curves. Choices are
"l2"
,"normalized_l2"
or"pearson"
. Iftransformation == "srsf"
, the metric must be"l2"
because the SRSF transform maps absolutely continuous functions to square-integrable functions. Iftransformation == "identity"
andwarping_class
is eitherdilation
oraffine
, the metric cab be either"normalized_l2"
or"pearson"
. The L2 distance is indeed not dilation-invariant or affine-invariant. The metric can also be"l2"
ifwarping_class == "shift"
. Defaults to"l2"
.- clustering_method
A character vector specifying one or more clustering methods to be fit. Choices are
"kmeans"
,"hclust-complete"
,"hclust-average"
,"hclust-single"
or"dbscan"
. Defaults to all of them.- warping_class
A character vector specifying one or more classes of warping functions to use for curve alignment. Choices are
"affine"
,"dilation"
,"none"
,"shift"
or"srsf"
. Defaults to all of them.- centroid_type
A character vector specifying one or more ways to compute centroids. Choices are
"mean"
,"medoid"
,"median"
,"lowess"
or"poly"
. Defaults to all of them.- cluster_on_phase
A boolean specifying whether clustering should be based on phase variation or amplitude variation. Defaults to
FALSE
which implies amplitude variation.
Value
An object of class mcaps
which is a tibble::tibble
storing the
objects of class caps
in correspondence of each combination of possible
choices from the input arguments.
Examples
#----------------------------------
# Compare k-means results with k = 1, 2, 3, 4, 5 using mean centroid and
# various warping classes.
if (FALSE) {
sim30_mcaps <- compare_caps(
x = simulated30_sub$x,
y = simulated30_sub$y,
warping_class = c("none", "shift", "dilation", "affine"),
clustering_method = "kmeans",
centroid_type = "mean"
)
}
#----------------------------------
# Then visualize the results
# Either with ggplot2 via ggplot2::autoplot(sim30_mcaps)
# or using graphics::plot()
# You can visualize the WSS values:
plot(sim30_mcaps, validation_criterion = "wss", what = "mean")
plot(sim30_mcaps, validation_criterion = "wss", what = "distribution")
# Or the average silhouette values:
plot(sim30_mcaps, validation_criterion = "silhouette", what = "mean")
plot(sim30_mcaps, validation_criterion = "silhouette", what = "distribution")