K-mean alignment and variants for functional data

kma(
  x,
  y,
  n_clusters = 1L,
  warping_class = c("affine", "dilation", "none", "shift", "srsf"),
  seeds = NULL,
  maximum_number_of_iterations = 100L,
  centroid_type = c("mean", "medoid"),
  distance = c("l2", "pearson"),
  warping_options = c(0.15, 0.15),
  number_of_threads = 1L,
  parallel_method = 0L,
  distance_relative_tolerance = 0.001,
  use_fence = FALSE,
  check_total_dissimilarity = TRUE,
  use_verbose = TRUE,
  compute_overall_center = FALSE
)

Arguments

x

A numeric matrix of shape nObs x nPts specifying the evaluation grid of each observation.

y

A numeric array of shape nObs x nDim x nPts specifying the observation values.

n_clusters

An integer value specifying the number of clusters. Defaults to 1L.

warping_class

A string specifying the warping class Choices are "affine", "dilation", "none", "shift" or "srsf". Defaults to "affine". The SRSF class is the only class which is boundary-preserving.

seeds

An integer vector of length n_clust specifying the indices of the initial templates. Defaults to NULL, which boils down to randomly sampled indices.

maximum_number_of_iterations

An integer specifying the maximum number of iterations before the algorithm stops (default: 100L).

centroid_type

A string specifying the type of centroid to compute. Choices are "mean" or "medoid". Defaults to "mean". This is used only when warping_class != "srsf". When warping_class = "srsf, the mean is systematically used.

distance

A string specifying the distance used to compare curves. Choices are "l2" or "pearson". Defaults to "l2". This is used only when warping_class != "srsf".

warping_options

A numeric vector supplied as a helper to the chosen warping_class to decide on warping parameter bounds. This is used only when warping_class != "srsf".

number_of_threads

An integer value specifying the number of threads used for parallelization. Defaults to 1L. This is used only when warping_class != "srsf".

parallel_method

An integer value specifying the type of desired parallelization for template computation, If 0L, templates are computed in parallel. If 1L, parallelization occurs within a single template computation (only for the medoid method as of now). Defaults to 0L. This is used only when warping_class != "srsf".

distance_relative_tolerance

A numeric value specifying a relative tolerance on the distance update between two iterations. If all observations have not sufficiently improved in that sense, the algorithm stops. Defaults to 1e-3. This is used only when warping_class != "srsf".

use_fence

A boolean specifying whether the fence algorithm should be used to robustify the algorithm against outliers. Defaults to FALSE. This is used only when warping_class != "srsf".

check_total_dissimilarity

A boolean specifying whether an additional stopping criterion based on improvement of the total dissimilarity should be used. Defaults to TRUE. This is used only when warping_class != "srsf".

use_verbose

A boolean specifying whether the algorithm should output details of the steps to the console. Defaults to TRUE. This is used only when warping_class != "srsf".

compute_overall_center

A boolean specifying whether the overall center should be also computed. Defaults to FALSE. This is used only when warping_class != "srsf".

Value

An object of class kma, which is a list with the following components:

original_curves: A numeric matrix of shape \(N \times L \times M\)

storing the original sample of \(N\)

\(L\)-dimensional curves observed on grids of size \(M\). original_grids: A numeric matrix of shape \(N \times M\) storing the original grids of size \(M\) on which wer evaluated the \(N\) curves;

x: As input; y: As input; seeds: Indices used in the algorithm; iterations: Number of iterations before the KMA algorithm stops; n_clust: As input; overall_center_grid: Overall center grid if compute_overall_center is set; overall_center_values: Overall center values if compute_overall_center is set; distances_to_overall_center: Distances of each observation to the overall center if compute_overall_center is set; x_final: Aligned observation grids; n_clust_final: Final number of clusters. Note that n_clust_final may differ from initial number of clusters n_clust if some clusters are empty; x_centers_final: Final center grids; y_centers_final: Final center values; template_grids: List of template grids at each iteration; template_values: List of template values at each iteration; labels: Cluster memberships; final_dissimilarity: Distances of each observation to the center of its assigned cluster; parameters_list: List of estimated warping parameters at each iteration; parameters: Final estimated warping parameters; warping_method: As input; dissimilarity_method: As input; center_method: As input; optimizer_method: As input.

Examples

res <- kma(
  simulated30$x,
  simulated30$y,
  seeds = c(1, 21),
  n_clust = 2,
  center_method = "medoid",
  warping_method = "affine",
  dissimilarity_method = "pearson"
)
#> Error in kma(simulated30$x, simulated30$y, seeds = c(1, 21), n_clust = 2,     center_method = "medoid", warping_method = "affine", dissimilarity_method = "pearson"): unused arguments (center_method = "medoid", warping_method = "affine", dissimilarity_method = "pearson")