fdacluster 0.3.0

An R package for jointly aligning and clustering functional data.

software
functional data
clustering
Author
Affiliation

Department of Mathematics Jean Leray, UMR CNRS 6629

Published

July 4, 2023

Overview

The fdacluster package provides implementations of the popular \(k\)-means, hierarchical agglomerative and DBSCAN clustering methods for functional data (Ramsay and Silverman 2005). Variability in functional data can be divided into three components: amplitude, phase and ancillary variability (Vantini 2012; Marron et al. 2015). The first two sources of variability can be captured with a statistical analysis that integrates a curve alignment step. The \(k\)-means and HAC algorithms implemented in fdacluster provide clustering structures that are based either on amplitude variation (default behavior) or phase variation (Marron et al. 2014). This is achieved by jointly performing clustering and alignment of a functional data set. The three main related functions are fdakmeans() for the \(k\)-means, fdahclust() for HAC and fdadbscan() for DBSCAN.

It supports:

  • functional data defined on one-dimensional domains but possibly evaluating in multivariate codomains;
  • functional data defined in arrays but also via the fd and funData classes for functional data defined in the fda and funData packages respectively;
  • shift, dilation and affine warping functions for functional data defined on the real line (Sangalli et al. 2010) and all boundary-preserving warping functions for functional data defined on a specific interval through the SRSF framework (Tucker, Wu, and Srivastava 2013).

Installation

You can install the released version of fdacluster from CRAN with:

install.packages("fdacluster")

Alternatively you can install the development version of fdacluster from GitHub with:

# install.packages("remotes")
remotes::install_github("astamm/fdacluster")

News in v0.3.0

  • Added median centroid type;
  • Median and mean centroid types are now defined on the union of individual grids;
  • Simplified caps class to avoid storing objects multiple times under different names;
  • Added vignette on initialization strategies for k-means;
  • Added article on use case about the Berkeley growth study;
  • Added article on supported input formats.

References

Marron, J. S., J. O. Ramsay, L. M. Sangalli, and A. Srivastava. 2014. “Statistics of Time Warpings and Phase Variations.”
———. 2015. “Functional Data Analysis of Amplitude and Phase Variation.” Statistical Science, 468–84.
Ramsay, J., and B. W. Silverman. 2005. Functional Data Analysis. Springer Series in Statistics. Springer.
Sangalli, L. M., P. Secchi, S. Vantini, and V. Vitelli. 2010. “K-Mean Alignment for Curve Clustering.” Computational Statistics & Data Analysis 54 (5): 1219–33.
Tucker, J. D., W. Wu, and A. Srivastava. 2013. “Generative Models for Functional Data Using Phase and Amplitude Separation.” Computational Statistics & Data Analysis 61: 50–66.
Vantini, S. 2012. “On the Definition of Phase and Amplitude Variability in Functional Data Analysis.” Test 21 (4): 676–96. https://doi.org/10.1007/s11749-011-0268-9.