This function can be used to perform the functional boxplot of univariate or multivariate functional data.

fbplot(
  Data,
  Depths = "MBD",
  Fvalue = 1.5,
  adjust = FALSE,
  display = TRUE,
  xlab = NULL,
  ylab = NULL,
  main = NULL,
  ...
)

# S3 method for fData
fbplot(
  Data,
  Depths = "MBD",
  Fvalue = 1.5,
  adjust = FALSE,
  display = TRUE,
  xlab = NULL,
  ylab = NULL,
  main = NULL,
  ...
)

# S3 method for mfData
fbplot(
  Data,
  Depths = list(def = "MBD", weights = "uniform"),
  Fvalue = 1.5,
  adjust = FALSE,
  display = TRUE,
  xlab = NULL,
  ylab = NULL,
  main = NULL,
  ...
)

Arguments

Data

the univariate or multivariate functional dataset whose functional boxplot must be determined, in form of fData or mfData object.

Depths

either a vector containing the depths for each element of the dataset, or:

  • univariate case: a string containing the name of the method you want to use to compute it. The default is 'MBD'.

  • multivariate case: a list with elements def, containing the name of the depth notion to be used to compute depths (BD or MBD), and weights, containing the value of parameter weights to be passed to the depth function. Default is list(def = 'MBD', weights = 'uniform').

In both cases the name of the functions to compute depths must be available in the caller's environment.

Fvalue

the value of the inflation factor \(F\), default is F = 1.5.

adjust

either FALSE if you would like the default value for the inflation factor, \(F = 1.5\), to be used, or (for now only in the univariate functional case) a list specifying the parameters required by the adjustment:

  • N_trials: the number of repetitions of the adjustment procedure based on the simulation of a gaussian population of functional data, each one producing an adjusted value of \(F\), which will lead to the averaged adjusted value \(\bar{F}\). Default is 20.

  • trial_size: the number of elements in the gaussian population of functional data that will be simulated at each repetition of the adjustment procedure. Default is 8 * Data$N.

  • TPR: the True Positive Rate of outliers, i.e. the proportion of observations in a dataset without amplitude outliers that have to be considered outliers. Default is 2 * pnorm(4 * qnorm(0.25)).

  • F_min: the minimum value of \(F\), defining the left boundary for the optimization problem aimed at finding, for a given dataset of simulated gaussian data associated to Data, the optimal value of \(F\). Default is 0.5.

  • F_max: the maximum value of \(F\), defining the right boundary for the optimization problem aimed at finding, for a given dataset of simulated gaussian data associated to Data, the optimal value of \(F\). Default is 5.

  • tol: the tolerance to be used in the optimization problem aimed at finding, for a given dataset of simulated gaussian data associated to Data, the optimal value of \(F\). Default is 1e-3.

  • maxiter: the maximum number of iterations to solve the optimization problem aimed at finding, for a given dataset of simulated gaussian data associated to Data, the optimal value of \(F\). Default is 100.

  • VERBOSE: a parameter controlling the verbosity of the adjustment process.

display

either a logical value indicating whether you want the functional boxplot to be displayed, or the number of the graphical device where you want the functional boxplot to be displayed.

xlab

the label to use on the x axis when displaying the functional boxplot.

ylab

the label (or list of labels for the multivariate functional case) to use on the y axis when displaying the functional boxplot.

main

the main title (or list of titles for the multivariate functional case) to be used when displaying the functional boxplot.

...

additional graphical parameters to be used in plotting functions.

Value

Even when used in graphical way to plot the functional boxplot, the function returns a list of three elements:

  • Depths: contains the depths of each element of the functional dataset.

  • Fvalue: is the value of F used to obtain the outliers.

  • ID_out: contains the vector of indices of dataset elements flagged as outliers (if any).

Adjustment

In the univariate functional case, when the adjustment option is selected, the value of \(F\) is optimized for the univariate functional dataset provided with Data.

In practice, a number adjust$N_trials of times a synthetic population (of size adjust$tiral_size with the same covariance (robustly estimated from data) and centerline as fData is simulated without outliers and each time an optimized value \(F_i\) is computed so that a given proportion (adjust$TPR) of observations is flagged as outliers. The final value of F for the functional boxplot is determined as an average of \(F_1, F_2, \dots, F_{N_{trials}}\). At each time step the optimization problem is solved using stats::uniroot (Brent's method).

References

  1. Sun, Y., & Genton, M. G. (2012). Functional boxplots. Journal of Computational and Graphical Statistics.

  2. Sun, Y., & Genton, M. G. (2012). Adjusted functional boxplots for spatio-temporal data visualization and outlier detection. Environmetrics, 23(1), 54-64.

See also

Examples


# UNIVARIATE FUNCTIONAL BOXPLOT - NO ADJUSTMENT

set.seed(1)

N = 2 * 100 + 1
P = 2e2

grid = seq( 0, 1, length.out = P )

D = 10 * matrix( sin( 2 * pi * grid ), nrow = N, ncol = P, byrow = TRUE )

D = D + rexp(N, rate = 0.05)


# c( 0, 1 : (( N - 1 )/2), -( ( ( N - 1 ) / 2 ) : 1 ) )^4


fD = fData( grid, D )

dev.new()
oldpar <- par(mfrow = c(1, 1))
par(mfrow = c(1, 3))

plot( fD, lwd = 2, main = 'Functional dataset',
      xlab = 'time', ylab = 'values' )

fbplot( fD, main = 'Functional boxplot', xlab = 'time', ylab = 'values', Fvalue = 1.5 )
#> $Depth
#>   [1] 0.506666667 0.446517413 0.189054726 0.181044776 0.435621891 0.067860697
#>   [7] 0.435621891 0.471194030 0.496268657 0.196965174 0.372935323 0.507263682
#>  [13] 0.423830846 0.019850746 0.468457711 0.473830846 0.256666667 0.496268657
#>  [19] 0.372935323 0.489502488 0.130895522 0.494726368 0.334278607 0.483383085
#>  [25] 0.139502488 0.086368159 0.487562189 0.029651741 0.449950249 0.487562189
#>  [31] 0.362388060 0.039353234 0.362388060 0.388009950 0.227611940 0.483383085
#>  [37] 0.351442786 0.503432836 0.506218905 0.277412935 0.465621891 0.481144279
#>  [43] 0.406716418 0.415472637 0.476368159 0.345820896 0.402189055 0.489502488
#>  [49] 0.459651741 0.227611940 0.423830846 0.181044776 0.048955224 0.478805970
#>  [55] 0.491343284 0.494726368 0.242338308 0.356965174 0.462686567 0.507263682
#>  [61] 0.113383085 0.459651741 0.297263682 0.328358209 0.009950249 0.427860697
#>  [67] 0.095472637 0.456517413 0.504278607 0.501442786 0.284129353 0.148009950
#>  [73] 0.058457711 0.328358209 0.415472637 0.048955224 0.388009950 0.334278607
#>  [79] 0.503432836 0.086368159 0.411144279 0.485522388 0.502487562 0.077164179
#>  [85] 0.164726368 0.506218905 0.316218905 0.427860697 0.383084577 0.212487562
#>  [91] 0.476368159 0.446517413 0.471194030 0.310000000 0.497711443 0.322338308
#>  [97] 0.442985075 0.249552239 0.172935323 0.383084577 0.249552239 0.462686567
#> [103] 0.449950249 0.406716418 0.419701493 0.104477612 0.220099502 0.156417910
#> [109] 0.481144279 0.507412935 0.322338308 0.270597015 0.029651741 0.164726368
#> [115] 0.367711443 0.501442786 0.431791045 0.095472637 0.256666667 0.340099502
#> [121] 0.284129353 0.505024876 0.297263682 0.491343284 0.009950249 0.212487562
#> [127] 0.039353234 0.378059701 0.122189055 0.439353234 0.316218905 0.411144279
#> [133] 0.402189055 0.397562189 0.148009950 0.500298507 0.204776119 0.130895522
#> [139] 0.220099502 0.507014925 0.468457711 0.478805970 0.172935323 0.442985075
#> [145] 0.263681592 0.505671642 0.485522388 0.419701493 0.303681592 0.204776119
#> [151] 0.456517413 0.290746269 0.290746269 0.077164179 0.242338308 0.502487562
#> [157] 0.263681592 0.500298507 0.058457711 0.019850746 0.345820896 0.493084577
#> [163] 0.310000000 0.356965174 0.351442786 0.270597015 0.113383085 0.196965174
#> [169] 0.378059701 0.122189055 0.497711443 0.277412935 0.493084577 0.189054726
#> [175] 0.397562189 0.340099502 0.235024876 0.499054726 0.439353234 0.507014925
#> [181] 0.505671642 0.303681592 0.367711443 0.499054726 0.392835821 0.473830846
#> [187] 0.507462687 0.465621891 0.235024876 0.139502488 0.453283582 0.506666667
#> [193] 0.156417910 0.431791045 0.392835821 0.507412935 0.505024876 0.453283582
#> [199] 0.504278607 0.067860697 0.104477612
#> 
#> $Fvalue
#> [1] 1.5
#> 
#> $ID_outliers
#> [1]   6  14  28  53  65  73 127 154
#> 

boxplot(fD$values[,1], ylim = range(fD$values), main = 'Boxplot of functional dataset at t_0 ' )

par(oldpar)

# UNIVARIATE FUNCTIONAL BOXPLOT - WITH ADJUSTMENT


set.seed( 161803 )

P = 2e2
grid = seq( 0, 1, length.out = P )

N = 1e2

# Generating a univariate synthetic gaussian dataset
Data = generate_gauss_fdata( N, centerline = sin( 2 * pi * grid ),
                             Cov = exp_cov_function( grid,
                                                     alpha = 0.3,
                                                     beta  = 0.4 ) )
fD = fData( grid, Data )

dev.new()
# \donttest{
fbplot( fD, adjust = list( N_trials = 10,
                           trial_size = 5 * N,
                           VERBOSE = TRUE ),
                     xlab = 'time', ylab = 'Values',
                     main = 'My adjusted functional boxplot' )
#>  * * * Iteration  1  /  10 
#>  * * * * beginning optimization
#>  * * * * optimization finished.
#>  * * * Iteration  2  /  10 
#>  * * * * beginning optimization
#>  * * * * optimization finished.
#>  * * * Iteration  3  /  10 
#>  * * * * beginning optimization
#>  * * * * optimization finished.
#>  * * * Iteration  4  /  10 
#>  * * * * beginning optimization
#>  * * * * optimization finished.
#>  * * * Iteration  5  /  10 
#>  * * * * beginning optimization
#>  * * * * optimization finished.
#>  * * * Iteration  6  /  10 
#>  * * * * beginning optimization
#>  * * * * optimization finished.
#>  * * * Iteration  7  /  10 
#>  * * * * beginning optimization
#>  * * * * optimization finished.
#>  * * * Iteration  8  /  10 
#>  * * * * beginning optimization
#>  * * * * optimization finished.
#>  * * * Iteration  9  /  10 
#>  * * * * beginning optimization
#>  * * * * optimization finished.
#>  * * * Iteration  10  /  10 
#>  * * * * beginning optimization
#>  * * * * optimization finished.
#> $Depth
#>   [1] 0.30070101 0.44452323 0.36620606 0.45850303 0.37377980 0.48135354
#>   [7] 0.38181818 0.02659596 0.13369899 0.42945859 0.06214949 0.47510707
#>  [13] 0.30211111 0.23325253 0.17038384 0.12049495 0.42031313 0.49717778
#>  [19] 0.33230101 0.44418990 0.48692525 0.48365657 0.35160404 0.40534141
#>  [25] 0.39773131 0.42371515 0.22158586 0.36634141 0.14679192 0.39269293
#>  [31] 0.46537778 0.23329495 0.34899596 0.36024646 0.16920606 0.44801616
#>  [37] 0.11143434 0.46201818 0.28396162 0.46948687 0.08282020 0.13537576
#>  [43] 0.36103636 0.38312323 0.24703030 0.48692121 0.46313131 0.44435556
#>  [49] 0.22941616 0.49802424 0.12830505 0.31013333 0.23446263 0.20488081
#>  [55] 0.48862424 0.28376768 0.37487475 0.44553131 0.38107071 0.42778182
#>  [61] 0.46742828 0.43395758 0.17490101 0.46710303 0.43027879 0.38315152
#>  [67] 0.49235758 0.48834949 0.05709293 0.44494949 0.40256566 0.19737778
#>  [73] 0.46515152 0.30163030 0.46011111 0.43029495 0.46189091 0.33044242
#>  [79] 0.40226263 0.35060202 0.48137374 0.04209091 0.25347273 0.27454747
#>  [85] 0.25617778 0.28562424 0.39055152 0.34064242 0.38934141 0.32964040
#>  [91] 0.48462828 0.37410303 0.46637778 0.23939394 0.46542626 0.28972727
#>  [97] 0.29029697 0.46380606 0.41843434 0.39590101
#> 
#> $Fvalue
#> [1] 0.8693445
#> 
#> $ID_outliers
#> [1]  8 11 41 42 51 69 82
#> 
# }

# MULTIVARIATE FUNCTIONAL BOXPLOT - NO ADJUSTMENT

set.seed( 1618033 )

P = 1e2
N = 1e2
L = 2

grid = seq( 0, 1, length.out = 1e2 )

C1 = exp_cov_function( grid, alpha = 0.3, beta = 0.4 )
C2 = exp_cov_function( grid, alpha = 0.3, beta = 0.4 )

# Generating a bivariate functional dataset of gaussian data with partially
# correlated components
Data = generate_gauss_mfdata( N, L,
                              centerline = matrix( sin( 2 * pi * grid ),
                                                   nrow = 2, ncol = P,
                                                   byrow = TRUE ),
                              correlations = rep( 0.5, 1 ),
                              listCov = list( C1, C2 ) )

mfD = mfData( grid, Data )

dev.new()
fbplot( mfD, Fvalue = 2.5, xlab = 'time', ylab = list( 'Values 1',
                                                       'Values 2' ),
        main = list( 'First component', 'Second component' ) )
#> $Depth
#>   [1] 0.43254949 0.39116364 0.35335556 0.39053131 0.08351313 0.16953131
#>   [7] 0.43867677 0.20252121 0.21967273 0.45222828 0.41967879 0.25533333
#>  [13] 0.42095758 0.35050101 0.38179192 0.35996364 0.29859798 0.20294747
#>  [19] 0.40174545 0.46333535 0.38374949 0.39712121 0.34678384 0.31019192
#>  [25] 0.48306263 0.31276768 0.35823030 0.28142222 0.31049495 0.44906869
#>  [31] 0.42941010 0.38867677 0.38472323 0.44426667 0.34880000 0.38969091
#>  [37] 0.45598182 0.33775758 0.45459192 0.48445051 0.31913737 0.37989091
#>  [43] 0.15915354 0.34753131 0.37781414 0.27654545 0.48208081 0.15098990
#>  [49] 0.36521818 0.26237778 0.44220000 0.44574949 0.24915556 0.37333131
#>  [55] 0.35525455 0.28274343 0.38808283 0.11473333 0.36927071 0.16206869
#>  [61] 0.33650909 0.39217778 0.39524040 0.44069091 0.04190707 0.47903636
#>  [67] 0.41923030 0.18813333 0.39609697 0.43845657 0.35709697 0.49089697
#>  [73] 0.39865657 0.46600606 0.39957576 0.35944444 0.19638990 0.30273333
#>  [79] 0.41812323 0.27323434 0.27932929 0.29413333 0.40971717 0.20922020
#>  [85] 0.29652727 0.49594141 0.38739798 0.26582020 0.11922424 0.31874747
#>  [91] 0.49965859 0.33760808 0.32843232 0.38696970 0.42256364 0.18884444
#>  [97] 0.46271919 0.44007475 0.36836970 0.22256364
#> 
#> $Fvalue
#> [1] 2.5
#> 
#> $ID_outliers
#> integer(0)
#>