Global Two-Sample Test for Network-Valued Data

This function carries out an hypothesis test where the null hypothesis is that the two populations of networks share the same underlying probabilistic distribution against the alternative hypothesis that the two populations come from different distributions. The test is performed in a non-parametric fashion using a permutational framework in which several statistics can be used, together with several choices of network matrix representations and distances between networks.

Usage

test2_global(
  x,
  y,
  representation = c("adjacency", "laplacian", "modularity", "transitivity"),
  distance = c("frobenius", "hamming", "spectral", "root-euclidean"),
  stats = c("flipr:t_ip", "flipr:f_ip"),
  B = 1000L,
  test = "exact",
  k = 5L,
  seed = NULL,
  ...
)

Arguments

x: Either an object of class nvd listing networks in sample 1 or a distance matrix of size \(n_1 + n_2\).
y: Either an object of class nvd listing networks in sample 2 or an integer value specifying the size of sample 1 or an integer vector specifying the indices of the observations belonging to sample 1.
representation: A string specifying the desired type of representation, among: "adjacency", "laplacian" and "modularity". Defaults to "adjacency".
distance: A string specifying the chosen distance for calculating the test statistic, among: "hamming", "frobenius", "spectral" and "root-euclidean". Defaults to "frobenius".
stats: A character vector specifying the chosen test statistic(s), among: "original_edge_count", "generalized_edge_count", "weighted_edge_count", "student_euclidean", "welch_euclidean" or any statistics based on inter-point distances available in the flipr package: "flipr:student_ip", "flipr:fisher_ip", "flipr:bg_ip", "flipr:energy_ip", "flipr:cq_ip". Defaults to c("flipr:student_ip", "flipr:fisher_ip").
B: The number of permutation or the tolerance. If this number is lower than 1, it is intended as a tolerance. Otherwise, it is intended as the number of required permutations. Defaults to 1000L.
test: A character string specifying the formula to be used to compute the permutation p-value. Choices are "estimate", "upper_bound" and "exact". Defaults to "exact" which provides exact tests.
k: An integer specifying the density of the minimum spanning tree used for the edge count statistics. Defaults to 5L.
seed: An integer for specifying the seed of the random generator for result reproducibility. Defaults to NULL.
...: Extra arguments to be passed to the distance function.

Value

A list with three components: the value of the statistic for the original two samples, the p-value of the resulting permutation test and a numeric vector storing the values of the permuted statistics.

Examples

n <- 5L
gnp_params <- list(n = 24L, p = 1/3)
degree_params <- list(out_degree = rep(2, 24L), method = "configuration")

# Two different models for the two populations
x <- nvd(sample_size = n, model = "gnp", !!!gnp_params)
#> ℹ Calling the `tidygraph::play_gnp()` function with the following arguments:
#> • n: 24
#> • p: 0.333333333333333
#> • directed: TRUE
#> • loops: FALSE
y <- nvd(sample_size = n, model = "degree", !!!degree_params)
#> ℹ Calling the `tidygraph::play_degree()` function with the following arguments:
#> • out_degree: 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …, 2, and 2
#> • method: configuration
#> • in_degree: NULL
t1 <- test2_global(x, y, representation = "modularity")
#> ! Setting the seed for sampling permutations is mandatory for obtaining a continuous p-value function. Using `seed = 1234`.
t1$pvalue
#> [1] 0.002308242

# Same model for the two populations
x <- nvd(sample_size = n, model = "gnp", !!!gnp_params)
#> ℹ Calling the `tidygraph::play_gnp()` function with the following arguments:
#> • n: 24
#> • p: 0.333333333333333
#> • directed: TRUE
#> • loops: FALSE
y <- nvd(sample_size = n, model = "gnp", !!!gnp_params)
#> ℹ Calling the `tidygraph::play_gnp()` function with the following arguments:
#> • n: 24
#> • p: 0.333333333333333
#> • directed: TRUE
#> • loops: FALSE
t2 <- test2_global(x, y, representation = "modularity")
#> ! Setting the seed for sampling permutations is mandatory for obtaining a continuous p-value function. Using `seed = 1234`.
t2$pvalue
#> [1] 0.8234127