The package nevada (NEtwork-VAlued Data Analysis) is an R package for the statistical analysis of network-valued data. In this setting, a sample is made of statistical units that are networks themselves. The package provides a set of matrix representations for networks so that network-valued data can be transformed into matrix-valued data. Subsequently, a number of distances between matrices is provided as well to quantify how far two networks are from each other and several test statistics are proposed for testing equality in distribution between samples of networks using exact permutation testing procedures. The permutation scheme is carried out by the flipr package which also provides a number of test statistics based on inter-point distances that play nicely with network-valued data. The implementation is largely made in C++ and the matrix of inter- and intra-sample distances is pre-computed, which alleviates the computational burden often associated with permutation tests.
Installation
You can install the latest stable version of nevada on CRAN with:
install.packages("nevada")
Or you can install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("astamm/nevada")
Usage
Example 1
In this first example, we compare two populations of networks generated according to two different models (Watts-Strogatz and Barabasi), using the adjacency matrix representation of networks, the Frobenius distance to compare single networks and the combination of Student-like and Fisher-like statistics based on inter-point distances to summarize information and perform the permutation test.
set.seed(123)
n <- 10L
x <- nevada::nvd(
model = "smallworld",
n = n,
model_params = list(dim = 1L, nei = 4L, p = 0.15)
)
y <- nevada::nvd(
model = "pa",
n = n,
model_params = list(power = 1L, m = NULL, directed = FALSE)
)
By default the nvd()
constructor generates networks with 25 nodes. One can wonder whether there is a difference between the distributions that generated these two samples (which there is given the models that we used). The test2_global()
function provides an answer to this question:
<- nevada::test2_global(x, y, seed = 1234)
t1_global $pvalue
t1_global1] 0.0009962984 [
The p-value is very small, leading to the conclusion that we should reject the null hypothesis of equal distributions.
Although this is a fake example, we could create a partition to try to localize differences along this partition:
partition <- as.integer(c(1:5, each = 5))
The test2_local()
function provides an answer to this question:
<- nevada::test2_local(x, y, partition, seed = 1234)
t1_local
t1_local$intra
# A tibble: 5 × 3
E pvalue truncated<chr> <dbl> <lgl>
1 P1 0.175 TRUE
2 P2 0.175 TRUE
3 P3 0.175 TRUE
4 P4 0.0859 TRUE
5 P5 0.00299 FALSE
$inter
# A tibble: 10 × 4
E1 E2 pvalue truncated<chr> <chr> <dbl> <lgl>
1 P1 P2 0.175 TRUE
2 P1 P3 0.175 TRUE
3 P1 P4 0.0859 TRUE
4 P1 P5 0.0240 FALSE
5 P2 P3 0.175 TRUE
6 P2 P4 0.00499 FALSE
7 P2 P5 0.000996 FALSE
8 P3 P4 0.0140 FALSE
9 P3 P5 0.00200 FALSE
10 P4 P5 0.0420 FALSE
Example 2
In this second example, we compare two populations of networks generated according to the same model (Watts-Strogatz), using the adjacency matrix representation of networks, the Frobenius distance to compare single networks and the combination of Student-like and Fisher-like statistics based on inter-point distances to summarize information and perform the permutation test.
One can wonder whether there is a difference between the distributions that generated these two samples (which there is given the models that we used). The test2_global()
function provides an answer to this question:
<- nevada::test2_global(x, y, seed = 1234)
t2 $pvalue
t21] 0.9999973 [
The p-value is larger than 5% or even 10%, leading us to failing to reject the null hypothesis of equal distributions at these significance thresholds.