Title: | Informative Procrustean Matrix Correlation |
---|---|
Description: | Estimates corrected Procrustean correlation between matrices for removing overfitting effect. Coissac Eric and Gonindard-Melodelima Christelle (2019) <doi:10.1101/842070>. |
Authors: | Eric Coissac, Christelle Gonindard-Melodelima |
Maintainer: | Eric Coissac <[email protected]> |
License: | CeCILL-2 |
Version: | 1.0.8 |
Built: | 2024-11-19 05:11:47 UTC |
Source: | https://github.com/cran/ProcMod |
The permutation schema is defined using the 'how' function. The implementation of this function is inspired from the VEGAN package and reproduced here to avoid an extra dependency on an hidden vegan function.
.getPermuteMatrix(permutations, n, strata = NULL)
.getPermuteMatrix(permutations, n, strata = NULL)
permutations |
a list of control values for the permutations as returned
by the function |
n |
numeric; the number of observations in the sample set.
May also be any object that nobs knows about;
see |
strata |
A factor, or an object that can be coerced to a factor via as.factor, specifying the strata for permutation. |
Internal function do not use.
Transforme the x
value into a numeric matrix
of
the correct size or into a dist
object.
.procmod_coerce_value(x, nrows = 0, contrasts = NULL)
.procmod_coerce_value(x, nrows = 0, contrasts = NULL)
x |
The data to coerce |
nrows |
an interger value specifying the number of row of the returned matrix |
contrasts |
see the |
a new numeric matrix with correct size.
Internal function do not use.
Eric Coissac <[email protected]>
Christelle Gonindard-Melodelima <[email protected]>
repeats several times the rows of a matrix to create a new matrix with more rows. The final row count must be a multiple of the initial row count
.rep_matrix(x, nrow)
.rep_matrix(x, nrow)
x |
The matrix to replicate |
nrow |
an interger value specifying the number of row of the returned matrix |
a new matrix with the same number of columns but with 'nrow' rows.
Internal function do not use.
Eric Coissac <[email protected]>
Christelle Gonindard-Melodelima <[email protected]>
The trace of a square matrix is defined as the sum of its diagonal elements.
.Trace(X)
.Trace(X)
X |
a square matrix |
the trace of X
Internal function do not use.
Eric Coissac
Christelle Gonindard-Melodelima
m <- matrix(1:16, nrow = 4) ProcMod:::.Trace(m)
m <- matrix(1:16, nrow = 4) ProcMod:::.Trace(m)
Conversion methods are proposed for list
,
matrix
and array
.
as_procmod_frame(data, ...) ## S3 method for class 'list' as_procmod_frame(data, ...) ## S3 method for class 'procmod_frame' as_procmod_frame(data, ...) ## S3 method for class 'array' as_procmod_frame(data, ...) ## S3 method for class 'matrix' as_procmod_frame(data, ...)
as_procmod_frame(data, ...) ## S3 method for class 'list' as_procmod_frame(data, ...) ## S3 method for class 'procmod_frame' as_procmod_frame(data, ...) ## S3 method for class 'array' as_procmod_frame(data, ...) ## S3 method for class 'matrix' as_procmod_frame(data, ...)
data |
a R object to coerce. |
... |
supplementary parameters used in some implementation of that method |
a procmod_frame
object
Eric Coissac
Christelle Gonindard-Melodelima
# Builds a list containing two random matrices m1 <- simulate_matrix(10,20) m2 <- simulate_matrix(10,30) l <- list(m1 = m1, m2 = m2) # Converts the list to a procmod_frame pmf1 <- as_procmod_frame(l) # Builds a procmod_frame from a matrix m3 <- matrix(1:12,nrow=3) pmf2 <- as_procmod_frame(matrix(1:12,nrow=3)) # Returns 4, the column count of the input matrix length(pmf2) # Builds a 3D array a <- array(1:24,dim = c(3,4,2)) # The conversion to a procmod_frame makes # an procmod element from each third dimension as_procmod_frame(a)
# Builds a list containing two random matrices m1 <- simulate_matrix(10,20) m2 <- simulate_matrix(10,30) l <- list(m1 = m1, m2 = m2) # Converts the list to a procmod_frame pmf1 <- as_procmod_frame(l) # Builds a procmod_frame from a matrix m3 <- matrix(1:12,nrow=3) pmf2 <- as_procmod_frame(matrix(1:12,nrow=3)) # Returns 4, the column count of the input matrix length(pmf2) # Builds a 3D array a <- array(1:24,dim = c(3,4,2)) # The conversion to a procmod_frame makes # an procmod element from each third dimension as_procmod_frame(a)
dist
object to a data.frame
object.The created data.frame
has a attribute is.dist
set to
the logical value TRUE
.
## S3 method for class 'dist' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
## S3 method for class 'dist' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
the |
row.names |
NULL or a |
optional |
logical. If |
... |
additional arguments to be passed to or from methods. |
Eric Coissac
Christelle Gonindard-Melodelima
data(bacteria) bacteria_rel_freq <- sweep(bacteria, 1, rowSums(bacteria), "/") bacteria_hellinger <- sqrt(bacteria_rel_freq) bacteria_dist <- dist(bacteria_hellinger) bdf <- as.data.frame(bacteria_dist)
data(bacteria) bacteria_rel_freq <- sweep(bacteria, 1, rowSums(bacteria), "/") bacteria_hellinger <- sqrt(bacteria_rel_freq) bacteria_dist <- dist(bacteria_hellinger) bdf <- as.data.frame(bacteria_dist)
colSums and rowSums of the returned matrix are all equal to zero.
bicenter(m)
bicenter(m)
m |
a |
Inspired from the algorithm described in stackoverflow https://stackoverflow.com/questions/43639063/double-centering-in-r
a numeric
matrix centred by rows
and columns
Eric Coissac
Christelle Gonindard-Melodelima
data(bacteria) bact_bc <- bicenter(bacteria) sum(rowSums(bact_bc)) sum(colSums(bact_bc))
data(bacteria) bact_bc <- bicenter(bacteria) sum(rowSums(bact_bc)) sum(colSums(bact_bc))
performs a Monte-Carlo Test on the sum of the singular values of a
procustean rotation (see procuste.rtest
).
corls_test( ..., permutations = permute::how(nperm = 999), p_adjust_method = "holm" )
corls_test( ..., permutations = permute::how(nperm = 999), p_adjust_method = "holm" )
... |
the set of matrices or a |
permutations |
a list of control values for the permutations as returned
by the function |
p_adjust_method |
the multiple test correction method used
to adjust p values. |
Eric Coissac
Christelle Gonindard-Melodelima
Jackson DA (1995). “PROTEST: A PROcrustean Randomization TEST of community environment concordance.” Écoscience, 2(3), 297–303.
A <- simulate_matrix(10,3) B <- simulate_matrix(10,5) C <- simulate_correlation(B,10,r2=0.6) # Computes the correlation matrix data <- procmod_frame(A = A, B = B, C = C) corls_test(data, permutations = 100)
A <- simulate_matrix(10,3) B <- simulate_matrix(10,5) C <- simulate_correlation(B,10,r2=0.6) # Computes the correlation matrix data <- procmod_frame(A = A, B = B, C = C) corls_test(data, permutations = 100)
Dimension 1 is the number of rows (individus) shared by the aggregated matrices. Dimension 2 is the number of aggregated matrices
## S3 method for class 'procmod_frame' dim(x)
## S3 method for class 'procmod_frame' dim(x)
x |
a |
Eric Coissac
Christelle Gonindard-Melodelima
# Builds a procmod_frame with two random matrices m1 <- simulate_matrix(10,20) m2 <- simulate_matrix(10,30) pmf <- procmod_frame(m1 = m1, m2 = m2) dim(pmf)
# Builds a procmod_frame with two random matrices m1 <- simulate_matrix(10,20) m2 <- simulate_matrix(10,30) pmf <- procmod_frame(m1 = m1, m2 = m2) dim(pmf)
This data set of five data.frame
is a simplified version of a full data set
describing biodiversity changes along a South-North
gradient on the Australian East Coast, from Sidney to
North Cap using a DNA metabarcoding approach.
The gradient is constituted of 21 locations.
data(eukaryotes) data(bacteria) data(climat) data(soil) data(geography)
data(eukaryotes) data(bacteria) data(climat) data(soil) data(geography)
five data.frame of 21 rows
An object of class data.frame
with 21 rows and 2150 columns.
An object of class data.frame
with 21 rows and 6 columns.
An object of class data.frame
with 21 rows and 12 columns.
An object of class data.frame
with 21 rows and 2 columns.
is a 21 x 2150 data.frame
describing bacterial
community at each one of the 21 locations.
Each number is the relative frequency of a molecular operational
taxonomy unit (MOTU) at a site after data cleaning and
averaging of 135 pontual measures.
is a 21 x 1393 data.frame
describing eukariote
community at each one of the 21 locations.
Each number is the relative frequency of a molecular operational
taxonomy unit (MOTU) at a site after data cleaning and
averaging of 135 pontual measures.
is a 21 x 6 data.frame
describing climatic conditions
at each site using worldclim descriptors (https://www.worldclim.org).
Max Temperature of Warmest Month
Mean Diurnal Range / Temperature Annual Range, with
Mean of monthly (max temp - min temp)
Max Temperature of Warmest Month - Min Temperature of Coldest Month
s a 21 x 6 data.frame
describing soil chemistery
at each site.
Each variable is reduced and centered
Logarithm of the potassium concentration
Soil Ph
Logarithm of the aluminium concentration
Logarithm of the iron concentration
Logarithm of the phosphorus concentration
Logarithm of the sulphur concentration
Logarithm of the calcium concentration
Logarithm of the magnesium concentration
Logarithm of the manganese concentration
carbon / nitrogen concentration ratio
Logarithm of the carbon concentration
Logarithm of the nitrogen concentration
Christelle Gonindard-Melodelima
Eric Coissac
Actually a simplified version of the ADE4 implementation
(is.euclid
).
is_euclid(distances, tol = 1e-07)
is_euclid(distances, tol = 1e-07)
distances |
an object of class 'dist' |
tol |
a tolerance threshold : an eigenvalue is considered positive if it is larger than -tol*lambda1 where lambda1 is the largest eigenvalue. |
Eric Coissac
Christelle Gonindard-Melodelima
library(vegan) data(bacteria) bacteria_rel_freq <- sweep(bacteria, 1, rowSums(bacteria), "/") bacteria_bray <- vegdist(bacteria_rel_freq,method = "bray") is_euclid(bacteria_bray) bacteria_chao <- vegdist(floor(bacteria*10000),method = "chao") is_euclid(bacteria_chao)
library(vegan) data(bacteria) bacteria_rel_freq <- sweep(bacteria, 1, rowSums(bacteria), "/") bacteria_bray <- vegdist(bacteria_rel_freq,method = "bray") is_euclid(bacteria_bray) bacteria_chao <- vegdist(floor(bacteria*10000),method = "chao") is_euclid(bacteria_chao)
Check if an object is a ProcMod Frame.
is_procmod_frame(x)
is_procmod_frame(x)
x |
a R |
a logical
value equals to
TRUE
if x
is a procmod_frame
,
FALSE
otherwise.
Eric Coissac
Christelle Gonindard-Melodelima
# Builds a procmod_frame with two random matrices m1 <- simulate_matrix(10,20) m2 <- simulate_matrix(10,30) pmf <- procmod_frame(m1 = m1, m2 = m2) # Returns TRUE is_procmod_frame(pmf) # Returns FALSE is_procmod_frame(3)
# Builds a procmod_frame with two random matrices m1 <- simulate_matrix(10,20) m2 <- simulate_matrix(10,30) pmf <- procmod_frame(m1 = m1, m2 = m2) # Returns TRUE is_procmod_frame(pmf) # Returns FALSE is_procmod_frame(3)
Returns the names of the elements associated to a procmod_corls
object.
## S3 method for class 'procmod_corls' names(x)
## S3 method for class 'procmod_corls' names(x)
x |
a |
Eric Coissac
Christelle Gonindard-Melodelima
# Build Three matrices of 3 rows. A <- simulate_matrix(10,3) B <- simulate_matrix(10,5) C <- simulate_correlation(B,10,r2=0.6) # Computes the correlation matrix data <- procmod_frame(A = A, B = B, C = C) cls <- corls(data, nrand = 100) names(cls)
# Build Three matrices of 3 rows. A <- simulate_matrix(10,3) B <- simulate_matrix(10,5) C <- simulate_correlation(B,10,r2=0.6) # Computes the correlation matrix data <- procmod_frame(A = A, B = B, C = C) cls <- corls(data, nrand = 100) names(cls)
Returns the names of the elements associated to a procmod_varls
object.
## S3 method for class 'procmod_varls' names(x)
## S3 method for class 'procmod_varls' names(x)
x |
a |
Eric Coissac
Christelle Gonindard-Melodelima
# Build Three matrices of 3 rows. A <- simulate_matrix(10,3) B <- simulate_matrix(10,5) C <- simulate_correlation(B,10,r2=0.6) # Computes the variance covariance matrix data <- procmod_frame(A = A, B = B, C = C) v <- varls(data, nrand = 100) names(v)
# Build Three matrices of 3 rows. A <- simulate_matrix(10,3) B <- simulate_matrix(10,5) C <- simulate_correlation(B,10,r2=0.6) # Computes the variance covariance matrix data <- procmod_frame(A = A, B = B, C = C) v <- varls(data, nrand = 100) names(v)
Project a set of points defined by a distance matrix in
an eucleadean space using the Kruskal's Non-metric
Multidimensional Scaling. This function is mainly a simplified
interface on the isoMDS
function using as
much as possible dimensions to limit the stress. The aims of this
NDMS being only to project point in an orthogonal space therefore
without any correlation between axis. Because a non-metric method
is used no condition is required on the used distance.
nmds(distances, maxit = 100, trace = FALSE, tol = 0.001, p = 2)
nmds(distances, maxit = 100, trace = FALSE, tol = 0.001, p = 2)
distances |
a |
maxit |
The maximum number of iterations. |
trace |
Logical for tracing optimization. Default |
tol |
convergence tolerance. |
p |
Power for Minkowski distance in the configuration space. |
a numeric matrix with at most n-1
dimensions, with
n
the number pf observations. This matrix defines the
coordinates of each point in the orthogonal space.
Eric Coissac
Christelle Gonindard-Melodelima
data(bacteria) bacteria_rel_freq <- sweep(bacteria, 1, rowSums(bacteria), "/") bacteria_hellinger <- sqrt(bacteria_rel_freq) bacteria_dist <- dist(bacteria_hellinger) project <- nmds(bacteria_dist)
data(bacteria) bacteria_rel_freq <- sweep(bacteria, 1, rowSums(bacteria), "/") bacteria_hellinger <- sqrt(bacteria_rel_freq) bacteria_dist <- dist(bacteria_hellinger) project <- nmds(bacteria_dist)
Project a set of points defined by a distance matrix
or a set of variables in an eucleadean space.
If the distance matrix is a metric, this is done using
the pcoa
function,
for other distance the nmds
is used.
When points are described by a set of variable the
nmds
is used.
ortho(data, ...) ## S3 method for class 'dist' ortho(data, tol = 1e-07, ...) ## S3 method for class 'matrix' ortho(data, scale = FALSE, ...) ## S3 method for class 'data.frame' ortho(data, scale = FALSE, ...) ## S3 method for class 'procmod_frame' ortho(data, ...)
ortho(data, ...) ## S3 method for class 'dist' ortho(data, tol = 1e-07, ...) ## S3 method for class 'matrix' ortho(data, scale = FALSE, ...) ## S3 method for class 'data.frame' ortho(data, scale = FALSE, ...) ## S3 method for class 'procmod_frame' ortho(data, ...)
data |
a numeric matrix describing the points |
... |
other parameters specific to some implementation |
tol |
a tolerance threshold : an eigenvalue is considered positive if it is larger than -tol*lambda1 where lambda1 is the largest eigenvalue. |
scale |
a |
a numeric matrix with at most n-1
dimensions, with
n
the number pf observations. This matrix defines the
coordinates of each point in the orthogonal space.
Eric Coissac
Christelle Gonindard-Melodelima
library(vegan) data(bacteria) data(eukaryotes) data(soil) dataset <- procmod_frame(euk = vegdist(decostand(eukaryotes, method = "hellinger"), method = "euclidean"), bac = vegdist(decostand(bacteria, method = "hellinger"), method = "euclidean"), soil = scale(soil, center = TRUE, scale = TRUE)) dp <- ortho(dataset) bacteria_rel_freq <- sweep(bacteria, 1, rowSums(bacteria), "/") bacteria_hellinger <- sqrt(bacteria_rel_freq) bacteria_dist <- dist(bacteria_hellinger) project <- ortho(bacteria_dist)
library(vegan) data(bacteria) data(eukaryotes) data(soil) dataset <- procmod_frame(euk = vegdist(decostand(eukaryotes, method = "hellinger"), method = "euclidean"), bac = vegdist(decostand(bacteria, method = "hellinger"), method = "euclidean"), soil = scale(soil, center = TRUE, scale = TRUE)) dp <- ortho(dataset) bacteria_rel_freq <- sweep(bacteria, 1, rowSums(bacteria), "/") bacteria_hellinger <- sqrt(bacteria_rel_freq) bacteria_dist <- dist(bacteria_hellinger) project <- ortho(bacteria_dist)
Project a set of points defined by a set of numeric variables in
an eucleadean space using the pricipal componant analysis.
This function is mainly a simplified
interface on the prcomp
function using as
much as possible dimensions to keep all the variation. The aims of this
PCA being only to project point in an orthogonal space therefore
without any correlation between axis. Data are centered by not scaled by
default.
pca(data, scale = FALSE)
pca(data, scale = FALSE)
data |
a numeric matrix describing the points |
scale |
a |
a numeric matrix with at most n-1
dimensions, with
n
the number pf observations. This matrix defines the
coordinates of each point in the orthogonal space.
Eric Coissac
Christelle Gonindard-Melodelima
data(bacteria) bacteria_rel_freq <- sweep(bacteria, 1, rowSums(bacteria), "/") bacteria_hellinger <- sqrt(bacteria_rel_freq) project <- pca(bacteria_hellinger)
data(bacteria) bacteria_rel_freq <- sweep(bacteria, 1, rowSums(bacteria), "/") bacteria_hellinger <- sqrt(bacteria_rel_freq) project <- pca(bacteria_hellinger)
Project a set of points defined by a distance matrix in
an eucleadean space using the Principal Coordinates Analysis
method. This function is mainly a simplified
interface on the cmdscale
function using as
much as possible dimensions for the projection. The aims of this
PCoA being only to project point in an orthogonal space therefore
without any correlation between axis. Because a metric method
is used the used distance must be euclidean
(cf is_euclid
).
pcoa(distances)
pcoa(distances)
distances |
a |
a numeric matrix with at most n-1
dimensions, with
n
the number pf observations. This matrix defines the
coordinates of each point in the orthogonal space.
Eric Coissac
Christelle Gonindard-Melodelima
data(bacteria) bacteria_rel_freq <- sweep(bacteria, 1, rowSums(bacteria), "/") bacteria_hellinger <- sqrt(bacteria_rel_freq) bacteria_dist <- dist(bacteria_hellinger) project <- pcoa(bacteria_dist)
data(bacteria) bacteria_rel_freq <- sweep(bacteria, 1, rowSums(bacteria), "/") bacteria_hellinger <- sqrt(bacteria_rel_freq) bacteria_dist <- dist(bacteria_hellinger) project <- pcoa(bacteria_dist)
Print a procrustean Correlation Matrix.
## S3 method for class 'procmod_corls' print(x, ...)
## S3 method for class 'procmod_corls' print(x, ...)
x |
a |
... |
other parameters passed to other functions |
Eric Coissac
Christelle Gonindard-Melodelima
# Build Three matrices of 3 rows. A <- simulate_matrix(10,3) B <- simulate_matrix(10,5) C <- simulate_correlation(B,10,r2=0.6) # Computes the correlation matrix data <- procmod_frame(A = A, B = B, C = C) cls <- corls(data, nrand = 100) print(cls)
# Build Three matrices of 3 rows. A <- simulate_matrix(10,3) B <- simulate_matrix(10,5) C <- simulate_correlation(B,10,r2=0.6) # Computes the correlation matrix data <- procmod_frame(A = A, B = B, C = C) cls <- corls(data, nrand = 100) print(cls)
Print procrustean Variance / Covariance Matrix.
## S3 method for class 'procmod_varls' print(x, ...)
## S3 method for class 'procmod_varls' print(x, ...)
x |
a |
... |
other parameters passed to other functions |
Eric Coissac
Christelle Gonindard-Melodelima
# Build Three matrices of 3 rows. A <- simulate_matrix(10,3) B <- simulate_matrix(10,5) C <- simulate_correlation(B,10,r2=0.6) # Computes the variance covariance matrix data <- procmod_frame(A = A, B = B, C = C) v <- varls(data, nrand = 100) print(v)
# Build Three matrices of 3 rows. A <- simulate_matrix(10,3) B <- simulate_matrix(10,5) C <- simulate_correlation(B,10,r2=0.6) # Computes the variance covariance matrix data <- procmod_frame(A = A, B = B, C = C) v <- varls(data, nrand = 100) print(v)
Estimates corrected Procrustean correlation between matrices for removing overfitting effect.
The functions in the ProcMod package aims to estimate and to test correlation between matrices, correcting for the spurious correlations because of the over-fitting effect.
The ProcMod package is developed on the metabarcoding.org gitlab (https://git.metabarcoding.org/lecasofts/ProcMod). The gitlab of metabarcoding.org provides up-to-date information and forums for bug reports.
Christelle Gonindard-Melodelima
Eric Coissac
A procmod_frame
can be considered as the analog of a
data.frame
for vector data. In a procmod_frame
each element, equivalent to a column in a data.frame
is a numeric matrix or a distance matrix object (dist
).
Every element must describe the same number of individuals.
Therefore every numeric matrix must have the same number of row
(nrow
) and every distance matrix must have the same size
(attr(d,"Size")
). A procmod_frame
can simultaneously
contain both types of data, numeric and distance matrix.
procmod_frame( ..., row_names = NULL, check_rows = TRUE, reorder_rows = TRUE, contrasts_arg = NULL )
procmod_frame( ..., row_names = NULL, check_rows = TRUE, reorder_rows = TRUE, contrasts_arg = NULL )
... |
a set of objects to aggregate into a
|
row_names |
a character vector containing names associated to each row. |
check_rows |
a logical value. When set to |
reorder_rows |
a logical value. When set to |
contrasts_arg |
A list, whose entries are values (numeric matrices or character strings naming functions) to be used as replacement values for the contrasts replacement function and whose names are the names of columns of data containing factors. |
a procmod_frame
instance.
Eric Coissac
Christelle Gonindard-Melodelima
library(vegan) data(bacteria) data(eukaryotes) data(soil) dataset <- procmod_frame(euk = vegdist(decostand(eukaryotes, method = "hellinger"), method = "euclidean"), bac = vegdist(decostand(bacteria, method = "hellinger"), method = "euclidean"), soil = scale(soil, center = TRUE, scale = TRUE)) length(dataset) nrow(dataset) ncol(dataset) dataset$euk
library(vegan) data(bacteria) data(eukaryotes) data(soil) dataset <- procmod_frame(euk = vegdist(decostand(eukaryotes, method = "hellinger"), method = "euclidean"), bac = vegdist(decostand(bacteria, method = "hellinger"), method = "euclidean"), soil = scale(soil, center = TRUE, scale = TRUE)) length(dataset) nrow(dataset) ncol(dataset) dataset$euk
src
matrix to fit into the space of the dest
matrix.The optimal rotation is computed according to the procruste methode. Rotation is based on singular value decomposition (SVD). No scaling and no centrering are done, before computing the SVD.
protate(src, dest)
protate(src, dest)
src |
a numeric matrix to be rotated |
dest |
a numeric matrix used as reference space |
a numeric matrix
Christelle Gonindard-Melodelima
Eric Coissac
# Generates two random matrices of size 10 x 15 m1 <- simulate_matrix(10, 15) m2 <- simulate_matrix(10, 20) # Rotates matrix m1 on m2 mr <- protate(m1, m2)
# Generates two random matrices of size 10 x 15 m1 <- simulate_matrix(10, 15) m2 <- simulate_matrix(10, 20) # Rotates matrix m1 on m2 mr <- protate(m1, m2)
Simulates a set of point correlated to another set according to the
procrustean correlation definition.
Points are simulated by drawing values of each dimension from a normal
distribution of mean 0 and standard deviation equals to 1.
The mean of each dimension is forced to 0 (data are centred).
By default variable are also scaled to enforce a strandard deviation
strictly equal to 1. Covariances between dimensions are not controled.
Therefore they are expected to be equal to 0 and reflect only the
random distribution of the covariance between two random vectors.
The intensity of the correlation is determined by the r2
parameter.
simulate_correlation(reference, p, r2, equal_var = TRUE)
simulate_correlation(reference, p, r2, equal_var = TRUE)
reference |
a numeric matrix to which the simulated data will be correlated |
p |
an |
r2 |
the fraction of variation shared between the |
equal_var |
a |
a numeric matrix of nrow(reference)
rows and p
columns
Eric Coissac
Christelle Gonindard-Melodelima
sim1 <- simulate_matrix(25,10) class(sim1) dim(sim1) sim2 <- simulate_correlation(sim1,20,0.8) corls(sim1, sim2)^2
sim1 <- simulate_matrix(25,10) class(sim1) dim(sim1) sim2 <- simulate_correlation(sim1,20,0.8) corls(sim1, sim2)^2
Points are simulated by drawing values of each dimension from a normal distribution of mean 0 and standard deviation equals to 1. The mean of each dimension is forced to 0 (data are centred). By default variable are also scaled to enforce a strandard deviation strictly equal to 1. Covariances between dimensions are not controled. Therefore they are expected to be equal to 0 and reflect only the random distribution of the covariance between two random vectors.
simulate_matrix(n, p, equal_var = TRUE)
simulate_matrix(n, p, equal_var = TRUE)
n |
an |
p |
an |
equal_var |
a |
a numeric matrix of n
rows and p
columns
Eric Coissac
Christelle Gonindard-Melodelima
sim1 <- simulate_matrix(25,10) class(sim1) dim(sim1)
sim1 <- simulate_matrix(25,10) class(sim1) dim(sim1)
This is the implementation of the subset
generic function for
procmod_frame
.
## S3 method for class 'procmod_frame' subset(x, subset, select, drop = FALSE, ...)
## S3 method for class 'procmod_frame' subset(x, subset, select, drop = FALSE, ...)
x |
object to be subsetted. |
subset |
logical expression indicating elements or rows to keep: missing values are taken as false. |
select |
expression, indicating columns to select from a data frame. |
drop |
passed on to [ indexing operator. |
... |
further arguments to be passed to or from other methods. |
The subset argument works on rows. Note that subset will be evaluated in the
procmod_frame
, so columns can be referred to (by name) as variables
in the expression (see the examples).
The select argument if provided indicates with matrices
have to be conserved. It works by first replacing column names in the selection
expression with the corresponding column numbers in the procmod_frame
and then using
the resulting integer vector to index the columns. This allows the use of the
standard indexing conventions so that for example ranges of columns can
be specified easily, or single columns can be dropped (see the examples). Remember
that each column of a procmod_frame
is actually a matrix.
The drop argument is passed on to the procmod_frame
indexing method.
The default value is FALSE
.
A procmod_frame
containing just the selected rows and columns.
Eric Coissac
Christelle Gonindard-Melodelima
library(vegan) data(bacteria) data(eukaryotes) data(soil) dataset <- procmod_frame(euk = vegdist(decostand(eukaryotes, method = "hellinger"), method = "euclidean"), bac = vegdist(decostand(bacteria, method = "hellinger"), method = "euclidean"), soil = scale(soil, center = TRUE, scale = TRUE)) dim(dataset) higher_ph = subset(dataset,soil[,"pH"] > 0) dim(higher_ph) without_bacteria = subset(dataset,soil[,"pH"] > 0, -bac) dim(without_bacteria)
library(vegan) data(bacteria) data(eukaryotes) data(soil) dataset <- procmod_frame(euk = vegdist(decostand(eukaryotes, method = "hellinger"), method = "euclidean"), bac = vegdist(decostand(bacteria, method = "hellinger"), method = "euclidean"), soil = scale(soil, center = TRUE, scale = TRUE)) dim(dataset) higher_ph = subset(dataset,soil[,"pH"] > 0) dim(higher_ph) without_bacteria = subset(dataset,soil[,"pH"] > 0, -bac) dim(without_bacteria)
varls
, corls
compute the procrustean
variance / covariance, or correlation matrices
between a set of real matrices and dist
objects.
varls(..., nrand = 100, p_adjust_method = "holm") corls(..., nrand = 100, p_adjust_method = "holm")
varls(..., nrand = 100, p_adjust_method = "holm") corls(..., nrand = 100, p_adjust_method = "holm")
... |
the set of matrices or a |
nrand |
number of randomisation used to estimate the mean
covariance observed between two random matrix.
If rand is |
p_adjust_method |
the multiple test correction method used
to adjust p values. |
Procrustean covariance between two matrices X and Y, is defined as the sum of the singular values of the X'Y matrix (Gower 1971; Lingoes and Schönemann 1974). Both the X and Y matrices must have the same number of rows.
The variances and covariances and correlations are corrected to avoid over fitting (Coissac and Gonindard-Melodelima 2019).
The inputs must be numeric matrices or dist
object.
The set of input matrices can be aggregated un a
procmod_frame
.
Before computing the coefficients, matrices are projected into an
orthogonal space using the ortho
function.
The denominator n - 1 is used which gives an unbiased estimator of the (co)variance for i.i.d. observations.
a procmod_varls
object which corresponds to a numeric
matrix annotated by several attributes.
The following attribute is always added:
- nrand
an integer value indicating the number of
randomisations used to estimate the mean of the random
covariance.
When nrand
is greater than 0 a couple of attributes
is added:
- rcovls
a numeric matrix containing the estimation
of the mean of the random covariance.
- p.value
a numeric matrix containing the estimations
of the p.values of tests checking that the observed
covariance is larger than the mean of the random covariance.
p.values are corrected for multiple tests according to the
method specified by the p_adjust_method
parameter.
Eric Coissac
Christelle Gonindard-Melodelima
Gower JC (1971). “Statistical methods of comparing different multivariate analyses of the same data.” Mathematics in the archaeological and historical sciences, 138–149.
Lingoes JC, Schönemann PH (1974). “Alternative measures of fit for the Schönemann-carroll matrix fitting algorithm.” Psychometrika, 39(4), 423–427.
Coissac E, Gonindard-Melodelima C (2019). “Assessing the shared variation among high-dimensional data matrices: a modified version of the Procrustean correlation coefficient.” in prep.
# Build Three matrices of 3 rows. A <- simulate_matrix(10,3) B <- simulate_matrix(10,5) C <- simulate_correlation(B,10,r2=0.6) # Computes the variance covariance matrix varls(A = A, B = B, C = C) data = procmod_frame(A = A, B = B, C = C) varls(data) # Computes the correlation matrix corls(data, nrand = 100)
# Build Three matrices of 3 rows. A <- simulate_matrix(10,3) B <- simulate_matrix(10,5) C <- simulate_correlation(B,10,r2=0.6) # Computes the variance covariance matrix varls(A = A, B = B, C = C) data = procmod_frame(A = A, B = B, C = C) varls(data) # Computes the correlation matrix corls(data, nrand = 100)