Package 'ProcMod' reference manual

Title:	Informative Procrustean Matrix Correlation
Description:	Estimates corrected Procrustean correlation between matrices for removing overfitting effect. Coissac Eric and Gonindard-Melodelima Christelle (2019) <doi:10.1101/842070>.
Authors:	Eric Coissac, Christelle Gonindard-Melodelima
Maintainer:	Eric Coissac <[email protected]>
License:	CeCILL-2
Version:	1.0.8
Built:	2025-03-19 04:32:40 UTC
Source:	https://github.com/cran/ProcMod

Generate permutation matrix according to a schema.

Description

The permutation schema is defined using the 'how' function. The implementation of this function is inspired from the VEGAN package and reproduced here to avoid an extra dependency on an hidden vegan function.

Usage

.getPermuteMatrix(permutations, n, strata = NULL)
.getPermuteMatrix(permutations, n, strata = NULL)

Arguments

`permutations`	a list of control values for the permutations as returned by the function `how`, or the number of permutations required.
`n`	numeric; the number of observations in the sample set. May also be any object that nobs knows about; see `nobs` methods.
`strata`	A factor, or an object that can be coerced to a factor via as.factor, specifying the strata for permutation.

Note

Internal function do not use.

Internal function coercing the data to a matrix.

Description

Transforme the x value into a numeric matrix of the correct size or into a dist object.

Usage

.procmod_coerce_value(x, nrows = 0, contrasts = NULL)
.procmod_coerce_value(x, nrows = 0, contrasts = NULL)

Arguments

`x`	The data to coerce
`nrows`	an interger value specifying the number of row of the returned matrix
`contrasts`	see the `contrasts_arg` argument of the `procmod_frame` constructor.

Value

a new numeric matrix with correct size.

Note

Internal function do not use.

Author(s)

Eric Coissac <[email protected]>

Christelle Gonindard-Melodelima <[email protected]>

Internal function repeating a matrix.

Description

repeats several times the rows of a matrix to create a new matrix with more rows. The final row count must be a multiple of the initial row count

Usage

.rep_matrix(x, nrow)
.rep_matrix(x, nrow)

Arguments

`x`	The matrix to replicate
`nrow`	an interger value specifying the number of row of the returned matrix

Value

a new matrix with the same number of columns but with 'nrow' rows.

Note

Internal function do not use.

Author(s)

Eric Coissac <[email protected]>

Christelle Gonindard-Melodelima <[email protected]>

Compute the trace of a square matrix.

Description

The trace of a square matrix is defined as the sum of its diagonal elements.

Usage

.Trace(X)
.Trace(X)

Arguments

`X`	a square matrix

Value

the trace of X

Note

Internal function do not use.

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

m <- matrix(1:16, nrow = 4)
ProcMod:::.Trace(m)
m <- matrix(1:16, nrow = 4)
ProcMod:::.Trace(m)

Coerce to a ProcMod Frame.

Description

Conversion methods are proposed for list, matrix and array.

Usage

as_procmod_frame(data, ...)

## S3 method for class 'list'
as_procmod_frame(data, ...)

## S3 method for class 'procmod_frame'
as_procmod_frame(data, ...)

## S3 method for class 'array'
as_procmod_frame(data, ...)

## S3 method for class 'matrix'
as_procmod_frame(data, ...)
as_procmod_frame(data, ...)

## S3 method for class 'list'
as_procmod_frame(data, ...)

## S3 method for class 'procmod_frame'
as_procmod_frame(data, ...)

## S3 method for class 'array'
as_procmod_frame(data, ...)

## S3 method for class 'matrix'
as_procmod_frame(data, ...)

Arguments

`data`	a R object to coerce.
`...`	supplementary parameters used in some implementation of that method

Value

a procmod_frame object

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

# Builds a list containing two random matrices
m1 <- simulate_matrix(10,20)
m2 <- simulate_matrix(10,30)
l <- list(m1 = m1, m2 = m2)

# Converts the list to a procmod_frame
pmf1 <- as_procmod_frame(l)

# Builds a procmod_frame from a matrix
m3 <- matrix(1:12,nrow=3)
pmf2 <- as_procmod_frame(matrix(1:12,nrow=3))
# Returns 4, the column count of the input matrix
length(pmf2)

# Builds a 3D array
a <- array(1:24,dim = c(3,4,2))

# The conversion to a procmod_frame makes
# an procmod element from each third dimension
as_procmod_frame(a)

# Builds a list containing two random matrices
m1 <- simulate_matrix(10,20)
m2 <- simulate_matrix(10,30)
l <- list(m1 = m1, m2 = m2)

# Converts the list to a procmod_frame
pmf1 <- as_procmod_frame(l)

# Builds a procmod_frame from a matrix
m3 <- matrix(1:12,nrow=3)
pmf2 <- as_procmod_frame(matrix(1:12,nrow=3))
# Returns 4, the column count of the input matrix
length(pmf2)

# Builds a 3D array
a <- array(1:24,dim = c(3,4,2))

# The conversion to a procmod_frame makes
# an procmod element from each third dimension
as_procmod_frame(a)

Converts a `dist` object to a `data.frame` object.

Description

The created data.frame has a attribute is.dist set to the logical value TRUE.

Usage

## S3 method for class 'dist'
as.data.frame(x, row.names = NULL, optional = FALSE, ...)
## S3 method for class 'dist'
as.data.frame(x, row.names = NULL, optional = FALSE, ...)

Arguments

`x`	the `dist` object to be converted
`row.names`	NULL or a `character` vector giving the row names for the data frame. Missing values are not allowed.
`optional`	logical. If `TRUE`, setting row names and converting column names (to syntactic names: see make.names) is optional. Note that all of R's base package as.data.frame() methods use optional only for column names treatment, basically with the meaning of data.frame(*, check.names = !optional). See also the make.names argument of the `matrix` method.
`...`	additional arguments to be passed to or from methods.

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

data(bacteria)
bacteria_rel_freq <- sweep(bacteria,
                           1,
                           rowSums(bacteria),
                           "/")
bacteria_hellinger <- sqrt(bacteria_rel_freq)
bacteria_dist <- dist(bacteria_hellinger)
bdf <- as.data.frame(bacteria_dist)

data(bacteria)
bacteria_rel_freq <- sweep(bacteria,
                           1,
                           rowSums(bacteria),
                           "/")
bacteria_hellinger <- sqrt(bacteria_rel_freq)
bacteria_dist <- dist(bacteria_hellinger)
bdf <- as.data.frame(bacteria_dist)

Double centering of a matrix.

Description

colSums and rowSums of the returned matrix are all equal to zero.

Usage

bicenter(m)
bicenter(m)

Arguments

`m`	a `numeric` matrix

Details

Inspired from the algorithm described in stackoverflow https://stackoverflow.com/questions/43639063/double-centering-in-r

Value

a numeric matrix centred by rows and columns

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

data(bacteria)
bact_bc <- bicenter(bacteria)
sum(rowSums(bact_bc))
sum(colSums(bact_bc))

data(bacteria)
bact_bc <- bicenter(bacteria)
sum(rowSums(bact_bc))
sum(colSums(bact_bc))

Monte-Carlo Test on the sum of the singular values of a procustean rotation.

Description

performs a Monte-Carlo Test on the sum of the singular values of a procustean rotation (see procuste.rtest).

Usage

corls_test(
  ...,
  permutations = permute::how(nperm = 999),
  p_adjust_method = "holm"
)
corls_test(
  ...,
  permutations = permute::how(nperm = 999),
  p_adjust_method = "holm"
)

Arguments

`...`	the set of matrices or a `procmod_frame` object.
`permutations`	a list of control values for the permutations as returned by the function `how`, or the number of permutations required.
`p_adjust_method`	the multiple test correction method used to adjust p values. `p_adjust_method` belongs one of the folowing values: `"holm"`, `"hochberg"`, `"hommel"`, `"bonferroni"`, `"BH"`, `"BY"`, `"fdr"`, `"none"`. The default is,set to `"holm"`.

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

References

Jackson DA (1995). “PROTEST: A PROcrustean Randomization TEST of community environment concordance.” Écoscience, 2(3), 297–303.

Examples

A <- simulate_matrix(10,3)
B <- simulate_matrix(10,5)
C <- simulate_correlation(B,10,r2=0.6)

# Computes the correlation matrix
data <- procmod_frame(A = A, B = B, C = C)

corls_test(data, permutations = 100)

A <- simulate_matrix(10,3)
B <- simulate_matrix(10,5)
C <- simulate_correlation(B,10,r2=0.6)

# Computes the correlation matrix
data <- procmod_frame(A = A, B = B, C = C)

corls_test(data, permutations = 100)

Dimensions of a ProcMod Frame.

Description

Dimension 1 is the number of rows (individus) shared by the aggregated matrices. Dimension 2 is the number of aggregated matrices

Usage

## S3 method for class 'procmod_frame'
dim(x)
## S3 method for class 'procmod_frame'
dim(x)

Arguments

`x`	a `procmod_frame` object

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

# Builds a procmod_frame with two random matrices
m1 <- simulate_matrix(10,20)
m2 <- simulate_matrix(10,30)
pmf <- procmod_frame(m1 = m1, m2 = m2)
dim(pmf)

# Builds a procmod_frame with two random matrices
m1 <- simulate_matrix(10,20)
m2 <- simulate_matrix(10,30)
pmf <- procmod_frame(m1 = m1, m2 = m2)
dim(pmf)

DNA metabarcoding Australia South-North Gradient

Description

This data set of five data.frame is a simplified version of a full data set describing biodiversity changes along a South-North gradient on the Australian East Coast, from Sidney to North Cap using a DNA metabarcoding approach. The gradient is constituted of 21 locations.

Usage

data(eukaryotes)

data(bacteria)

data(climat)

data(soil)

data(geography)
data(eukaryotes)

data(bacteria)

data(climat)

data(soil)

data(geography)

Format

five data.frame of 21 rows

An object of class data.frame with 21 rows and 2150 columns.

An object of class data.frame with 21 rows and 6 columns.

An object of class data.frame with 21 rows and 12 columns.

An object of class data.frame with 21 rows and 2 columns.

Details

bacteria

is a 21 x 2150 data.frame describing bacterial community at each one of the 21 locations. Each number is the relative frequency of a molecular operational taxonomy unit (MOTU) at a site after data cleaning and averaging of 135 pontual measures.

bacteria

is a 21 x 1393 data.frame describing eukariote community at each one of the 21 locations. Each number is the relative frequency of a molecular operational taxonomy unit (MOTU) at a site after data cleaning and averaging of 135 pontual measures.

climat

is a 21 x 6 data.frame describing climatic conditions at each site using worldclim descriptors (https://www.worldclim.org).

Aspect

TempSeasonality

MaxMonTemp

Max Temperature of Warmest Month

MeanMonTempRange

AnnMeanTemp

Isothemality

Mean Diurnal Range / Temperature Annual Range, with

Mean Diurnal Range: Mean of monthly (max temp - min temp)
Temperature Annual Range: Max Temperature of Warmest Month - Min Temperature of Coldest Month

soil

s a 21 x 6 data.frame describing soil chemistery at each site. Each variable is reduced and centered

KLg: Logarithm of the potassium concentration
pH: Soil Ph
AlLg: Logarithm of the aluminium concentration
FeLg: Logarithm of the iron concentration
PLg: Logarithm of the phosphorus concentration
SLg: Logarithm of the sulphur concentration
CaLg: Logarithm of the calcium concentration
MgLg: Logarithm of the magnesium concentration
MnLg: Logarithm of the manganese concentration
CNratio: carbon / nitrogen concentration ratio
CLg: Logarithm of the carbon concentration
NLg: Logarithm of the nitrogen concentration

geography

Author(s)

Christelle Gonindard-Melodelima

Eric Coissac

Test if the distance matrix is euclidean.

Description

Actually a simplified version of the ADE4 implementation (is.euclid).

Usage

is_euclid(distances, tol = 1e-07)
is_euclid(distances, tol = 1e-07)

Arguments

`distances`	an object of class 'dist'
`tol`	a tolerance threshold : an eigenvalue is considered positive if it is larger than -tol*lambda1 where lambda1 is the largest eigenvalue.

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

library(vegan)
data(bacteria)

bacteria_rel_freq <- sweep(bacteria,
                           1,
                           rowSums(bacteria),
                           "/")

bacteria_bray <- vegdist(bacteria_rel_freq,method = "bray")
is_euclid(bacteria_bray)

bacteria_chao <- vegdist(floor(bacteria*10000),method = "chao")
is_euclid(bacteria_chao)

library(vegan)
data(bacteria)

bacteria_rel_freq <- sweep(bacteria,
                           1,
                           rowSums(bacteria),
                           "/")

bacteria_bray <- vegdist(bacteria_rel_freq,method = "bray")
is_euclid(bacteria_bray)

bacteria_chao <- vegdist(floor(bacteria*10000),method = "chao")
is_euclid(bacteria_chao)

Check if an object is a ProcMod Frame.

Description

Check if an object is a ProcMod Frame.

Usage

is_procmod_frame(x)
is_procmod_frame(x)

Arguments

`x`	a R `object to test`

Value

a logical value equals to TRUE if x is a procmod_frame, FALSE otherwise.

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

# Builds a procmod_frame with two random matrices
m1 <- simulate_matrix(10,20)
m2 <- simulate_matrix(10,30)
pmf <- procmod_frame(m1 = m1, m2 = m2)

# Returns TRUE
is_procmod_frame(pmf)

# Returns FALSE
is_procmod_frame(3)

# Builds a procmod_frame with two random matrices
m1 <- simulate_matrix(10,20)
m2 <- simulate_matrix(10,30)
pmf <- procmod_frame(m1 = m1, m2 = m2)

# Returns TRUE
is_procmod_frame(pmf)

# Returns FALSE
is_procmod_frame(3)

The Names of the elements of a Correlation Matrix

Description

Returns the names of the elements associated to a procmod_corls object.

Usage

## S3 method for class 'procmod_corls'
names(x)
## S3 method for class 'procmod_corls'
names(x)

Arguments

`x`	a `procmod_corls` object

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

# Build Three matrices of 3 rows.
A <- simulate_matrix(10,3)
B <- simulate_matrix(10,5)
C <- simulate_correlation(B,10,r2=0.6)

# Computes the correlation matrix
data <- procmod_frame(A = A, B = B, C = C)
cls <- corls(data, nrand = 100)

names(cls)

# Build Three matrices of 3 rows.
A <- simulate_matrix(10,3)
B <- simulate_matrix(10,5)
C <- simulate_correlation(B,10,r2=0.6)

# Computes the correlation matrix
data <- procmod_frame(A = A, B = B, C = C)
cls <- corls(data, nrand = 100)

names(cls)

The Names of the elements of a Variance / Covariance Matrix.

Description

Returns the names of the elements associated to a procmod_varls object.

Usage

## S3 method for class 'procmod_varls'
names(x)
## S3 method for class 'procmod_varls'
names(x)

Arguments

`x`	a `procmod_varls` object

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

# Build Three matrices of 3 rows.
A <- simulate_matrix(10,3)
B <- simulate_matrix(10,5)
C <- simulate_correlation(B,10,r2=0.6)

# Computes the variance covariance matrix
data <- procmod_frame(A = A, B = B, C = C)
v <- varls(data, nrand = 100)

names(v)

# Build Three matrices of 3 rows.
A <- simulate_matrix(10,3)
B <- simulate_matrix(10,5)
C <- simulate_correlation(B,10,r2=0.6)

# Computes the variance covariance matrix
data <- procmod_frame(A = A, B = B, C = C)
v <- varls(data, nrand = 100)

names(v)

Project a distance matrix in a euclidean space (NMDS).

Description

Project a set of points defined by a distance matrix in an eucleadean space using the Kruskal's Non-metric Multidimensional Scaling. This function is mainly a simplified interface on the isoMDS function using as much as possible dimensions to limit the stress. The aims of this NDMS being only to project point in an orthogonal space therefore without any correlation between axis. Because a non-metric method is used no condition is required on the used distance.

Usage

nmds(distances, maxit = 100, trace = FALSE, tol = 0.001, p = 2)
nmds(distances, maxit = 100, trace = FALSE, tol = 0.001, p = 2)

Arguments

`distances`	a `dist` object or a `matrix` object representing a distance matrix.
`maxit`	The maximum number of iterations.
`trace`	Logical for tracing optimization. Default `TRUE`.
`tol`	convergence tolerance.
`p`	Power for Minkowski distance in the configuration space.

Value

a numeric matrix with at most n-1 dimensions, with n the number pf observations. This matrix defines the coordinates of each point in the orthogonal space.

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

data(bacteria)
bacteria_rel_freq <- sweep(bacteria,
                           1,
                           rowSums(bacteria),
                           "/")
bacteria_hellinger <- sqrt(bacteria_rel_freq)
bacteria_dist <- dist(bacteria_hellinger)

project <- nmds(bacteria_dist)

data(bacteria)
bacteria_rel_freq <- sweep(bacteria,
                           1,
                           rowSums(bacteria),
                           "/")
bacteria_hellinger <- sqrt(bacteria_rel_freq)
bacteria_dist <- dist(bacteria_hellinger)

project <- nmds(bacteria_dist)

Project a dataset in a euclidean space.

Description

Project a set of points defined by a distance matrix or a set of variables in an eucleadean space. If the distance matrix is a metric, this is done using the pcoa function, for other distance the nmds is used. When points are described by a set of variable the nmds is used.

Usage

ortho(data, ...)

## S3 method for class 'dist'
ortho(data, tol = 1e-07, ...)

## S3 method for class 'matrix'
ortho(data, scale = FALSE, ...)

## S3 method for class 'data.frame'
ortho(data, scale = FALSE, ...)

## S3 method for class 'procmod_frame'
ortho(data, ...)
ortho(data, ...)

## S3 method for class 'dist'
ortho(data, tol = 1e-07, ...)

## S3 method for class 'matrix'
ortho(data, scale = FALSE, ...)

## S3 method for class 'data.frame'
ortho(data, scale = FALSE, ...)

## S3 method for class 'procmod_frame'
ortho(data, ...)

Arguments

`data`	a numeric matrix describing the points
`...`	other parameters specific to some implementation
`tol`	a tolerance threshold : an eigenvalue is considered positive if it is larger than -tol*lambda1 where lambda1 is the largest eigenvalue.
`scale`	a `logical` value indicating if the dimensions must be scaled to force for every column that `sd=1`. `FALSE` by default.

Value

a numeric matrix with at most n-1 dimensions, with n the number pf observations. This matrix defines the coordinates of each point in the orthogonal space.

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

library(vegan)
data(bacteria)
data(eukaryotes)
data(soil)

dataset <- procmod_frame(euk = vegdist(decostand(eukaryotes,
                                                 method = "hellinger"),
                                       method = "euclidean"),
                         bac = vegdist(decostand(bacteria,
                                                 method = "hellinger"),
                                       method = "euclidean"),
                         soil = scale(soil,
                                      center = TRUE,
                                      scale  = TRUE))

dp <- ortho(dataset)

bacteria_rel_freq <- sweep(bacteria,
                           1,
                           rowSums(bacteria),
                           "/")
bacteria_hellinger <- sqrt(bacteria_rel_freq)
bacteria_dist <- dist(bacteria_hellinger)

project <- ortho(bacteria_dist)

library(vegan)
data(bacteria)
data(eukaryotes)
data(soil)

dataset <- procmod_frame(euk = vegdist(decostand(eukaryotes,
                                                 method = "hellinger"),
                                       method = "euclidean"),
                         bac = vegdist(decostand(bacteria,
                                                 method = "hellinger"),
                                       method = "euclidean"),
                         soil = scale(soil,
                                      center = TRUE,
                                      scale  = TRUE))

dp <- ortho(dataset)

bacteria_rel_freq <- sweep(bacteria,
                           1,
                           rowSums(bacteria),
                           "/")
bacteria_hellinger <- sqrt(bacteria_rel_freq)
bacteria_dist <- dist(bacteria_hellinger)

project <- ortho(bacteria_dist)

Project a set of points in a euclidean space (PCA).

Description

Project a set of points defined by a set of numeric variables in an eucleadean space using the pricipal componant analysis. This function is mainly a simplified interface on the prcomp function using as much as possible dimensions to keep all the variation. The aims of this PCA being only to project point in an orthogonal space therefore without any correlation between axis. Data are centered by not scaled by default.

Usage

pca(data, scale = FALSE)
pca(data, scale = FALSE)

Arguments

`data`	a numeric matrix describing the points
`scale`	a `logical` value indicating if the dimensions must be scaled to force for every column that `sd=1`. `FALSE` by default.

Value

a numeric matrix with at most n-1 dimensions, with n the number pf observations. This matrix defines the coordinates of each point in the orthogonal space.

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

data(bacteria)
bacteria_rel_freq <- sweep(bacteria,
                           1,
                           rowSums(bacteria),
                           "/")
bacteria_hellinger <- sqrt(bacteria_rel_freq)

project <- pca(bacteria_hellinger)

data(bacteria)
bacteria_rel_freq <- sweep(bacteria,
                           1,
                           rowSums(bacteria),
                           "/")
bacteria_hellinger <- sqrt(bacteria_rel_freq)

project <- pca(bacteria_hellinger)

Project a distance matrix in a euclidean space (PCOA).

Description

Project a set of points defined by a distance matrix in an eucleadean space using the Principal Coordinates Analysis method. This function is mainly a simplified interface on the cmdscale function using as much as possible dimensions for the projection. The aims of this PCoA being only to project point in an orthogonal space therefore without any correlation between axis. Because a metric method is used the used distance must be euclidean (cf is_euclid).

Usage

pcoa(distances)
pcoa(distances)

Arguments

distances

a dist object or a matrix object representing a distance matrix.

Value

a numeric matrix with at most n-1 dimensions, with n the number pf observations. This matrix defines the coordinates of each point in the orthogonal space.

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

data(bacteria)
bacteria_rel_freq <- sweep(bacteria,
                           1,
                           rowSums(bacteria),
                           "/")
bacteria_hellinger <- sqrt(bacteria_rel_freq)
bacteria_dist <- dist(bacteria_hellinger)

project <- pcoa(bacteria_dist)

data(bacteria)
bacteria_rel_freq <- sweep(bacteria,
                           1,
                           rowSums(bacteria),
                           "/")
bacteria_hellinger <- sqrt(bacteria_rel_freq)
bacteria_dist <- dist(bacteria_hellinger)

project <- pcoa(bacteria_dist)

Print a procrustean Correlation Matrix.

Description

Print a procrustean Correlation Matrix.

Usage

## S3 method for class 'procmod_corls'
print(x, ...)
## S3 method for class 'procmod_corls'
print(x, ...)

Arguments

`x`	a `procmod_corls` object
`...`	other parameters passed to other functions

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

# Build Three matrices of 3 rows.
A <- simulate_matrix(10,3)
B <- simulate_matrix(10,5)
C <- simulate_correlation(B,10,r2=0.6)

# Computes the correlation matrix
data <- procmod_frame(A = A, B = B, C = C)
cls <- corls(data, nrand = 100)

print(cls)

# Build Three matrices of 3 rows.
A <- simulate_matrix(10,3)
B <- simulate_matrix(10,5)
C <- simulate_correlation(B,10,r2=0.6)

# Computes the correlation matrix
data <- procmod_frame(A = A, B = B, C = C)
cls <- corls(data, nrand = 100)

print(cls)

Print procrustean Variance / Covariance Matrix.

Description

Print procrustean Variance / Covariance Matrix.

Usage

## S3 method for class 'procmod_varls'
print(x, ...)
## S3 method for class 'procmod_varls'
print(x, ...)

Arguments

`x`	a `procmod_varls` object
`...`	other parameters passed to other functions

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

# Build Three matrices of 3 rows.
A <- simulate_matrix(10,3)
B <- simulate_matrix(10,5)
C <- simulate_correlation(B,10,r2=0.6)

# Computes the variance covariance matrix
data <- procmod_frame(A = A, B = B, C = C)
v <- varls(data, nrand = 100)

print(v)

# Build Three matrices of 3 rows.
A <- simulate_matrix(10,3)
B <- simulate_matrix(10,5)
C <- simulate_correlation(B,10,r2=0.6)

# Computes the variance covariance matrix
data <- procmod_frame(A = A, B = B, C = C)
v <- varls(data, nrand = 100)

print(v)

Informative Procrustean Matrix Correlation

Description

Estimates corrected Procrustean correlation between matrices for removing overfitting effect.

Details

The functions in the ProcMod package aims to estimate and to test correlation between matrices, correcting for the spurious correlations because of the over-fitting effect.

The ProcMod package is developed on the metabarcoding.org gitlab (https://git.metabarcoding.org/lecasofts/ProcMod). The gitlab of metabarcoding.org provides up-to-date information and forums for bug reports.

Author(s)

Christelle Gonindard-Melodelima

Eric Coissac

The procmod_frame data structure.

Description

A procmod_frame can be considered as the analog of a data.frame for vector data. In a procmod_frame each element, equivalent to a column in a data.frame is a numeric matrix or a distance matrix object (dist). Every element must describe the same number of individuals. Therefore every numeric matrix must have the same number of row (nrow) and every distance matrix must have the same size (attr(d,"Size")). A procmod_frame can simultaneously contain both types of data, numeric and distance matrix.

Usage

procmod_frame(
  ...,
  row_names = NULL,
  check_rows = TRUE,
  reorder_rows = TRUE,
  contrasts_arg = NULL
)
procmod_frame(
  ...,
  row_names = NULL,
  check_rows = TRUE,
  reorder_rows = TRUE,
  contrasts_arg = NULL
)

Arguments

`...`	a set of objects to aggregate into a `procmod_frame`. These objects can be numeric matrices, or dist objects. Every objects must have the same number of row.
`row_names`	a character vector containing names associated to each row.
`check_rows`	a logical value. When set to `TRUE`, its default value, the number of row of every elements of the `procmod_frame` are tested for equality. Otherwise no check is done.
`reorder_rows`	a logical value. When set to `TRUE`, its default value, every elements of the `procmod_frame` are reordered according to the `row_names` order. Otherwise nothing is done.
`contrasts_arg`	A list, whose entries are values (numeric matrices or character strings naming functions) to be used as replacement values for the contrasts replacement function and whose names are the names of columns of data containing factors.

Value

a procmod_frame instance.

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

library(vegan)
data(bacteria)
data(eukaryotes)
data(soil)

dataset <- procmod_frame(euk = vegdist(decostand(eukaryotes,
                                                 method = "hellinger"),
                                       method = "euclidean"),
                         bac = vegdist(decostand(bacteria,
                                                 method = "hellinger"),
                                       method = "euclidean"),
                         soil = scale(soil,
                                      center = TRUE,
                                      scale  = TRUE))
length(dataset)
nrow(dataset)
ncol(dataset)
dataset$euk

library(vegan)
data(bacteria)
data(eukaryotes)
data(soil)

dataset <- procmod_frame(euk = vegdist(decostand(eukaryotes,
                                                 method = "hellinger"),
                                       method = "euclidean"),
                         bac = vegdist(decostand(bacteria,
                                                 method = "hellinger"),
                                       method = "euclidean"),
                         soil = scale(soil,
                                      center = TRUE,
                                      scale  = TRUE))
length(dataset)
nrow(dataset)
ncol(dataset)
dataset$euk

Rotate the `src` matrix to fit into the space of the `dest` matrix.

Description

The optimal rotation is computed according to the procruste methode. Rotation is based on singular value decomposition (SVD). No scaling and no centrering are done, before computing the SVD.

Usage

protate(src, dest)
protate(src, dest)

Arguments

`src`	a numeric matrix to be rotated
`dest`	a numeric matrix used as reference space

Value

a numeric matrix

Author(s)

Christelle Gonindard-Melodelima

Eric Coissac

Examples

# Generates two random matrices of size 10 x 15
m1 <- simulate_matrix(10, 15)
m2 <- simulate_matrix(10, 20)

# Rotates matrix m1 on m2
mr <- protate(m1, m2)

# Generates two random matrices of size 10 x 15
m1 <- simulate_matrix(10, 15)
m2 <- simulate_matrix(10, 20)

# Rotates matrix m1 on m2
mr <- protate(m1, m2)

Simulate n points of dimension p correlated to a reference matrix.

Description

Simulates a set of point correlated to another set according to the procrustean correlation definition. Points are simulated by drawing values of each dimension from a normal distribution of mean 0 and standard deviation equals to 1. The mean of each dimension is forced to 0 (data are centred). By default variable are also scaled to enforce a strandard deviation strictly equal to 1. Covariances between dimensions are not controled. Therefore they are expected to be equal to 0 and reflect only the random distribution of the covariance between two random vectors. The intensity of the correlation is determined by the r2 parameter.

Usage

simulate_correlation(reference, p, r2, equal_var = TRUE)
simulate_correlation(reference, p, r2, equal_var = TRUE)

Arguments

`reference`	a numeric matrix to which the simulated data will be correlated
`p`	an `int` value indicating the number of dimensions (variables) simulated
`r2`	the fraction of variation shared between the `reference` and the simulated data
`equal_var`	a `logical` value indicating if the dimensions must be scaled to force `sd=1`. `TRUE` by default.

Value

a numeric matrix of nrow(reference) rows and p columns

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

sim1 <- simulate_matrix(25,10)
class(sim1)
dim(sim1)
sim2 <- simulate_correlation(sim1,20,0.8)
corls(sim1, sim2)^2

sim1 <- simulate_matrix(25,10)
class(sim1)
dim(sim1)
sim2 <- simulate_correlation(sim1,20,0.8)
corls(sim1, sim2)^2

Simulate n points of dimension p.

Description

Points are simulated by drawing values of each dimension from a normal distribution of mean 0 and standard deviation equals to 1. The mean of each dimension is forced to 0 (data are centred). By default variable are also scaled to enforce a strandard deviation strictly equal to 1. Covariances between dimensions are not controled. Therefore they are expected to be equal to 0 and reflect only the random distribution of the covariance between two random vectors.

Usage

simulate_matrix(n, p, equal_var = TRUE)
simulate_matrix(n, p, equal_var = TRUE)

Arguments

`n`	an `int` value indicating the number of observations.
`p`	an `int` value indicating the number of dimensions (variables) simulated
`equal_var`	a `logical` value indicating if the dimensions must be scaled to force `sd=1`. `TRUE` by default.

Value

a numeric matrix of n rows and p columns

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

sim1 <- simulate_matrix(25,10)
class(sim1)
dim(sim1)

sim1 <- simulate_matrix(25,10)
class(sim1)
dim(sim1)

Subsetting Procmod Frames

Description

This is the implementation of the subset generic function for procmod_frame.

Usage

## S3 method for class 'procmod_frame'
subset(x, subset, select, drop = FALSE, ...)
## S3 method for class 'procmod_frame'
subset(x, subset, select, drop = FALSE, ...)

Arguments

`x`	object to be subsetted.
`subset`	logical expression indicating elements or rows to keep: missing values are taken as false.
`select`	expression, indicating columns to select from a data frame.
`drop`	passed on to [ indexing operator.
`...`	further arguments to be passed to or from other methods.

Details

The subset argument works on rows. Note that subset will be evaluated in the procmod_frame, so columns can be referred to (by name) as variables in the expression (see the examples).

The select argument if provided indicates with matrices have to be conserved. It works by first replacing column names in the selection expression with the corresponding column numbers in the procmod_frame and then using the resulting integer vector to index the columns. This allows the use of the standard indexing conventions so that for example ranges of columns can be specified easily, or single columns can be dropped (see the examples). Remember that each column of a procmod_frame is actually a matrix.

The drop argument is passed on to the procmod_frame indexing method. The default value is FALSE.

Value

A procmod_frame containing just the selected rows and columns.

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

Examples

library(vegan)
data(bacteria)
data(eukaryotes)
data(soil)

dataset <- procmod_frame(euk = vegdist(decostand(eukaryotes,
                                                 method = "hellinger"),
                                       method = "euclidean"),
                         bac = vegdist(decostand(bacteria,
                                                 method = "hellinger"),
                                       method = "euclidean"),
                         soil = scale(soil,
                                      center = TRUE,
                                      scale  = TRUE))
dim(dataset)

higher_ph = subset(dataset,soil[,"pH"] > 0)
dim(higher_ph)

without_bacteria = subset(dataset,soil[,"pH"] > 0, -bac)
dim(without_bacteria)

library(vegan)
data(bacteria)
data(eukaryotes)
data(soil)

dataset <- procmod_frame(euk = vegdist(decostand(eukaryotes,
                                                 method = "hellinger"),
                                       method = "euclidean"),
                         bac = vegdist(decostand(bacteria,
                                                 method = "hellinger"),
                                       method = "euclidean"),
                         soil = scale(soil,
                                      center = TRUE,
                                      scale  = TRUE))
dim(dataset)

higher_ph = subset(dataset,soil[,"pH"] > 0)
dim(higher_ph)

without_bacteria = subset(dataset,soil[,"pH"] > 0, -bac)
dim(without_bacteria)

Procrustean Correlation, and Variance / Covariance Matrices.

Description

varls, corls compute the procrustean variance / covariance, or correlation matrices between a set of real matrices and dist objects.

Usage

varls(..., nrand = 100, p_adjust_method = "holm")

corls(..., nrand = 100, p_adjust_method = "holm")
varls(..., nrand = 100, p_adjust_method = "holm")

corls(..., nrand = 100, p_adjust_method = "holm")

Arguments

`...`	the set of matrices or a `procmod_frame` object.
`nrand`	number of randomisation used to estimate the mean covariance observed between two random matrix. If rand is `NULL` or equal to `0`, no correction is estimated and the raw procrustean covariances are estimated.
`p_adjust_method`	the multiple test correction method used to adjust p values. `p_adjust_method` belongsone of the folowing values: `"holm"`, `"hochberg"`, `"hommel"`, `"bonferroni"`, `"BH"`, `"BY"`, `"fdr"`, `"none"`. The default is,set to `"holm"`.

Details

Procrustean covariance between two matrices X and Y, is defined as the sum of the singular values of the X'Y matrix (Gower 1971; Lingoes and Schönemann 1974). Both the X and Y matrices must have the same number of rows.

The variances and covariances and correlations are corrected to avoid over fitting (Coissac and Gonindard-Melodelima 2019).

The inputs must be numeric matrices or dist object. The set of input matrices can be aggregated un a procmod_frame.

Before computing the coefficients, matrices are projected into an orthogonal space using the ortho function.

The denominator n - 1 is used which gives an unbiased estimator of the (co)variance for i.i.d. observations.

Value

a procmod_varls object which corresponds to a numeric matrix annotated by several attributes.

The following attribute is always added:

- nrand an integer value indicating the number of randomisations used to estimate the mean of the random covariance.

When nrand is greater than 0 a couple of attributes is added:

- rcovls a numeric matrix containing the estimation of the mean of the random covariance.

- p.value a numeric matrix containing the estimations of the p.values of tests checking that the observed covariance is larger than the mean of the random covariance. p.values are corrected for multiple tests according to the method specified by the p_adjust_method parameter.

Author(s)

Eric Coissac

Christelle Gonindard-Melodelima

References

Gower JC (1971). “Statistical methods of comparing different multivariate analyses of the same data.” Mathematics in the archaeological and historical sciences, 138–149.

Lingoes JC, Schönemann PH (1974). “Alternative measures of fit for the Schönemann-carroll matrix fitting algorithm.” Psychometrika, 39(4), 423–427.

Coissac E, Gonindard-Melodelima C (2019). “Assessing the shared variation among high-dimensional data matrices: a modified version of the Procrustean correlation coefficient.” in prep.

Examples

# Build Three matrices of 3 rows.
A <- simulate_matrix(10,3)
B <- simulate_matrix(10,5)
C <- simulate_correlation(B,10,r2=0.6)

# Computes the variance covariance matrix
varls(A = A, B = B, C = C)

data = procmod_frame(A = A, B = B, C = C)
varls(data)

# Computes the correlation matrix
corls(data, nrand = 100)

# Build Three matrices of 3 rows.
A <- simulate_matrix(10,3)
B <- simulate_matrix(10,5)
C <- simulate_correlation(B,10,r2=0.6)

# Computes the variance covariance matrix
varls(A = A, B = B, C = C)

data = procmod_frame(A = A, B = B, C = C)
varls(data)

# Computes the correlation matrix
corls(data, nrand = 100)

Package 'ProcMod'

Help Index

Generate permutation matrix according to a schema.

Description

Usage

Arguments

Note

Internal function coercing the data to a matrix.

Description

Usage

Arguments

Value

Note

Author(s)

Internal function repeating a matrix.

Description

Usage

Arguments

Value

Note

Author(s)

Compute the trace of a square matrix.

Description

Usage

Arguments

Value

Note

Author(s)

Examples

Coerce to a ProcMod Frame.

Description

Usage

Arguments

Value

Author(s)

Examples

Converts a dist object to a data.frame object.

Description

Usage

Arguments

Author(s)

Examples

Double centering of a matrix.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Monte-Carlo Test on the sum of the singular values of a procustean rotation.

Description

Usage

Arguments

Author(s)

References

See Also

Examples

Dimensions of a ProcMod Frame.

Description

Usage

Arguments

Author(s)

Examples

DNA metabarcoding Australia South-North Gradient

Description

Usage

Format

Details

Author(s)

Test if the distance matrix is euclidean.

Description

Usage

Arguments

Author(s)

Examples

Check if an object is a ProcMod Frame.

Description

Usage

Arguments

Converts a `dist` object to a `data.frame` object.