Skip to Tutorial Content

Definitions

Matrices

In R, matrices (matrix objects) are two-dimensional arrays. They generally contain row and column names. A matrix must contain elements of the same mode (numeric, character, boolean…).

As an example, a matrix can be created as follows.

x <- round(runif(25), 2)
mat <- matrix(data=x, 
              ncol = 5, 
              byrow = TRUE)
print(mat)

You can also create a matrix by grouping vectors of the same size using the functions cbind() (column bind) or rbind() (row bind).

mat <- cbind(0:5, 20:25, 30:35)
mat
mat <- rbind(0:5, 20:25, 30:35)
mat

Functions for matrix object

Row and column names

Column/row names can be changed as follows:

set.seed(1)
mat <- matrix(data=round(rnorm(20), 2), 
            ncol = 4, 
            byrow = TRUE)

colnames(mat) <- LETTERS[1:4]
rownames(mat) <- letters[1:5]
print(mat)

Matrix dimensions

We can find out the number of rows, columns and dimensions of the matrix with the nrow(), ncol() and dim() functions respectively.

nrow(mat)
ncol(mat)
dim(mat)

Exercise

  • Given the following matrix, use the paste0() function to create row names of the form gene_1, gene_2, gene3… and column names of the form sample_1, sample_2, sample_3…. Associate these column and row names with the matrix mat.
set.seed(123)
mat <- matrix(data=round(runif(200, 0, 100), 0), 
            ncol = 10, 
            byrow = TRUE)
set.seed(123)
mat <- matrix(data=round(runif(200, 0, 100), 0), 
            ncol = 10, 
            byrow = TRUE)

rown <- paste0("gene_", 1:nrow(mat))
rownames(mat) <- rown
coln <- paste0("sample_", 1:ncol(mat))
colnames(mat) <- coln

The transposition function

To transpose a matrix (\(mat^{T}\)), rows and columns are swapped. In machine learning, features (e.g., genes) often appear in columns, while samples are rows. Use the t() function to perform the transposition.

mat
t(mat)

The diag() function

You can easily manipulate the matrix object with various specific functions. For example, getting and modifying the diagonal values can be performed by the diag() function.

Let’s imagine that a matrix represents the adjacency matrix of a graph which for any protein, A to H, indicates with a 1 whether it interacts with another (0 otherwise). Proteins will be the nodes (nodes/vertices) of the graph and interactions will constitute the edges (edges).

Let’s create such a matrix (we will see just later the graph representation).

mat <- matrix(c(0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 
                0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 
                1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 
                0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 
                0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 
                0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
                1, 1, 0, 0),
            ncol = 8, 
            byrow = TRUE)
cr_names <- LETTERS[1:8]
colnames(mat) <- rownames(mat) <- cr_names
print(mat)

Using the igraph library, the graph can be created with the graph_from_adjacency_matrix() function. Here, we choose to declare the graph as undirected (mode=“undirected”) because, here, for protein-protein interactions, there is no particular source and target (i.e we don’t know whether one activates or represses the other, they just interact…).

library(igraph)
my_graph <- igraph::graph_from_adjacency_matrix(mat, mode="undirected")
plot(my_graph)

From the diagram, and by extracting the values from the matrix diagonal, we can see that B interacts with itself, as does D. This proteins may create homodimers. To check all the proteins that can create homodimers, we can just ask for the matrix diagonal.

diag(mat)

If we do not want to focus on these homodimeric interactions we may simply set the diagonal values to 0.

diag(mat) <- 0
all(diag(mat) == 0) # TRUE
print(mat) # 
  • By creating a graph using the graph_from_adjacency_matrix() from the igraph library, check graphically that homodimeric interaction are no more present in the graph.
mat
library(igraph)
my_graph <- igraph::graph_from_adjacency_matrix(mat, mode="undirected")
plot(my_graph)

The lower.tri() function

The functions upper.tri() or lower.tri() return a logical matrix indicating if a cell from the matrix is part of the upper or lower triangle respectively.

upper.tri(mat)

Indexing

Indexing by a matrix

A test can be applied to all cells of the matrix. For example we can test whether the value is 1.

mat > 0.5

We can apply more complex tests by using boolean operators. For instance we could test whether a cell value is equal to 1 and part of the lower triangle.

mat > 0.5 & lower.tri(mat)

Two-dimensional indexing

Since a matrix contains rows and columns, we’ll (most of the time) use two-dimensional indexing. Two pieces of information are passed to the indexing operator in the form [lines, columns] (where lines and columns are vectors for the size of rows and columns respectively). If lines is not defined (e.g. [, columns]), all rows are extracted. Same principle for columns.

Given the matrix declared below:

  • Extract the value of the cell at position 1,1 and store the result in a variable a.
  • Extract the values of the cells at position 1,1 and 1,2 and store the result in a variable b.
  • Extract cell values from row 1 and store the result in a variable c. 
  • Extract cell values from rows 1 and 3 and store the result in a variable d. 
  • Extract cell values from column 1 and store the result in a variable e.
  • Extract cell values from columns 1 and 3 and store the result in a variable f. 
set.seed(123)
mat <- matrix(data=sample(1:20, size=40, replace = TRUE), 
            ncol = 4, 
            byrow = TRUE)
colnames(mat) <- LETTERS[1:4]
rownames(mat) <- letters[1:10]
set.seed(123)
mat <- matrix(data=sample(1:20, size=40, replace = TRUE), 
            ncol = 4, 
            byrow = TRUE)
colnames(mat) <- LETTERS[1:4]
rownames(mat) <- letters[1:10]

a <- mat[1, 1]
b <- mat[1, c(1, 2)]
c <- mat[1, ]
d <- mat[c(1,3), ]
e <- mat[, 1]
f <- mat[ ,c(1, 3)]

Given the matrix declared below:

  • Extract the values from the cells in columns 1 and 3 for rows 1 and 3, and store the result in a variable g.
  • Extract all rows where the values in column 1 are greater than 11, and store the result in a variable h.
  • Extract the cell where the row name is “a” and the column name is “B”, and store the result in a variable i.
  • Extract the cells where the row names are “a”, “b”, and “c”, and the column name is “B”, and store the result in a variable j.
  • Extract all columns where the values in row 1 are greater than 10, and store the result in a variable k.
set.seed(123)
mat <- matrix(data=sample(1:20, size=40, replace = TRUE), 
            ncol = 4, 
            byrow = TRUE)
colnames(mat) <- LETTERS[1:4]
rownames(mat) <- letters[1:10]
set.seed(123)
mat <- matrix(data=sample(1:20, size=40, replace = TRUE), 
            ncol = 4, 
            byrow = TRUE)
colnames(mat) <- LETTERS[1:4]
rownames(mat) <- letters[1:10]

g <- mat[c(1, 3) ,c(1, 3)]
h <- mat[mat[, 1] > 11, ]
i <- mat["a", "B"]
j <- mat[c("a", "b", "c"), "B"]
k <- mat[, mat[1,] > 10]

Implicit Conversion by the Indexing Function

The indexing function can cause a type conversion that is not always desired (but is often very practical). For example, below, if we select a column from the matrix, we end up with a vector, which seems quite natural (the same is observed if we select a row).

mat <- matrix(1:20, ncol=4)
is(mat[, 1])

We can prevent this default behavior by setting the drop argument of the indexing function to FALSE. It is set to TRUE by default.

mat <- matrix(1:20, nc=10)
is(mat[, 1, drop=FALSE])

The apply Function

Using the apply() function, we can apply functions, that take a vector as their first argument (e.g. mean(), median(), var(), sd()…), to the rows or columns of a matrix.

The syntax and arguments of the apply function are as follows: apply(X, MARGIN, FUN,…).

  • X is a matrix or a data.frame
  • MARGIN indicates whether the function should be applied to:
    • the rows (MARGIN=1)
    • or the columns (MARGIN=2)
  • FUN is the function to be applied
  • … additional arguments for FUN

If we write apply(X=mat, MARGIN=2, FUN=median), each column (MARGIN=2) of mat (X=mat) will be passed successively to the median() function. This will return a vector of size ncol(mat) containing the median values of each column.

Given the matrix \(mat\), use apply() to:

  • Calculate the mean (mean()) of each row and store the result in the variable a.
  • Calculate the variance (var()) of each row and store the result in the variable b.
  • Calculate the standard deviation (sd()) of each row and store the result in the variable c.
  • Calculate the interquartile range (IQR()) of each row and store the result in the variable d.
set.seed(123)
mat <- matrix(data=sample(1:20, size=40, replace = TRUE), 
            ncol = 4, 
            byrow = TRUE)
colnames(mat) <- LETTERS[1:4]
rownames(mat) <- letters[1:10]
set.seed(123)
mat <- matrix(data=sample(1:20, size=40, replace = TRUE), 
            ncol = 4, 
            byrow = TRUE)
colnames(mat) <- LETTERS[1:4]
rownames(mat) <- letters[1:10]

a <- apply(mat, 1, mean)
b <- apply(mat, 1, var)
c <- apply(mat, 1, sd)
d <- apply(mat, 1, IQR)

When the function being called has multiple arguments that need to be specified, the arguments can be passed after the apply function:

# E.g. apply a trimmed mean 
# to the rows by removing 20% 
# of the extreme values.
apply(mat, 1, mean, trim = 0.2)
  • Check the help for the quantile() function. Calculate the values of the \(1^{st}\) and \(3^{rd}\) quartiles for each column. Store the results in q_25and q_75 respectively.
set.seed(123)
mat <- matrix(data=sample(1:20, size=40, replace = TRUE), 
            ncol = 4, 
            byrow = TRUE)
set.seed(123)
mat <- matrix(data=sample(1:20, size=40, replace = TRUE), 
            ncol = 4, 
            byrow = TRUE)

q_25 <- apply(mat, MARGIN = 2, quantile, 0.25)
q_75 <- apply(mat, MARGIN = 2, quantile, 0.75)

Mathematical Operations

We will often work with numeric matrices on which we can perform mathematical operations. As with vectors, these operations are generally greatly simplified because they implicitly apply to all elements of the matrix. Thus, we can write the following instructions:

mat
mat + 10
mat / 2
abs(mat)^0.5
mat + mat ^ 2

End of the section

Thank you for following this tutorial.

The ‘matrix’ object