Indexing vectors.

Indexing

Principle

Indexing retrieves specific elements from an object using brackets [].

# the_target_positions are the positions we wish to extract from 'my_vector' vector.
my_vector[target_positions]

Here, target_positions can be:

A vector of indices (positions from 1 to the length of the vector).
A logical vector (of the size of the vector).
Named positions in the vector.

We’ll look at these different cases below.

Indexing vectors by position

Principle

The indexing argument may be a set of numerical positions.

Note: R uses one-based indexing, where the first position is 1 (unlike Python’s zero-based indexing, starting at 0).

set.seed(123) 
x <- sort(round(rnorm(10), digit=2), decreasing = T)
x

Retrieving positions 1 to 5 can thus be written:

x[1:5]

It’s also possible to request all but certain positions (we use -).

print(x[-c(1,3)])

Exercise

Given the vector below, use the indexing operator and :
- Store position 6 of x in variable a.
- Store positions 5, 18 and 27 in variable b.
- After setting the random value generator to 123, randomly draw 10 positions of x without replacement and store the corresponding values in variable d.

set.seed(123)
x <- sort(sample(1:1000, size=100))

set.seed(123)
x <- sort(sample(1:1000, size=100))
a <- x[6]
b <- x[c(5, 18, 27)]
set.seed(123)
rnd_pos <- sample(1:length(x), size=10, replace=FALSE) 
d <- x[rnd_pos]

Logical Indexing of Vectors

Principe

The principle is to pass the indexing operator a logical vector of the same size as the vector. Only true positions (TRUE) will be returned/selected.

print(x)

x > 0

x[x > 0]

Logical operators & and | can test vector positions based on another vector, especially when both are of the same size.

Example

Imagine, for example, the x and y coordinates of 2D points.

set.seed(123)
# We create normally distributed values 
# on the x-axis.
x <- rnorm(100)
# Add a little noise to x
# to create y.
y <- x + rnorm(100, mean=0.3, sd=0.4)

One can visualize the result using the plot() function.

# plot() creates a scatterplot
plot(x, y)
# Add a vertical/horizontal grid
grid()
# Adds a vertical (argument v) line
abline(v = 0, col="black")
# Add a horizontal (argument h) line
abline(h= 0, col="black")
# Add a diagonal line
# with equation y=x (intersect/a = 0, slope/b = 1)
abline(a= 0, b=1, col="black")

Indexing allows you to highlight specific points, such as coloring positive x and y values. Use the points() function to overlay these points on an existing plot.

plot(x, y)
grid()
abline(v = 0, col="black")
abline(h= 0, col="black")
abline(a= 0, b=1, col="black")

# What are the positive positions in x and y?
pos <- x > 0 & y > 0
# points() overlay points
# to an existing diagram.
# pch= point type
# col= color
points(x[pos], y[pos], pch=16, col="red")

Exercise

Complete the code to display in red those points for which the x-values are greater than the y-values, and in blue those for which y is greater than x.

set.seed(123)
x <- rnorm(100)
y <- x + rnorm(100, mean=0.1, sd=0.3)
plot(x,y)
grid()

set.seed(123)
x <- rnorm(100)
y <- x + rnorm(100, mean=0.1, sd=0.3)
plot(x,y)
grid()
abline(v = 0, col="black")
abline(h= 0, col="black")
abline(a= 0, b=1, col="black")
points(x[x > y], y[x > y], pch=16, col="red")
points(x[x < y], y[x < y], pch=16, col="blue")

Complete the code to display in red all points within a circle centered at (0,0) with radius 1. Use Pythagoras’ theorem: points should satisfy sqrt(x^2 + y^2) <= 1 (sqrt() is the square root).

To raise to a power, use the ^ operator. For example, write x ^ 2.

set.seed(123)
x <- rnorm(100)
y <- rnorm(100)
plot(x,y)

set.seed(123)
x <- rnorm(100)
y <- rnorm(100)
plot(x,y)
grid()
abline(v = 0, col="black")
abline(h= 0, col="black")
abline(a= 0, b=1, col="black")
pos <- sqrt(x ^ 2 + y ^ 2) <= 1
points(x[pos], y[pos], col="red", pch=16)

Indexing vectors by position names

Principle and examples

Vector positions can be named and extracted by passing their names as a vector to the indexing operator []. For example, a vector might store gene expression levels, with positions named after gene symbols.

gene_expression <- c(2, 8, 5.6, 10, 2.7)
names(gene_expression) <- c("CD3E", "CD4", "ZAP70", "PCNA", "BUB3")
gene_expression

We can then naturally search for gene expression in the vector using the gene names passed in as a string vector.

gene_expression[c("ZAP70", "CD3E", "CD4")]

Using regular expressions for indexing

Consider the same vector.

gene_expression

The grep() function searches for genes matching a regular expression, which defines patterns in character strings using a set of operators.

Operator	Meaning
.	Any character (except \n).
[ABc]	A choice. Here A, B or c.
[a-z]	A lower-case letter (or any interval, e.g: [u-w]).
[A-Z]	An upper-case letter (or any interval, e.g: [A-W]).
[^ABab]	A negation. Here any character but A, B, a or b.
x*	0 to n time the previous character (here “x”).
x+	1 to n time the previous character (here “x”).
x{n,m}	n to m time the previous character (here “x”).
(foo\|bar)	A character string or another (foo or bar).
^	The beginning of the line
$	The end of the line.
\n	A newline.
\t	A tabulation.
\	Escaping character (e.g \. means “a dot” not “any character” as indicated in the first line of the table).

The grep() function returns positions matching a regular expression, e.g., genes whose names start (^) with “CD”.

pos <- grep(pattern = "^CD", x = names(gene_expression), perl = TRUE)
pos

These positions can then simply be extracted

gene_expression[pos]

NB: By default, grep() does not include certain regular expression operators. The use of the perl = TRUE argument below extends its capabilities to all operators. When in doubt, it is preferable to set this argument to TRUE.

Exercices

In the example below, extract the expression values for genes whose name starts with ‘CD’ and ends with a numerical value (store the results in a variable named g_exp_sub). Use the table above to construct the regular expression.

gene_expression <- c(2, 8, 5.6, 10, 2.7, 4, 9, 12)
names(gene_expression) <- c("CD3E", "CD4", "ZAP70", "PCNA", "BUB3", "CDC42BPA", "CDK11B", "CD52")

gene_expression <- c(2, 8, 5.6, 10, 2.7, 4, 9, 12)
names(gene_expression) <- c("CD3E", "CD4", "ZAP70", "PCNA", "BUB3", "CDC42BPA", "CDK11B", "CD52")
pos <- grep(pattern = "^CD.*[0-9]$", x = names(gene_expression), perl = TRUE)
g_exp_sub <- gene_expression[pos]

In the example below, extract the expression values for genes whose name starts with ‘CD’ and ends with an alphanumerical value (store the results in a variable named g_exp_sub). Use the table above to construct the regular expression.

gene_expression <- c(2, 8, 5.6, 10, 2.7, 4, 9, 12)
names(gene_expression) <- c("CD3E", "CD4", "ZAP70", "PCNA", "BUB3", "CDC42BPA", "CDK11B", "CD52")

gene_expression <- c(2, 8, 5.6, 10, 2.7, 4, 9, 12)
names(gene_expression) <- c("CD3E", "CD4", "ZAP70", "PCNA", "BUB3", "CDC42BPA", "CDK11B", "CD52")
pos <- grep(pattern = "^CD.*[0-9A-Za-z]$", x = names(gene_expression), perl = TRUE)
g_exp_sub <- gene_expression[pos]

Quizz

Answer the questions below.

End of the section

Thank you for following this tutorial.