Indexing
Principle
Indexing retrieves specific elements from an object using brackets
[]
.
# the_target_positions are the positions we wish to extract from 'my_vector' vector.
my_vector[target_positions]
Here, target_positions can be:
- A vector of indices (positions from 1 to the length of the vector).
- A logical vector (of the size of the vector).
- Named positions in the vector.
We’ll look at these different cases below.
Indexing vectors by position
Principle
The indexing argument may be a set of numerical positions.
Note: R uses one-based indexing, where the first position is 1 (unlike Python’s zero-based indexing, starting at 0).
set.seed(123)
x <- sort(round(rnorm(10), digit=2), decreasing = T)
x
Retrieving positions 1 to 5 can thus be written:
x[1:5]
It’s also possible to request all but certain positions (we use -).
print(x[-c(1,3)])
Exercise
- Given the vector below, use the indexing operator and :
- Store position 6 of x in variable a.
- Store positions 5, 18 and 27 in variable b.
- After setting the random value generator to 123, randomly draw 10 positions of x without replacement and store the corresponding values in variable d.
set.seed(123)
x <- sort(sample(1:1000, size=100))
set.seed(123)
x <- sort(sample(1:1000, size=100))
a <- x[6]
b <- x[c(5, 18, 27)]
set.seed(123)
rnd_pos <- sample(1:length(x), size=10, replace=FALSE)
d <- x[rnd_pos]
Logical Indexing of Vectors
Principe
The principle is to pass the indexing operator a logical vector of the same size as the vector. Only true positions (TRUE) will be returned/selected.
print(x)
x > 0
x[x > 0]
Logical operators &
and |
can test
vector positions based on another vector, especially when both are of
the same size.
Example
Imagine, for example, the x and y coordinates of 2D points.
set.seed(123)
# We create normally distributed values
# on the x-axis.
x <- rnorm(100)
# Add a little noise to x
# to create y.
y <- x + rnorm(100, mean=0.3, sd=0.4)
One can visualize the result using the plot()
function.
# plot() creates a scatterplot
plot(x, y)
# Add a vertical/horizontal grid
grid()
# Adds a vertical (argument v) line
abline(v = 0, col="black")
# Add a horizontal (argument h) line
abline(h= 0, col="black")
# Add a diagonal line
# with equation y=x (intersect/a = 0, slope/b = 1)
abline(a= 0, b=1, col="black")
Indexing allows you to highlight specific points, such as coloring
positive x
and y
values. Use the
points()
function to overlay these points on an existing
plot.
plot(x, y)
grid()
abline(v = 0, col="black")
abline(h= 0, col="black")
abline(a= 0, b=1, col="black")
# What are the positive positions in x and y?
pos <- x > 0 & y > 0
# points() overlay points
# to an existing diagram.
# pch= point type
# col= color
points(x[pos], y[pos], pch=16, col="red")
Exercise
Complete the code to display in red those points for which the x-values are greater than the y-values, and in blue those for which y is greater than x.
set.seed(123)
x <- rnorm(100)
y <- x + rnorm(100, mean=0.1, sd=0.3)
plot(x,y)
grid()
set.seed(123)
x <- rnorm(100)
y <- x + rnorm(100, mean=0.1, sd=0.3)
plot(x,y)
grid()
abline(v = 0, col="black")
abline(h= 0, col="black")
abline(a= 0, b=1, col="black")
points(x[x > y], y[x > y], pch=16, col="red")
points(x[x < y], y[x < y], pch=16, col="blue")
Complete the code to display in red all points within a circle
centered at (0,0) with radius 1. Use Pythagoras’ theorem: points should
satisfy sqrt(x^2 + y^2) <= 1
(sqrt() is the square
root).
To raise to a power, use the ^
operator. For example,
write x ^ 2
.
set.seed(123)
x <- rnorm(100)
y <- rnorm(100)
plot(x,y)
set.seed(123)
x <- rnorm(100)
y <- rnorm(100)
plot(x,y)
grid()
abline(v = 0, col="black")
abline(h= 0, col="black")
abline(a= 0, b=1, col="black")
pos <- sqrt(x ^ 2 + y ^ 2) <= 1
points(x[pos], y[pos], col="red", pch=16)
Indexing vectors by position names
Principle and examples
Vector positions can be named and extracted by passing their names as
a vector to the indexing operator []
. For example, a vector
might store gene expression levels, with positions named after gene
symbols.
gene_expression <- c(2, 8, 5.6, 10, 2.7)
names(gene_expression) <- c("CD3E", "CD4", "ZAP70", "PCNA", "BUB3")
gene_expression
We can then naturally search for gene expression in the vector using the gene names passed in as a string vector.
gene_expression[c("ZAP70", "CD3E", "CD4")]
Using regular expressions for indexing
Consider the same vector.
gene_expression
The grep()
function searches for genes matching a
regular expression, which defines patterns in character
strings using a set of operators.
Operator | Meaning |
---|---|
. | Any character (except \n). |
[ABc] | A choice. Here A, B or c. |
[a-z] | A lower-case letter (or any interval, e.g: [u-w]). |
[A-Z] | An upper-case letter (or any interval, e.g: [A-W]). |
[^ABab] | A negation. Here any character but A, B, a or b. |
x* | 0 to n time the previous character (here “x”). |
x+ | 1 to n time the previous character (here “x”). |
x{n,m} | n to m time the previous character (here “x”). |
(foo|bar) | A character string or another (foo or bar). |
^ | The beginning of the line |
$ | The end of the line. |
\n | A newline. |
\t | A tabulation. |
\ | Escaping character (e.g \. means “a dot” not “any character” as indicated in the first line of the table). |
The grep() function returns positions matching a regular expression, e.g., genes whose names start (^) with “CD”.
pos <- grep(pattern = "^CD", x = names(gene_expression), perl = TRUE)
pos
These positions can then simply be extracted
gene_expression[pos]
NB: By default, grep()
does not include
certain regular expression operators. The use of the
perl = TRUE
argument below extends its capabilities to all
operators. When in doubt, it is preferable to set this argument to
TRUE.
Exercices
In the example below, extract the expression values for genes whose
name starts with ‘CD’ and ends with a numerical value
(store the results in a variable named g_exp_sub
). Use the
table above to construct the regular expression.
gene_expression <- c(2, 8, 5.6, 10, 2.7, 4, 9, 12)
names(gene_expression) <- c("CD3E", "CD4", "ZAP70", "PCNA", "BUB3", "CDC42BPA", "CDK11B", "CD52")
gene_expression <- c(2, 8, 5.6, 10, 2.7, 4, 9, 12)
names(gene_expression) <- c("CD3E", "CD4", "ZAP70", "PCNA", "BUB3", "CDC42BPA", "CDK11B", "CD52")
pos <- grep(pattern = "^CD.*[0-9]$", x = names(gene_expression), perl = TRUE)
g_exp_sub <- gene_expression[pos]
In the example below, extract the expression values for genes whose
name starts with ‘CD’ and ends with an alphanumerical
value (store the results in a variable named
g_exp_sub
). Use the table above to construct the regular
expression.
gene_expression <- c(2, 8, 5.6, 10, 2.7, 4, 9, 12)
names(gene_expression) <- c("CD3E", "CD4", "ZAP70", "PCNA", "BUB3", "CDC42BPA", "CDK11B", "CD52")
gene_expression <- c(2, 8, 5.6, 10, 2.7, 4, 9, 12)
names(gene_expression) <- c("CD3E", "CD4", "ZAP70", "PCNA", "BUB3", "CDC42BPA", "CDK11B", "CD52")
pos <- grep(pattern = "^CD.*[0-9A-Za-z]$", x = names(gene_expression), perl = TRUE)
g_exp_sub <- gene_expression[pos]
Quizz
Answer the questions below.
End of the section
Thank you for following this tutorial.