Facets
Principle
The ggplot2
library offers an extremely powerful tool
for dividing a plot into panels (facets) based on the levels of
specified categorical variables. Facets allow for data
exploration based on a factor or a given group of factors. For
the following example, we will create a matrix containing the results of
a fictitious ELISA test, where measurements are taken
at 2 different times (days) for experiments conducted by four different
operators.
url <- "https://zenodo.org/record/8210893/files/elisa_artificial.txt"
elisa <- read.table(url, sep="\t", header=TRUE, row.names=1)
head(elisa)
This is an artificial dataset from an ELISA test using 96-well plates. Eight ELISA plates (12 columns / 8 rows) were used, as can be verified here.
table(elisa$rows, elisa$columns)
These eight plates were produced by 4 experimenters on two different days.
table(elisa$user, elisa$day)
The facet_wrap() Function
With the ggplot2
syntax, it becomes very easy to produce
histograms corresponding to the intensity of colorations obtained in
each well (value
) for a given experimenter
(user
). In the example below, note the use of the
facet_wrap()
function:
- This function creates a one-dimensional arrangement of
facets, which can optionally be displayed across multiple rows
using the
nrow
andncol
arguments. - The
facets
argument passed tofacet_wrap()
must be a formula (formula
). In our example,facets = ~ user
translates to ‘create graphical panels based on the value of theuser
variable.’
p <- ggplot(data = elisa,
mapping = aes(x=value))
p + geom_histogram() +
facet_wrap(facets = ~ user, ncol=2)
For exploratory purposes, we can similarly analyze the distributions
of the values obtained based on the operator (user
) and the
day (day
).
p <- ggplot(data = elisa,
mapping = aes(x=value))
p + geom_histogram() +
facet_wrap(facets = ~ user + day, ncol=2)
- Display the density distributions of
value
in facets based on the experimenter and the day.
p <- ggplot(data = elisa,
mapping = aes(x=value, fill=user))
p + geom_density(color=NA) +
facet_wrap(facets = ~ user + day, ncol=2)
The facet_grid() Function
Since each user performed an ELISA experiment on Monday and Friday,
we can choose a two-dimensional faceted representation
with facet_grid()
(a grid/matrix of facets). Note that the
facets
argument is set to user ~ day
,
indicating that user
will be displayed in rows and
day
in columns.
p <- ggplot(data = elisa,
mapping = aes(x=value, fill=user))
p + geom_histogram() +
facet_grid(facets = user ~ day) +
scale_fill_manual(values=c("#1B9E77", "#D95F02", "#7570B3", "#E7298A"))
- Display horizontal boxplots (using
coord_flip()
) corresponding to the distributions of the variablevalue
for each user, creating a facet based on the day.
p <- ggplot(data = elisa,
mapping = aes(x=user, y=value, fill=user))
p + geom_boxplot() + coord_flip() +
facet_wrap(facets = ~ day, ncol=2)
Application Example: Heatmap
Based on the numerical data loaded into R, we may want to create a color-coded image (heatmap) of the ELISA plates produced by different users.
- The coordinates
x
(elisa$rows
) andy
(elisa$columns
) of the wells in the plate are available. - The
fill
aesthetic will be mapped tovalue
.
We can use geom_raster() to represent an ELISA plate and
partition the plot based on user
and day
.
p <- ggplot(data = elisa,
mapping = aes(x=columns, y=rows, fill=value))
p + geom_raster() +
facet_grid(facets = user ~ day)
For geom_raster()
, which represents the
continuous numeric variable value
, one of the
following functions can be used to control the colors:
scale_fill_gradient()
: This function is used to specify the fill colors in a gradual manner in a plot, using a single start color and a single end color.scale_fill_gradient2()
: This function is similar toscale_fill_gradient()
, but it allows specifying an additional color, which serves as a midpoint or central value in the color scale, creating a two-color gradient.scale_fill_gradientn()
: This function is used to specify a fill gradient with multiple custom colors. You can define the colors you want to use in the color scale based on your data and preferences.
p <- ggplot(data = elisa,
mapping = aes(x=columns, y=rows, fill=value))
p + geom_raster() +
facet_grid(facets = user ~ day) +
scale_fill_gradientn(colours = c("#0000BF", "#0000FF",
"#0080FF", "#00FFFF",
"#40FFBF", "#80FF80",
"#BFFF40", "#FFFF00",
"#FF8000", "#FF0000",
"#BF0000"))
Ordering Rows/Columns
You may have noticed that the rows are not ideally ordered. We would prefer the order: ‘cont’, ‘a’, ‘b’, ‘c’, …
In ggplot2
, to order factors, you need to create ordinal
variables. This can be done, as we saw earlier, using the
ordered=TRUE
argument in the factor()
function.
- Modify the following code so that the columns are ordered.
___
p <- ggplot(data = elisa,
mapping = aes(x=columns, y=rows, fill=value))
p + geom_raster() +
facet_grid(facets = user ~ day) +
scale_fill_gradientn(colours = c("#0000BF", "#0000FF",
"#0080FF", "#00FFFF",
"#40FFBF", "#80FF80",
"#BFFF40", "#FFFF00",
"#FF8000", "#FF0000",
"#BF0000"))
elisa$rows <- factor(x = elisa$rows, ordered = T, levels=c('cont', letters[1:7]))
p <- ggplot(data = elisa,
mapping = aes(x=columns,
y=rows,
fill=value))
p <- p + geom_raster() +
facet_grid(facets = user ~ day) +
scale_fill_gradientn(colours = c("#0000BF", "#0000FF",
"#0080FF", "#00FFFF",
"#40FFBF", "#80FF80",
"#BFFF40", "#FFFF00",
"#FF8000", "#FF0000",
"#BF0000"))
print(p)
Predefined Graphic Themes
Introduction to Themes
There are many ways to adjust the overall visual appearance of a plot. As a first step, you can apply a predefined theme, which affects various parameters of the plot (fonts, character sizes, axis styles, background color, contrast, etc.). ggplot2 includes around ten built-in themes. The names of these global configuration functions usually start with ‘theme_’.
apropos("^theme_")
[1] "theme_bw" "theme_classic" "theme_dark" "theme_get"
[5] "theme_gray" "theme_grey" "theme_light" "theme_linedraw"
[9] "theme_minimal" "theme_replace" "theme_set" "theme_test"
[13] "theme_update" "theme_void"
For example:
theme_gray()
: The signature theme of ggplot2 with a gray background and white grid lines, designed to highlight the data while facilitating comparisons.
theme_bw()
: The classic ggplot2 theme with a white background and black grid lines, designed to highlight the data while facilitating comparisons. It may be better suited for presentations displayed using a projector.
theme_linedraw()
: A theme with only black lines of varying widths on a white background, reminiscent of a line drawing. The goal is similar to that oftheme_bw()
.
theme_void()
: A totally empty theme.
theme_minimal()
: un thème totalement épuré.
theme_dark()
dont le nom est très parlant…
theme_classic()
,theme_test()
,theme_dark()
,theme_light()
…
D’autres thèmes prédéfinis sont disponibles dans la librairie ggthemes.
- Par exemple
theme_excel()
pour les nostalgiques du tableur Microsoft… L’aide indique: “Thème permettant de reproduire l’affreuse monstruosité qu’était l’ancien graphique Excel à fond gris. Ne l’utilisez jamais.” :). A noter que vous pouvez aussi bénéficier de l’hideuse palette excel (scale_colour_excel()
). Un must… :)
- Ou encore
theme_wsj()
pour simuler un diagramme du Wall Street Journal…
Exercice
- Essayez successivement d’ajouter l’un des thèmes suivant au
diagramme \(p\):
theme_bw()
,theme_classic()
,theme_dark()
,theme_gray()
,theme_grey()
,theme_light()
,theme_minimal()
,theme_void()
,ggthemes::theme_wsj()
,ggthemes::theme_excel()
,ggthemes::theme_excel_new()
,ggthemes::theme_economist()
…
p <- p + theme_bw()
print(p)
p <- p + theme_bw()
print(p)
p <- p + theme_classic()
print(p)
p <- p + theme_dark()
print(p)
p <- p + theme_light()
print(p)
p <- p + theme_minimal()
print(p)
p <- p + ggthemes::theme_wsj()
print(p)
p <- p + ggthemes::theme_excel()
print(p)
p <- p + ggthemes::theme_excel_new()
print(p)
p <- p + ggthemes::theme_economist()
print(p)
#...
Fine-Tuning Graphs
The theme() and element_*() Functions
Beyond applying predefined themes (like theme_minimal()
or theme_bw()
), you can customize every aspect of a graph
according to your needs.
The
theme()
function offers maximum flexibility for customizing the appearance of your graphs. For example, you can specify the font, font size, and text color for axis titles and legends, or change the background of the graph to fit a dark or light theme.The
element_*()
functions (notablyelement_text()
,element_line()
,element_rect()
,element_blank()
…) are used in combination withtheme()
to control specific elements of the graph.The
element_text()
function is used to define the font, font size, and color of a text element.The
element_line()
function customizes graph lines, such as line thickness or line type.The
element_rect()
function customizes box/rectangle-type elements.You can use
element_blank()
to completely remove certain elements of the graph if needed.
- Place your cursor between the parentheses and press the
key on your keyboard to view all the arguments of theme()
. This reveals all the modifiable elements of the graph (and there are many…).
theme()
# rect
# text
# title
# aspect.ratio
# axis.title
# axis.title.x
# axis.title.x.top
# axis.title.x.bottom
# axis.title.y
# axis.title.y.left
# axis.title.y.right
# axis.text
# axis.text.x
# axis.text.x.top
# axis.text.x.bottom
# axis.text.y
# axis.text.y.left
# axis.text.y.right
# axis.ticks
# axis.ticks.x
# axis.ticks.x.top
# axis.ticks.x.bottom
# axis.ticks.y
# axis.ticks.y.left
# axis.ticks.y.right
# axis.ticks.length
# axis.ticks.length.x
# axis.ticks.length.x.top
# axis.ticks.length.x.bottom
# axis.ticks.length.y
# axis.ticks.length.y.left
# axis.ticks.length.y.right
# axis.line
# axis.line.x
# axis.line.x.top
# axis.line.x.bottom
# axis.line.y
# axis.line.y.left
# axis.line.y.right
# legend.background
# legend.margin
# legend.spacing
# legend.spacing.x
# legend.spacing.y
# legend.key
# legend.key.size
# legend.key.height
# legend.key.width
# legend.text
# legend.text.align
# legend.title
# legend.title.align
# legend.position
# legend.direction
# legend.justification
# legend.box
# legend.box.just
# legend.box.margin
# legend.box.background
# legend.box.spacing
# panel.background
# panel.border
# panel.spacing
# panel.spacing.x
# panel.spacing.y
# panel.grid
# panel.grid.major
# panel.grid.minor
# panel.grid.major.x
# panel.grid.major.y
# panel.grid.minor.x
# panel.grid.minor.y
# panel.ontop
# plot.background
# plot.title
# plot.title.position
# plot.subtitle
# plot.caption
# plot.caption.position
# plot.tag
# plot.tag.position
# plot.margin
# strip.background
# strip.background.x
# strip.background.y
# strip.clip
# strip.placement
# strip.text
# strip.text.x
# strip.text.x.bottom
# strip.text.x.top
# strip.text.y
# strip.text.y.left
# strip.text.y.right
# strip.switch.pad.grid
# strip.switch.pad.wrap
Examples
Below, we customize various elements of a graph (with varying levels
of aesthetic appeal…). You will notice that it is quite intuitive to
determine whether to use element_text()
,
element_line()
, or element_rect()
depending on
the context. Note that the argument names are consistent across these
three functions (color, size…), which makes them easy to use.
p <- p + theme_minimal()
p <- p + theme(strip.background = element_rect(color="red", fill="orange"),
strip.text = element_text(color="white", face="bold"),
axis.text.x = element_text(color="blue", size=7, angle=45, family = "Helvetica", face="bold"),
axis.text.y = element_text(color="darkviolet", size=10, family = "Times", face="bold"),
axis.ticks.x = element_line(color="brown", linewidth=1),
axis.ticks.y = element_line(color="darkturquoise", linewidth=1),
plot.background = element_rect(fill="paleturquoise"),
)
#...
Exercises
In the following plot:
- Change the font of the graph title
(
family="Times"
). - Adjust the angle of the x-axis text (angle=45°).
- Modify the background color (fill).
- Remove the secondary grid lines (using element_blank()).
- Add a border line to the boxes containing the legends.
p <- p + ggtitle("Flipper Lengths vs Bill Lengths") +
theme(plot.title = ___,
axis.text.x = ___,
plot.background = ___,
panel.grid.minor = ___,
legend.background = element_blank(),
legend.box.background = ___
)
print(p)
p <- p + ggtitle("Flipper Lengths vs Bill Lengths") +
theme(plot.title = element_text(family="Times"),
axis.text.x = element_text(angle=45),
plot.background = element_rect(fill="#EEDDAA"),
panel.grid.minor = element_blank(),
legend.background = element_blank(),
legend.box.background = element_rect(color = "black", size=1)
)
print(p)
Exercises
The Dataset
- Here, our dataset contains several pieces of information related to nearly all known transcripts in the human genome (one per row). This data was produced in tsv format using the pygtftk software (v1.6.3) from a GTF file downloaded from Ensembl (genome version GRCh38, release 92).
Since the file is somewhat large, we will download it and place it in your user folder so that it does not need to be downloaded again later.
options(timeout=10000)
dir_path <- file.path(fs::path_home(), ".rtrainer")
dir.create(dir_path, showWarnings = FALSE)
## The URL pointing to the dataset
url <- "https://zenodo.org/record/8211383/files/Homo_sapiens.GRCh38.110.subset_2.tsv.gz"
# Download
file_path <- file.path(dir_path, "Homo_sapiens.GRCh38.110.subset_2.tsv.gz")
if(!file.exists(file_path)) download.file(url=url, destfile = file_path, quiet = TRUE)
We will load the file into R using the read.table()
function. At the same time, we will assign the
transcript_id
column to the row names
(row.names=6
).
tx_info <- read.table(file=file_path, header=TRUE, sep="\t", row.names=6)
dim(tx_info)
Here is our dataset:
head(tx_info)
Number of Transcripts per Chromosome
Create a diagram with geom_bar()
showing the number of
different transcripts per chromosome (seqid
). - Use
+ coord_flip()
to rotate the diagram. - Order the
chromosomes as follows: 1, 2, 3 .. 22, X, Y, MT.
___
p <- ggplot(data=tx_info, ___) +
___
tx_info$seqid <- factor(tx_info$seqid,
levels = c(as.character(1:22), "X", "Y", "MT"),
ordered = TRUE)
p <- ggplot(data=tx_info,
mapping=aes(x=seqid)) +
geom_bar() + coord_flip() +
theme_bw()
print(p)
Transcript Sizes
In the
data.frame
tx_info
, create a new columntx_genomic_size_log10
containing the variabletx_genomic_size
converted to base 10 logarithm (log10()
). Using histograms and facets, explore the variabletx_genomic_size_log10
(transcript size including introns in base 10 logarithm).Use
geom_histogram()
andfacet_grid(gene_biotype~., scale="free_y")
. The argumentscale="free_y"
allows each facet to have its own specific scale.Appropriately configure (
theme()
) the size and orientation of the textual elements.
p <- ggplot(data=tx_info,
mapping=aes(x=tx_genomic_size_log10)) +
geom_histogram(bins=50) +
facet_grid(gene_biotype~., scale="free_y") +
labs(x="Genomic Size of Transcripts (log10)") +
theme_minimal() +
theme(panel.grid.minor = element_blank(),
strip.text.y = element_text(angle=0, size=5),
axis.text.y = element_text(size=5))
Number of Exons
If processed pseudogenes no longer have introns, we should only find one exon…
Transform the nb_exons
column into its logarithm and
place the result in the nb_exons_log10
column. What can you
say about the number of exons (nb_exons_log10
) for
transcripts based on “gene_biotype”? Use a boxplot or
violin plot to present this information.
biotypes <- unique(tx_info$gene_biotype)
palette <- setNames(rainbow(length(biotypes)), biotypes)
tx_info$nb_exons_log10 <- log10(tx_info$nb_exons)
p <- ggplot(data=tx_info,
mapping=aes(x=gene_biotype,
y=nb_exons_log10,
fill=gene_biotype)) +
geom_boxplot() +
theme_minimal() +
labs(y="Nummber of exons (log10)") +
coord_flip() +
scale_fill_manual(values=palette) +
theme(legend.position = "none")
print(p)
Chromosomal Distribution of Gene Types
- Create a bar chart (
geom_bar
) showing the number of transcripts for eachgene_biotype
class on each chromosome. Usegeom_bar()
with theposition
argument set tostack
,dodge
, orfill
. Depending on this argument, do you get the same impression about the distribution of gene biotypes across chromosomes? What are the advantages and disadvantages of each representation? What can you say about the number and types of genes present on the Y chromosome?
tx_info$nb_exons_log10 <- log10(tx_info$nb_exons)
p <- ggplot(data=tx_info,
mapping=aes(x=seqid,
fill=gene_biotype)) +
geom_bar(position="fill") +
theme_minimal() +
labs(y="Count",
x="Chromosome") +
coord_flip() +
theme(legend.position = "bottom")
print(p)
End of the section
Thank you for following this tutorial.