Simple Bar Charts
The geom_bar() function in ggplot2 is used to create
bar charts, which are particularly suited for
representing categorical data. This function allows you to construct
vertical or horizontal bars based on the
variables specified in the x and y aesthetics of the
aes()
function.
In this example, we will create a bar chart to
represent the frequency of different diamond cuts (cut
) in
the diamonds
dataset. The function will count the
number of occurrences of each cut: ‘Fair’, ‘Good’, ‘Very Good’,
etc. The position = "dodge"
argument places the
bars side by side, making it easier to compare frequencies across
different categories. Note below the use of labs()
, which
allows control over the axis labels.
## Load the ggplot2 library
library(ggplot2)
## Next, load the chickwts dataset
data(diamonds)
p <- ggplot(data = diamonds, aes(x = cut, fill = cut)) +
geom_bar(position = "dodge") +
labs(title = "Number of Diamond Cuts",
x = "Cut",
y = "Count")
print(p)
- Modify the following code to associate a unique color with each bar.
p <- ggplot(data = diamonds, aes(x = cut)) +
geom_bar(position = "dodge") +
labs(title = "Nombre de type de clarté par Coupe",
x = "Coupe",
y = "Nombre")
print(p)
col_palette <- c("Ideal" = "#A22200",
"Premium" = "#0871A4",
"Very Good" = "#00B850",
"Good" = "#226666",
"Fair" = "#FF8900")
p <- ggplot(data = diamonds, aes(x = cut, fill=cut)) +
geom_bar(position = "dodge") +
labs(title = "Nombre de type de clarté par Coupe",
x = "Coupe",
y = "Nombre",
fill="Coupe") +
scale_fill_manual(values=col_palette)
print(p)
Exemple à deux variables catégorielles
Dans cet exemple plus complexe, nous allons créer un graphique à
barres en utilisant cut
comme axe x et clarity
comme axe y pour explorer la fréquence des différentes clartés de
diamants (clarity) pour chaque type de coupes
(cut
). Le principe est de dire que les barres seront
colorés (fill
) en fonction de la variable
clarity
.
Nous utiliserons l’argument position = "stack"
(i.e. empilée) pour placer les comptes des différentes clartés
les une sur les autres.
p <- ggplot(data = diamonds, aes(x = cut, fill=clarity)) +
geom_bar(position = "stack") +
labs(title = "Nombre de type de clarté par Coupe",
x = "Coupe",
y = "Nombre")
print(p)
Note that clarity
is an ordinal categorical
variable, and ggplot naturally chooses a discrete color
gradient to represent it.
head(diamonds$clarity)
As in the previous examples, you can change the colors with
scale_fill_manual
. However, at this stage, we can introduce
a new function, scale_fill_brewer()
, which requires
installing the RColorBrewer
library. For this function, you
need to choose the name of one of the palettes displayed by
RColorBrewer::display.brewer.all()
. You pass the palette
name to the function (e.g.,
scale_fill_brewer(palette='Purples')
).
library('RColorBrewer')
display.brewer.all()
There are three types of palettes in
RColorBrewer
: sequential, diverging, and
qualitative.
- Sequential palettes are suitable for ordered data that progresses from the lowest to the highest value.
- Diverging palettes emphasize critical middle values and extremes at both ends of the data range.
- Qualitative palettes are best suited for representing nominal or categorical data.
- Test different sequential palettes from RColorBrewer (e.g. Blues, BuGn, BuPu, GnBu, Greens, Greys, Oranges, OrRd, PuBu, PuBuGn, PuRd, Purples, RdPu, Reds, YlGn, YlGnBu, YlOrBr, YlOrRd) for the representation.
p <- ggplot(data = diamonds, aes(x = cut, fill=clarity)) +
geom_bar(position = "stack", color="black") +
labs(title = "Nombre de type de clarté par Coupe",
x = "Coupe",
y = "Nombre") + scale_fill_brewer(palette='___')
print(p)
p <- ggplot(data = diamonds, aes(x = cut, fill=clarity)) +
geom_bar(position = "stack", color="black") +
labs(title = "Nombre de type de clarté par Coupe",
x = "Coupe",
y = "Nombre") + scale_fill_brewer(palette='Purples')
print(p)
Bar Positioning
In ggplot2, the ‘position’ argument of
geom_bar
determines the positioning of the
bars and can drastically change the perception:
- ‘Stack’ stacks the bars to represent cumulative values, useful for showing totals and proportions of subcategories.
- ‘Dodge’ places the bars side by side without overlap, ideal for directly comparing values between categories.
- ‘Dodge2’ is similar to ‘dodge’ but further separates the bars based on another variable, creating side-by-side groups.
- ‘Fill’ fills the entire bar. Useful for visualizing proportions/percentages.
- ‘Identity’ places the bars in front of each other. Be cautious, as this is rarely what you want.
- ‘Jitter’ adds a bit of noise to the x-axis. In the context of bar charts, this argument has little value. Again, the bars are placed in front of each other.
- In the diagram below, change the position successively to ‘stack’, ‘dodge’, and ‘fill’.
p <- ggplot(data = diamonds, aes(x = cut, fill=clarity)) +
geom_bar(position = "___", color="black") +
labs(title = "Number of Clarity Types by Cut",
x = "Coupe",
y = "Nombre") + scale_fill_brewer(palette='Oranges')
print(p)
ggplot(data = diamonds, aes(x = cut, fill=clarity)) +
geom_bar(position = "stack", color="black") +
labs(title = "Number of Clarity Types by Cut",
x = "Coupe",
y = "Nombre") + scale_fill_brewer(palette='Purples')
ggplot(data = diamonds, aes(x = cut, fill=clarity)) +
geom_bar(position = "dodge", color="black") +
labs(title = "Number of Clarity Types by Cut",
x = "Coupe",
y = "Nombre") + scale_fill_brewer(palette='Purples')
ggplot(data = diamonds, aes(x = cut, fill=clarity)) +
geom_bar(position = "fill", color="black") +
labs(title = "Number of Clarity Types by Cut",
x = "Coupe",
y = "Proportion") + scale_fill_brewer(palette='Purples')
Consolidation Exercises
Iris Dataset
- Create a plot like the one shown below that compares the sepal
lengths (
Sepal.Length
) for each flower species (Species
) in the iris dataset (data(iris)
). The plot, stored in a variablep
, should display the distributions as boxes (geom_boxplot()
) with jittered points (geom_jitter()
) to visualize each individual observation. The axes should be renamed “Species” and “Sepal Lengths.” Each species should be associated with a unique box color (‘fill’). - Control colors using
scale_fill_brewer(palette='Dark2')
.
library(ggplot2)
set.seed(456)
data(iris)
library(ggplot2)
set.seed(456)
data(iris)
# Example of comparing distributions with violins
p <- ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot(alpha = 0.7) +
geom_jitter(height = 0) +
scale_fill_brewer(palette='Dark2') +
labs(title = "Comparison of Sepal Lengths by Species",
x = "Species",
y = "Sepal Lengths",
fill = "Species")
print(p)
ToothGrowth Dataset
Create a violin plot like the one shown below to compare the distribution of tooth lengths (
len
) based on the method of vitamin C administration (supp
) in theToothGrowth
dataset.The categories ‘OJ’ and ‘VC’ should be replaced with ‘Orange juice’ and ‘Ascorbic acid’.
Add “rugs” (small lines) to the plot (
geom_rug()
) to visualize the data distribution along the y-axis.The axes should be renamed “Supplements” and “Tooth Lengths”. Each violin should have its own unique color (‘fill’).
Control colors with
scale_fill_brewer(palette='Accent')
andscale_color_brewer(palette='Accent')
.
data(ToothGrowth)
data(ToothGrowth)
levels(ToothGrowth$supp) <- c('Orange juice', 'Ascorbic acid')
p <- ggplot(ToothGrowth, aes(x = supp, y = len, fill = supp, color=supp)) +
geom_violin(alpha = 0.7, color="black") +
geom_rug() +
scale_fill_brewer(palette='Accent') +
scale_color_brewer(palette='Accent') +
labs(title = "Comparison of Tooth Length by Dose",
x = "Supplements",
y = "Tooth Lengths",
fill = "Supplements",
color = "Supplements")
print(p)
Palmerpenguins Dataset
For this example, we will use the penguins
dataset from
the palmerpenguins
package. This dataset contains
information about penguins from the Palmer Archipelago in
Antarctica.
- Using the penguins dataset, generate a plot
p
identical to the diagram below. In this example, we useflipper_length_mm
(flipper length) on the x-axis,bill_length_mm
(bill length) on the y-axis, andbody_mass_g
(body mass) for the size of the points, withspecies
(species) for the color of the points. You must also ensure the axis labels for the x and y axes match the given names.
library(palmerpenguins)
data("penguins")
penguins <- na.omit(penguins)
library(palmerpenguins)
data("penguins")
penguins <- na.omit(penguins)
# Create a custom color palette
palette_couleurs <- c("dodgerblue", "darkorange", "forestgreen")
# Create the colored bubble plot
p <- ggplot(penguins, aes(x = flipper_length_mm, y = bill_length_mm, size = body_mass_g, color = species)) +
geom_point(alpha = 0.7) +
scale_color_manual(values = palette_couleurs) +
labs(x = "Flipper Length (mm)",
y = "Bill Length (mm)",
color = "Species",
size = "Body Mass (g)")
print(p)
End of the section
Thank you for following this tutorial.