Introduction to Class Systems in R
Class systems are a fundamental element of object-oriented programming in R. They allow the creation of complex data structures called objects, which combine attributes (variables) and methods (functions) to represent real-world entities in a modular way. In R, the two main class systems are S3 and S4, each with its own characteristics and conventions. In this course, we will focus on S4 classes, which are commonly used in the Bioconductor project, a set of function libraries for biological data analysis.
S4 Classes
Creating a ‘single_cell’ Class
Below is an example of how to implement a single_cell
class that allows for storing a gene expression matrix, metadata about
the cells (a data.frame
), gene information (a
data.frame
), and a list containing free-form information
about the experiment:
# Définir la classe S4 "single_cell"
setClass(
Class = "single_cell",
representation(
expression_matrix = "matrix",
cell_metadata = "data.frame",
gene_information = "data.frame",
exp_info="list"
)
)
Definition of the Constructor Method
In object-oriented programming (OOP), a method is a function associated with a class. It defines the behavior or actions that objects of this class can perform. Methods are called on objects to perform specific operations.
The “initialize” method is a special method in R, often called a
constructor method. It is used to initialize an
object of the class when it is created. More specifically, its
purpose is to configure or assign initial values to the
attributes of the object, so that the object is ready to be
used in your application. In the case of the “single_cell” class you
have defined, the initialize()
method is responsible for
initializing the attributes of the class.
NB: The initialize()
method expects an
argument named .Object
(see section Usage
?summary
).
# Create a constructor for the "single_cell" class
setMethod("initialize",
signature(.Object = "single_cell"),
function(.Object,
expression_matrix=matrix(),
cell_metadata=data.frame(),
gene_information=data.frame(),
exp_info=list(author=NULL, date=NULL, laboratory=NULL)) {
.Object@expression_matrix <- expression_matrix
colnames(.Object@expression_matrix) <- cell_data$cell_id
rownames(.Object@expression_matrix) <- gene_information$cell_id
.Object@cell_metadata <- cell_metadata
.Object@gene_information <- gene_information
.Object@exp_info <- exp_info
return(.Object)
}
)
Creating an Instance
Creating an Instance of the ‘single_cell’ Class
Given the following data (matrix m
, data.frame
cell_data
, data.frame g_info
):
set.seed(123)
m <- matrix(rnorm(200),
nrow = 20,
ncol = 10)
cell_data <- data.frame(cell_id = 1:10,
cell_pop = 1:10,
sample_type = sample(c("A", "B"), 10, replace=TRUE))
g_info <- data.frame(gene_id = 1:20,
gene_name = letters[1:20])
exp_info <- list(author="D. Puthier",
date="Tue Nov 7 09:29:10 2023",
laboratory="TAGC")
We will use the following code to store them in an object of the
‘single_cell’ class. We say that the object sc_data
is
an instance of the ‘single_cell’ class.
sc_data <- new("single_cell",
expression_matrix = m,
cell_metadata = cell_data,
gene_information = g_info,
exp_info=exp_info
)
For now, when we call the object, it shows us everything it contains.
If the data is large, this is not practical. Therefore, we will redefine
the show()
method.
sc_data
Defining Generic Methods
Defining the ‘show()’ Method
A number of methods are called generic. These methods exist by
default in the R language (e.g., ncol()
,
nrow()
, dim()
, summary()
,
show()
, [
…) and are defined for many different
objects. Similarly, we can redefine them for our ‘single_cell’ object.
The show()
method is the so-called “representation” method.
It allows us to define a display message when an object of the
‘single_cell’ class is called (i.e., when its name is typed in
the terminal and Enter is pressed). This function exists for all
objects, but we are redefining it here for our ‘single_cell’ object.
NB: By default, this function expects an argument
named object
(see ?show
).
# Let's define the `show()`
setMethod("show",
signature(object = "single_cell"),
function(object) {
nr <- nrow(object@expression_matrix)
nc <- ncol(object@expression_matrix)
cat("A single cell object with:\n")
cat(paste0("\t- ", nr, " genes\n"))
cat(paste0("\t- ", nc, " samples"))
}
)
When this object is called, it is displayed via its
show()
method:
sc_data
Defining ncol() and nrow()
Define the two methods ncol()
and nrow()
that return the number of rows and columns, respectively, of the
expression_matrix
from a ‘single_cell’ object. You can
refer to the code provided below:
NB: ncol()
and nrow()
are
generic functions where the only argument is x
(see ?ncol
).
setMethod("ncol",
signature(x = "single_cell"),
function(x) {
}
)
setMethod("ncol",
signature(x = "single_cell"),
function(x) {
return(ncol(x@expression_matrix))
}
)
setMethod("nrow",
signature(x = "single_cell"),
function(x) {
return(nrow(x@expression_matrix))
}
)
Defining summary()
Define the generic method summary()
that displays a
numeric summary of an object. The summary()
method expects
‘object’ as an argument (?summary
). Here, the method should
return the mean of the columns of the ‘expression_matrix’ attribute.
setMethod("summary",
signature(object = "single_cell"),
function(object) {
col_means <- colMeans(object@expression_matrix)
names(col_means) <- colnames(object@expression_matrix)
return(col_means)
}
)
Creating New Generic Methods
Generic Method ‘gene_name()’
Define a new function gene_name()
that returns the list
of gene names (gene_name
) from the ‘gene_information’
attribute.
setGeneric("gene_name", function(object) {
standardGeneric("gene_name")
})
setMethod("gene_name",
signature(object = "single_cell"),
function(object) {
return(object@gene_information$gene_name)
}
)
Summary
Complete Code for the Class
To summarize, the following code allows us to create a
single_cell
class and its associated methods.
# Définir la classe S4 "single_cell"
setClass(
Class = "single_cell",
representation(
expression_matrix = "matrix",
cell_metadata = "data.frame",
gene_information = "data.frame",
exp_info="list"
)
)
# Create a constructor for the "single_cell" class
setMethod("initialize",
signature(.Object = "single_cell"),
function(.Object,
expression_matrix=matrix(),
cell_metadata=data.frame(),
gene_information=data.frame(),
exp_info=list(author=NULL, date=NULL, laboratory=NULL)) {
.Object@expression_matrix <- expression_matrix
colnames(.Object@expression_matrix) <- cell_data$cell_id
rownames(.Object@expression_matrix) <- gene_information$cell_id
.Object@cell_metadata <- cell_metadata
.Object@gene_information <- gene_information
.Object@exp_info <- exp_info
return(.Object)
}
)
# Define the generic method `show()`
setMethod("show",
signature(object = "single_cell"),
function(object) {
nr <- nrow(object@expression_matrix)
nc <- ncol(object@expression_matrix)
cat("A single cell object with:\n")
cat(paste0("\t- ", nr, " genes\n"))
cat(paste0("\t- ", nc, " samples"))
}
)
setMethod("ncol",
signature(x = "single_cell"),
function(x) {
return(ncol(x@expression_matrix))
}
)
setMethod("nrow",
signature(x = "single_cell"),
function(x) {
return(nrow(x@expression_matrix))
}
)
setGeneric("author", function(object) {
standardGeneric("author")
})
setMethod("author",
signature(object = "single_cell"),
function(object) {
if("author" %in% names(object@exp_info)){
return(object@exp_info$author)
}else{
return(NULL)
}
}
)
setGeneric("gene_name", function(object) {
standardGeneric("gene_name")
})
setMethod("gene_name",
signature(object = "single_cell"),
function(object) {
return(object@gene_information$gene_name)
}
)
End of the section
Thank you for following this tutorial.