Skip to Tutorial Content

Introduction to Class Systems in R

Class systems are a fundamental element of object-oriented programming in R. They allow the creation of complex data structures called objects, which combine attributes (variables) and methods (functions) to represent real-world entities in a modular way. In R, the two main class systems are S3 and S4, each with its own characteristics and conventions. In this course, we will focus on S4 classes, which are commonly used in the Bioconductor project, a set of function libraries for biological data analysis.

S4 Classes

Creating a ‘single_cell’ Class

Below is an example of how to implement a single_cell class that allows for storing a gene expression matrix, metadata about the cells (a data.frame), gene information (a data.frame), and a list containing free-form information about the experiment:

# Définir la classe S4 "single_cell"
setClass(
  Class = "single_cell",
  representation(
    expression_matrix = "matrix",
    cell_metadata = "data.frame",
    gene_information = "data.frame",
    exp_info="list"
  )
)

Definition of the Constructor Method

In object-oriented programming (OOP), a method is a function associated with a class. It defines the behavior or actions that objects of this class can perform. Methods are called on objects to perform specific operations.

The “initialize” method is a special method in R, often called a constructor method. It is used to initialize an object of the class when it is created. More specifically, its purpose is to configure or assign initial values to the attributes of the object, so that the object is ready to be used in your application. In the case of the “single_cell” class you have defined, the initialize() method is responsible for initializing the attributes of the class.

NB: The initialize() method expects an argument named .Object (see section Usage ?summary).

  # Create a constructor for the "single_cell" class
setMethod("initialize", 
          signature(.Object = "single_cell"),
  function(.Object, 
           expression_matrix=matrix(), 
           cell_metadata=data.frame(), 
           gene_information=data.frame(),
           exp_info=list(author=NULL, date=NULL, laboratory=NULL)) {
    
    .Object@expression_matrix <- expression_matrix
     colnames(.Object@expression_matrix) <- cell_data$cell_id
     rownames(.Object@expression_matrix) <- gene_information$cell_id
    .Object@cell_metadata <- cell_metadata
    .Object@gene_information <- gene_information
    .Object@exp_info <- exp_info
    
    return(.Object)
  }
)

Creating an Instance

Creating an Instance of the ‘single_cell’ Class

Given the following data (matrix m, data.frame cell_data, data.frame g_info):

set.seed(123)
m <- matrix(rnorm(200), 
            nrow = 20, 
            ncol = 10)

cell_data <- data.frame(cell_id = 1:10, 
                        cell_pop = 1:10, 
                        sample_type = sample(c("A", "B"), 10, replace=TRUE))

g_info <- data.frame(gene_id = 1:20, 
                     gene_name = letters[1:20])

exp_info <- list(author="D. Puthier", 
                 date="Tue Nov  7 09:29:10 2023", 
                 laboratory="TAGC")

We will use the following code to store them in an object of the ‘single_cell’ class. We say that the object sc_data is an instance of the ‘single_cell’ class.

sc_data <- new("single_cell",
  expression_matrix = m,
  cell_metadata = cell_data,
  gene_information = g_info,
  exp_info=exp_info
)

For now, when we call the object, it shows us everything it contains. If the data is large, this is not practical. Therefore, we will redefine the show() method.

sc_data

Defining Generic Methods

Defining the ‘show()’ Method

A number of methods are called generic. These methods exist by default in the R language (e.g., ncol(), nrow(), dim(), summary(), show(), […) and are defined for many different objects. Similarly, we can redefine them for our ‘single_cell’ object. The show() method is the so-called “representation” method. It allows us to define a display message when an object of the ‘single_cell’ class is called (i.e., when its name is typed in the terminal and Enter is pressed). This function exists for all objects, but we are redefining it here for our ‘single_cell’ object.

NB: By default, this function expects an argument named object (see ?show).

# Let's define the `show()`
setMethod("show", 
          signature(object = "single_cell"),
  function(object) {
    
    nr <- nrow(object@expression_matrix)
    nc <- ncol(object@expression_matrix)
    
    cat("A single cell object with:\n")
    cat(paste0("\t- ", nr, " genes\n"))
     cat(paste0("\t- ", nc, " samples"))  
  }
)

When this object is called, it is displayed via its show() method:

sc_data

Defining ncol() and nrow()

Define the two methods ncol() and nrow() that return the number of rows and columns, respectively, of the expression_matrix from a ‘single_cell’ object. You can refer to the code provided below:

NB: ncol() and nrow() are generic functions where the only argument is x (see ?ncol).

setMethod("ncol", 
          signature(x = "single_cell"),
  function(x) {

  }
)
setMethod("ncol", 
          signature(x = "single_cell"),
  function(x) {
    return(ncol(x@expression_matrix))
  }
)

setMethod("nrow", 
          signature(x = "single_cell"),
  function(x) {
    return(nrow(x@expression_matrix))
  }
)

Defining summary()

Define the generic method summary() that displays a numeric summary of an object. The summary() method expects ‘object’ as an argument (?summary). Here, the method should return the mean of the columns of the ‘expression_matrix’ attribute.

setMethod("summary", 
          signature(object = "single_cell"),
  function(object) {
    col_means <- colMeans(object@expression_matrix)
    names(col_means) <- colnames(object@expression_matrix)
    return(col_means)
  }
)

Creating New Generic Methods

Generic Method ‘author()’

It is possible to create new, user-defined methods. In this case, you first need to create a generic method and then define it for the targeted object (here, ‘single_cell’). For example, here we create the author() function, which returns the name of the experimenter (if it is defined).

setGeneric("author", function(object) {
  standardGeneric("author")
})

setMethod("author", 
          signature(object = "single_cell"),
  function(object) {
    if("author" %in% names(object@exp_info)){
      return(object@exp_info$author)
    }else{
      return(NULL)
    }
  }
)

Generic Method ‘gene_name()’

Define a new function gene_name() that returns the list of gene names (gene_name) from the ‘gene_information’ attribute.

setGeneric("gene_name", function(object) {
  standardGeneric("gene_name")
})

setMethod("gene_name", 
          signature(object = "single_cell"),
  function(object) {
    return(object@gene_information$gene_name)
  }
)

Summary

Complete Code for the Class

To summarize, the following code allows us to create a single_cell class and its associated methods.

# Définir la classe S4 "single_cell"
setClass(
  Class = "single_cell",
  representation(
    expression_matrix = "matrix",
    cell_metadata = "data.frame",
    gene_information = "data.frame",
    exp_info="list"
  )
)

# Create a constructor for the "single_cell" class
setMethod("initialize", 
          signature(.Object = "single_cell"),
  function(.Object, 
           expression_matrix=matrix(), 
           cell_metadata=data.frame(), 
           gene_information=data.frame(),
           exp_info=list(author=NULL, date=NULL, laboratory=NULL)) {
    
    .Object@expression_matrix <- expression_matrix
     colnames(.Object@expression_matrix) <- cell_data$cell_id
     rownames(.Object@expression_matrix) <- gene_information$cell_id
    .Object@cell_metadata <- cell_metadata
    .Object@gene_information <- gene_information
    .Object@exp_info <- exp_info
    
    return(.Object)
  }
)

# Define the generic method `show()`
setMethod("show", 
          signature(object = "single_cell"),
  function(object) {
    
    nr <- nrow(object@expression_matrix)
    nc <- ncol(object@expression_matrix)
    
    cat("A single cell object with:\n")
    cat(paste0("\t- ", nr, " genes\n"))
     cat(paste0("\t- ", nc, " samples"))  
  }
)

setMethod("ncol", 
          signature(x = "single_cell"),
  function(x) {
    return(ncol(x@expression_matrix))
  }
)

setMethod("nrow", 
          signature(x = "single_cell"),
  function(x) {
    return(nrow(x@expression_matrix))
  }
)

setGeneric("author", function(object) {
  standardGeneric("author")
})

setMethod("author", 
          signature(object = "single_cell"),
  function(object) {
    if("author" %in% names(object@exp_info)){
      return(object@exp_info$author)
    }else{
      return(NULL)
    }
  }
)


setGeneric("gene_name", function(object) {
  standardGeneric("gene_name")
})

setMethod("gene_name", 
          signature(object = "single_cell"),
  function(object) {
    return(object@gene_information$gene_name)
  }
)

End of the section

Thank you for following this tutorial.

Object-Oriented Programming in R