Difference between unique() and duplicated()

When we work with data, we usually find with an obstacle: repeated values. This type of values don’t represent a critical problem if we have the ability to identify. Once we have that list of repeated values, it is very easy to discard, eliminate or simply extract.

We are going to see two type of functions in R which allow to identify repeated values: unique() and duplicated() function. Besides, as we will see below, we can use these functions with different types of data, such as vectors, matrix or dataframes.

  • As we can see, unique() function uses numeric indicators to determine unique values.
  • Instead, duplicated() function uses logical values to determine duplicated values.

Besides, we can use these functions in matrix:

Now, we will identify unique and duplicated rows, using very common dataframe called iris. Besides, we will also select not repeated rows:

Finally, we can see that we can obtain the same result with iris[unique(iris),] and iris[!duplicated(iris),]

How to create a fast and easy heatmap with ggplot2

The heatmaps are a tool of data visualization broadly widely used with biological data. The concept is to represent a matrix of values as colors where usually is organized by a gradient. We can find a large number of these graphics in scientific articles related with gene expressions, such as microarray or RNA-seq.

In the next example, we are going to represent a dataframe of gene expression values of 20 genes and 20 patients.

Once we have our dataframe (df_heatmap), we can visualize the values with the package ggplot2.

rplot

rplot01

rplot09

 

Difference between paste() and paste0()

Probably, function paste is one of the most used function in R. The objective of this function is concatenate a series of strings.

The arguments of the function are:

= The space to write the series of strings.

sep = The element which separates every term. It should be specified with character string format.

collapse = The element which separates every result. It should be specified with character string format and it is optional. i

We can see an example where both arguments works together:

As we can see in fourth example, if we specify a value in argument collapse, we obtain an unique string instead of five as in the previous example

The difference between paste() and paste0() is that the argument sep by default is ” ” (paste) and “” (paste0).

In conclusion, paste0() is faster than paste() if our objective is concatenate strings without spaces because we don’t have to  specify the argument sep.