NO LINKS WORK IN THIS SITE SADLY
Fast functions for correlation and hierarchical clustering
R code examples
Peter Langfelder1 and Steve Horvath1,2
1 Dept. of Human Genetics, UC Los Angeles, 2 Dept. of Biostatistics, UC Los Angeles
Peter (dot) Langfelder (at) gmail (dot) com, SHorvath (at) mednet (dot) ucla (dot) edu
We provide several R scripts comparing the performance of the correlation calculations and hierarchical clustering to the standard R functions. To run these examples, packages flashClust (version 1.20 or higher) and WGCNA (version 1.13 or higher) must be installed. The R code was last updated July 1, 2015, with small updates to both code and text.
1. Example of module stability analysis using resampling of microarray samples
We provide an example of a study of module stability analysis using resampling of microarray samples in expression data from livers of female mice of an F2 cross (Ghazalpour et al, 2006). We provide two version of the example. The “large” version uses a full data set of over 23000 probe sets. This version requires a computer with at least 16 GB (32 GB preferred) of RAM to run. For the benefit of users who do not have access to computers with that much memory, we also provide a smaller version of the same analysis that only uses 5000 probes and will run on a standard modern desktop or laptop with at least 2GB of memory.
Download data and custom function for the analysis. The following two files are necessary for either version of the analysis.
- Expression data necessary to run the analysis
- R function file containing functions necessary for this analysis
R code that performs the large analysis: Please choose your preferred format of the actual R code:
R code that performs the small analysis: Please choose your preferred format of the actual R code:
2. Timing comparisons of correlation calculations
We provide several R scripts that compare correlation calculations implemented in the WGCNA package to standard R function cor.
- Comparison of speed suitable for a standard desktop computer. While this comparison will run on any system, for the main paper we ran it under Windows.
- Comparison of speed and quantification of errors when using a non-zero setting of the argument quick. This script is suitable for a standard desktop computer. While this comparison will run on any system, for the main paper we ran it under Windows.
- Comparison of speed suitable for a large workstation. To run this script, the computer should have at least 16 GB of memory and run a version of R that can use the full system memory (in particular, it must be a 64-bit version).
- Comparison of speed and quantification of errors when using a non-zero setting of the argument quick, a version for a large workstation. Same minimum requirements as above apply.
- Synthesis of timing results – this script puts together the timing results of correlation speed and draws Figure 2 for the main article.
3. Timing comparisons of hierarchical clustering
We provide an R script that compares the performance of the hierarchical clustering implemented in package flashClust to that of standard R function hclust. As written, the script will run only on a large computer (see above), but can easily be modified to make it manageable also on standard desktop computers.
Update (October 2014): R core team recently modified the code in the standard function hclust implemented in package stats. The new “standard” hclust is now as fast or faster than the flashClust presented here. The R timing code below will work but flashClust will no longer be much (if at all) faster than the “standard” hclust.