先容如何利用 R 绘制 heatmap 的文章。
本日无意间在Flowingdata看到一篇关于如何利用 R 来做 heatmap 的文章(请移步到这里)。固然 heatmap 只是 R 中一个很普通的图形函数,但这个例子利用了2008-2009赛季 NBA 50个较高级球员数据做了一个极佳的演示,结果很是不错。对 R 大抵相识的童鞋可以直接在 R console 上敲
没有打仗过 R 的童鞋继承围观,下面会仔细先容如何利用 R 实现 NBA 50位较高级球员指标表示热图:
关于 heatmap,中文一般翻译为“热图”,其统计意义wiki上表明的很清楚:
A heat map is a graphical representation of data where the values taken by a variable in a two-dimensional map are represented as colors.Heat maps originated in 2D displays of the values in a data matrix. Larger values were represented by small dark gray or black squares (pixels) and smaller values by lighter squares.
下面这个图等于Flowingdata用一些 R 函数对2008-2009 赛季NBA 50名较高级球员指标做的一个热图(点击参看大图):
这里共罗列了50位球员,预计喜好篮球的童鞋对上图右边的每个名字城市耳熟能详。这些球员每小我私家会有19个指标,包罗打了几场球(G)、上场几分钟(MIN)、得分(PTS)……这样就行成了一个50行×19列的矩阵。但问题是,数据有些多,需要利用一种较量好的步伐来展示,So it comes, heatmap!
姚明的3PP(3 Points Percentage)这条数据很有意思,很是精彩!仔细查了一下这个数值,居然是100%。仔细追念一下,好像谁人赛季姚明仿佛投过一个3分,而且中了,然后再也没有3p。这样本可真够小的!
Step 0. Download R
R 官网:http://www.r-project.org,它是免费的。官网上面提供了Windows,Mac,Linux版本(或源代码)的R措施。
Step 1. Load the data
R 可以支持网络路径,利用读取csv文件的函数read.csv。
nba<- read.csv(“http://datasets.flowingdata.com/ppg2008.csv”, sep=”,”)
Step 2. Sort data
nba <- nba[order(nba$PTS),]
Step 3. Prepare data
row.names(nba) <- nba$Name
nba <- nba[,2:20] # or nba <- nba[,-1]
Step 4. Prepare data, again
把 data frame 转化为我们需要的矩阵名目:
nba_matrix <- data.matrix(nba)
Step 5. Make a heatmap
# R 的默认还会在图的左边和上边绘制 dendrogram,利用Rowv=NA, Colv=NA去掉
heatmap(nba_matrix, Rowv=NA, Colv=NA, col=cm.colors(256), revC=FALSE, scale=’column’)
Step 6. Color selection
heatmap(nba_matrix, Rowv=NA, Colv=NA, col=heat.colors(256), revC=FALSE, scale=”column”, margins=c(5,10))
Bioinformatics and Computational Biology Solutions Using R and Bioconductor 第10章的
Heatmaps, or false color images have a reasonably long history, as has the
notion of rearranging the columns and rows to show structure in the data.
They were applied to microarray data by Eisen et al. (1998) and have
become a standard visualization method for this type of data.
A heatmap is a two-dimensional, rectangular, colored grid. It displays
data that themselves come in the form of a rectangular matrix. The color
of each rectangle is determined by the value of the corresponding entry
in the matrix. The rows and columns of the matrix can be rearranged
independently. Usually they are reordered so that similar rows are placed
next to each other, and the same for columns. Among the orderings that
are widely used are those derived from a hierarchical clustering, but many
other orderings are possible. If hierarchical clustering is used, then it is
customary that the dendrograms are provided as well. In many cases the
resulting image has rectangular regions that are relatively homogeneous
and hence the graphic can aid in determining which rows (generally the
genes) have similar expression values within which subgroups of samples
(generally the columns).
The function heatmap is an implementation with many options. In particular,
users can control the ordering of rows and columns independently
from each other. They can use row and column labels of their own choosing
or select their own color scheme.
> library(“ALL”)
> data(“ALL”)
> selSamples <- ALL$mol.biol %in% c(“ALL1/AF4”,
+ “E2A/PBX1”)
> ALLs <- ALL[, selSamples]
> ALLs$mol.biol <- factor(ALLs$mol.biol)
> colnames(exprs(ALLs)) <- paste(ALLs$mol.biol,
+ colnames(exprs(ALLs)))
> meanThr <- log2(100)
> g <- ALLs$mol.biol
> s1 <- rowMeans(exprs(ALLs)[, g == levels(g)[1]]) >
+ meanThr
> s2 <- rowMeans(exprs(ALLs)[, g == levels(g)[2]]) >
+ meanThr
> s3 <- rowttests(ALLs, g)$p.value < 2e-04
> selProbes <- (s1 | s2) & s3
> ALLhm <- ALLs[selProbes, ]
> hmcol <- colorRampPalette(brewer.pal(10, “RdBu”))(256)
> spcol <- ifelse(ALLhm$mol.biol == “ALL1/AF4”,
+ “goldenrod”, “skyblue”)
> heatmap(exprs(ALLhm), col = hmcol, ColSideColors = spcol)
>help(heatmap) 查找辅佐,看看辅佐给提供的例子
