t_kahi’s blog

KNIMEやCellProfiler、創薬に関する記事と,日々のメモです

Comparison of diversity index (Quadratic entropy, Shannon entropy & Simpson index)

I posted blog about calculation of Quadratic Entropy (QE) before.

www.t-kahi.com

Today, I will calculate and compare the diversity indices, Quadratic Entropy (QE), Shannon Entropy & Simpson index.

these diversity indices were used to evaluate tumor cell diversity. Evaluating tumor heterogeneity in immunohistochemistry-stained breast cancer tissue | Laboratory Investigation

f:id:t_kahi:20190824150814p:plain

I calculate Quadratic Entropy (QE), Shannon Entropy & Simpson index for the same as the previous post.

library(ggplot2)

min <- 0
max <- 20

hist1 <- rnorm(500,10,1)
hist2 <- rnorm(500,10,1)
hist <- c(hist1, hist2)
data <- data.frame(intensity = hist)
head(data)

len <- length(data$intensity)
K <- 1 + log2(len)

plt <- ggplot(data,aes(x=intensity))+
  geom_histogram(bins=round(K))

plt

add <- max/K

break_data <- seq(min, max, add)
break_data <- c(break_data, break_data[length(break_data)]+add)
data$bins <- cut(data$intensity, breaks=break_data,label=FALSE)

data <- na.omit(data)
hist_data <- data.frame(table(data$bins))

result <- data.frame(Num = seq(1:length(break_data)))
result <- merge(result,hist_data,by.x="Num", by.y ="Var1",all=T)
result[is.na(result)] <- 0

result$"Freq" <- result$"Freq"/sum(result$"Freq")
result$"Num" <- (result$"Num"-min(result$"Num"))/(max(result$"Num")-min(result$"Num"))

distance <-  dist(result$"Num",method = "euclidean")

D <- as.matrix(distance)
p <- as.vector(result$"Freq")

#Calculate Quadratic Entropy
QE <- c(crossprod(p, D %*% p)) / 2
QE

#Calculate Shannon Entropy
shanon <- -sum(result$"Freq"*log2(result$"Freq"+1e-15) )
shanon

#Calculate Simpson idex
simp <- sum(result$"Freq"*result$"Freq" )
simp

I tested multiple histogram which changed rnorm mean values and number of datasets.

The following data is the calculation result of QE, Shannon entropy and Simpson index.
The 50 : 50 mix consists of 2 units normal distribution, and the mean of the distribution is changed as follows:
f:id:t_kahi:20190824163410p:plain
These data showed that the value of QE are increased when distribution of histogram were separated.

Next, the following data is the calculation result of different 10∶1 mix which consists of 2 unit normal distributions.
f:id:t_kahi:20190824163427p:plain
Shannon entropy and Simpson index are not sensitive to a small 10% subpopulation. On the other hand, it is shown that the value of QE increased steadily.

These data showed that Quadratic entropy is better at detecting small distribution differences than Shannon entropy and Simpson index.