kmeans:K均值
论文链接
res <- kmeans(t(data), centers = 9)
adjustedRandIndex(res$cluster, meta$label)
plot(res$centers, col = topo.colors(4))
tsne_out <- Rtsne(data)
plot(tsne_out$Y, col = topo.colors(4))
SAIC:在聚类迭代过程中结合k-means和ANOVA
SCUBA:kmeans;使用gap statistics 识别bifurcation events
scVDMC : single-cell variance-driven multi-task clustering
pcaReduce:
论文链接
library(pcaReduce)
res <- PCAreduce(t(data),
nbt = 1,
q = 7,
method = "S")
res[[1]]
adjustedRandIndex(res[[1]][, 1], meta$label)
k-medoids
res <- pamk(data = t(data), krange = 7)
adjustedRandIndex(res$pamobject$clustering, meta$label)
BackSPIN:two-way biclustering algorithm;
cellTree:构建最小生成树;
CIDR:缺失值填补
论文链接
#rows correspond to features (genes, transcripts, etc) and the columns correspond to cells
library(cidr)
load("/Biase.Rdata")
cellType <- factor(meta$label)
types <- levels(cellType)
scols <-
c("red",
"blue",
"green",
"brown",
"pink",
"purple",
"darkgreen",
"grey")
cols <- rep(NA, length(cellType))
for (i in 1:length(cols)) {
cols[i] <- scols[which(types == cellType[i])]
}
#' @param nPC number of principal coordinates (nPC),by default 4.
#' @param nCluster the number of clusters;
#'
sdata <- as.matrix(data)
sdata <- scDataConstructor(sdata)#????scData??
sdata <- determineDropoutCandidates(sdata)#ȷ??dropout??ѡ????
sdata <- wThreshold(sdata) #????Ȩֵ
sdata <- scDissim(sdata) #????dissimilarity????
sdata <- scPCA(sdata) #pcoa
sdata <- nPC(sdata) #ȷ??????????
nPC <- sdata@nPC #ȷ??npc??
nCluster(sdata) #plot????????
sdata <- scCluster(sdata, nPC = nPC) #cidr???ξ???
adjustedRandIndex(sdata@clusters, meta$label)
sdata@nCluster
plot(
sdata@PC[, c(1, 2)],
col = cols,
pch = sdata@clusters,
main = "CIDR",
xlab = "PC1",
ylab = "PC2"
)
RCA: reference component analysis
论文链接
GMM: Gaussian mixture model
pc_res <- prcomp(t(data))$x
tmp_pca_mat = pc_res[, 1:10]
res <- Mclust(tmp_pca_mat, G = 2:10)
clusterid <- apply(res$z, 1, which.max)
adjustedRandIndex(clusterid, meta$label)
TSCAN:使用GMM和MST发现pseudo time ordering
论文链接
TCC:Transcript compatibility counts;
SIMLR:从单细胞 RNA-seq 数据学习相似度量以执行降维、聚类和可视化
论文链接
library(SIMLR)
data <- CreateSeuratObject(data)
ElbowPlot(data)
SIMLR_res <- SIMLR(data, c = 3)#聚类簇数
adjustedRandIndex(SIMLR_res$y$cluster, meta$label)
plot(SIMLR_res$ydata,
col = c(topo.colors(7))[meta$label],
pch = 20)
heatmap(SIMLR_res$S)
SNN-cliq:clique detection ;
①计算初始数据点之间相似性(欧氏距离);
②使用相似矩阵,列出每个数据点的KNN;
③基于每两个数据点的共享邻居(SNN)计算二级相似矩阵;
④构建两个点的SNN图,节点代表数据点,边代表数据点之间的相似性
Louvain:使用社区检测算法进行聚类,首先根据 scRNA-seq 数据构建网络,其中结点代表细
胞,边代表细胞间的相似性,随后使用社区检测算法对网络进行划分,聚类结果很大程度上取
决于相似网络的构建。
论文链接
DBSCAN:
①随机从一个未被访问过的数据点x开始,以eps为半径搜索范围内所有邻域点;
②如果x点在该邻域内有足够数量的点,数量大于等于minPts,则聚类过程开始,并且当前数据点成为新簇中的第一个核心点。否则,该点将被标记为噪声。该点都会被标记为“已访问”;
③新簇中的每个核心点x,它的eps距离邻域内的点会归为同簇。eps邻域内的所有点都属于同一个簇,然后对才添加到簇中的所有新点重复上述过程。
④重复步骤2和3两个过程,直到确定了簇中的所有点才停止,即访问和标记了聚类的eps邻域内的所有点。
⑤当完成了这个簇的划分,就开始处理新的未访问的点,发现新的簇或者是噪声。重复上述过程,直到所有点被标记为已访问才停止。这样就完成了对所有点的聚类过程。
library(dbscan)
kNNdistplot(t(data), k = 5)
res <- dbscan::dbscan(t(data), minPts = 5, eps = 340)
res$cluster
adjustedRandIndex(res$cluster, meta$label)
GiniClust: discover rare subpopulation
论文链接
Monocle
论文链接
density peak clustering: 考虑数据点之间的距离,而非密度阈值,假设簇中心是簇中数据点密度的局部最大值
SOM: competitive learning for clustering ; 随机梯度下降;sensitive to parameter tuning(learning rate)
SCRAT:single-cell R-analysis tools ; 可视化2D热图,表示单细胞基因之间的相关性
SOMSC:压缩高维基因表达数据为2维,用于cellular state transition identification和pseudotemporal ordering of cells
SC3
论文链接
library(SC3)
sce <- SingleCellExperiment(assays = list(counts = as.matrix(data),
logcounts = log2(as.matrix(data) + 1)))
# define feature names in feature_symbol column
rowData(sce)$feature_symbol <- rownames(sce)
# remove features with duplicated names
sce <- sce[!duplicated(rowData(sce)$feature_symbol),]
sce <- runPCA(sce)
res <- sc3(sce, ks = 3)
res <- sc3(sce, k_estimator = T)
sc3_plot_consensus(res, k = 3)
sc3_plot_silhouette(res, 10)
adjustedRandIndex(res$sc3_3_clusters, meta$label)
plotPCA(res, colour_by = "sc3_3_clusters")
RAFSIL:首先对数据进行特征构建,随后学习细胞间相似度。可用于典型的探索性数据分析任务,如降维、可视化、聚类。
论文链接
library(RAFSIL)
cluster_result <- RAFSIL(data = embedding_data,
NumC = 6,
method = "RAFSIL1")$lab
cluster_result <- RAFSIL(data = t(embedding_data),
NumC = 6,
method = "RAFSIL2")$lab
final_ARI <- adjustedRandIndex(cluster_result, label)
print(final_ARI)
LAK
论文链接
library(mclust)
setwd("/LAK-master")
source("LAK.R")
#Biase <- readRDS("Single Cell Data/biase.rds")
yan <-
readRDS("/yan.rds")
m <- assays(yan)[[1]][, -(50:56)]
LAK_ann <- LAK(m, 3)
yan_ann <- colData(yan)$cell_type1[-(50:56)]
yan_ann_numeric <- c()
id <- names(table(yan_ann))
for (i in 1:length(yan_ann)) {
for (j in 1:length(id)) {
if (yan_ann[i] == id[j]) {
yan_ann_numeric <- c(yan_ann_numeric, j)
break
}
}
}
adjustedRandIndex(LAK_ann[[1]]$Cs, yan_ann_numeric)