将单细胞的表达矩阵输出为matrix.mtx.gz

前言,今天在群里看到有人提出说遇到一个稀疏矩阵转普通矩阵的报错问题,感觉这个问题自己以后可能也有遇到,所以做了这个报错的可能解决办法的记录。


报错例子

这里举出两种语言去将单细胞的表达矩阵输出为matrix.mtx.gz

Python 方式

加载库

# 加载库
from scipy.io import mmread
import pandas as pd
import numpy as np

读取10X单细胞文件

_index = pd.read_csv("./features.tsv.gz", index_col=0,sep = '\t',header=None)
_index.index.name =None #把索引列的列名去掉
_col   = pd.read_csv("./barcodes.tsv.gz", index_col=0,sep = '\t',header=None)
_col.index.name =None #把列名向量的名去掉
_data  = mmread("./matrix.mtx.gz").todense()

将稀疏矩阵转换成DataFrame

rna_count = pd.DataFrame(data=_data,index = _index.index,columns=_col.index)
print(rna_count .iloc[0:3,0:2])
print("gene_ID_len : "+str(rna_count .shape[0]))  #获取表达矩阵基因长度

对pd类型的表达矩阵简单标准化(可选)

rna_count  = ( rna_count +1 ).applymap(np.log2)

将数据输出为cellranger标准格式

import os
import shutil 
import gzip
import scipy
import time
fmt='%Y-%m-%d %a %H:%M:%S'
Date=time.strftime(fmt,time.localtime(time.time()))
outdir = ".Matrix_reAnno"
os.makedirs(outdir, exist_ok=True)
##save matrix.mtx.gz
reAnno_count_sparse_mtx = scipy.sparse.coo_matrix(rna_countrna_count_combine.values)
scipy.io.mmwrite(os.path.join(outdir,'matrix.mtx'),
                 reAnno_count_sparse_mtx,
                 comment='This counts is regenerate and remapped symbol by zhuzhiyong \n Generate DateTime::'+str(Date)
                )
with open(os.path.join(outdir,'matrix.mtx'),'rb') as mtx_in:
        with gzip.open(os.path.join(outdir,'matrix.mtx') + '.gz','wb') as mtx_gz: #创建一个读写文件'matrix.mtx.gz',用以将matrix.mtx拷贝过去
            shutil.copyfileobj(mtx_in, mtx_gz)
os.remove(os.path.join(outdir,'matrix.mtx'))
##save barcodes.tsv.gz
barcodesFile = pd.DataFrame(rna_countrna_count_combine.columns)
barcodesFile.to_csv(os.path.join(outdir,"barcodes.tsv.gz"),sep='\t',header =False,index=False)
##save features.tsv.gz
featuresFile = pd.DataFrame(rna_countrna_count_combine.index)
featuresFile.to_csv(os.path.join(outdir,"features.tsv.gz"),sep='\t',header =False,index=False)

R语言方式

写出expr counts 为matrix.mtx.gz

library(Matrix)
sparse.gbm <- Matrix(scRNA@assays$RNA@counts, sparse = T )
write(x = sparse.gbm@Dimnames[[1]], file = "features.tsv")
write.table([email protected], file = 'scRNA_ref_meta.tsv', sep = '\t', quote = FALSE)
writeMM(obj = sparse.gbm, file="matrix.mtx")
system("gzip matrix.mtx")  #创建压缩文件并删除原文件 matrix.mtx.gz
scales::number_bytes(file.size("matrix.mtx.gz"))

这部分内容参考推文

如果我还是希望可以将稀疏矩阵转为普通矩阵呢,那怎么办

州更大佬写了个Rcpp加速的转换函数,见下:

Rcpp::sourceCpp(code='
#include 
using namespace Rcpp;


// [[Rcpp::export]]
IntegerMatrix asMatrix(NumericVector rp,
                       NumericVector cp,
                       NumericVector z,
                       int nrows,
                       int ncols){

  int k = z.size() ;

  IntegerMatrix  mat(nrows, ncols);

  for (int i = 0; i < k; i++){
      mat(rp[i],cp[i]) = z[i];
  }

  return mat;
}
' )


as_matrix <- function(mat){

  row_pos <- mat@i
  col_pos <- findInterval(seq(mat@x)-1,mat@p[-1])
  
  tmp <- asMatrix(rp = row_pos, cp = col_pos, z = mat@x,
                 nrows =  mat@Dim[1], ncols = mat@Dim[2])

  row.names(tmp) <- mat@Dimnames[[1]]
  colnames(tmp) <- mat@Dimnames[[2]]
  return(tmp)
}

你可能感兴趣的:(将单细胞的表达矩阵输出为matrix.mtx.gz)