Concatenation — anndata 0.9.0.dev38+g3c5f63d documentation
With concat(), AnnData objects can be combined via a composition of two operations: concatenation and merging.
Concatenation is when we keep all sub elements of each object, and stack these elements in an ordered way.
Merging is combining a set of collections into one resulting collection which contains elements from the objects.
Note
This function borrows from similar functions in pandas and xarray. Argument which are used to control concatenation are modeled after pandas.concat() while strategies for merging are inspired by xarray.merge()’s compat
argument.
Let’s start off with an example:
import scanpy as sc, anndata as ad, numpy as np, pandas as pd
from scipy import sparse
from anndata import AnnData
pbmc = sc.datasets.pbmc68k_reduced()
pbmc
groups = pbmc.obs.groupby("louvain").indices
pbmc_concat = ad.concat([pbmc[inds] for inds in groups.values()], merge="same")
assert np.array_equal(pbmc.X, pbmc_concat[pbmc.obs_names].X)
pbmc_concat
>>> import scanpy as sc, anndata as ad, numpy as np, pandas as pd >>> from scipy import sparse >>> from anndata import AnnData >>> pbmc = sc.datasets.pbmc68k_reduced() >>> pbmc AnnData object with n_obs × n_vars = 700 × 765 obs: 'bulk_labels', 'n_genes', 'percent_mito', 'n_counts', 'S_score', 'G2M_score', 'phase', 'louvain' var: 'n_counts', 'means', 'dispersions', 'dispersions_norm', 'highly_variable' uns: 'bulk_labels_colors', 'louvain', 'louvain_colors', 'neighbors', 'pca', 'rank_genes_groups' obsm: 'X_pca', 'X_umap' varm: 'PCs' obsp: 'distances', 'connectivities'
>>> groups = pbmc.obs.groupby("louvain").indices >>> pbmc_concat = ad.concat([pbmc[inds] for inds in groups.values()], merge="same") >>> assert np.array_equal(pbmc.X, pbmc_concat[pbmc.obs_names].X) >>> pbmc_concat AnnData object with n_obs × n_vars = 700 × 765 obs: 'bulk_labels', 'n_genes', 'percent_mito', 'n_counts', 'S_score', 'G2M_score', 'phase', 'louvain' var: 'n_counts', 'means', 'dispersions', 'dispersions_norm', 'highly_variable' obsm: 'X_pca', 'X_umap' varm: 'PCs'o
out=[]
out=[pbmc[inds] for inds in groups.values()]
Note that we concatenated along the observations by default, and that most elements aligned to the observations were concatenated as well. A notable exception is obsp, which can be re-enabled with the pairwise
keyword argument. This is because it’s not obvious that combining graphs or distance matrices padded with 0s is particularly useful, and may be unintuitive.