smart.utils

smart.utils.clustering(adata, n_clusters=7, key='emb', add_key='SMART', method='SMART', start=0.1, end=3.0, increment=0.01, use_pca=False, n_comps=20)

Perform clustering on latent representations with multiple supported methods.

Parameters:
  • adata (anndata.AnnData) – AnnData object of scanpy.

  • n_clusters (int, default=7) – Number of clusters.

  • key (str, default="emb") – Key of input representation in adata.obsm.

  • add_key (str, default="SMART") – Key to store clustering results in adata.obs.

  • method (str, default="SMART") – Clustering method. Options: [“mclust”, “leiden”, “louvain”, “gmm”, “kmeans”].

  • start (float, default=0.1) – Start resolution for search (used in leiden/louvain).

  • end (float, default=3.0) – End resolution for search (used in leiden/louvain).

  • increment (float, default=0.01) – Step size for resolution search.

  • use_pca (bool, default=False) – Whether to reduce dimensions using PCA.

  • n_comps (int, default=20) – Number of components for PCA if use_pca=True.

Returns:

Updates adata.obs[add_key] with clustering results.

Return type:

None

smart.utils.getcolordict(adata, my_cluster, true_cluster, colordict)

Map predicted clusters to true clusters using color dictionary.

Parameters:
  • adata (anndata.AnnData) – AnnData object with clustering results.

  • my_cluster (str) – Column name of predicted clusters in adata.obs.

  • true_cluster (str) – Column name of true clusters in adata.obs.

  • colordict (dict) – Dictionary mapping true clusters to colors.

Returns:

Mapping from predicted cluster IDs to colors.

Return type:

dict

smart.utils.harmony(adata, feature_labels, batch_labels, use_gpu=True)

Perform batch correction using Harmony.

Parameters:
  • adata (anndata.AnnData) – AnnData object containing features in .obsm.

  • feature_labels (str) – Key in adata.obsm for feature representation.

  • batch_labels (str) – Key in adata.obs for batch labels.

  • use_gpu (bool, default=True) – Whether to use GPU acceleration.

Returns:

Updates adata.obsm with corrected representation: {feature_labels}_harmony.

Return type:

None

smart.utils.mclust_R(adata, num_cluster, modelNames='EEE', used_obsm='emb_pca', random_seed=2020)

Perform clustering using R package mclust.

Parameters:
  • adata (anndata.AnnData) – AnnData object containing representation in .obsm.

  • num_cluster (int) – Number of clusters.

  • modelNames (str, default="EEE") – Model type in mclust.

  • used_obsm (str, default="emb_pca") – Key in adata.obsm to use for clustering.

  • random_seed (int, default=2020) – Random seed for reproducibility.

Returns:

adata – Updated AnnData with adata.obs[‘mclust’].

Return type:

anndata.AnnData

smart.utils.pca(adata, use_reps=None, n_comps=10)

Perform dimensionality reduction using PCA.

Parameters:
  • adata (anndata.AnnData) – AnnData object.

  • use_reps (str, optional) – Key in adata.obsm to use as input features. If None, use adata.X.

  • n_comps (int, default=10) – Number of principal components.

Returns:

PCA-reduced features of shape [n_samples, n_comps].

Return type:

np.ndarray

smart.utils.search_res(adata, n_clusters, method='leiden', use_rep='emb', start=0.1, end=3.0, increment=0.01)

Search for resolution value that yields the desired number of clusters.

Parameters:
  • adata (anndata.AnnData) – AnnData object.

  • n_clusters (int) – Target number of clusters.

  • method (str, default="leiden") – Clustering method. Options: [“leiden”, “louvain”].

  • use_rep (str, default="emb") – Representation key for clustering.

  • start (float, default=0.1) – Start resolution.

  • end (float, default=3.0) – End resolution.

  • increment (float, default=0.01) – Resolution step size.

Returns:

res – Resolution value that yields n_clusters clusters.

Return type:

float

smart.utils.set_seed(seed=2024)

Set random seed for reproducibility across Python, NumPy, PyTorch, and CUDA.

Parameters:

seed (int, default=2024) – Random seed value.

Return type:

None