Tutorial 3: MISAR-seq mouse brain data integration

In addition to integrating spatial transcriptomic and proteomic data, SMART can also fuse spatial transcriptomics with chromatin accessibility. We evaluated SMART’s performance on brain datasets from four developing mice at distinct embryonic stages (E11.0, E13.5, E15.5, and E18.5). These datasets were generated using MISAR-seq, a technique that enables spatially resolved joint profiling of chromatin accessibility and gene expression. The following tutorial demonstrates how SMART performs on the E18.5 embryonic stage dataset.

The preprocessed data can be downloaded from: https://doi.org/10.5281/zenodo.17093158.

Load packages

[6]:
import os
import torch
import pandas as pd
import scanpy as sc
import warnings
from muon import prot as pt
from muon import atac as ac
from smart.train import train_SMART
from smart.utils import set_seed
from smart.utils import pca
from smart.utils import clustering
from smart.build_graph import Cal_Spatial_Net
from smart.MNN import Mutual_Nearest_Neighbors
import matplotlib.pyplot as plt
from sklearn.metrics import adjusted_rand_score

set_seed(2024)

warnings.filterwarnings('ignore')
# Environment configuration. SMART pacakge can be implemented with either CPU or GPU. GPU acceleration is highly recommend for imporoved efficiency.
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# the location of R, which is required for the 'mclust' algorithm. Please replace the path below with local R installation path, use the command `R RHOME` to find the path.
os.environ['R_HOME'] = '/public/home/cit_wlhuang/.conda/envs/smart/lib/R'

Load data

[7]:
# read data
file_fold = '../../../../SMART-main_orignal/datasets/MISAR-seq_mouse_brain/Mouse_Brain_E18_S1/'  #please replace 'file_fold' with the download path

adata_omics1 = sc.read_h5ad(file_fold + 'adata_RNA.h5ad')
adata_omics2 = sc.read_h5ad(file_fold + 'adata_ATAC.h5ad')


adata_omics1.var_names_make_unique()
adata_omics2.var_names_make_unique()


adata_omics1.obs["anno"]=pd.read_csv(file_fold+"anno.csv",index_col=0)["cluster"]

adata_omics1,adata_omics2
[7]:
(AnnData object with n_obs × n_vars = 2129 × 32285
     obs: 'gex_barcode', 'atac_barcode', 'is_cell', 'excluded_reason', 'gex_raw_reads', 'gex_mapped_reads', 'gex_conf_intergenic_reads', 'gex_conf_exonic_reads', 'gex_conf_intronic_reads', 'gex_conf_exonic_unique_reads', 'gex_conf_exonic_antisense_reads', 'gex_conf_exonic_dup_reads', 'gex_exonic_umis', 'gex_conf_intronic_unique_reads', 'gex_conf_intronic_antisense_reads', 'gex_conf_intronic_dup_reads', 'gex_intronic_umis', 'gex_conf_txomic_unique_reads', 'gex_umis_count', 'gex_genes_count', 'atac_raw_reads', 'atac_unmapped_reads', 'atac_lowmapq', 'atac_dup_reads', 'atac_chimeric_reads', 'atac_mitochondrial_reads', 'atac_fragments', 'atac_TSS_fragments', 'atac_peak_region_fragments', 'atac_peak_region_cutsites', 'anno'
     var: 'gene_ids', 'feature_types', 'genome'
     obsm: 'spatial',
 AnnData object with n_obs × n_vars = 2129 × 117473
     obs: 'gex_barcode', 'atac_barcode', 'is_cell', 'excluded_reason', 'gex_raw_reads', 'gex_mapped_reads', 'gex_conf_intergenic_reads', 'gex_conf_exonic_reads', 'gex_conf_intronic_reads', 'gex_conf_exonic_unique_reads', 'gex_conf_exonic_antisense_reads', 'gex_conf_exonic_dup_reads', 'gex_exonic_umis', 'gex_conf_intronic_unique_reads', 'gex_conf_intronic_antisense_reads', 'gex_conf_intronic_dup_reads', 'gex_intronic_umis', 'gex_conf_txomic_unique_reads', 'gex_umis_count', 'gex_genes_count', 'atac_raw_reads', 'atac_unmapped_reads', 'atac_lowmapq', 'atac_dup_reads', 'atac_chimeric_reads', 'atac_mitochondrial_reads', 'atac_fragments', 'atac_TSS_fragments', 'atac_peak_region_fragments', 'atac_peak_region_cutsites'
     var: 'gene_ids', 'feature_types', 'genome'
     obsm: 'spatial')

Data pre-processing

[8]:
# RNA
sc.pp.filter_genes(adata_omics1, min_cells=10)

sc.pp.highly_variable_genes(adata_omics1, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata_omics1, target_sum=1e4)
sc.pp.log1p(adata_omics1)
sc.pp.scale(adata_omics1)

adata_omics1_high = adata_omics1[:, adata_omics1.var['highly_variable']]
adata_omics1.obsm['feat'] = pca(adata_omics1_high, n_comps=30)

# ATAC
adata_omics2 = adata_omics2[adata_omics1.obs_names].copy()
ac.pp.tfidf(adata_omics2, scale_factor=1e4)
sc.pp.normalize_per_cell(adata_omics2, counts_per_cell_after=1e4)
sc.pp.log1p(adata_omics2)
adata_omics2.obsm['feat'] = pca(adata_omics2, n_comps=60)

Spatial neighbour graph construction

[9]:
Cal_Spatial_Net(adata_omics1, model="KNN", n_neighbors=4)
Cal_Spatial_Net(adata_omics2, model="KNN", n_neighbors=4)
The graph contains 8516 edges, 2129 cells.
4.0000 neighbors per cell on average.
The graph contains 8516 edges, 2129 cells.
4.0000 neighbors per cell on average.

MNN triplet samples calculation

[10]:
adata_list=[adata_omics1,adata_omics2]
x = [torch.FloatTensor(adata.obsm["feat"]).to(device) for adata in adata_list]
edges =[torch.LongTensor(adata.uns["edgeList"]).to(device) for adata in adata_list ]
triplet_samples_list = [Mutual_Nearest_Neighbors(adata, key="feat", n_nearest_neighbors=3,farthest_ratio=0.6) for adata in adata_list]
Distances calculation completed!
The data using feature 'feat' contains 1430 mnn_anchors
Distances calculation completed!
The data using feature 'feat' contains 1142 mnn_anchors

Model training

[11]:
model=train_SMART(features=x,
            edges=edges,
            triplet_samples_list=triplet_samples_list,
            weights=[1,1,1,1],
            emb_dim=64,
            n_epochs=300,
            lr=1e-3,
            weight_decay=1e-5,
            device = device,
            window_size=10,
            slope=1e-4)

adata_omics1.obsm["SMART"]=model(x, edges)[0].cpu().detach().numpy()

100%|██████████| 300/300 [00:12<00:00, 24.25it/s]

Clustering

[12]:
tool = 'mclust'  # mclust, leiden, and louvain
clustering(adata_omics1, key='SMART', add_key='SMART', n_clusters=10, method=tool, use_pca=True)
ari = adjusted_rand_score(adata_omics1.obs['anno'], adata_omics1.obs['SMART'])
print(ari)
R[write to console]:                    __           __
   ____ ___  _____/ /_  _______/ /_
  / __ `__ \/ ___/ / / / / ___/ __/
 / / / / / / /__/ / /_/ (__  ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/   version 6.1.1
Type 'citation("mclust")' for citing this R package in publications.

fitting ...
  |======================================================================| 100%
0.47239400575078005
[13]:
fig, ax_list = plt.subplots(1, 3, figsize=(11, 3))
sc.pp.neighbors(adata_omics1, use_rep='SMART', n_neighbors=10)
sc.tl.umap(adata_omics1)

sc.pl.umap(adata_omics1, color='SMART', ax=ax_list[0], title='SMART', s=20, show=False)
sc.pl.embedding(adata_omics1, basis='spatial', color='SMART', ax=ax_list[1], title='SMART', s=30, show=False)
sc.pl.embedding(adata_omics1, basis='spatial', color='anno', ax=ax_list[2], title='anno', s=30, show=False)
plt.tight_layout(w_pad=0.3)
plt.show()
../_images/tutorials_Tutorial_3_MISAR-seq_mouse_brain_data_integration_17_0.png