tfmindi.tl.cluster_seqlets

tfmindi.tl.cluster_seqlets#

tfmindi.tl.cluster_seqlets(adata, resolution=3.0, pca_svd_solver=None, *, recompute=False)#

Perform complete clustering workflow including dimensionality reduction, clustering, and functional annotation.

This function performs the following steps: 1. PCA on similarity matrix (GPU-accelerated if available) - skipped if already present 2. Compute neighborhood graph (GPU-accelerated if available) - skipped if already present 3. Generate t-SNE embedding (GPU-accelerated if available) - skipped if already present 4. Leiden clustering at specified resolution (GPU-accelerated if available) - always computed 5. Calculate mean contribution scores from stored seqlet matrices 6. Assign DBD annotations based on top motif similarity per seqlet 7. Map leiden clusters to consensus DBD annotations

Performance Optimization: By default, PCA, neighborhood graph, and t-SNE computations are reused if already present in the AnnData object. This allows fast re-clustering with different resolutions without recomputing expensive preprocessing steps.

GPU Acceleration: When tfmindi[gpu] is installed and CUDA is available, this function automatically uses RAPIDS-accelerated implementations. The API remains identical between CPU and GPU versions.

Parameters:
  • adata (AnnData) – AnnData object with similarity matrix in .X and seqlet data in .obs. Expects .obs to contain seqlet matrices and .var to contain motif annotations.

  • resolution (float (default: 3.0)) – Clustering resolution for Leiden algorithm (default: 3.0)

  • pca_svd_solver (str | None (default: None)) – svd_solver used for calculating pca see: https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.pca.html#scanpy.pp.pca (default: None, i.e. choose automatically).

  • recompute (bool (default: False)) – If False (default), reuse existing PCA and neighborhood graph computations if available. If True, always recompute PCA, neighbors, and t-SNE from scratch.

Return type:

None

Returns:

Modifies adata in-place with cluster assignments and annotations: - adata.obsm[“X_pca”]: PCA coordinates - adata.obsm[“X_tsne”]: t-SNE coordinates - adata.obs[“leiden”]: Cluster assignments - adata.obs[“mean_contrib”]: Mean contribution scores per seqlet - adata.obs[“seqlet_dbd”]: DBD annotations per seqlet - adata.obs[“cluster_dbd”]: Consensus DBD annotations per cluster

Examples

>>> import tfmindi as tm
>>> # adata created with tm.pp.create_seqlet_adata()
>>>
>>> # Initial clustering - computes PCA, neighbors, t-SNE, and clustering
>>> tm.tl.cluster_seqlets(adata, resolution=3.0)
>>> print(f"Found {adata.obs['leiden'].nunique()} clusters")
>>>
>>> # Fast re-clustering with different resolution - reuses PCA, neighbors, t-SNE
>>> tm.tl.cluster_seqlets(adata, resolution=5.0)
>>> print(f"Found {adata.obs['leiden'].nunique()} clusters")
>>>
>>> # Force recomputation of all steps
>>> tm.tl.cluster_seqlets(adata, resolution=3.0, recompute=True)