tfmindi.tl.run_topic_modeling

tfmindi.tl.run_topic_modeling#

tfmindi.tl.run_topic_modeling(adata, n_topics=40, alpha=50, eta=0.1, n_iter=150, random_state=123, filter_unknown=True)#

Discover co-occurring motif patterns using topic modeling on region-level data.

This function performs the following steps: 1. Group seqlets by genomic regions using stored coordinates 2. Create region-cluster count matrix from leiden assignments 3. Fit LDA model to discover topics (co-occurring cluster patterns) 4. Store fitted model and results in adata.uns and adata.obsm

Parameters:
  • adata (AnnData) – AnnData object with cluster assignments and genomic coordinates. Must contain: - adata.obs[“leiden”]: Cluster assignments - adata.obs[“example_idx”]: Example indices for region grouping - adata.obs[“start”]: Seqlet start positions - adata.obs[“end”]: Seqlet end positions - adata.obs[“cluster_dbd”]: DBD annotations per cluster (optional)

  • n_topics (int (default: 40)) – Number of topics to discover

  • alpha (float (default: 50)) – Dirichlet prior for document-topic distribution

  • eta (float (default: 0.1)) – Dirichlet prior for topic-word distribution

  • n_iter (int (default: 150)) – Number of LDA iterations

  • random_state (int (default: 123)) – Random seed for reproducibility

  • filter_unknown (bool (default: True)) – Whether to filter out seqlets with unknown DBD annotations

Return type:

None

Returns:

None Results are stored in adata: - adata.uns[‘topic_modeling’]: Model, parameters, and all topic-related matrices

Examples

>>> import tfmindi as tm
>>> # adata with clustering results
>>> tm.tl.run_topic_modeling(adata, n_topics=40)
>>> print(f"Discovered {adata.uns['topic_modeling']['params']['n_topics']} topics")
>>> print(f"Region-topic matrix shape: {adata.obsm['X_topics'].shape}")
>>> # Now can plot directly from adata
>>> tm.pl.dbd_topic_heatmap(adata)
>>> tm.pl.region_topic_tsne(adata)