tfmindi.tl.run_topic_modeling#
- tfmindi.tl.run_topic_modeling(adata, n_topics=40, alpha=50, eta=0.1, n_iter=150, random_state=123, filter_unknown=True)#
Discover co-occurring motif patterns using topic modeling on region-level data.
This function performs the following steps: 1. Group seqlets by genomic regions using stored coordinates 2. Create region-cluster count matrix from leiden assignments 3. Fit LDA model to discover topics (co-occurring cluster patterns) 4. Store fitted model and results in adata.uns and adata.obsm
- Parameters:
adata (
AnnData) – AnnData object with cluster assignments and genomic coordinates. Must contain: - adata.obs[“leiden”]: Cluster assignments - adata.obs[“example_idx”]: Example indices for region grouping - adata.obs[“start”]: Seqlet start positions - adata.obs[“end”]: Seqlet end positions - adata.obs[“cluster_dbd”]: DBD annotations per cluster (optional)n_topics (
int(default:40)) – Number of topics to discoveralpha (
float(default:50)) – Dirichlet prior for document-topic distributioneta (
float(default:0.1)) – Dirichlet prior for topic-word distributionn_iter (
int(default:150)) – Number of LDA iterationsrandom_state (
int(default:123)) – Random seed for reproducibilityfilter_unknown (
bool(default:True)) – Whether to filter out seqlets with unknown DBD annotations
- Return type:
- Returns:
None Results are stored in adata: - adata.uns[‘topic_modeling’]: Model, parameters, and all topic-related matrices
Examples
>>> import tfmindi as tm >>> # adata with clustering results >>> tm.tl.run_topic_modeling(adata, n_topics=40) >>> print(f"Discovered {adata.uns['topic_modeling']['params']['n_topics']} topics") >>> print(f"Region-topic matrix shape: {adata.obsm['X_topics'].shape}") >>> # Now can plot directly from adata >>> tm.pl.dbd_topic_heatmap(adata) >>> tm.pl.region_topic_tsne(adata)