tfmindi.tl.evaluate_topic_models

tfmindi.tl.evaluate_topic_models#

tfmindi.tl.evaluate_topic_models(adata, n_topics_range=None, alpha=50, eta=0.1, n_iter=150, random_state=123, **kwargs)#

Evaluate multiple topic models to find optimal number of topics.

Parameters:
  • adata (AnnData) – AnnData object with cluster assignments and genomic coordinates

  • n_topics_range (list[int] | None (default: None)) – List of topic numbers to evaluate (default: [10, 15, 20, 25, 30, 35, 40, 50])

  • alpha (float (default: 50)) – Dirichlet prior for document-topic distribution (default: 50)

  • eta (float (default: 0.1)) – Dirichlet prior for topic-word distribution (default: 0.1)

  • n_iter (int (default: 150)) – Number of LDA iterations (default: 150)

  • random_state (int (default: 123)) – Random seed for reproducibility (default: 123)

  • **kwargs – Additional arguments passed to run_topic_modeling

Return type:

dict[int, float]

Returns:

Mapping of n_topics to log-likelihood scores

Note: The best-performing model is automatically stored in adata

Examples

>>> import tfmindi as tm
>>> # Evaluate different numbers of topics
>>> scores = tm.tl.evaluate_topic_models(adata, n_topics_range=[10, 20, 30, 40])
>>> best_n_topics = max(scores, key=scores.get)
>>> print(f"Best number of topics: {best_n_topics}")
>>> # Best model is already stored in adata for plotting