tfmindi.tl.create_patterns

tfmindi.tl.create_patterns#

tfmindi.tl.create_patterns(adata, max_n=None, method='tomtom', by='leiden', **kwargs)#

Generate aligned PWM patterns from seqlet clusters using stored data.

This function performs the following steps for each cluster: 1. Extract seqlets belonging to that cluster 2. Use TomTom to align seqlets within the cluster 3. Find consensus root seqlet (lowest mean similarity) 4. Apply strand and offset corrections using stored sequence data 5. Generate Pattern object with PWM, contribution scores, and seqlet instances

Parameters:
  • adata (AnnData) – AnnData object with cluster assignments and stored seqlet data. Must contain: - adata.obs[“seqlet_matrix”]: Individual seqlet contribution matrices - adata.uns[“unique_examples”][“oh”]: Unique example one-hot sequences - adata.uns[“unique_examples”][“contrib”]: Unique example contribution scores - adata.obs[“example_oh_idx”]: Index into unique examples for OH sequences - adata.obs[“example_contrib_idx”]: Index into unique examples for contributions

  • max_n (int | None (default: None)) – Maximum number of seqlets to use per cluster for pattern creation. If None, all seqlets in each cluster are used. If an integer is provided, seqlets are randomly subsampled to speed up pattern creation. Default is None.

  • method (Literal['tomtom', 'kmer', 'mafft'] (default: 'tomtom')) – Method used for aligning seqlet instances. Options are tomtom, kmer or mafft

  • by (str (default: 'leiden')) – Which annotation in adata.obs is used for generating patterns.

  • **kwargs – Extra key words arguments passed to alignment functions.

Return type:

dict[str, Pattern | None]

Returns:

Dictionary mapping cluster IDs to Pattern objects

Examples

>>> import tfmindi as tm
>>> # adata with clustering results
>>> patterns = tm.tl.create_patterns(adata)
>>> print(f"Found {len(patterns)} patterns")
>>> # Use subsampling to speed up pattern creation
>>> patterns_fast = tm.tl.create_patterns(adata, max_n=300)
>>> pattern_0 = patterns["0"]
>>> print(f"Pattern 0 has {pattern_0.n_seqlets} seqlets")
>>> print(f"Pattern 0 PWM shape: {pattern_0.ppm.shape}")