tfmindi.pp.extract_seqlets

tfmindi.pp.extract_seqlets#

tfmindi.pp.extract_seqlets(contrib, oh, threshold=0.05, additional_flanks=3)#

Extract, scale, and process seqlets from saliency maps using Tangermeme.

Seqlets are normalized based on their maximum absolute contribution value.

Parameters:
  • contrib (ndarray) – Contribution scores array with shape (n_examples, 4, length)

  • oh (ndarray) – One-hot encoded sequences array with shape (n_examples, 4, length)

  • threshold (float (default: 0.05)) – Importance threshold for seqlet extraction (default: 0.05)

  • additional_flanks (int (default: 3)) – Additional flanking bases to include around seqlets (default: 3)

Return type:

tuple[DataFrame, list[ndarray]]

Returns:

  • DataFrame with seqlet coordinates [example_idx, start, end, chrom, g_start, g_end]

  • List of processed seqlet contribution matrices

Examples

>>> seqlets_df, seqlet_matrices = extract_seqlets(contrib, oh, threshold=0.05)
>>> print(seqlets_df.columns.tolist())
['example_idx', 'start', 'end', 'attribution', 'p-value']
>>> print(len(seqlet_matrices))
1250