tfmindi.pp.extract_seqlets#
- tfmindi.pp.extract_seqlets(contrib, oh, threshold=0.05, additional_flanks=3)#
Extract, scale, and process seqlets from saliency maps using Tangermeme.
Seqlets are normalized based on their maximum absolute contribution value.
- Parameters:
contrib (
ndarray) – Contribution scores array with shape (n_examples, 4, length)oh (
ndarray) – One-hot encoded sequences array with shape (n_examples, 4, length)threshold (
float(default:0.05)) – Importance threshold for seqlet extraction (default: 0.05)additional_flanks (
int(default:3)) – Additional flanking bases to include around seqlets (default: 3)
- Return type:
- Returns:
DataFrame with seqlet coordinates [example_idx, start, end, chrom, g_start, g_end]
List of processed seqlet contribution matrices
Examples
>>> seqlets_df, seqlet_matrices = extract_seqlets(contrib, oh, threshold=0.05) >>> print(seqlets_df.columns.tolist()) ['example_idx', 'start', 'end', 'attribution', 'p-value'] >>> print(len(seqlet_matrices)) 1250