Types#
Core data types used throughout TF-MInDi.
- class tfmindi.Pattern(ppm, contrib_scores, hypothetical_contrib_scores, seqlets, cluster_id, n_seqlets, dbd=None)#
A pattern object representing aligned seqlets from a cluster.
- ppm#
Position probability matrix (length x 4) representing the consensus sequence
- contrib_scores#
Mean contribution scores (length x 4) for the pattern
- hypothetical_contrib_scores#
Mean hypothetical contribution scores (length x 4)
- seqlets#
List of aligned Seqlet objects in this pattern
- cluster_id#
The cluster ID this pattern represents
- n_seqlets#
Number of seqlets in this pattern
- dbd#
DNA-binding domain annotation for this pattern (optional)
- class tfmindi.Seqlet(seq_instance, start, end, region_one_hot, is_revcomp, example_idx, seqlet_idx, contrib_scores=None, hypothetical_contrib_scores=None)#
A seqlet object representing an aligned sequence instance.
- seq_instance#
Aligned sequence instance (length x 4) one-hot encoded
- start#
Start position in the original sequence
- end#
End position in the original sequence
- region_one_hot#
Full one-hot encoded sequence this seqlet comes from (4 x seq_length)
- is_revcomp#
Whether this seqlet is reverse complemented
- contrib_scores#
Actual contribution scores masked by sequence content (length x 4). Non-zero only where nucleotides are present (seq_instance * raw_contributions)
- hypothetical_contrib_scores#
Raw contribution scores showing potential importance at each position (length x 4). Values for all nucleotides regardless of what’s actually present