Types#

Core data types used throughout TF-MInDi.

class tfmindi.Pattern(ppm, contrib_scores, hypothetical_contrib_scores, seqlets, cluster_id, n_seqlets, dbd=None)#

A pattern object representing aligned seqlets from a cluster.

ppm#

Position probability matrix (length x 4) representing the consensus sequence

contrib_scores#

Mean contribution scores (length x 4) for the pattern

hypothetical_contrib_scores#

Mean hypothetical contribution scores (length x 4)

seqlets#

List of aligned Seqlet objects in this pattern

cluster_id#

The cluster ID this pattern represents

n_seqlets#

Number of seqlets in this pattern

dbd#

DNA-binding domain annotation for this pattern (optional)

class tfmindi.Seqlet(seq_instance, start, end, region_one_hot, is_revcomp, example_idx, seqlet_idx, contrib_scores=None, hypothetical_contrib_scores=None)#

A seqlet object representing an aligned sequence instance.

seq_instance#

Aligned sequence instance (length x 4) one-hot encoded

start#

Start position in the original sequence

end#

End position in the original sequence

region_one_hot#

Full one-hot encoded sequence this seqlet comes from (4 x seq_length)

is_revcomp#

Whether this seqlet is reverse complemented

contrib_scores#

Actual contribution scores masked by sequence content (length x 4). Non-zero only where nucleotides are present (seq_instance * raw_contributions)

hypothetical_contrib_scores#

Raw contribution scores showing potential importance at each position (length x 4). Values for all nucleotides regardless of what’s actually present