cassiopeia.solver.dissimilarity_functions.weighted_hamming_distance#

cassiopeia.solver.dissimilarity_functions.weighted_hamming_distance(s1, s2, missing_state_indicator=-1, weights=None)[source]#

Computes the weighted hamming distance between samples.

Evaluates the dissimilarity of two phylogenetic samples on the basis of their shared indel states and the probability of these indel states occurring. Specifically, for a given character, if two states are identical we decrement the dissimilarity by the probability of these two occurring independently; if the two states disagree, we increment the dissimilarity by the probability of these states occurring. We normalize the dissimilarity by the number of non-missing characters shared by the two samples.

If weights are not given, then we increment dissimilarity by +2 if the states are different, +1 if one state is uncut and the other is an indel, and +0 if the two states are identical.

Parameters:

s1 List[int]: Character states of the first sample
s2 List[int]: Character states of the second sample
missing_state_indicator default: -1: The character representing missing values
weights Optional[Dict[int, Dict[int, float]]] (default: None): A dictionary storing the state weights for each character, derived from the state priors. This should be a nested dictionary where each key corresponds to character that then indexes another dictionary storing the weight of each observed state. (Character -> State -> Weight)

Return type:

float

Returns:

A dissimilarity score.