cassiopeia.solver.SharedMutationJoiningSolver#
- class cassiopeia.solver.SharedMutationJoiningSolver(similarity_function=<function hamming_similarity_without_missing>, prior_transformation='negative_log')[source]#
Shared-Mutation-Joining class for Cassiopeia.
Implements an iterative, bottom-up agglomerative clustering procedure. The algorithm iteratively clusters the samples in the sample pool by the number of shared mutations that they have in their character information. The algorithm has theoretical guarantees on correctness given a sufficiently large number of characters and bounds on edge lengths in the tree generative process.
- TODO(mgjones, rzhang): Make the solver work with similarity maps as
flattened arrays
- Parameters:
- similarity_function
Optional
[Callable
[[array
,array
,int
,Optional
[Dict
[int
,Dict
[int
,float
]]]],float
]] (default:<function hamming_similarity_without_missing at 0x76660cb3c550>
) Function that can be used to compute the similarity between samples.
- prior_transformation
str
(default:'negative_log'
) Function to use when transforming priors into weights. Supports the following transformations:
- ”negative_log”: Transforms each probability by the negative
log (default)
”inverse”: Transforms each probability p by taking 1/p “square_root_inverse”: Transforms each probability by the
the square root of 1/p
- similarity_function
- similarity_function#
Function used to compute similarity between samples.
- prior_transformation#
Function to use when transforming priors into weights.
Methods
- solve(cassiopeia_tree, layer=None, collapse_mutationless_edges=False, logfile='stdout.log')[source]#
Solves a tree for the SharedMutationJoiningSolver.
The solver routine calculates an n x n similarity matrix of all pairwise sample similarities based on a provided similarity function on the character vectors. The general solver routine proceeds by iteratively finding pairs of samples to join together into a “cherry” until all samples are joined. At each iterative step, the two samples with the most shared character/state mutations are joined. Then, an LCA node with a character vector containing only the mutations shared by the joined samples is added to the sample pool, and the similarity matrix is updated with respect to the new LCA node. The function will update the tree attribute of the input CassiopeiaTree.
- Parameters:
- cassiopeia_tree
CassiopeiaTree
CassiopeiaTree object to be populated
- layer
Optional
[str
] (default:None
) Layer storing the character matrix for solving. If None, the default character matrix is used in the CassiopeiaTree.
- collapse_mutationless_edges
bool
(default:False
) Indicates if the final reconstructed tree should collapse mutationless edges based on internal states inferred by Camin-Sokal parsimony. In scoring accuracy, this removes artifacts caused by arbitrarily resolving polytomies.
- logfile
str
(default:'stdout.log'
) Location to write standard out. Not currently used.
- cassiopeia_tree
- Return type:
- find_cherry(similarity_matrix)[source]#
Finds a pair of samples to join into a cherry.
Finds the pair of samples with the highest pairwise similarity to join.
- update_similarity_map_and_character_matrix(character_matrix, similarity_function, similarity_map, cherry, new_node, missing_state_indicator=-1, weights=None)[source]#
Update similarity map after finding a cherry.
Adds the new LCA node into the character matrix with the mutations shared by the joined nodes as its character vector. Then, updates the similarity matrix by calculating the pairwise similarity between the new LCA node and all existing nodes.
- Parameters:
- character_matrix
DataFrame
Contains the character information for all nodes, updated as nodes are joined and new internal LCA nodes are added
- similarity_function
Callable
[[array
,array
,int
,Dict
[int
,Dict
[int
,float
]]],float
] A similarity function
- similarity_map
DataFrame
A similarity map to update
- cherry
Tuple
[str
,str
] A tuple of indices in the similarity map that are joining
- new_node
str
New node name, to be added to the updated similarity map
- missing_state_indicator
int
(default:-1
) Character representing missing data
- weights default:
None
Weighting of each (character, state) pair. Typically a transformation of the priors.
- character_matrix
- Return type:
- Returns:
A new similarity map, updated with the new node