cassiopeia.solver.UPGMASolver#

class cassiopeia.solver.UPGMASolver(dissimilarity_function=<function weighted_hamming_distance>, prior_transformation='negative_log', fast=False, implementation='ccphylo_upgma')[source]#

UPGMA CassiopeiaSolver.

Implements the UPGMA algorithm described as a derived class of DistanceSolver. This class inherits the generic solve method, but implements its own procedure for finding cherries by minimizing the dissimilarity between samples. After joining nodes, the dissimilarities are updated by averaging the distances of elements in the new cluster with each existing node. Produces a rooted tree that is assumed to be ultrametric. If fast is set to True, a fast UPGMA implementation of is used.

Parameters:
dissimilarity_function (array, array, int, {int: {int: float}}) → float | NoneOptional[Callable[[array, array, int, Dict[int, Dict[int, float]]], float]] (default: <function weighted_hamming_distance at 0x7ff09d141700>)

A function by which to compute the dissimilarity map. Optional if a dissimilarity map is already provided.

prior_transformation str (default: 'negative_log')

Function to use when transforming priors into weights. Supports the following transformations:

”negative_log”: Transforms each probability by the negative

log (default)

”inverse”: Transforms each probability p by taking 1/p “square_root_inverse”: Transforms each probability by the

the square root of 1/p

fast bool (default: False)

Whether to use a fast implementation of UPGMA.

implementation str (default: 'ccphylo_upgma')

Which fast implementation to use. Options are: “ccphylo_upgma”: Uses the fast UPGMA implementation from CCPhylo.

dissimilarity_function#

Function used to compute dissimilarity between samples.

add_root#

Whether or not to add an implicit root the tree.

prior_transformation#

Function to use when transforming priors into weights.

Methods

root_tree(tree, root_sample, remaining_samples)[source]#

Roots a tree produced by UPGMA.

Adds the root at the top of the UPGMA reconstructed tree. By the ultrametric assumption, the root is placed as the parent to the last two unjoined nodes.

Parameters:
tree Graph

Networkx object representing the tree topology

root_sample str

Ignored in this case, the root is known in this case

remaining_samples List[str]

The last two unjoined nodes in the tree

Returns:

A rooted tree.

find_cherry(dissimilarity_matrix)[source]#

Finds a pair of samples to join into a cherry.

Finds the pair of samples with the minimum dissimilarity by finding the minimum value in the provided dissimilarity matrix

Parameters:
dissimilarity_matrix array

A sample x sample dissimilarity matrix

Return type:

Tuple[int, int]

Returns:

A tuple of integers representing rows in the dissimilarity matrix

to join.

update_dissimilarity_map(dissimilarity_map, cherry, new_node)[source]#

Update dissimilarity map after finding a cherry.

Updates the dissimilarity map after joining together two nodes (m1, m2) at a cherry m. For all other nodes v, the new dissimilarity map d’ is:

d’(m, v) = (<m1> * d(m1, v) + <m2> * d(m2, v))/(<m1> + <m2>)

where <m1> is the size of cluster m1, i.e. the number of sample leaves under node m1.

Parameters:
dissimilarity_map DataFrame

A dissimilarity map to update

cherry Tuple[str, str]

A tuple of indices in the dissimilarity map that are joining

new_node str

New node name, to be added to the new dissimilarity map

Return type:

DataFrame

Returns:

A new dissimilarity map, updated with the new node

setup_root_finder(cassiopeia_tree)[source]#

Defines the implicit rooting strategy for the UPGMASolver.

By default, the UPGMA algorithm returns an rooted tree. Therefore, the implicit root will be placed and specified at the end of the solving procedure as the parent of the last two unjoined nodes.

Parameters:
cassiopeia_tree CassiopeiaTree

Input CassiopeiaTree to solve

Return type:

None