cassiopeia.solver.UPGMASolver

class cassiopeia.solver.UPGMASolver(dissimilarity_function=<function weighted_hamming_distance>, prior_transformation='negative_log')[source]

UPGMA CassiopeiaSolver.

Implements the UPGMA algorithm described as a derived class of DistanceSolver. This class inherits the generic solve method, but implements its own procedure for finding cherries by minimizing the dissimilarity between samples. After joining nodes, the dissimilarities are updated by averaging the distances of elements in the new cluster with each existing node. Produces a rooted tree that is assumed to be ultrametric.

Parameters
dissimilarity_function : Callable[[array, array, int, Dict[int, Dict[int, float]]], float], NoneOptional[Callable[[array, array, int, Dict[int, Dict[int, float]]], float]] (default: <function weighted_hamming_distance at 0x7f6b4c018ae8>)

A function by which to compute the dissimilarity map. Optional if a dissimilarity map is already provided.

prior_transformation : strstr (default: 'negative_log')

Function to use when transforming priors into weights. Supports the following transformations:

”negative_log”: Transforms each probability by the negative

log (default)

”inverse”: Transforms each probability p by taking 1/p “square_root_inverse”: Transforms each probability by the

the square root of 1/p

dissimilarity_function

Function used to compute dissimilarity between samples.

add_root

Whether or not to add an implicit root the tree.

prior_transformation

Function to use when transforming priors into weights.

Methods

root_tree(tree, root_sample, remaining_samples)[source]

Roots a tree produced by UPGMA.

Adds the root at the top of the UPGMA reconstructed tree. By the ultrametric assumption, the root is placed as the parent to the last two unjoined nodes.

Parameters
tree : GraphGraph

Networkx object representing the tree topology

root_sample : strstr

Ignored in this case, the root is known in this case

remaining_samples : List[str]List[str]

The last two unjoined nodes in the tree

Returns

A rooted tree.

find_cherry(dissimilarity_matrix)[source]

Finds a pair of samples to join into a cherry.

Finds the pair of samples with the minimum dissimilarity by finding the minimum value in the provided dissimilarity matrix

Parameters
dissimilarity_matrix : arrayarray

A sample x sample dissimilarity matrix

Return type

Tuple[int, int]Tuple[int, int]

Returns

A tuple of integers representing rows in the dissimilarity matrix

to join.

update_dissimilarity_map(dissimilarity_map, cherry, new_node)[source]

Update dissimilarity map after finding a cherry.

Updates the dissimilarity map after joining together two nodes (m1, m2) at a cherry m. For all other nodes v, the new dissimilarity map d’ is:

d’(m, v) = (<m1> * d(m1, v) + <m2> * d(m2, v))/(<m1> + <m2>)

where <m1> is the size of cluster m1, i.e. the number of sample leaves under node m1.

Parameters
dissimilarity_map : DataFrameDataFrame

A dissimilarity map to update

cherry : Tuple[str, str]Tuple[str, str]

A tuple of indices in the dissimilarity map that are joining

new_node : strstr

New node name, to be added to the new dissimilarity map

Return type

DataFrameDataFrame

Returns

A new dissimilarity map, updated with the new node

setup_root_finder(cassiopeia_tree)[source]

Defines the implicit rooting strategy for the UPGMASolver.

By default, the UPGMA algorithm returns an rooted tree. Therefore, the implicit root will be placed and specified at the end of the solving procedure as the parent of the last two unjoined nodes.

Parameters
cassiopeia_tree : CassiopeiaTreeCassiopeiaTree

Input CassiopeiaTree to solve

Return type

NoneNone