cassiopeia.solver.NeighborJoiningSolver#

class cassiopeia.solver.NeighborJoiningSolver(dissimilarity_function=<function weighted_hamming_distance>, add_root=False, prior_transformation='negative_log', fast=False, implementation='ccphylo_dnj', threads=1)[source]#

Neighbor-Joining class for Cassiopeia.

Implements the Neighbor-Joining algorithm described by Saitou and Nei (1987) as a derived class of DistanceSolver. This class inherits the generic solve method, but implements its own procedure for finding cherries by minimizing the Q-criterion between samples. If fast is set to True, a fast NJ implementation of is used.

Parameters:
dissimilarity_function Optional[Callable[[array, array, int, Dict[int, Dict[int, float]]], float]] (default: <function weighted_hamming_distance at 0x7ff880f86700>)

A function by which to compute the dissimilarity map. Optional if a dissimilarity map is already provided.

add_root bool (default: False)

Whether or not to add an implicit root the tree, i.e. a root with unmutated characters. If set to False, and no explicit root is provided in the CassiopeiaTree, then will return an unrooted, undirected tree

prior_transformation str (default: 'negative_log')

Function to use when transforming priors into weights. Supports the following transformations:

”negative_log”: Transforms each probability by the negative

log (default)

”inverse”: Transforms each probability p by taking 1/p “square_root_inverse”: Transforms each probability by the

the square root of 1/p

fast bool (default: False)

Whether to use a fast implementation of Neighbor-Joining.

implementation str (default: 'ccphylo_dnj')

Which fast implementation to use. Options are: “ccphylo_dnj”: CCPhylo implementation the Dynamic Neighbor-Joining

algorithm described by Clausen (2023). Solution in guaranteed to be exact.

”ccphylo_hnj”: CCPhylo implementation of the Heuristic

Neighbor-Joining algorithm described by Clausen (2023). Solution is not guaranteed to be exact.

”ccphylo_nj”: CCPhylo implementation of the Neighbor-Joining.

threads int (default: 1)

Number of threads to use for solver.

dissimilarity_function#

Function used to compute dissimilarity between samples.

add_root#

Whether or not to add an implicit root the tree.

prior_transformation#

Function to use when transforming priors into weights.

threads#

Number of threads to use for solver.

Methods

root_tree(tree, root_sample, remaining_samples)[source]#

Roots a tree produced by Neighbor-Joining at the specified root.

Uses the specified root to root the tree passed in

Parameters:
tree Graph

Networkx object representing the tree topology

root_sample str

Sample to treat as the root

remaining_samples List[str]

The last two unjoined nodes in the tree

Return type:

DiGraph

Returns:

A rooted tree

find_cherry(dissimilarity_matrix)[source]#

Finds a pair of samples to join into a cherry.

Proceeds by minimizing the Q-criterion as in Saitou and Nei (1987) to select a pair of samples to join.

Parameters:
dissimilarity_matrix array

A sample x sample dissimilarity matrix

Return type:

Tuple[int, int]

Returns:

A tuple of intgers representing rows in the dissimilarity matrix

to join.

static compute_q(dissimilarity_map)[source]#

Computes the Q-criterion for every pair of samples.

Computes the Q-criterion defined by Saitou and Nei (1987):

Q(i,j) = d(i, j) - 1/(n-2) (sum(d(i, :)) + sum(d(j,:)))

Parameters:
dissimilarity_map <class ‘int’>

A sample x sample dissimilarity map

Return type:

array

Returns:

A matrix storing the Q-criterion for every pair of samples.

update_dissimilarity_map(dissimilarity_map, cherry, new_node)[source]#

Update dissimilarity map after finding a cherry.

Updates the dissimilarity map after joining together two nodes (m1, m2) at a cherry m. For all nodes v, the new dissimilarity map d’ is:

d’(m, v) = 0.5 * (d(v, m1) + d(v, m2) - d(m1, m2))

Parameters:
dissimilarity_map DataFrame

A dissimilarity map to update

cherry Tuple[str, str]

A tuple of indices in the dissimilarity map that are joining

new_node str

New node name, to be added to the new dissimilarity map

Return type:

DataFrame

Returns:

A new dissimilarity map, updated with the new node

setup_root_finder(cassiopeia_tree)[source]#

Defines the implicit rooting strategy for the NeighborJoiningSolver.

By default, the NeighborJoining algorithm returns an unrooted tree. To root this tree, an implicit root of all zeros is added to the character matrix. Then, the dissimilarity map is recalculated using the updated character matrix. If the tree already has a computed dissimilarity map, only the new dissimilarities are calculated.

Parameters:
cassiopeia_tree CassiopeiaTree

Input CassiopeiaTree to solve

Return type:

None