cassiopeia.solver.HybridSolver#

class cassiopeia.solver.HybridSolver(top_solver, bottom_solver, lca_cutoff=None, cell_cutoff=None, threads=1, prior_transformation='negative_log')[source]#

The Hybrid Cassiopeia solver.

HybridSolver is an class representing the structure of Cassiopeia Hybrid inference algorithms. The solver procedure contains logic for building tree starting with a top-down greedy algorithm until a predetermined criteria is reached at which point a more complex algorithm is used to reconstruct each subproblem. The top-down algorithm _must_ be a subclass of a GreedySolver as it must have functions find_split and perform_split. The solver employed at the bottom of the tree can be any CassiopeiaSolver subclass and need only have a solve method.

Parameters:
top_solver GreedySolver

An algorithm to be applied at the top of the tree. Must be a subclass of GreedySolver.

bottom_solver CassiopeiaSolver

An algorithm to be applied at the bottom of the tree. Must be a subclass of CassiopeiaSolver.

lca_cutoff float | NoneOptional[float] (default: None)

Distance to the latest-common-ancestor (LCA) of a subclade to be used as a cutoff for transitioning to the bottom solver.

cell_cutoff int | NoneOptional[int] (default: None)

Number of cells in a subclade to be used as a cutoff for transitioning to the bottom solver.

threads int (default: 1)

Number of threads to be used. This corresponds to the number of subproblems to be run concurrently with the bottom solver.

prior_transformation str (default: 'negative_log')

Function to use when transforming priors into weights. Supports the following transformations:

”negative_log”: Transforms each probability by the negative

log (default)

”inverse”: Transforms each probability p by taking 1/p “square_root_inverse”: Transforms each probability by the

the square root of 1/p

Methods

solve(cassiopeia_tree, layer=None, collapse_mutationless_edges=False, logfile='stdout.log')[source]#

The general hybrid solver routine.

The hybrid solver proceeds by clustering together cells using the algorithm stored in the top_solver until a criteria is reached. Once this criteria is reached, the bottom_solver is applied to each subproblem left over from the “greedy” clustering.

Parameters:
cassiopeia_tree CassiopeiaTree

CassiopeiaTree that stores the character matrix and priors for reconstruction.

layer str | NoneOptional[str] (default: None)

Layer storing the character matrix for solving. If None, the default character matrix is used in the CassiopeiaTree.

collapse_mutationless_edges bool (default: False)

Indicates if the final reconstructed tree should collapse mutationless edges based on internal states inferred by Camin-Sokal parsimony. In scoring accuracy, this removes artifacts caused by arbitrarily resolving polytomies.

logfile str (default: 'stdout.log')

Location to log progress.

apply_top_solver(character_matrix, samples, tree, node_name_generator, weights=None, missing_state_indicator=-1, root=None)[source]#

Applies the top solver to samples.

A helper method for applying the top solver to the samples until a criteria is hit. Subproblems and the root ID are returned.

Parameters:
character_matrix DataFrame

Character matrix

samples List[str]

Samples in the subclade of interest.

tree DiGraph

In progress tree for the HybridSolver.

node_name_generator Generator[str, None, None]

Generator for creating unique node names while applying the top-solver.

weights {int: {int: float}} | NoneOptional[Dict[int, Dict[int, float]]] (default: None)

Weights of character-state combinations, derived from priors if these are available.

missing_state_indicator int (default: -1)

Indicator for missing data

root int | NoneOptional[int] (default: None)

Node ID of the root in the subtree containing the samples.

Return type:

Tuple[int, List[Tuple[int, List[str]]]]

Returns:

The ID of the node serving as the root of the tree containing the

samples, and a list of subproblems in the form [subtree-root, subtree-samples].

apply_bottom_solver(cassiopeia_tree, root, samples=typing.List[str], logfile='stdout.log', layer=None)[source]#

Apply the bottom solver to subproblems.

A private method for solving subproblems identified by the top-down solver with the more precise bottom solver for this instantation of the HybridSolver. This function will create a unique log file, based on the root, set up a new instance of the bottom solver and solve the subproblem.

The function will return a tree for the subproblem and the identifier of the root of the tree.

Parameters:
cassiopeia_tree CassiopeiaTree

CassiopeiaTree for the entire dataset. This will be subsetted with respect to the samples specified.

root int

Identifier of the root in the master tree

samples default: typing.List[str]

A list of samples for which to infer a tree.

logfile str (default: 'stdout.log')

Base location for logging output. A specific logfile will be created from this base logfile name.

layer str | NoneOptional[str] (default: None)

Layer storing the character matrix for solving. If None, the default character matrix is used in the CassiopeiaTree.

Return type:

Tuple[DiGraph, int]

Returns:

A tree in the form of a Networkx graph and the original root

identifier

assess_cutoff(samples, character_matrix, missing_state_indicator=-1)[source]#

Assesses samples with respect to hybrid cutoff.

Parameters:
samples List[str]

A list of samples in a clade.

character_matrix DataFrame

Character matrix

missing_state_indicator int (default: -1)

Indicator for missing data.

Return type:

bool

Returns:

True if the cutoff is reached, False if not.