cassiopeia.solver.VanillaGreedySolver#

class cassiopeia.solver.VanillaGreedySolver(missing_data_classifier=<function assign_missing_average>, prior_transformation='negative_log')[source]#

A class for the basic Cassiopeia-Greedy solver.

The VanillaGreedySolver implements a top-down algorithm that optimizes for parsimony by recursively splitting the sample set based on the most presence, or absence, of the most frequent mutation. Multiple missing data imputation methods are included for handling the case when a sample has a missing value on the character being split, where presence or absence of the character is ambiguous. The user can also specify a missing data method.

TODO(richardyz98): Implement fuzzysolver

Parameters:

missing_data_classifier Callable (default: <function assign_missing_average at 0x7ff880855e50>)

Takes either a string specifying one of the included missing data imputation methods, or a function implementing the user-specified missing data method. The default is the “average” method

prior_transformation str (default: 'negative_log')

A function defining a transformation on the priors in forming weights to scale frequencies. One of the following:

”negative_log”: Transforms each probability by the negative
log (default)

”inverse”: Transforms each probability p by taking 1/p “square_root_inverse”: Transforms each probability by the

the square root of 1/p

prior_transformation#: Function to transform priors, if these are available.

missing_data_classifier#: Function to classify missing data during character splits.

Methods

perform_split(character_matrix, samples, weights=None, missing_state_indicator=-1)[source]#

Partitions based on the most frequent (character, state) pair.

Uses the (character, state) pair to split the list of samples into two partitions. In doing so, the procedure makes use of the missing data classifier to classify samples that have missing data at that character where presence or absence of the character is ambiguous.

Parameters:

character_matrix DataFrame: Character matrix
samples List[int]: A list of samples to partition
weights Optional[Dict[int, Dict[int, float]]] (default: None): Weighting of each (character, state) pair. Typically a transformation of the priors.
missing_state_indicator int (default: -1): Character representing missing data.

Return type:

Tuple[List[str], List[str]]

Returns:

A tuple of lists, representing the left and right partition groups