cassiopeia.solver.MaxCutGreedySolver#

class cassiopeia.solver.MaxCutGreedySolver(missing_data_classifier=<function assign_missing_average>, prior_transformation='negative_log')[source]#

A CassiopeiaGreedy solver with the max cut criterion.

The MaxCutGreedySolver implements a top-down algorithm that recursively splits the sample set based on the presence/absence of the most frequent mutation. Additionally, the hill-climbing procedure from the MaxCutSolver is used to further optimize each split for the max cut on the similarity graph on the samples. This effectively moves samples across the partition so that samples with similar mutations are grouped together and samples with different mutations are seperated. Multiple missing data imputation methods are included for handling the case when a sample has a missing value on the character being split, where presence or absence of the character is ambiguous. The user can also specify a missing data method.

TODO: Implement fuzzy solver

Parameters:

prior_transformation str (default: 'negative_log')

A function defining a transformation on the priors in forming weights to scale frequencies and the contribution of each mutation in the connectivity graph. One of the following:

”negative_log”: Transforms each probability by the negative
log (default)

”inverse”: Transforms each probability p by taking 1/p “square_root_inverse”: Transforms each probability by the

the square root of 1/p

prior_transformation#: Function to transform priors, if these are available.

missing_data_classifier#: Function to classify missing data during character splits.

Methods

perform_split(character_matrix, samples, weights=None, missing_state_indicator=-1)[source]#

Performs a partition using both Greedy and MaxCut criteria.

First, uses the most frequent (character, state) pair to split the list of samples. In doing so, the procedure makes use of the missing data classifier. Then, it optimizes this partition for the max cut on a connectivity graph constructed on the samples using a hill-climbing method.

Parameters:

character_matrix DataFrame: Character matrix
samples List[int]: A list of samples to partition
weights {int: {int: float}} | NoneOptional[Dict[int, Dict[int, float]]] (default: None): Weighting of each (character, state) pair. Typically a transformation of the priors.
missing_state_indicator int (default: -1): Character representing missing data.

Return type:

Tuple[List[str], List[str]]

Returns:

A tuple of lists, representing the left and right partition groups