cassiopeia.solver.SpectralGreedySolver#

class cassiopeia.solver.SpectralGreedySolver(missing_data_classifier=<function assign_missing_average>, similarity_function=<function hamming_similarity_without_missing>, threshold=0, prior_transformation='negative_log')[source]#

A CassiopeiaGreedy solver with a spectral heuristic.

The SpectralGreedySolver implements a top-down algorithm that recursively splits the sample set based on the presence, or absence, of the most frequent mutation. Additionally, the hill-climbing procedure from the SpectralSolver is used to further optimize each split for the normalized cut on the similarity graph on the samples. This effectively moves samples across the partition so that both sides of partition have higher internal similarity in their mutations. Multiple missing data imputation methods are included for handling the case when a sample has a missing value on the character being split, where presence or absence of the character is ambiguous. The user can also specify a missing data method.

TODO: Implement FuzzySolver TODO: Experiment to find the best default similarity function

Parameters:

missing_data_classifier Callable (default: <function assign_missing_average at 0x7ff880855e50>)

Takes either a string specifying one of the included missing data imputation methods, or a function implementing the user-specified missing data method. The default is the “average” method.

similarity_function Optional[Callable[[List[int], List[int], int, Optional[Dict[int, Dict[int, float]]]], float]] (default: <function hamming_similarity_without_missing at 0x7ff8808ab8b0>)

A function that calculates a similarity score between two given samples and their observed mutations. The default is “hamming_distance_without_missing”

threshold Optional[int] (default: 0)

A minimum similarity threshold to include an edge in the similarity graph

prior_transformation str (default: 'negative_log')

A function defining a transformation on the priors in forming weights to scale frequencies and the contribution of each mutation in the similarity graph. One of the following:

”negative_log”: Transforms each probability by the negative
log (default)

”inverse”: Transforms each probability p by taking 1/p “square_root_inverse”: Transforms each probability by the

the square root of 1/p

similarity_function#: A function that calculates a similarity score between two given samples and their observed mutations

weights#: Weights on character/mutation pairs, derived from priors

threshold#: A minimum similarity threshold

prior_transformation#: Function to use when transforming priors into weights.

Methods

perform_split(character_matrix, samples, weights=None, missing_state_indicator=-1)[source]#

Performs a partition using both Greedy and Spectral criteria.

First, uses the most frequent (character, state) pair to split the list of samples. In doing so, the procedure makes use of the missing data classifier. Then, it optimizes this partition for the normalized cut on a similarity graph constructed on the samples using a hill- climbing method.

Parameters:

character_matrix DataFrame: Character matrix
samples List[int]: A list of samples to partition
weights Optional[Dict[int, Dict[int, float]]] (default: None): Weighting of each (character, state) pair. Typically a transformation of the priors.
missing_state_indicator int (default: -1): Character representing missing data.

Return type:

Tuple[List[str], List[str]]

Returns:

A tuple of lists, representing the left and right partition groups