cassiopeia.solver.SpectralGreedySolver

class cassiopeia.solver.SpectralGreedySolver(missing_data_classifier=<function assign_missing_average>, similarity_function=<function hamming_similarity_without_missing>, threshold=0, prior_transformation='negative_log')[source]

A CassiopeiaGreedy solver with a spectral heuristic.

The SpectralGreedySolver implements a top-down algorithm that recursively splits the sample set based on the presence, or absence, of the most frequent mutation. Additionally, the hill-climbing procedure from the SpectralSolver is used to further optimize each split for the normalized cut on the similarity graph on the samples. This effectively moves samples across the partition so that both sides of partition have higher internal similarity in their mutations. Multiple missing data imputation methods are included for handling the case when a sample has a missing value on the character being split, where presence or absence of the character is ambiguous. The user can also specify a missing data method.

TODO: Implement FuzzySolver TODO: Experiment to find the best default similarity function

Parameters
missing_data_classifier : CallableCallable (default: <function assign_missing_average at 0x7fa4d0bc2268>)

Takes either a string specifying one of the included missing data imputation methods, or a function implementing the user-specified missing data method. The default is the “average” method.

similarity_function : Callable[[List[int], List[int], int, Optional[Dict[int, Dict[int, float]]]], float], NoneOptional[Callable[[List[int], List[int], int, Optional[Dict[int, Dict[int, float]]]], float]] (default: <function hamming_similarity_without_missing at 0x7fa4d0baf1e0>)

A function that calculates a similarity score between two given samples and their observed mutations. The default is “hamming_distance_without_missing”

threshold : int, NoneOptional[int] (default: 0)

A minimum similarity threshold to include an edge in the similarity graph

prior_transformation : strstr (default: 'negative_log')

A function defining a transformation on the priors in forming weights to scale frequencies and the contribution of each mutation in the similarity graph. One of the following:

”negative_log”: Transforms each probability by the negative

log (default)

”inverse”: Transforms each probability p by taking 1/p “square_root_inverse”: Transforms each probability by the

the square root of 1/p

similarity_function

A function that calculates a similarity score between two given samples and their observed mutations

weights

Weights on character/mutation pairs, derived from priors

threshold

A minimum similarity threshold

prior_transformation

Function to use when transforming priors into weights.

Methods

perform_split(character_matrix, samples, weights=None, missing_state_indicator=- 1)[source]

Performs a partition using both Greedy and Spectral criteria.

First, uses the most frequent (character, state) pair to split the list of samples. In doing so, the procedure makes use of the missing data classifier. Then, it optimizes this partition for the normalized cut on a similarity graph constructed on the samples using a hill- climbing method.

Parameters
character_matrix : DataFrameDataFrame

Character matrix

samples : List[int]List[int]

A list of samples to partition

weights : {int: {int: float}}, NoneOptional[Dict[int, Dict[int, float]]] (default: None)

Weighting of each (character, state) pair. Typically a transformation of the priors.

missing_state_indicator : intint (default: -1)

Character representing missing data.

Return type

Tuple[List[str], List[str]]Tuple[List[str], List[str]]

Returns

A tuple of lists, representing the left and right partition groups