cassiopeia.solver.VanillaGreedySolver#
- class cassiopeia.solver.VanillaGreedySolver(missing_data_classifier=<function assign_missing_average>, prior_transformation='negative_log')[source]#
A class for the basic Cassiopeia-Greedy solver.
The VanillaGreedySolver implements a top-down algorithm that optimizes for parsimony by recursively splitting the sample set based on the most presence, or absence, of the most frequent mutation. Multiple missing data imputation methods are included for handling the case when a sample has a missing value on the character being split, where presence or absence of the character is ambiguous. The user can also specify a missing data method.
TODO(richardyz98): Implement fuzzysolver
- Parameters:
- missing_data_classifier
Callable
(default:<function assign_missing_average at 0x76660cb13b80>
) Takes either a string specifying one of the included missing data imputation methods, or a function implementing the user-specified missing data method. The default is the “average” method
- prior_transformation
str
(default:'negative_log'
) A function defining a transformation on the priors in forming weights to scale frequencies. One of the following:
- ”negative_log”: Transforms each probability by the negative
log (default)
”inverse”: Transforms each probability p by taking 1/p “square_root_inverse”: Transforms each probability by the
the square root of 1/p
- missing_data_classifier
- prior_transformation#
Function to transform priors, if these are available.
- missing_data_classifier#
Function to classify missing data during character splits.
Methods
- perform_split(character_matrix, samples, weights=None, missing_state_indicator=-1)[source]#
Partitions based on the most frequent (character, state) pair.
Uses the (character, state) pair to split the list of samples into two partitions. In doing so, the procedure makes use of the missing data classifier to classify samples that have missing data at that character where presence or absence of the character is ambiguous.
- Parameters:
- character_matrix
DataFrame
Character matrix
- samples
List
[int
] A list of samples to partition
- weights
Optional
[Dict
[int
,Dict
[int
,float
]]] (default:None
) Weighting of each (character, state) pair. Typically a transformation of the priors.
- missing_state_indicator
int
(default:-1
) Character representing missing data.
- character_matrix
- Return type:
- Returns:
A tuple of lists, representing the left and right partition groups