cassiopeia.solver.MaxCutSolver#
- class cassiopeia.solver.MaxCutSolver(sdimension=3, iterations=50, prior_transformation='negative_log')[source]#
A MaxCut graph-based CassiopeiaSolver.
The MaxCutSolver implements a top-down algorithm that recursively partitions the sample set based on connectivity. At each recursive step, a connectivity graph is generated for the sample set, where edge weights between samples represent the number of triplets that separate those samples minus the number of triplets that have those samples as an ingroup. Shared mutations are coded as strong negative connections and differing mutations are coded as positive connections. Then a partition is generated by finding a maximum weight cut over the graph. The final partition is also improved upon by a greedy hill-climbing procedure that further optimizes the cut.
- Parameters:
- prior_transformation
str
(default:'negative_log'
) A function defining a transformation on the priors in forming weights to scale frequencies and the contribution of each mutation in the connectivity graph. One of the following:
- ”negative_log”: Transforms each probability by the negative
log (default)
”inverse”: Transforms each probability p by taking 1/p “square_root_inverse”: Transforms each probability by the
the square root of 1/p
- sdimension
Optional
[int
] (default:3
) The number of dimensions to use for the embedding space. Acts as a hyperparameter
- iterations
Optional
[int
] (default:50
) The number of iterations in updating the embeddings. Acts as a hyperparameter
- prior_transformation
- prior_transformation#
Function to use when transforming priors into weights.
- sdimension#
The number of dimensions to use for the embedding space
- iterations#
The number of iterations in updating the embeddings
Methods
- perform_split(character_matrix, samples, weights=None, missing_state_indicator=-1)[source]#
Generate a partition of the samples by finding the max-cut.
First, a connectivity graph is generated with samples as nodes such that samples with shared mutations have strong negative edge weight and samples with distant mutations have positive edge weight. Then, the algorithm finds a partition by using a heuristic method to find the max-cut on the connectivity graph. The samples are randomly embedded in a d-dimensional sphere and the embeddings for each node are iteratively updated based on neighboring edge weights in the connectivity graph such that nodes with stronger connectivity cluster together. The final partition is generated by choosing random hyperplanes to bisect the d-sphere and taking the one that maximizes the cut.
- Parameters:
- character_matrix
DataFrame
Character matrix
- samples
List
[int
] A list of samples to partition
- weights
Optional
[Dict
[int
,Dict
[int
,float
]]] (default:None
) Weighting of each (character, state) pair. Typically a transformation of the priors.
- missing_state_indicator
int
(default:-1
) Character representing missing data.
- character_matrix
- Return type:
- Returns:
A tuple of lists, representing the left and right partition groups