cassiopeia.solver.ILPSolver#

class cassiopeia.solver.ILPSolver(convergence_time_limit=12600, convergence_iteration_limit=0, maximum_potential_graph_layer_size=10000, maximum_potential_graph_lca_distance=None, weighted=False, seed=None, mip_gap=0.01, prior_transformation='negative_log')[source]#

The Cassiopeia ILP-based maximum parsimony solver.

ILPSolver is a subclass of CassiopeiaSolver and implements the Cassiopeia-ILP algorithm described in Jones et al, Genome Biol 2020. The solver proceeds by constructing a tree over a network of possible evolutionary states known as the potential graph. The procedure for constructing this tree is done by solving for a Steiner Tree with an integer linear programming (ILP) optimization approach. The ILP optimization is performed using Gurobi.

Parameters:
convergence_time_limit int (default: 12600)

Amount of time allotted to the ILP for

0. convergence. Ignored if set to

convergence_iteration_limit int (default: 0)

Number of iterations allowed for ILP convergence. Ignored if set to 0.

maximum_potential_graph_layer_size int (default: 10000)

Maximum size allowed for an iteration of the potential graph inference procedure. If this is exceeded, we return the previous iteration’s graph or abort altogether.

maximum_potential_graph_lca_distance int | NoneOptional[int] (default: None)

Maximum height of LCA to add to the potential graph. If this parameter is not provided or the specified value is 0, the maximum distance between any pair of samples is used as the maximum lca height.

weighted bool (default: False)

Weight edges on the potential graph by the negative log likelihood of the mutations.

seed int | NoneOptional[int] (default: None)

Random seed to use during ILP optimization.

mip_gap float (default: 0.01)

Objective gap for mixed integer linear programming problem.

logfile

A file to log output to. This will contain information around the potential graph inference procedure as well as the Steiner Tree optimization.

prior_transformation str (default: 'negative_log')

Function to use when transforming priors into weights. Supports the following transformations:

”negative_log”: Transforms each probability by the negative

log (default)

”inverse”: Transforms each probability p by taking 1/p “square_root_inverse”: Transforms each probability by the

the square root of 1/p

Methods

solve(cassiopeia_tree, layer=None, collapse_mutationless_edges=False, logfile='stdout.log')[source]#

Infers a tree with Cassiopeia-ILP.

Solves a tree using the Cassiopeia-ILP algorithm and populates a tree in the provided CassiopeiaTree. Specifically, a potential graph of theoretical evolutionary intermediates is estimated from the input data and Gurobi is used to optimize and ILP searching for a maximum parsimony subgraph of the potential graph, resulting in a proposed maximum parsimony phylogeny.

Parameters:
cassiopeia_tree CassiopeiaTree

Input CassiopeiaTree

layer str | NoneOptional[str] (default: None)

Layer storing the character matrix for solving. If None, the default character matrix is used in the CassiopeiaTree.

collapse_mutationless_edges bool (default: False)

Indicates if the final reconstructed tree should collapse mutationless edges based on internal states inferred by Camin-Sokal parsimony. In scoring accuracy, this removes artifacts caused by arbitrarily resolving polytomies.

logfile str (default: 'stdout.log')

Location to log progress.

infer_potential_graph(character_matrix, pid, lca_height, weights=None, missing_state_indicator=-1)[source]#

Infers a potential graph for the observed states.

Using the set of samples in the character matrix for this solver, this procedure creates a network which contains potential ancestors, or evolutionary intermediates.

This procedure invokes ilp_solver_utilities.infer_potential_graph_cython which returns the edges of the potential graph in character string format (e.g., “1|2|3|…”). The procedure here decodes these strings, creates a Networkx directed graph, and adds edges to the graph. These weights are added to the edges of the graph using priors, if they are specified in the CassiopeiaTree, or the number of mutations along an edge.

Parameters:
character_matrix DataFrame

Character matrix

root

Specified root node, represented as a list of character states

pid int

Process ID for future reference

lca_height int

Maximum lca height to consider for connecting nodes to an LCA

weights {int: {int: str}} | NoneOptional[Dict[int, Dict[int, str]]] (default: None)

Weights for character-state pairs, derived from the priors if these are available.

missing_state_indicator int (default: -1)

Indicator for missing data.

Return type:

DiGraph

Returns:

A potential graph represented by a directed graph.

add_edge_weights(potential_graph, weights=None, missing_state_indicator=-1)[source]#

Annotates edges with the weight.

Given a graph where nodes are iterable entities containing character states, annotated edges with respect to the number of mutations. If a prior dictionary is passed into the constructor, the log likelihood of the mutations is added instead. These values are stored in the weight attribute of the networkx graph.

Parameters:
potential_graph DiGraph

Potential graph

weights {int: {int: str}} | NoneOptional[Dict[int, Dict[int, str]]] (default: None)

Weights to use when comparing states between characters

missing_state_indicator int (default: -1)

Variable to indicate missing state information.

Return type:

DiGraph

Returns:

The potential graph with edge weights added, stored in the weight

attribute.

generate_steiner_model(potential_graph, root, targets)[source]#

Generates a Gurobi instance for Steiner Tree inference.

Given a potential graph, a root to treat as the source, and a list of targets, create a Gurobi mixed integer linear programming instance for computing the Steiner Tree.

Parameters:
potential_graph DiGraph

Potential graph representing the evolutionary space on which to solve for the Steiner Tree.

root List[int]

A node in the graph to treat as the source.

targets List[List[int]]

A list of nodes in the tree that serve as targets for the Steiner Tree procedure.

Returns:

A Gurobipy Model instance and the edge variables involved.

solve_steiner_instance(model, edge_variables, potential_graph, pid, logfile)[source]#

Solves for a Steiner Tree from the Gurobi instance.

This function works with a model that has been specified via Gurobi, and will solve the model using the stopping criteria that the user has specified in this class instance.

Parameters:
model

A Gurobi model corresponding to the Steiner Tree problem. This should be created with generate_steiner_model.

edge_variables

Edge variables that were created during model generation. These are Gurobi variables that indicate whether two nodes are connected to one another in the Potential Graph; we use these variables to recreate a tree at the end from the Gurobi solution.

potential_graph DiGraph

Potential Graph that was used as input to the Steiner Tree problem.

pid int

Process ID

logfile str

Location to store standard out.

Return type:

List[DiGraph]

Returns:

A list of solutions

post_process_steiner_solution(solution, root)[source]#

Post-processes the returned graph from Gurobi.

This procedure post-processes the proposed Steiner Tree from Gurobi by enforcing that no self-loops occur and that every node at most one parent.

Parameters:
solution DiGraph

The Gurobi solution

root List[int]

The root node

targets

A list of targets

pid

Process id

Return type:

DiGraph

Returns:

A cleaned up networkx solution