cassiopeia.solver.ILPSolver#
- class cassiopeia.solver.ILPSolver(convergence_time_limit=12600, convergence_iteration_limit=0, maximum_potential_graph_layer_size=10000, maximum_potential_graph_lca_distance=None, weighted=False, seed=None, mip_gap=0.01, prior_transformation='negative_log')[source]#
The Cassiopeia ILP-based maximum parsimony solver.
ILPSolver is a subclass of CassiopeiaSolver and implements the Cassiopeia-ILP algorithm described in Jones et al, Genome Biol 2020. The solver proceeds by constructing a tree over a network of possible evolutionary states known as the potential graph. The procedure for constructing this tree is done by solving for a Steiner Tree with an integer linear programming (ILP) optimization approach. The ILP optimization is performed using Gurobi.
- Parameters:
- convergence_time_limit
int
(default:12600
) Amount of time allotted to the ILP for
- 0. convergence. Ignored if set to
- convergence_iteration_limit
int
(default:0
) Number of iterations allowed for ILP convergence. Ignored if set to 0.
- maximum_potential_graph_layer_size
int
(default:10000
) Maximum size allowed for an iteration of the potential graph inference procedure. If this is exceeded, we return the previous iteration’s graph or abort altogether.
- maximum_potential_graph_lca_distance
Optional
[int
] (default:None
) Maximum height of LCA to add to the potential graph. If this parameter is not provided or the specified value is 0, the maximum distance between any pair of samples is used as the maximum lca height.
- weighted
bool
(default:False
) Weight edges on the potential graph by the negative log likelihood of the mutations.
- seed
Optional
[int
] (default:None
) Random seed to use during ILP optimization.
- mip_gap
float
(default:0.01
) Objective gap for mixed integer linear programming problem.
- logfile
A file to log output to. This will contain information around the potential graph inference procedure as well as the Steiner Tree optimization.
- prior_transformation
str
(default:'negative_log'
) Function to use when transforming priors into weights. Supports the following transformations:
- ”negative_log”: Transforms each probability by the negative
log (default)
”inverse”: Transforms each probability p by taking 1/p “square_root_inverse”: Transforms each probability by the
the square root of 1/p
- convergence_time_limit
Methods
- solve(cassiopeia_tree, layer=None, collapse_mutationless_edges=False, logfile='stdout.log')[source]#
Infers a tree with Cassiopeia-ILP.
Solves a tree using the Cassiopeia-ILP algorithm and populates a tree in the provided CassiopeiaTree. Specifically, a potential graph of theoretical evolutionary intermediates is estimated from the input data and Gurobi is used to optimize and ILP searching for a maximum parsimony subgraph of the potential graph, resulting in a proposed maximum parsimony phylogeny.
- Parameters:
- cassiopeia_tree
CassiopeiaTree
Input CassiopeiaTree
- layer
Optional
[str
] (default:None
) Layer storing the character matrix for solving. If None, the default character matrix is used in the CassiopeiaTree.
- collapse_mutationless_edges
bool
(default:False
) Indicates if the final reconstructed tree should collapse mutationless edges based on internal states inferred by Camin-Sokal parsimony. In scoring accuracy, this removes artifacts caused by arbitrarily resolving polytomies.
- logfile
str
(default:'stdout.log'
) Location to log progress.
- cassiopeia_tree
- infer_potential_graph(character_matrix, pid, lca_height, weights=None, missing_state_indicator=-1)[source]#
Infers a potential graph for the observed states.
Using the set of samples in the character matrix for this solver, this procedure creates a network which contains potential ancestors, or evolutionary intermediates.
This procedure invokes ilp_solver_utilities.infer_potential_graph_cython which returns the edges of the potential graph in character string format (e.g., “1|2|3|…”). The procedure here decodes these strings, creates a Networkx directed graph, and adds edges to the graph. These weights are added to the edges of the graph using priors, if they are specified in the CassiopeiaTree, or the number of mutations along an edge.
- Parameters:
- character_matrix
DataFrame
Character matrix
- root
Specified root node, represented as a list of character states
- pid
int
Process ID for future reference
- lca_height
int
Maximum lca height to consider for connecting nodes to an LCA
- weights
Optional
[Dict
[int
,Dict
[int
,str
]]] (default:None
) Weights for character-state pairs, derived from the priors if these are available.
- missing_state_indicator
int
(default:-1
) Indicator for missing data.
- character_matrix
- Return type:
DiGraph
- Returns:
A potential graph represented by a directed graph.
- add_edge_weights(potential_graph, weights=None, missing_state_indicator=-1)[source]#
Annotates edges with the weight.
Given a graph where nodes are iterable entities containing character states, annotated edges with respect to the number of mutations. If a prior dictionary is passed into the constructor, the log likelihood of the mutations is added instead. These values are stored in the weight attribute of the networkx graph.
- Parameters:
- Return type:
DiGraph
- Returns:
- The potential graph with edge weights added, stored in the weight
attribute.
- generate_steiner_model(potential_graph, root, targets)[source]#
Generates a Gurobi instance for Steiner Tree inference.
Given a potential graph, a root to treat as the source, and a list of targets, create a Gurobi mixed integer linear programming instance for computing the Steiner Tree.
- Parameters:
- Returns:
A Gurobipy Model instance and the edge variables involved.
- solve_steiner_instance(model, edge_variables, potential_graph, pid, logfile)[source]#
Solves for a Steiner Tree from the Gurobi instance.
This function works with a model that has been specified via Gurobi, and will solve the model using the stopping criteria that the user has specified in this class instance.
- Parameters:
- model
A Gurobi model corresponding to the Steiner Tree problem. This should be created with generate_steiner_model.
- edge_variables
Edge variables that were created during model generation. These are Gurobi variables that indicate whether two nodes are connected to one another in the Potential Graph; we use these variables to recreate a tree at the end from the Gurobi solution.
- potential_graph
DiGraph
Potential Graph that was used as input to the Steiner Tree problem.
- pid
int
Process ID
- logfile
str
Location to store standard out.
- Return type:
List
[DiGraph
]- Returns:
A list of solutions