cassiopeia.sim.ecDNABirthDeathSimulator#

class cassiopeia.sim.ecDNABirthDeathSimulator(birth_waiting_distribution, initial_birth_scale, death_waiting_distribution=<function ecDNABirthDeathSimulator.<lambda>>, mutation_distribution=None, fitness_distribution=None, fitness_base=2.718281828459045, num_extant=None, experiment_time=None, collapse_unifurcations=True, prune_dead_lineages=True, random_seed=None, initial_copy_number=array([1]), cosegregation_coefficient=0.0, splitting_function=<function ecDNABirthDeathSimulator.<lambda>>, fitness_array=array([0, 1]), fitness_function=None, capture_efficiency=1.0, initial_tree=None)[source]#

Simulator class for a forward birth-death process with fitness in a population with ecDNA.

“Implements a flexible phylogenetic tree simulator using a forward birth- death process. In this process starting from an initial root lineage, births represent the branching of a new lineage and death represents the cessation of an existing lineage. The process is represented as a tree, with internal nodes representing division events, branch lengths representing the lifetimes of individuals, and leaves representing samples observed at the end of the experiment.

Allows any distribution on birth and death waiting times to be specified, including constant, exponential, weibull, etc. If no death waiting time distribution is provided, the process reduces to a Yule birth process. Also robustly simulates differing fitness on lineages within a simulated tree. Fitness in this context represents potential mutations that may be acquired on a lineage that change the rate at which new members are born. Each lineage maintains its own birth scale parameter, altered from an initial specified experiment-wide birth scale parameter by accrued mutations. Different fitness regimes can be specified based on user provided distributions on how often fitness mutations occur and their respective strengths.

There are two stopping conditions for the simulation. The first is “number of extant nodes”, which specifies the simulation to run until the first moment a number of extant nodes exist. The second is “experiment time”, which specifies the time at which lineages are sampled. At least one of these two stopping criteria must be provided. Both can be provided in which case the simulation is run until one of the stopping conditions is reached.”

Example use snippet:

# note that numpy uses a different parameterization of the # exponential distribution with the scale parameter, which is 1/rate

birth_waiting_distribution = lambda scale: np.random.exponential(scale) death_waiting_distribution = np.random.exponential(1.5) initial_birth_scale = 0.5 mutation_distribution = lambda: 1 if np.random.uniform() > 0.5 else 0 fitness_distribution = lambda: np.random.uniform(-1,1) fitness_base = 2

bd_sim = BirthDeathFitnessSimulator(

birth_waiting_distribution, initial_birth_scale, death_waiting_distribution=death_waiting_distribution, mutation_distribution=mutation_distribution, fitness_distribution=fitness_distribution, fitness_base=fitness_base, num_extant=8

) tree = bd_sim.simulate_tree()

Parameters:
birth_waiting_distribution Callable[[float], float]

A function that samples waiting times from the birth distribution. Determines how often births occur. Must take a scale parameter as the input

initial_birth_scale float

The initial scale parameter that is used at the start of the experiment

death_waiting_distribution Optional[Callable[[], float]] (default: <function ecDNABirthDeathSimulator.<lambda> at 0x7665e6d65d30>)

A function that samples waiting times from the death distribution. Determines how often deaths occur. Default is no-death.

mutation_distribution Optional[Callable[[], int]] (default: None)

A function that samples the number of mutations that occur at a division event. If None, then no mutations are sampled

fitness_distribution Optional[Callable[[], float]] (default: None)

One of the two elements in determining the multiplicative coefficient of a fitness mutation. A function that samples the exponential that the fitness base is raised by. Determines the distribution of fitness mutation strengths. Must not be None if mutation_distribution provided

fitness_base float (default: 2.718281828459045)

One of the two elements in determining the multiplicative strength of a fitness mutation. The base that is raised by the value given by the fitness distribution. Determines the base strength of fitness mutations. By default is e, Euler’s Constant

num_extant Optional[int] (default: None)

Specifies the number of extant lineages existing at the same time as a stopping condition for the experiment

experiment_time Optional[float] (default: None)

Specifies the total time that the experiment runs as a stopping condition for the experiment

collapse_unifurcations bool (default: True)

Specifies whether to collapse unifurcations in the tree resulting from pruning dead lineages

prune_dead_lineages bool (default: True)

Whether or not to prune dead (unobserved) lineages. Can be more efficient to not prune, and instead compute statistics on living lineages downstream.

random_seed Optional[int] (default: None)

A seed for reproducibility

initial_copy_number array (default: array([1]))

Initial copy number for parental lineage.

cosegregation_coefficient float (default: 0.0)

A coefficient describing how likely it is for one species to be co-inherited with one specific species (currently modeled as the first in the array). TODO: how do we make this generalizable to multiple species each with different pairwise covariances?

splitting_function Callable[[int], int] (default: <function ecDNABirthDeathSimulator.<lambda> at 0x7665e6d65dc0>)

As implemented, the function that describes segregation of each species at cell division. TODO: fix this implementation to allow for non-independent segregation.

fitness_array array (default: array([0, 1]))

Fitnesses with respect to copy number of each species in a cell. This should be a matrix in R^e (where e is the number of ecDNA species being modelled).

fitness_function Optional[Callable[[int, int, float], float]] (default: None)

A function that produces a fitness value as a function of copy number and the selection coefficient encoded by the fitness array.

capture_efficiency float (default: 1.0)

Probability of observing an ecDNA species. Used as the the probability of a binomial process.

Raises:
  • TreeSimulatorError if invalid stopping conditions are provided or if a

  • fitness distribution is not provided when a mutation distribution isn't

Methods

initialize_tree(names)[source]#

initializes a tree (nx.DiGraph() object with one node)

Return type:

DiGraph

update_fitness(ecdna_array)[source]#

Updates a lineage birth scale, representing its (Malthusian) fitness.

Fitness is computed as a function of copy number, using the fitness_array (which defines fitness for CN=0 or CN >0 for each species, with epistasis).

Parameters:
ecdna_array array

The birth_scale to be updated

Return type:

float

Returns:

The updated birth_scale

Raises:

TreeSimulatorError if a negative number of mutations is sampled

sample_lineage_event(lineage, current_lineages, tree, names, observed_nodes)[source]#

A helper function that samples an event for a lineage. Takes a lineage and determines the next event in that lineage’s future. Simulates the lifespan of a new descendant. Birth and death waiting times are sampled, representing how long the descendant lived. If a death event occurs first, then the lineage with the new descendant is added to the queue of currently alive, but its status is marked as inactive and will be removed at the time the lineage dies. If a birth event occurs first, then the lineage with the new descendant is added to the queue, but with its status marked as active, and further events will be sampled at the time the lineage divides. Additionally, its fitness will be updated by altering its birth rate. The descendant node is added to the tree object, with the edge weight between the current node and the descendant representing the lifespan of the descendant. In the case the descendant would live past the end of the experiment (both birth and death times exceed past the end of the experiment), then the lifespan is cut off at the experiment time and a final observed sample is added to the tree. In this case the lineage is marked as inactive as well. :param unique_id: The unique ID number to be used to name a new node

added to the tree

Parameters:
lineage Dict[str, Union[int, float]]

The current extant lineage to extend. Contains the ID of the internal node to attach the descendant to, the current birth scale parameter of the lineage, the current total lived time of the lineage, and the status of whether the lineage is still dividing

current_lineages PriorityQueue

The queue containing currently alive lineages

tree DiGraph

The tree object being constructed by the simulator representing the birth death process

names Generator

A generator providing unique names for tree nodes

observed_nodes List[str]

A list of nodes that are observed at the end of the experiment

Raises:
  • TreeSimulatorError if a negative waiting time is sampled or a

  • non-active lineage is passed in

Return type:

None

get_ecdna_array(parent_id, tree)[source]#

Generates an ecDNA array for a child given its parent and sisters.

Parameters:
parent_id str

ID of parent in the generated tree.

tree DiGraph

The in-progress tree.

Return type:

array

Returns:

Numpy array corresponding to the ecDNA copy numbers for the child.

populate_tree_from_simulation(tree, observed_nodes)[source]#

Populates tree with appropriate meta data.

Parameters:
tree DiGraph

The tree simulated with ecDNA and fitness values populated as attributes.

observed_nodes List[str]

The observed leaves of the tree.

Return type:

CassiopeiaTree

Returns:

A CassiopeiaTree with relevant node attributes filled in.