cassiopeia.pp.compute_empirical_indel_priors#

cassiopeia.pp.compute_empirical_indel_priors(allele_table, grouping_variables=['intBC'], cut_sites=None)[source]#

Computes indel prior probabilities.

Generates indel prior probabilities from the input allele table. The general idea behind this procedure is to count the number of times an indel independently occur. By default, we treat each intBC as an independent, which is true if the input allele table is a clonal population. Here, the procedure will count the number of intBCs that contain a particular indel and divide by the number of intBCs in the allele table. However, a user can be more nuanced in their analysis and group intBC by other variables, such as lineage group (this is especially useful if intBCs might occur several clonal populations). Then, the procedure will count the number of times an indel occurs in a unique lineage-intBC combination.

Parameters:
allele_table DataFrame

AlleleTable

grouping_variables List[str] (default: ['intBC'])

Variables to stratify data by, to treat as independent groups in counting indel occurrences. These must be columns in the allele table

cut_sites List[str] | NoneOptional[List[str]] (default: None)

Columns in the AlleleTable to treat as cut sites. If None, we assume that the cut-sites are denoted by columns of the form “r{int}” (e.g. “r1”)

Return type:

DataFrame

Returns:

A DataFrame mapping indel identities to the probability.