cassiopeia.pp.convert_alleletable_to_character_matrix

cassiopeia.pp.convert_alleletable_to_character_matrix(alleletable, ignore_intbcs=[], allele_rep_thresh=1.0, missing_data_state=- 1, mutation_priors=None, cut_sites=None, collapse_duplicates=True)[source]

Converts an AlleleTable into a character matrix.

Given an AlleleTable storing the observed mutations for each intBC / cellBC combination, create a character matrix for input into a CassiopeiaSolver object. By default, we codify uncut mutations as ‘0’ and missing data items as ‘-1’. The function also have the ability to ignore certain intBC sets as well as cut sites with too little diversity.

Parameters
alleletable : DataFrameDataFrame

Allele Table to be converted into a character matrix

ignore_intbcs : List[str]List[str] (default: [])

A set of intBCs to ignore

allele_rep_thresh : floatfloat (default: 1.0)

A threshold for removing target sites that have an allele represented by this proportion

missing_data_state : intint (default: -1)

A state to use for missing data.

mutation_priors : DataFrame, NoneOptional[DataFrame] (default: None)

A table storing the prior probability of a mutation occurring. This table is used to create a character matrix-specific probability dictionary for reconstruction.

cut_sites : List[str], NoneOptional[List[str]] (default: None)

Columns in the AlleleTable to treat as cut sites. If None, we assume that the cut-sites are denoted by columns of the form “r{int}” (e.g. “r1”)

collapse_duplicates : boolbool (default: True)

Whether or not to collapse duplicate character states present for a single cellBC-intBC pair. This option has no effect if there are no allele conflicts. Defaults to True.

Returns

character matrix : A

states to the original mutation.

probability dictionary : a

states to the original mutation.

a dictionary mapping : and

states to the original mutation.

Return type

Tuple[DataFrame, Dict[int, Dict[int, float]], Dict[int, Dict[int, str]]]Tuple[DataFrame, Dict[int, Dict[int, float]], Dict[int, Dict[int, str]]]