cassiopeia.pp.convert_alleletable_to_character_matrix#
- cassiopeia.pp.convert_alleletable_to_character_matrix(alleletable, ignore_intbcs=[], allele_rep_thresh=1.0, missing_data_allele=None, missing_data_state=-1, mutation_priors=None, cut_sites=None, collapse_duplicates=True)[source]#
Converts an AlleleTable into a character matrix.
Given an AlleleTable storing the observed mutations for each intBC / cellBC combination, create a character matrix for input into a CassiopeiaSolver object. By default, we codify uncut mutations as ‘0’ and missing data items as ‘-1’. The function also have the ability to ignore certain intBC sets as well as cut sites with too little diversity.
- Parameters:
- alleletable
DataFrame Allele Table to be converted into a character matrix
- ignore_intbcs
List[str] (default:[]) A set of intBCs to ignore
- allele_rep_thresh
float(default:1.0) A threshold for removing target sites that have an allele represented by this proportion
- missing_data_allele
str|NoneOptional[str] (default:None) Value in the allele table that indicates that the cut-site is missing. This will be converted into
missing_data_state- missing_data_state
int(default:-1) A state to use for missing data.
- mutation_priors
DataFrame|NoneOptional[DataFrame] (default:None) A table storing the prior probability of a mutation occurring. This table is used to create a character matrix-specific probability dictionary for reconstruction.
- cut_sites
List[str] |NoneOptional[List[str]] (default:None) Columns in the AlleleTable to treat as cut sites. If None, we assume that the cut-sites are denoted by columns of the form “r{int}” (e.g. “r1”)
- collapse_duplicates
bool(default:True) Whether or not to collapse duplicate character states present for a single cellBC-intBC pair. This option has no effect if there are no allele conflicts. Defaults to True.
- alleletable
- Return type:
Tuple[DataFrame,Dict[int,Dict[int,float]],Dict[int,Dict[int,str]]]- Returns:
- A character matrix, a probability dictionary, and a dictionary mapping
states to the original mutation.