cassiopeia.pp.convert_alleletable_to_character_matrix#
- cassiopeia.pp.convert_alleletable_to_character_matrix(alleletable, ignore_intbcs=[], allele_rep_thresh=1.0, missing_data_allele=None, missing_data_state=-1, mutation_priors=None, cut_sites=None, collapse_duplicates=True)[source]#
Converts an AlleleTable into a character matrix.
Given an AlleleTable storing the observed mutations for each intBC / cellBC combination, create a character matrix for input into a CassiopeiaSolver object. By default, we codify uncut mutations as ‘0’ and missing data items as ‘-1’. The function also have the ability to ignore certain intBC sets as well as cut sites with too little diversity.
- Parameters:
- alleletable
DataFrame
Allele Table to be converted into a character matrix
- ignore_intbcs
List
[str
] (default:[]
) A set of intBCs to ignore
- allele_rep_thresh
float
(default:1.0
) A threshold for removing target sites that have an allele represented by this proportion
- missing_data_allele
Optional
[str
] (default:None
) Value in the allele table that indicates that the cut-site is missing. This will be converted into
missing_data_state
- missing_data_state
int
(default:-1
) A state to use for missing data.
- mutation_priors
Optional
[DataFrame
] (default:None
) A table storing the prior probability of a mutation occurring. This table is used to create a character matrix-specific probability dictionary for reconstruction.
- cut_sites
Optional
[List
[str
]] (default:None
) Columns in the AlleleTable to treat as cut sites. If None, we assume that the cut-sites are denoted by columns of the form “r{int}” (e.g. “r1”)
- collapse_duplicates
bool
(default:True
) Whether or not to collapse duplicate character states present for a single cellBC-intBC pair. This option has no effect if there are no allele conflicts. Defaults to True.
- alleletable
- Return type:
Tuple
[DataFrame
,Dict
[int
,Dict
[int
,float
]],Dict
[int
,Dict
[int
,str
]]]- Returns:
- A character matrix, a probability dictionary, and a dictionary mapping
states to the original mutation.