cassiopeia.pp.error_correct_umis¶
- cassiopeia.pp.error_correct_umis(input_df, max_umi_distance=2, allow_allele_conflicts=False, n_threads=1)[source]¶
Within cellBC-intBC pairs, collapses UMIs that have close sequences.
Error correct UMIs together within cellBC-intBC pairs. UMIs that have a Hamming Distance between their sequences less than a threshold are corrected towards whichever UMI is more abundant. The allow_allele_conflicts option may be used to also group on the actual allele.
- Parameters
- input_df :
DataFrameDataFrame Input DataFrame of alignments.
- max_umi_distance :
intint(default:2) The threshold specifying the Maximum Hamming distance between UMIs for one to be corrected to another.
- allow_allele_conflicts :
boolbool(default:False) Whether or not to include the allele when splitting UMIs into allele groups. When True, UMIs are grouped by cellBC-intBC-allele triplets. When False, UMIs are grouped by cellBC-intBC pairs. This option is used when it is possible for each cellBC-intBC pair to have >1 allele state, such as for spatial data.
- n_threads :
intint(default:1) Number of threads to use.
- input_df :
- Return type
- Returns
A DataFrame of error corrected UMIs.