cassiopeia.pp.error_correct_umis#
- cassiopeia.pp.error_correct_umis(input_df, max_umi_distance=2, allow_allele_conflicts=False, n_threads=1)[source]#
Within cellBC-intBC pairs, collapses UMIs that have close sequences.
Error correct UMIs together within cellBC-intBC pairs. UMIs that have a Hamming Distance between their sequences less than a threshold are corrected towards whichever UMI is more abundant. The allow_allele_conflicts option may be used to also group on the actual allele.
- Parameters:
- input_df
DataFrame
Input DataFrame of alignments.
- max_umi_distance
int
(default:2
) The threshold specifying the Maximum Hamming distance between UMIs for one to be corrected to another.
- allow_allele_conflicts
bool
(default:False
) Whether or not to include the allele when splitting UMIs into allele groups. When True, UMIs are grouped by cellBC-intBC-allele triplets. When False, UMIs are grouped by cellBC-intBC pairs. This option is used when it is possible for each cellBC-intBC pair to have >1 allele state, such as for spatial data.
- n_threads
int
(default:1
) Number of threads to use.
- input_df
- Return type:
- Returns:
A DataFrame of error corrected UMIs.