cassiopeia.pp.error_correct_umis#

cassiopeia.pp.error_correct_umis(input_df, max_umi_distance=2, allow_allele_conflicts=False, n_threads=1)[source]#

Within cellBC-intBC pairs, collapses UMIs that have close sequences.

Error correct UMIs together within cellBC-intBC pairs. UMIs that have a Hamming Distance between their sequences less than a threshold are corrected towards whichever UMI is more abundant. The allow_allele_conflicts option may be used to also group on the actual allele.

Parameters:
input_df DataFrame

Input DataFrame of alignments.

max_umi_distance int (default: 2)

The threshold specifying the Maximum Hamming distance between UMIs for one to be corrected to another.

allow_allele_conflicts bool (default: False)

Whether or not to include the allele when splitting UMIs into allele groups. When True, UMIs are grouped by cellBC-intBC-allele triplets. When False, UMIs are grouped by cellBC-intBC pairs. This option is used when it is possible for each cellBC-intBC pair to have >1 allele state, such as for spatial data.

n_threads int (default: 1)

Number of threads to use.

Return type:

DataFrame

Returns:

A DataFrame of error corrected UMIs.