cassiopeia.pp.resolve_umi_sequence#

cassiopeia.pp.resolve_umi_sequence(molecule_table, output_directory, min_umi_per_cell=10, min_avg_reads_per_umi=2.0, plot=True)[source]#

Resolve a consensus sequence for each UMI.

This procedure will perform UMI and cellBC filtering on the basis of reads per UMI and UMIs per cell and then assign the most abundant sequence to each UMI if there is a set of conflicting sequences per UMI.

Parameters:
molecule_table DataFrame

molecule table to resolve

output_directory str

Directory to store results

min_umi_per_cell int (default: 10)

The threshold specifying the minimum number of UMIs in a cell needed to be retained during filtering

min_avg_reads_per_umi float (default: 2.0)

The threshold specifying the minimum coverage (i.e. average) reads per UMI in a cell needed for that cell to be retained during filtering

Return type:

DataFrame

Returns:

A molecule table with unique mappings between cellBC-UMI pairs.