Preprocess¶

Data Preprocessing¶

We have several functions that are part of our pipeline for processing sequencing data from single-cell lineage tracing technologies:

`pp.align_sequences`(queries[, ref_filepath, …])	Align reads to the TargetSite reference.
`pp.call_alleles`(alignments[, ref_filepath, …])	Call indels from CIGAR strings.
`pp.call_lineage_groups`(input_df, …[, …])	Assigns cells to their clonal populations.
`pp.collapse_umis`(bam_fp, output_directory[, …])	Collapses close UMIs together from a bam file.
`pp.convert_fastqs_to_unmapped_bam`(fastq_fps, …)	Converts FASTQs into an unmapped BAM based on a chemistry.
`pp.error_correct_cellbcs_to_whitelist`(…[, …])	Error-correct cell barcodes in the input BAM.
`pp.error_correct_intbcs_to_whitelist`(…[, …])	Corrects all intBCs to the provided whitelist.
`pp.error_correct_umis`(input_df[, …])	Within cellBC-intBC pairs, collapses UMIs that have close sequences.
`pp.filter_bam`(bam_fp, output_directory[, …])	Filter reads in a BAM that have low quality barcode or UMIs.
`pp.filter_molecule_table`(input_df, …[, …])	Filters and corrects a molecule table of cellBC-UMI pairs.
`pp.filter_cells`(molecule_table[, …])	Filter out cell barcodes that have too few UMIs or too few reads/UMI
`pp.filter_umis`(moleculetable[, readCountThresh])	Filters out UMIs with too few reads.
`pp.resolve_umi_sequence`(molecule_table, …)	Resolve a consensus sequence for each UMI.

We also have several functions that are useful for converting between data formats for downstream analyses:

`pp.compute_empirical_indel_priors`(allele_table)	Computes indel prior probabilities.
`pp.convert_alleletable_to_character_matrix`(…)	Converts an AlleleTable into a character matrix.
`pp.convert_alleletable_to_lineage_profile`(…)	Converts an AlleleTable to a lineage profile.
`pp.convert_lineage_profile_to_character_matrix`(…)	Converts a lineage profile to a character matrix.

API cassiopeia.pp.align_sequences