cassiopeia.pp.call_alleles#

cassiopeia.pp.call_alleles(alignments, ref_filepath=None, ref=None, barcode_interval=(20, 34), cutsite_locations=[112, 166, 220], cutsite_width=12, context=True, context_size=5)[source]#

Call indels from CIGAR strings.

Given many alignments, we extract the indels by comparing the CIGAR strings of each alignment to the reference sequence.

Parameters:
alignments DataFrame

Alignments provided in DataFrame

ref_filepath Optional[str] (default: None)

Filepath to the reference sequence

ref Optional[str] (default: None)

Nucleotide sequence of the reference

barcode_interval Tuple[int, int] (default: (20, 34))

Interval in reference corresponding to the integration barcode

cutsite_locations List[int] (default: [112, 166, 220])

A list of all cutsite positions in the reference

cutsite_width int (default: 12)

Number of nucleotides left and right of cutsite location that indels can appear in.

context bool (default: True)

Include sequence context around indels

context_size int (default: 5)

Number of bases to the right and left to include as context

Return type:

DataFrame

Returns:

A DataFrame mapping each sequence alignment to the called indels.