
cassiopeia.pp.call_alleles(alignments, ref_filepath=None, ref=None, barcode_interval=(20, 34), cutsite_locations=[112, 166, 220], cutsite_width=12, context=True, context_size=5)[source]#

Call indels from CIGAR strings.

Given many alignments, we extract the indels by comparing the CIGAR strings of each alignment to the reference sequence.

alignments DataFrame

Alignments provided in DataFrame

ref_filepath Optional[str] (default: None)

Filepath to the reference sequence

ref Optional[str] (default: None)

Nucleotide sequence of the reference

barcode_interval Tuple[int, int] (default: (20, 34))

Interval in reference corresponding to the integration barcode

cutsite_locations List[int] (default: [112, 166, 220])

A list of all cutsite positions in the reference

cutsite_width int (default: 12)

Number of nucleotides left and right of cutsite location that indels can appear in.

context bool (default: True)

Include sequence context around indels

context_size int (default: 5)

Number of bases to the right and left to include as context

Return type:



A DataFrame mapping each sequence alignment to the called indels.