cassiopeia.pp.call_alleles

cassiopeia.pp.call_alleles(alignments, ref_filepath=None, ref=None, barcode_interval=(20, 34), cutsite_locations=[112, 166, 220], cutsite_width=12, context=True, context_size=5)[source]

Call indels from CIGAR strings.

Given many alignments, we extract the indels by comparing the CIGAR strings of each alignment to the reference sequence.

Parameters
alignments : DataFrameDataFrame

Alignments provided in DataFrame

ref_filepath : str, NoneOptional[str] (default: None)

Filepath to the reference sequence

ref : str, NoneOptional[str] (default: None)

Nucleotide sequence of the reference

barcode_interval : Tuple[int, int]Tuple[int, int] (default: (20, 34))

Interval in reference corresponding to the integration barcode

cutsite_locations : List[int]List[int] (default: [112, 166, 220])

A list of all cutsite positions in the reference

cutsite_width : intint (default: 12)

Number of nucleotides left and right of cutsite location that indels can appear in.

context : boolbool (default: True)

Include sequence context around indels

context_size : intint (default: 5)

Number of bases to the right and left to include as context

Return type

DataFrameDataFrame

Returns

A DataFrame mapping each sequence alignment to the called indels.