cassiopeia.pp.call_alleles¶
- cassiopeia.pp.call_alleles(alignments, ref_filepath=None, ref=None, barcode_interval=(20, 34), cutsite_locations=[112, 166, 220], cutsite_width=12, context=True, context_size=5)[source]¶
Call indels from CIGAR strings.
Given many alignments, we extract the indels by comparing the CIGAR strings of each alignment to the reference sequence.
- Parameters
- alignments :
DataFrameDataFrame Alignments provided in DataFrame
- ref_filepath :
str,NoneOptional[str] (default:None) Filepath to the reference sequence
- ref :
str,NoneOptional[str] (default:None) Nucleotide sequence of the reference
- barcode_interval :
Tuple[int,int]Tuple[int,int] (default:(20, 34)) Interval in reference corresponding to the integration barcode
- cutsite_locations :
List[int]List[int] (default:[112, 166, 220]) A list of all cutsite positions in the reference
- cutsite_width :
intint(default:12) Number of nucleotides left and right of cutsite location that indels can appear in.
- context :
boolbool(default:True) Include sequence context around indels
- context_size :
intint(default:5) Number of bases to the right and left to include as context
- alignments :
- Return type
- Returns
A DataFrame mapping each sequence alignment to the called indels.