cassiopeia.pp.call_alleles#
- cassiopeia.pp.call_alleles(alignments, ref_filepath=None, ref=None, barcode_interval=(20, 34), cutsite_locations=[112, 166, 220], cutsite_width=12, context=True, context_size=5)[source]#
Call indels from CIGAR strings.
Given many alignments, we extract the indels by comparing the CIGAR strings of each alignment to the reference sequence.
- Parameters:
- alignments
DataFrame
Alignments provided in DataFrame
- ref_filepath
Optional
[str
] (default:None
) Filepath to the reference sequence
- ref
Optional
[str
] (default:None
) Nucleotide sequence of the reference
- barcode_interval
Tuple
[int
,int
] (default:(20, 34)
) Interval in reference corresponding to the integration barcode
- cutsite_locations
List
[int
] (default:[112, 166, 220]
) A list of all cutsite positions in the reference
- cutsite_width
int
(default:12
) Number of nucleotides left and right of cutsite location that indels can appear in.
- context
bool
(default:True
) Include sequence context around indels
- context_size
int
(default:5
) Number of bases to the right and left to include as context
- alignments
- Return type:
- Returns:
A DataFrame mapping each sequence alignment to the called indels.