cassiopeia.pp.align_sequences¶
- cassiopeia.pp.align_sequences(queries, ref_filepath=None, ref=None, gap_open_penalty=20, gap_extend_penalty=1, n_threads=1)[source]¶
Align reads to the TargetSite reference.
Take in several queries stored in a DataFrame mapping cellBC-UMIs to a sequence of interest and align each to a reference sequence. The alignment algorithm used is the Smith-Waterman local alignment algorithm. The desired output consists of the best alignment score and the CIGAR string storing the indel locations in the query sequence.
- Parameters
- queries :
DataFrameDataFrame DataFrame storing a list of sequences to align.
- ref_filepath :
str,NoneOptional[str] (default:None) Filepath to the reference FASTA.
- ref :
str,NoneOptional[str] (default:None) Reference sequence.
- gapopen
Gap open penalty
- gapextend
Gap extension penalty
- n_threads :
intint(default:1) Number of threads to use.
- queries :
- Return type
- Returns
A DataFrame mapping each sequence name to the CIGAR string, quality, and original query sequence.