cassiopeia.pp.align_sequences

cassiopeia.pp.align_sequences(queries, ref_filepath=None, ref=None, gap_open_penalty=20, gap_extend_penalty=1, n_threads=1)[source]

Align reads to the TargetSite reference.

Take in several queries stored in a DataFrame mapping cellBC-UMIs to a sequence of interest and align each to a reference sequence. The alignment algorithm used is the Smith-Waterman local alignment algorithm. The desired output consists of the best alignment score and the CIGAR string storing the indel locations in the query sequence.

Parameters
queries : DataFrameDataFrame

DataFrame storing a list of sequences to align.

ref_filepath : str, NoneOptional[str] (default: None)

Filepath to the reference FASTA.

ref : str, NoneOptional[str] (default: None)

Reference sequence.

gapopen

Gap open penalty

gapextend

Gap extension penalty

n_threads : intint (default: 1)

Number of threads to use.

Return type

DataFrameDataFrame

Returns

A DataFrame mapping each sequence name to the CIGAR string, quality, and original query sequence.