The alignment module contains concepts, algorithms and classes that are related to the computation of pairwise and multiple sequence alignments. More...
Modules | |
Aligned Sequence | |
Provides seqan3::aligned_sequence, as well as various ranges that model it. | |
Configuration | |
Provides configuration elements for the pairwise alignment configuration. | |
Decorator | |
The decorator submodule contains special SeqAn decorators. | |
Matrix | |
Provides data structures for representing alignment coordinates and alignments as a matrix. | |
Pairwise | |
Provides the algorithmic components for the computation of pairwise alignments. | |
Scoring | |
Provides the data structures used for scoring alphabets and sequences. | |
Functions | |
template<typename reference_type , typename sequence_type > | |
auto | seqan3::alignment_from_cigar (std::vector< cigar > const &cigar_vector, reference_type const &reference, uint32_t const reference_start_position, sequence_type const &sequence) |
Construct an alignment from a cigar string and the corresponding sequnces. More... | |
template<typename alignment_type > | |
auto | seqan3::cigar_from_alignment (alignment_type const &alignment, uint32_t const soft_clipping_at_the_beginning, uint32_t const soft_clipping_at_the_end, bool const extended_cigar=false) |
Creates a cigar string (SAM format) given a seqan3::detail::pairwise_alignment represented by two seqan3::aligned_sequence's. More... | |
The alignment module contains concepts, algorithms and classes that are related to the computation of pairwise and multiple sequence alignments.
There are several types of alignments. We support pairwise alignments so far, but we also plan to support multiple alignments in the future.
SeqAn offers a generic multi-purpose alignment library comprising all widely known alignment algorithms as well as many special algorithms. These algorithms are all accessible through an easy to use alignment interface which is described in Pairwise.
|
inline |
Construct an alignment from a cigar string and the corresponding sequnces.
reference_type | The type of the reference sequence for a SAM record. |
sequence_type | The type of the read sequence for a SAM record. |
[in] | cigar_vector | The cigar information to convert to an alignment. |
[in] | reference | The reference sequence for a SAM record, e.g. chr1. |
[in] | reference_start_position | The start position of the alignment in the reference sequence. |
[in] | sequence | The read sequence for a SAM record. |
std::tuple
of size 2 holding 2 seqan3::gap_decorators
. At position 0 is the aligned reference sequence and at position 1 the aligned read sequence.The CIGAR string is a compact representation of an aligned read against a reference and was introduced by the SAM format. The SAM format stores the result of mapping short/long read sequences from a sequencing experiment (e.g. Illumina/Nanopore) against a reference (e.g. hg38).
You can reconstruct a full alignment from a CIGAR string, if you have the respective sequences at hand:
in seqan3, an alignment is represented by a std::tuple
of size 2 that hold to seqan3::aligned_sequence
s.
The data structure that we use most often to model seqan3::aligned_sequence
is the seqan3::gap_decorator
. It is a lightweight data structure that only holds a view on the sequence (no copy is made) and on top can hold seqan3::gap
s.`
In the above example the read sequence ACGT
is aligned to the reference with one gap, indicated by 1D
in the CIGAR string: AC-GA
wehre -
represents a gap.
The full alignment consist of two aligned sequences (read and reference). In the above example, the alignment
is represented by a tuple of the aligned reference at the 1. position and the aligned read at the 2. (ACTGA,AC-GA)
.
A more realistic example is that you get the information directly from a SAM file:
|
inline |
Creates a cigar string (SAM format) given a seqan3::detail::pairwise_alignment represented by two seqan3::aligned_sequence's.
alignment_type | Must model seqan3::detail::pairwise_alignment. |
alignment | The alignment, represented by a pair of aligned sequences, to be transformed into cigar vector based on the second (query) sequence. |
soft_clipping_at_the_beginning | Whether part of the beginning of the second (read/query) sequence is not part of the alignment. |
soft_clipping_at_the_end | Whether part of the end of the second (read/query) sequence is not part of the alignment. |
extended_cigar | Whether to print the extended cigar alphabet or not. See cigar operation. |
alignment
pair.Given the following alignment reference sequence on top and the query sequence at the bottom:
In this case, the function seqan3::detail::get_cigar_vector will return the following cigar vector:
‘[('M’,4),('I',2),('M',5),('D',2),('M',1)]`.
The extended cigar string would look like this:
‘[(’=',3)('X',1)('I',2)('=',3)('X',1)('=',1)('D',2)('=',1)]`.