Alignment compression
Pairwise alignments between large sets of sequences can take up excessive amounts of storage space. However, if the sequences have already been stored elsewhere, the whole alignment does not need to be saved, only the starting and stopping points of the sequences in the alignment compared to the position . Here is an alignment involving two sequences:
SEQ A: RQEEEEQARIAAERKQQEEEEARQAAEKKQQEEE--RQEAE SEQ B: RREREEQLK---NRKIEEKLMREQTSQQLSQSTQAARIEGA
Here is the compressed version that stores the starting and stopping points in the format of [sequence start],[alignment start]:[sequence segment length]; for each segment of sequence appearing in the alignment:
SEQ A: 1,1:34;35,37:5 SEQ B: 1,1:9;10,13:29
[edit] Implementation in Perl
# Takes in a sequence string and a string of sections # made by align_compress and recreates that sequence in the alignment sub align_construct { my ( $sequence, $sections ) = @_; my @sections = split( ";", $sections ); my $alignment = ""; my $last_end = 0; foreach my $segment ( @sections ) { $segment =~ /(\d+),(\d+):(\d+)/; my $seq_start = $1; my $align_start = $2; my $length = $3; if( $last_end ) { $alignment .= "-" x ($align_start - $last_end); } $alignment .= substr( $sequence, $seq_start-1, $length ); $last_end = $align_start + $length; } return $alignment; } # Takes a sequence in alignment form and uses the start and end # as a reference of where to start and end the compression sub align_compress { my ( $alignment, $start, $end ) = @_; my @sections; my $position = 0; my $segment = 0; my $segment_start = 0; my $align_start = 0; for my $location ( 1 .. length( $alignment ) ) { my $aa = substr( $alignment, $location-1, 1 ); $position++ if( $aa ne "-" ); next if( $location < $start || $location > $end ); if( $aa eq "-" ) { if( $segment ) { push( @sections, "$segment_start,$align_start:$segment" ); $segment = 0; $segment_start = 0; $align_start = 0; } } else { $segment++; if( $segment_start == 0 ) { $align_start = $location; $segment_start = $position; } } } if( $segment ) { push( @sections, "$segment_start,$align_start:$segment" ); } return join(";", @sections ); }