Needleman-Wunsch Grid Generation in Python
The Needleman-Wunsch algorithm is a way to align sequences in a way that optimizes "similarity". Usually, a grid is generated and then you follow a path down the grid (based off the largest value) to compute the optimal alignment between two sequences. I have created a Python program, that given two strings, will create the resulting matrix for the Needleman-Wunsch algorithm. Currently, the program when ran will generate two random sequences of DNA and then print out the resulting Needleman-Wunsch matrix.
needlemanwunsch.py
import random
from tabulate import tabulate
class NeedlemanWunsch:
"""
Class used for generating Needleman-Wunsch matrices.
"""
def _compute_block(self, result, i, j):
"""
Given a block (corresponding to a 2 x 2 matrix), calculate the value o-
f the bottom right corner.
(Based on the equation:
M_{i,j} = max(M_{i-1,j-1} + S_{i,j},
M_{i,j-1} + W,
M_{i-1,j} + W)
Found here: https://vlab.amrita.edu/?sub=3&brch=274&sim=1431&cnt=1)
Args:
result : The current matrix that is being computed.
i : The right most part of the block being computed.
j : The bottom most part of the block being computed.
Returns:
The value for the right bottom corner of a particular block.
"""
return max(result[i-1][j-1] +
self._calc_weight(self._second_seq[i-1],
self._first_seq[j-1]),
result[i-1][j] + self.gap,
result[i][j-1] + self.gap)
def _calc_weight(self, first_char, second_char):
"""
Helper function, given two characters determines (based on the sc-
oring scheme) what the score for the particular characters can be.
Args:
first_char : A character to compare.
second_char : A character to compare.
Returns:
Either self.match or self.mismatch.
"""
if first_char == second_char:
return self.match
else:
return self.mismatch
def generate(self, first_seq, second_seq):
"""
Generates a matrix corresponding to the scores to the Needleman-Wu-
nsch algorithm.
Args:
first_seq : One of the sequences to be compared for similarity.
second_seq : One of the sequences to be compared for
similarity.
Returns:
A 2D list corresponding to the resulting matrix of the Needlem-
an-Wunsch algorithm.
"""
# Internally requies that the first sequence is longer.
if len(second_seq) > len(first_seq):
first_seq, second_seq = second_seq, first_seq
self._first_seq = first_seq
self._second_seq = second_seq
# Adjust sequence with "intial space"
# Initialize the resulting matrix with the initial row.
result = [list(range(0, -len(first_seq) - 1, -1))]
# Create initial columns.
for i in range(-1, -len(second_seq) - 1, -1):
row = [i]
row.extend([0]*len(first_seq))
result.append(row)
# Sweep through and compute each new cell row-wise.
for i in range(1, len(result)):
for j in range(1, len(result[0])):
result[i][j] = self._compute_block(result, i, j)
# Format for prettier printing.
for index, letter in enumerate(second_seq):
result[index + 1].insert(0, letter)
result[0].insert(0, ' ')
result.insert(0, list(" " + first_seq))
return result
def __init__(self, match=1, mismatch=-1, gap=-1):
"""
Initialize the Needleman-Wunsch class so that it provides weights for
match (default 1), mismatch (default -1), and gap (default -1).
"""
self.match = match
self.mismatch = mismatch
self.gap = gap
self._first_seq = ""
self._second_seq = ""
def deletion(seq, pos):
"""
Deletes a random base pair from a sequence at a specified position.
Args:
seq : Sequence to perform deletion on.
pos : Location of deletion.
Returns:
seq with character removed at pos.
"""
return seq[:pos] + seq[pos:]
def base_change(seq, pos):
"""
Changes a random base pair to another base pair at a specified position.
Args:
seq : Sequence to perform base change on.
pos : Locaion of base change.
Returns:
seq with character changed at pos.
"""
new_base = random.choice("ACTG".replace(seq[pos], ""))
return seq[:pos] + new_base + seq[pos:]
def mutate(seq, rounds=3):
"""
Mutates a piece of DNA by randomly applying a deletion or base change
Args:
seq : The sequence to be mutated.
rounds : Defaults to 3, the number of mutations to be made.
Returns:
A mutated sequence.
"""
mutations = (deletion, base_change)
for _ in range(rounds):
pos = random.randrange(len(seq))
seq = random.choice(mutations)(seq, pos)
return seq
def main():
"""
Creates a random couple of strings and creates the corresponding Needleman
-Wunsch matrix associated with them.
"""
needleman_wunsch = NeedlemanWunsch()
first_seq = ''.join(random.choices("ACTG", k=5))
second_seq = mutate(first_seq)
data = needleman_wunsch.generate(first_seq, second_seq)
print(tabulate(data, headers="firstrow"))
if __name__ == '__main__':
main()
I ended up using a NeedlemanWunsch
class, because using only function resulted in a lot DRY for the parameters match
, mismatch
, and gap
.
I am not particularly fond of the code. I haven't used numpy
or any related libraries because I couldn't see any way that it would significantly shorten the code, however, I would be willing to use numpy
if there is a significantly shorter way of expressing the matrix generation. However, the code seems frail, and very prone to off by one errors.
python algorithm python-3.x bioinformatics
add a comment |
The Needleman-Wunsch algorithm is a way to align sequences in a way that optimizes "similarity". Usually, a grid is generated and then you follow a path down the grid (based off the largest value) to compute the optimal alignment between two sequences. I have created a Python program, that given two strings, will create the resulting matrix for the Needleman-Wunsch algorithm. Currently, the program when ran will generate two random sequences of DNA and then print out the resulting Needleman-Wunsch matrix.
needlemanwunsch.py
import random
from tabulate import tabulate
class NeedlemanWunsch:
"""
Class used for generating Needleman-Wunsch matrices.
"""
def _compute_block(self, result, i, j):
"""
Given a block (corresponding to a 2 x 2 matrix), calculate the value o-
f the bottom right corner.
(Based on the equation:
M_{i,j} = max(M_{i-1,j-1} + S_{i,j},
M_{i,j-1} + W,
M_{i-1,j} + W)
Found here: https://vlab.amrita.edu/?sub=3&brch=274&sim=1431&cnt=1)
Args:
result : The current matrix that is being computed.
i : The right most part of the block being computed.
j : The bottom most part of the block being computed.
Returns:
The value for the right bottom corner of a particular block.
"""
return max(result[i-1][j-1] +
self._calc_weight(self._second_seq[i-1],
self._first_seq[j-1]),
result[i-1][j] + self.gap,
result[i][j-1] + self.gap)
def _calc_weight(self, first_char, second_char):
"""
Helper function, given two characters determines (based on the sc-
oring scheme) what the score for the particular characters can be.
Args:
first_char : A character to compare.
second_char : A character to compare.
Returns:
Either self.match or self.mismatch.
"""
if first_char == second_char:
return self.match
else:
return self.mismatch
def generate(self, first_seq, second_seq):
"""
Generates a matrix corresponding to the scores to the Needleman-Wu-
nsch algorithm.
Args:
first_seq : One of the sequences to be compared for similarity.
second_seq : One of the sequences to be compared for
similarity.
Returns:
A 2D list corresponding to the resulting matrix of the Needlem-
an-Wunsch algorithm.
"""
# Internally requies that the first sequence is longer.
if len(second_seq) > len(first_seq):
first_seq, second_seq = second_seq, first_seq
self._first_seq = first_seq
self._second_seq = second_seq
# Adjust sequence with "intial space"
# Initialize the resulting matrix with the initial row.
result = [list(range(0, -len(first_seq) - 1, -1))]
# Create initial columns.
for i in range(-1, -len(second_seq) - 1, -1):
row = [i]
row.extend([0]*len(first_seq))
result.append(row)
# Sweep through and compute each new cell row-wise.
for i in range(1, len(result)):
for j in range(1, len(result[0])):
result[i][j] = self._compute_block(result, i, j)
# Format for prettier printing.
for index, letter in enumerate(second_seq):
result[index + 1].insert(0, letter)
result[0].insert(0, ' ')
result.insert(0, list(" " + first_seq))
return result
def __init__(self, match=1, mismatch=-1, gap=-1):
"""
Initialize the Needleman-Wunsch class so that it provides weights for
match (default 1), mismatch (default -1), and gap (default -1).
"""
self.match = match
self.mismatch = mismatch
self.gap = gap
self._first_seq = ""
self._second_seq = ""
def deletion(seq, pos):
"""
Deletes a random base pair from a sequence at a specified position.
Args:
seq : Sequence to perform deletion on.
pos : Location of deletion.
Returns:
seq with character removed at pos.
"""
return seq[:pos] + seq[pos:]
def base_change(seq, pos):
"""
Changes a random base pair to another base pair at a specified position.
Args:
seq : Sequence to perform base change on.
pos : Locaion of base change.
Returns:
seq with character changed at pos.
"""
new_base = random.choice("ACTG".replace(seq[pos], ""))
return seq[:pos] + new_base + seq[pos:]
def mutate(seq, rounds=3):
"""
Mutates a piece of DNA by randomly applying a deletion or base change
Args:
seq : The sequence to be mutated.
rounds : Defaults to 3, the number of mutations to be made.
Returns:
A mutated sequence.
"""
mutations = (deletion, base_change)
for _ in range(rounds):
pos = random.randrange(len(seq))
seq = random.choice(mutations)(seq, pos)
return seq
def main():
"""
Creates a random couple of strings and creates the corresponding Needleman
-Wunsch matrix associated with them.
"""
needleman_wunsch = NeedlemanWunsch()
first_seq = ''.join(random.choices("ACTG", k=5))
second_seq = mutate(first_seq)
data = needleman_wunsch.generate(first_seq, second_seq)
print(tabulate(data, headers="firstrow"))
if __name__ == '__main__':
main()
I ended up using a NeedlemanWunsch
class, because using only function resulted in a lot DRY for the parameters match
, mismatch
, and gap
.
I am not particularly fond of the code. I haven't used numpy
or any related libraries because I couldn't see any way that it would significantly shorten the code, however, I would be willing to use numpy
if there is a significantly shorter way of expressing the matrix generation. However, the code seems frail, and very prone to off by one errors.
python algorithm python-3.x bioinformatics
add a comment |
The Needleman-Wunsch algorithm is a way to align sequences in a way that optimizes "similarity". Usually, a grid is generated and then you follow a path down the grid (based off the largest value) to compute the optimal alignment between two sequences. I have created a Python program, that given two strings, will create the resulting matrix for the Needleman-Wunsch algorithm. Currently, the program when ran will generate two random sequences of DNA and then print out the resulting Needleman-Wunsch matrix.
needlemanwunsch.py
import random
from tabulate import tabulate
class NeedlemanWunsch:
"""
Class used for generating Needleman-Wunsch matrices.
"""
def _compute_block(self, result, i, j):
"""
Given a block (corresponding to a 2 x 2 matrix), calculate the value o-
f the bottom right corner.
(Based on the equation:
M_{i,j} = max(M_{i-1,j-1} + S_{i,j},
M_{i,j-1} + W,
M_{i-1,j} + W)
Found here: https://vlab.amrita.edu/?sub=3&brch=274&sim=1431&cnt=1)
Args:
result : The current matrix that is being computed.
i : The right most part of the block being computed.
j : The bottom most part of the block being computed.
Returns:
The value for the right bottom corner of a particular block.
"""
return max(result[i-1][j-1] +
self._calc_weight(self._second_seq[i-1],
self._first_seq[j-1]),
result[i-1][j] + self.gap,
result[i][j-1] + self.gap)
def _calc_weight(self, first_char, second_char):
"""
Helper function, given two characters determines (based on the sc-
oring scheme) what the score for the particular characters can be.
Args:
first_char : A character to compare.
second_char : A character to compare.
Returns:
Either self.match or self.mismatch.
"""
if first_char == second_char:
return self.match
else:
return self.mismatch
def generate(self, first_seq, second_seq):
"""
Generates a matrix corresponding to the scores to the Needleman-Wu-
nsch algorithm.
Args:
first_seq : One of the sequences to be compared for similarity.
second_seq : One of the sequences to be compared for
similarity.
Returns:
A 2D list corresponding to the resulting matrix of the Needlem-
an-Wunsch algorithm.
"""
# Internally requies that the first sequence is longer.
if len(second_seq) > len(first_seq):
first_seq, second_seq = second_seq, first_seq
self._first_seq = first_seq
self._second_seq = second_seq
# Adjust sequence with "intial space"
# Initialize the resulting matrix with the initial row.
result = [list(range(0, -len(first_seq) - 1, -1))]
# Create initial columns.
for i in range(-1, -len(second_seq) - 1, -1):
row = [i]
row.extend([0]*len(first_seq))
result.append(row)
# Sweep through and compute each new cell row-wise.
for i in range(1, len(result)):
for j in range(1, len(result[0])):
result[i][j] = self._compute_block(result, i, j)
# Format for prettier printing.
for index, letter in enumerate(second_seq):
result[index + 1].insert(0, letter)
result[0].insert(0, ' ')
result.insert(0, list(" " + first_seq))
return result
def __init__(self, match=1, mismatch=-1, gap=-1):
"""
Initialize the Needleman-Wunsch class so that it provides weights for
match (default 1), mismatch (default -1), and gap (default -1).
"""
self.match = match
self.mismatch = mismatch
self.gap = gap
self._first_seq = ""
self._second_seq = ""
def deletion(seq, pos):
"""
Deletes a random base pair from a sequence at a specified position.
Args:
seq : Sequence to perform deletion on.
pos : Location of deletion.
Returns:
seq with character removed at pos.
"""
return seq[:pos] + seq[pos:]
def base_change(seq, pos):
"""
Changes a random base pair to another base pair at a specified position.
Args:
seq : Sequence to perform base change on.
pos : Locaion of base change.
Returns:
seq with character changed at pos.
"""
new_base = random.choice("ACTG".replace(seq[pos], ""))
return seq[:pos] + new_base + seq[pos:]
def mutate(seq, rounds=3):
"""
Mutates a piece of DNA by randomly applying a deletion or base change
Args:
seq : The sequence to be mutated.
rounds : Defaults to 3, the number of mutations to be made.
Returns:
A mutated sequence.
"""
mutations = (deletion, base_change)
for _ in range(rounds):
pos = random.randrange(len(seq))
seq = random.choice(mutations)(seq, pos)
return seq
def main():
"""
Creates a random couple of strings and creates the corresponding Needleman
-Wunsch matrix associated with them.
"""
needleman_wunsch = NeedlemanWunsch()
first_seq = ''.join(random.choices("ACTG", k=5))
second_seq = mutate(first_seq)
data = needleman_wunsch.generate(first_seq, second_seq)
print(tabulate(data, headers="firstrow"))
if __name__ == '__main__':
main()
I ended up using a NeedlemanWunsch
class, because using only function resulted in a lot DRY for the parameters match
, mismatch
, and gap
.
I am not particularly fond of the code. I haven't used numpy
or any related libraries because I couldn't see any way that it would significantly shorten the code, however, I would be willing to use numpy
if there is a significantly shorter way of expressing the matrix generation. However, the code seems frail, and very prone to off by one errors.
python algorithm python-3.x bioinformatics
The Needleman-Wunsch algorithm is a way to align sequences in a way that optimizes "similarity". Usually, a grid is generated and then you follow a path down the grid (based off the largest value) to compute the optimal alignment between two sequences. I have created a Python program, that given two strings, will create the resulting matrix for the Needleman-Wunsch algorithm. Currently, the program when ran will generate two random sequences of DNA and then print out the resulting Needleman-Wunsch matrix.
needlemanwunsch.py
import random
from tabulate import tabulate
class NeedlemanWunsch:
"""
Class used for generating Needleman-Wunsch matrices.
"""
def _compute_block(self, result, i, j):
"""
Given a block (corresponding to a 2 x 2 matrix), calculate the value o-
f the bottom right corner.
(Based on the equation:
M_{i,j} = max(M_{i-1,j-1} + S_{i,j},
M_{i,j-1} + W,
M_{i-1,j} + W)
Found here: https://vlab.amrita.edu/?sub=3&brch=274&sim=1431&cnt=1)
Args:
result : The current matrix that is being computed.
i : The right most part of the block being computed.
j : The bottom most part of the block being computed.
Returns:
The value for the right bottom corner of a particular block.
"""
return max(result[i-1][j-1] +
self._calc_weight(self._second_seq[i-1],
self._first_seq[j-1]),
result[i-1][j] + self.gap,
result[i][j-1] + self.gap)
def _calc_weight(self, first_char, second_char):
"""
Helper function, given two characters determines (based on the sc-
oring scheme) what the score for the particular characters can be.
Args:
first_char : A character to compare.
second_char : A character to compare.
Returns:
Either self.match or self.mismatch.
"""
if first_char == second_char:
return self.match
else:
return self.mismatch
def generate(self, first_seq, second_seq):
"""
Generates a matrix corresponding to the scores to the Needleman-Wu-
nsch algorithm.
Args:
first_seq : One of the sequences to be compared for similarity.
second_seq : One of the sequences to be compared for
similarity.
Returns:
A 2D list corresponding to the resulting matrix of the Needlem-
an-Wunsch algorithm.
"""
# Internally requies that the first sequence is longer.
if len(second_seq) > len(first_seq):
first_seq, second_seq = second_seq, first_seq
self._first_seq = first_seq
self._second_seq = second_seq
# Adjust sequence with "intial space"
# Initialize the resulting matrix with the initial row.
result = [list(range(0, -len(first_seq) - 1, -1))]
# Create initial columns.
for i in range(-1, -len(second_seq) - 1, -1):
row = [i]
row.extend([0]*len(first_seq))
result.append(row)
# Sweep through and compute each new cell row-wise.
for i in range(1, len(result)):
for j in range(1, len(result[0])):
result[i][j] = self._compute_block(result, i, j)
# Format for prettier printing.
for index, letter in enumerate(second_seq):
result[index + 1].insert(0, letter)
result[0].insert(0, ' ')
result.insert(0, list(" " + first_seq))
return result
def __init__(self, match=1, mismatch=-1, gap=-1):
"""
Initialize the Needleman-Wunsch class so that it provides weights for
match (default 1), mismatch (default -1), and gap (default -1).
"""
self.match = match
self.mismatch = mismatch
self.gap = gap
self._first_seq = ""
self._second_seq = ""
def deletion(seq, pos):
"""
Deletes a random base pair from a sequence at a specified position.
Args:
seq : Sequence to perform deletion on.
pos : Location of deletion.
Returns:
seq with character removed at pos.
"""
return seq[:pos] + seq[pos:]
def base_change(seq, pos):
"""
Changes a random base pair to another base pair at a specified position.
Args:
seq : Sequence to perform base change on.
pos : Locaion of base change.
Returns:
seq with character changed at pos.
"""
new_base = random.choice("ACTG".replace(seq[pos], ""))
return seq[:pos] + new_base + seq[pos:]
def mutate(seq, rounds=3):
"""
Mutates a piece of DNA by randomly applying a deletion or base change
Args:
seq : The sequence to be mutated.
rounds : Defaults to 3, the number of mutations to be made.
Returns:
A mutated sequence.
"""
mutations = (deletion, base_change)
for _ in range(rounds):
pos = random.randrange(len(seq))
seq = random.choice(mutations)(seq, pos)
return seq
def main():
"""
Creates a random couple of strings and creates the corresponding Needleman
-Wunsch matrix associated with them.
"""
needleman_wunsch = NeedlemanWunsch()
first_seq = ''.join(random.choices("ACTG", k=5))
second_seq = mutate(first_seq)
data = needleman_wunsch.generate(first_seq, second_seq)
print(tabulate(data, headers="firstrow"))
if __name__ == '__main__':
main()
I ended up using a NeedlemanWunsch
class, because using only function resulted in a lot DRY for the parameters match
, mismatch
, and gap
.
I am not particularly fond of the code. I haven't used numpy
or any related libraries because I couldn't see any way that it would significantly shorten the code, however, I would be willing to use numpy
if there is a significantly shorter way of expressing the matrix generation. However, the code seems frail, and very prone to off by one errors.
python algorithm python-3.x bioinformatics
python algorithm python-3.x bioinformatics
asked 10 mins ago
Dair
4,417729
4,417729
add a comment |
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210099%2fneedleman-wunsch-grid-generation-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210099%2fneedleman-wunsch-grid-generation-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown