Sentence prediction(whether it is English, French or German) based on Unigram model and Bigram model [on...
up vote
-2
down vote
favorite
Given a string for example "I hate AI". I need to find out if the sentence is in English, German or French. Unigram Model makes the prediction on the basis of each character frequency in a training text, while Bigram model makes prediction based on what character follow another character.
The following code has 2 methods 1. getBigramResult() 2. getUnigramResult().
Both the method takes an ArrayList<Character>
as a parameter and returns a HashMap<Language,Double>
with Key as the Language(French, English, Germany) and the probability associated with each language for the given character list as the value. The two methods are almost the same except for
The for loop->
for(int j = 0; j < textCharList.size() - 1; j++)// getBigramResult()
for(int j=0; j<textCharList.size(); j++)// getUnigramResult()
The if condition->
if(textCharList.get(i) !='+' && textCharList.get(i+1) !='+')// getBigramResult()
if(textCharList.get(i)!='+')// getUnigramResult()
The probability calculating function
getConditionalProbability(textCharacter.get(i),textCharacter.get(i+1)) // getBigramResult()
getProbability(textCharacter.get(i))// getUnigramResult()
getBigramResult() works on a class call
BigramV2
and getUnigramResult() works on a class callUnigram
.
The code of the methods are as follows
public static HashMap<Language, Double> getBigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size() - 1; j++) {
if (textCharList.get(j) != '+' && textCharList.get(j + 1) != '+') {
FileHandler.writeSentences("BIGRAM :"+textCharList.get(j)+""+textCharList.get(j + 1),false);
for (int k = 0; k < biGramList.size(); k++) {
BiGramV2 temp = biGramList.get(k);
double conditionalProbability = Math.log10(temp.getConditionalProbabilty(textCharList.get(j),
textCharList.get(j + 1)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j+1)+"|"+textCharList.get(j) +") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);
}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}
public static HashMap<Language, Double> getUnigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size(); j++) {
if (textCharList.get(j) != '+') {
FileHandler.writeSentences("UNIGRAM :"+textCharList.get(j),false);
for (int k = 0; k < uniGramList.size(); k++) {
Unigram temp = uniGramList.get(k);
double conditionalProbability = Math.log10(temp.getProbabilty(textCharList.get(j)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j)+") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);
}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}
Both the above methods getBigramResult()
and getUnigramResult()
are very similar, and I feel like it's not design efficient, but I am not able to refactor them because of the different outer for
-loop, if
block and different probability calculating methods. Any suggestion on my code would be appreciated.
java design-patterns
New contributor
put on hold as unclear what you're asking by Jamal♦ 1 hour ago
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
up vote
-2
down vote
favorite
Given a string for example "I hate AI". I need to find out if the sentence is in English, German or French. Unigram Model makes the prediction on the basis of each character frequency in a training text, while Bigram model makes prediction based on what character follow another character.
The following code has 2 methods 1. getBigramResult() 2. getUnigramResult().
Both the method takes an ArrayList<Character>
as a parameter and returns a HashMap<Language,Double>
with Key as the Language(French, English, Germany) and the probability associated with each language for the given character list as the value. The two methods are almost the same except for
The for loop->
for(int j = 0; j < textCharList.size() - 1; j++)// getBigramResult()
for(int j=0; j<textCharList.size(); j++)// getUnigramResult()
The if condition->
if(textCharList.get(i) !='+' && textCharList.get(i+1) !='+')// getBigramResult()
if(textCharList.get(i)!='+')// getUnigramResult()
The probability calculating function
getConditionalProbability(textCharacter.get(i),textCharacter.get(i+1)) // getBigramResult()
getProbability(textCharacter.get(i))// getUnigramResult()
getBigramResult() works on a class call
BigramV2
and getUnigramResult() works on a class callUnigram
.
The code of the methods are as follows
public static HashMap<Language, Double> getBigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size() - 1; j++) {
if (textCharList.get(j) != '+' && textCharList.get(j + 1) != '+') {
FileHandler.writeSentences("BIGRAM :"+textCharList.get(j)+""+textCharList.get(j + 1),false);
for (int k = 0; k < biGramList.size(); k++) {
BiGramV2 temp = biGramList.get(k);
double conditionalProbability = Math.log10(temp.getConditionalProbabilty(textCharList.get(j),
textCharList.get(j + 1)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j+1)+"|"+textCharList.get(j) +") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);
}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}
public static HashMap<Language, Double> getUnigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size(); j++) {
if (textCharList.get(j) != '+') {
FileHandler.writeSentences("UNIGRAM :"+textCharList.get(j),false);
for (int k = 0; k < uniGramList.size(); k++) {
Unigram temp = uniGramList.get(k);
double conditionalProbability = Math.log10(temp.getProbabilty(textCharList.get(j)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j)+") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);
}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}
Both the above methods getBigramResult()
and getUnigramResult()
are very similar, and I feel like it's not design efficient, but I am not able to refactor them because of the different outer for
-loop, if
block and different probability calculating methods. Any suggestion on my code would be appreciated.
java design-patterns
New contributor
put on hold as unclear what you're asking by Jamal♦ 1 hour ago
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
2
Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
4 hours ago
@SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
– dividedbyzero
28 mins ago
Please update the title to express what the code does not your concerns for the code.
– bruglesco
18 mins ago
@bruglesco Do you think its ok now?
– dividedbyzero
11 mins ago
add a comment |
up vote
-2
down vote
favorite
up vote
-2
down vote
favorite
Given a string for example "I hate AI". I need to find out if the sentence is in English, German or French. Unigram Model makes the prediction on the basis of each character frequency in a training text, while Bigram model makes prediction based on what character follow another character.
The following code has 2 methods 1. getBigramResult() 2. getUnigramResult().
Both the method takes an ArrayList<Character>
as a parameter and returns a HashMap<Language,Double>
with Key as the Language(French, English, Germany) and the probability associated with each language for the given character list as the value. The two methods are almost the same except for
The for loop->
for(int j = 0; j < textCharList.size() - 1; j++)// getBigramResult()
for(int j=0; j<textCharList.size(); j++)// getUnigramResult()
The if condition->
if(textCharList.get(i) !='+' && textCharList.get(i+1) !='+')// getBigramResult()
if(textCharList.get(i)!='+')// getUnigramResult()
The probability calculating function
getConditionalProbability(textCharacter.get(i),textCharacter.get(i+1)) // getBigramResult()
getProbability(textCharacter.get(i))// getUnigramResult()
getBigramResult() works on a class call
BigramV2
and getUnigramResult() works on a class callUnigram
.
The code of the methods are as follows
public static HashMap<Language, Double> getBigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size() - 1; j++) {
if (textCharList.get(j) != '+' && textCharList.get(j + 1) != '+') {
FileHandler.writeSentences("BIGRAM :"+textCharList.get(j)+""+textCharList.get(j + 1),false);
for (int k = 0; k < biGramList.size(); k++) {
BiGramV2 temp = biGramList.get(k);
double conditionalProbability = Math.log10(temp.getConditionalProbabilty(textCharList.get(j),
textCharList.get(j + 1)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j+1)+"|"+textCharList.get(j) +") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);
}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}
public static HashMap<Language, Double> getUnigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size(); j++) {
if (textCharList.get(j) != '+') {
FileHandler.writeSentences("UNIGRAM :"+textCharList.get(j),false);
for (int k = 0; k < uniGramList.size(); k++) {
Unigram temp = uniGramList.get(k);
double conditionalProbability = Math.log10(temp.getProbabilty(textCharList.get(j)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j)+") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);
}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}
Both the above methods getBigramResult()
and getUnigramResult()
are very similar, and I feel like it's not design efficient, but I am not able to refactor them because of the different outer for
-loop, if
block and different probability calculating methods. Any suggestion on my code would be appreciated.
java design-patterns
New contributor
Given a string for example "I hate AI". I need to find out if the sentence is in English, German or French. Unigram Model makes the prediction on the basis of each character frequency in a training text, while Bigram model makes prediction based on what character follow another character.
The following code has 2 methods 1. getBigramResult() 2. getUnigramResult().
Both the method takes an ArrayList<Character>
as a parameter and returns a HashMap<Language,Double>
with Key as the Language(French, English, Germany) and the probability associated with each language for the given character list as the value. The two methods are almost the same except for
The for loop->
for(int j = 0; j < textCharList.size() - 1; j++)// getBigramResult()
for(int j=0; j<textCharList.size(); j++)// getUnigramResult()
The if condition->
if(textCharList.get(i) !='+' && textCharList.get(i+1) !='+')// getBigramResult()
if(textCharList.get(i)!='+')// getUnigramResult()
The probability calculating function
getConditionalProbability(textCharacter.get(i),textCharacter.get(i+1)) // getBigramResult()
getProbability(textCharacter.get(i))// getUnigramResult()
getBigramResult() works on a class call
BigramV2
and getUnigramResult() works on a class callUnigram
.
The code of the methods are as follows
public static HashMap<Language, Double> getBigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size() - 1; j++) {
if (textCharList.get(j) != '+' && textCharList.get(j + 1) != '+') {
FileHandler.writeSentences("BIGRAM :"+textCharList.get(j)+""+textCharList.get(j + 1),false);
for (int k = 0; k < biGramList.size(); k++) {
BiGramV2 temp = biGramList.get(k);
double conditionalProbability = Math.log10(temp.getConditionalProbabilty(textCharList.get(j),
textCharList.get(j + 1)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j+1)+"|"+textCharList.get(j) +") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);
}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}
public static HashMap<Language, Double> getUnigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size(); j++) {
if (textCharList.get(j) != '+') {
FileHandler.writeSentences("UNIGRAM :"+textCharList.get(j),false);
for (int k = 0; k < uniGramList.size(); k++) {
Unigram temp = uniGramList.get(k);
double conditionalProbability = Math.log10(temp.getProbabilty(textCharList.get(j)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j)+") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);
}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}
Both the above methods getBigramResult()
and getUnigramResult()
are very similar, and I feel like it's not design efficient, but I am not able to refactor them because of the different outer for
-loop, if
block and different probability calculating methods. Any suggestion on my code would be appreciated.
java design-patterns
java design-patterns
New contributor
New contributor
edited 12 mins ago
New contributor
asked 5 hours ago
dividedbyzero
11
11
New contributor
New contributor
put on hold as unclear what you're asking by Jamal♦ 1 hour ago
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
put on hold as unclear what you're asking by Jamal♦ 1 hour ago
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
2
Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
4 hours ago
@SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
– dividedbyzero
28 mins ago
Please update the title to express what the code does not your concerns for the code.
– bruglesco
18 mins ago
@bruglesco Do you think its ok now?
– dividedbyzero
11 mins ago
add a comment |
2
Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
4 hours ago
@SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
– dividedbyzero
28 mins ago
Please update the title to express what the code does not your concerns for the code.
– bruglesco
18 mins ago
@bruglesco Do you think its ok now?
– dividedbyzero
11 mins ago
2
2
Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
4 hours ago
Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
4 hours ago
@SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
– dividedbyzero
28 mins ago
@SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
– dividedbyzero
28 mins ago
Please update the title to express what the code does not your concerns for the code.
– bruglesco
18 mins ago
Please update the title to express what the code does not your concerns for the code.
– bruglesco
18 mins ago
@bruglesco Do you think its ok now?
– dividedbyzero
11 mins ago
@bruglesco Do you think its ok now?
– dividedbyzero
11 mins ago
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
2
Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
4 hours ago
@SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
– dividedbyzero
28 mins ago
Please update the title to express what the code does not your concerns for the code.
– bruglesco
18 mins ago
@bruglesco Do you think its ok now?
– dividedbyzero
11 mins ago