Sentence prediction(whether it is English, French or German) based on Unigram model and Bigram model [on...











up vote
-2
down vote

favorite












Given a string for example "I hate AI". I need to find out if the sentence is in English, German or French. Unigram Model makes the prediction on the basis of each character frequency in a training text, while Bigram model makes prediction based on what character follow another character.



The following code has 2 methods 1. getBigramResult() 2. getUnigramResult().



Both the method takes an ArrayList<Character> as a parameter and returns a HashMap<Language,Double> with Key as the Language(French, English, Germany) and the probability associated with each language for the given character list as the value. The two methods are almost the same except for





  1. The for loop->



    for(int j = 0; j < textCharList.size() - 1; j++)// getBigramResult()

    for(int j=0; j<textCharList.size(); j++)// getUnigramResult()



  2. The if condition->



    if(textCharList.get(i) !='+' && textCharList.get(i+1) !='+')// getBigramResult()

    if(textCharList.get(i)!='+')// getUnigramResult()



  3. The probability calculating function



    getConditionalProbability(textCharacter.get(i),textCharacter.get(i+1)) // getBigramResult()

    getProbability(textCharacter.get(i))// getUnigramResult()


  4. getBigramResult() works on a class call BigramV2 and getUnigramResult() works on a class call Unigram.



The code of the methods are as follows



public static HashMap<Language, Double> getBigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size() - 1; j++) {
if (textCharList.get(j) != '+' && textCharList.get(j + 1) != '+') {
FileHandler.writeSentences("BIGRAM :"+textCharList.get(j)+""+textCharList.get(j + 1),false);
for (int k = 0; k < biGramList.size(); k++) {
BiGramV2 temp = biGramList.get(k);
double conditionalProbability = Math.log10(temp.getConditionalProbabilty(textCharList.get(j),
textCharList.get(j + 1)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j+1)+"|"+textCharList.get(j) +") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);
}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}

public static HashMap<Language, Double> getUnigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size(); j++) {
if (textCharList.get(j) != '+') {
FileHandler.writeSentences("UNIGRAM :"+textCharList.get(j),false);
for (int k = 0; k < uniGramList.size(); k++) {
Unigram temp = uniGramList.get(k);
double conditionalProbability = Math.log10(temp.getProbabilty(textCharList.get(j)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j)+") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);

}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}


Both the above methods getBigramResult() and getUnigramResult() are very similar, and I feel like it's not design efficient, but I am not able to refactor them because of the different outer for-loop, if block and different probability calculating methods. Any suggestion on my code would be appreciated.










share|improve this question









New contributor




dividedbyzero is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











put on hold as unclear what you're asking by Jamal 1 hour ago


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.











  • 2




    Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
    – Sᴀᴍ Onᴇᴌᴀ
    4 hours ago










  • @SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
    – dividedbyzero
    28 mins ago










  • Please update the title to express what the code does not your concerns for the code.
    – bruglesco
    18 mins ago










  • @bruglesco Do you think its ok now?
    – dividedbyzero
    11 mins ago















up vote
-2
down vote

favorite












Given a string for example "I hate AI". I need to find out if the sentence is in English, German or French. Unigram Model makes the prediction on the basis of each character frequency in a training text, while Bigram model makes prediction based on what character follow another character.



The following code has 2 methods 1. getBigramResult() 2. getUnigramResult().



Both the method takes an ArrayList<Character> as a parameter and returns a HashMap<Language,Double> with Key as the Language(French, English, Germany) and the probability associated with each language for the given character list as the value. The two methods are almost the same except for





  1. The for loop->



    for(int j = 0; j < textCharList.size() - 1; j++)// getBigramResult()

    for(int j=0; j<textCharList.size(); j++)// getUnigramResult()



  2. The if condition->



    if(textCharList.get(i) !='+' && textCharList.get(i+1) !='+')// getBigramResult()

    if(textCharList.get(i)!='+')// getUnigramResult()



  3. The probability calculating function



    getConditionalProbability(textCharacter.get(i),textCharacter.get(i+1)) // getBigramResult()

    getProbability(textCharacter.get(i))// getUnigramResult()


  4. getBigramResult() works on a class call BigramV2 and getUnigramResult() works on a class call Unigram.



The code of the methods are as follows



public static HashMap<Language, Double> getBigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size() - 1; j++) {
if (textCharList.get(j) != '+' && textCharList.get(j + 1) != '+') {
FileHandler.writeSentences("BIGRAM :"+textCharList.get(j)+""+textCharList.get(j + 1),false);
for (int k = 0; k < biGramList.size(); k++) {
BiGramV2 temp = biGramList.get(k);
double conditionalProbability = Math.log10(temp.getConditionalProbabilty(textCharList.get(j),
textCharList.get(j + 1)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j+1)+"|"+textCharList.get(j) +") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);
}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}

public static HashMap<Language, Double> getUnigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size(); j++) {
if (textCharList.get(j) != '+') {
FileHandler.writeSentences("UNIGRAM :"+textCharList.get(j),false);
for (int k = 0; k < uniGramList.size(); k++) {
Unigram temp = uniGramList.get(k);
double conditionalProbability = Math.log10(temp.getProbabilty(textCharList.get(j)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j)+") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);

}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}


Both the above methods getBigramResult() and getUnigramResult() are very similar, and I feel like it's not design efficient, but I am not able to refactor them because of the different outer for-loop, if block and different probability calculating methods. Any suggestion on my code would be appreciated.










share|improve this question









New contributor




dividedbyzero is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











put on hold as unclear what you're asking by Jamal 1 hour ago


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.











  • 2




    Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
    – Sᴀᴍ Onᴇᴌᴀ
    4 hours ago










  • @SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
    – dividedbyzero
    28 mins ago










  • Please update the title to express what the code does not your concerns for the code.
    – bruglesco
    18 mins ago










  • @bruglesco Do you think its ok now?
    – dividedbyzero
    11 mins ago













up vote
-2
down vote

favorite









up vote
-2
down vote

favorite











Given a string for example "I hate AI". I need to find out if the sentence is in English, German or French. Unigram Model makes the prediction on the basis of each character frequency in a training text, while Bigram model makes prediction based on what character follow another character.



The following code has 2 methods 1. getBigramResult() 2. getUnigramResult().



Both the method takes an ArrayList<Character> as a parameter and returns a HashMap<Language,Double> with Key as the Language(French, English, Germany) and the probability associated with each language for the given character list as the value. The two methods are almost the same except for





  1. The for loop->



    for(int j = 0; j < textCharList.size() - 1; j++)// getBigramResult()

    for(int j=0; j<textCharList.size(); j++)// getUnigramResult()



  2. The if condition->



    if(textCharList.get(i) !='+' && textCharList.get(i+1) !='+')// getBigramResult()

    if(textCharList.get(i)!='+')// getUnigramResult()



  3. The probability calculating function



    getConditionalProbability(textCharacter.get(i),textCharacter.get(i+1)) // getBigramResult()

    getProbability(textCharacter.get(i))// getUnigramResult()


  4. getBigramResult() works on a class call BigramV2 and getUnigramResult() works on a class call Unigram.



The code of the methods are as follows



public static HashMap<Language, Double> getBigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size() - 1; j++) {
if (textCharList.get(j) != '+' && textCharList.get(j + 1) != '+') {
FileHandler.writeSentences("BIGRAM :"+textCharList.get(j)+""+textCharList.get(j + 1),false);
for (int k = 0; k < biGramList.size(); k++) {
BiGramV2 temp = biGramList.get(k);
double conditionalProbability = Math.log10(temp.getConditionalProbabilty(textCharList.get(j),
textCharList.get(j + 1)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j+1)+"|"+textCharList.get(j) +") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);
}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}

public static HashMap<Language, Double> getUnigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size(); j++) {
if (textCharList.get(j) != '+') {
FileHandler.writeSentences("UNIGRAM :"+textCharList.get(j),false);
for (int k = 0; k < uniGramList.size(); k++) {
Unigram temp = uniGramList.get(k);
double conditionalProbability = Math.log10(temp.getProbabilty(textCharList.get(j)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j)+") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);

}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}


Both the above methods getBigramResult() and getUnigramResult() are very similar, and I feel like it's not design efficient, but I am not able to refactor them because of the different outer for-loop, if block and different probability calculating methods. Any suggestion on my code would be appreciated.










share|improve this question









New contributor




dividedbyzero is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











Given a string for example "I hate AI". I need to find out if the sentence is in English, German or French. Unigram Model makes the prediction on the basis of each character frequency in a training text, while Bigram model makes prediction based on what character follow another character.



The following code has 2 methods 1. getBigramResult() 2. getUnigramResult().



Both the method takes an ArrayList<Character> as a parameter and returns a HashMap<Language,Double> with Key as the Language(French, English, Germany) and the probability associated with each language for the given character list as the value. The two methods are almost the same except for





  1. The for loop->



    for(int j = 0; j < textCharList.size() - 1; j++)// getBigramResult()

    for(int j=0; j<textCharList.size(); j++)// getUnigramResult()



  2. The if condition->



    if(textCharList.get(i) !='+' && textCharList.get(i+1) !='+')// getBigramResult()

    if(textCharList.get(i)!='+')// getUnigramResult()



  3. The probability calculating function



    getConditionalProbability(textCharacter.get(i),textCharacter.get(i+1)) // getBigramResult()

    getProbability(textCharacter.get(i))// getUnigramResult()


  4. getBigramResult() works on a class call BigramV2 and getUnigramResult() works on a class call Unigram.



The code of the methods are as follows



public static HashMap<Language, Double> getBigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size() - 1; j++) {
if (textCharList.get(j) != '+' && textCharList.get(j + 1) != '+') {
FileHandler.writeSentences("BIGRAM :"+textCharList.get(j)+""+textCharList.get(j + 1),false);
for (int k = 0; k < biGramList.size(); k++) {
BiGramV2 temp = biGramList.get(k);
double conditionalProbability = Math.log10(temp.getConditionalProbabilty(textCharList.get(j),
textCharList.get(j + 1)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j+1)+"|"+textCharList.get(j) +") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);
}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}

public static HashMap<Language, Double> getUnigramResult(ArrayList<Character> textCharList) {
HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();
for (int j = 0; j < textCharList.size(); j++) {
if (textCharList.get(j) != '+') {
FileHandler.writeSentences("UNIGRAM :"+textCharList.get(j),false);
for (int k = 0; k < uniGramList.size(); k++) {
Unigram temp = uniGramList.get(k);
double conditionalProbability = Math.log10(temp.getProbabilty(textCharList.get(j)));
updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);
FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j)+") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);

}
FileHandler.writeSentences("",false);
}
}
return totalProbabilities;
}


Both the above methods getBigramResult() and getUnigramResult() are very similar, and I feel like it's not design efficient, but I am not able to refactor them because of the different outer for-loop, if block and different probability calculating methods. Any suggestion on my code would be appreciated.







java design-patterns






share|improve this question









New contributor




dividedbyzero is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




dividedbyzero is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 12 mins ago





















New contributor




dividedbyzero is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 5 hours ago









dividedbyzero

11




11




New contributor




dividedbyzero is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





dividedbyzero is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






dividedbyzero is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




put on hold as unclear what you're asking by Jamal 1 hour ago


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






put on hold as unclear what you're asking by Jamal 1 hour ago


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.










  • 2




    Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
    – Sᴀᴍ Onᴇᴌᴀ
    4 hours ago










  • @SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
    – dividedbyzero
    28 mins ago










  • Please update the title to express what the code does not your concerns for the code.
    – bruglesco
    18 mins ago










  • @bruglesco Do you think its ok now?
    – dividedbyzero
    11 mins ago














  • 2




    Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
    – Sᴀᴍ Onᴇᴌᴀ
    4 hours ago










  • @SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
    – dividedbyzero
    28 mins ago










  • Please update the title to express what the code does not your concerns for the code.
    – bruglesco
    18 mins ago










  • @bruglesco Do you think its ok now?
    – dividedbyzero
    11 mins ago








2




2




Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
4 hours ago




Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
4 hours ago












@SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
– dividedbyzero
28 mins ago




@SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
– dividedbyzero
28 mins ago












Please update the title to express what the code does not your concerns for the code.
– bruglesco
18 mins ago




Please update the title to express what the code does not your concerns for the code.
– bruglesco
18 mins ago












@bruglesco Do you think its ok now?
– dividedbyzero
11 mins ago




@bruglesco Do you think its ok now?
– dividedbyzero
11 mins ago















active

oldest

votes






















active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes

Popular posts from this blog

404 Error Contact Form 7 ajax form submitting

How to know if a Active Directory user can login interactively

TypeError: fit_transform() missing 1 required positional argument: 'X'