Sentence prediction(whether it is English, French or German) based on Unigram model and Bigram model [on...

up vote
-2
down vote

favorite

Given a string for example "I hate AI". I need to find out if the sentence is in English, German or French. Unigram Model makes the prediction on the basis of each character frequency in a training text, while Bigram model makes prediction based on what character follow another character.

The following code has 2 methods 1. getBigramResult() 2. getUnigramResult().

Both the method takes an ArrayList<Character> as a parameter and returns a HashMap<Language,Double> with Key as the Language(French, English, Germany) and the probability associated with each language for the given character list as the value. The two methods are almost the same except for

The for loop->

for(int j = 0; j < textCharList.size() - 1; j++)// getBigramResult()



for(int j=0; j<textCharList.size(); j++)// getUnigramResult()

The if condition->

if(textCharList.get(i) !='+' && textCharList.get(i+1) !='+')// getBigramResult()



if(textCharList.get(i)!='+')// getUnigramResult()

The probability calculating function

getConditionalProbability(textCharacter.get(i),textCharacter.get(i+1)) // getBigramResult()



getProbability(textCharacter.get(i))// getUnigramResult()

getBigramResult() works on a class call BigramV2 and getUnigramResult() works on a class call Unigram.

The code of the methods are as follows

public static HashMap<Language, Double> getBigramResult(ArrayList<Character> textCharList) {

    HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();

    for (int j = 0; j < textCharList.size() - 1; j++) {

        if (textCharList.get(j) != '+' && textCharList.get(j + 1) != '+') {

            FileHandler.writeSentences("BIGRAM :"+textCharList.get(j)+""+textCharList.get(j + 1),false);

            for (int k = 0; k < biGramList.size(); k++) {

                BiGramV2 temp = biGramList.get(k);

                double conditionalProbability = Math.log10(temp.getConditionalProbabilty(textCharList.get(j),

                        textCharList.get(j + 1)));

                updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);

                FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j+1)+"|"+textCharList.get(j) +") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);

            }

            FileHandler.writeSentences("",false);

        }

    }

    return totalProbabilities;

}



public static HashMap<Language, Double> getUnigramResult(ArrayList<Character> textCharList) {

    HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();

    for (int j = 0; j < textCharList.size(); j++) {

        if (textCharList.get(j) != '+') {

            FileHandler.writeSentences("UNIGRAM :"+textCharList.get(j),false);

            for (int k = 0; k < uniGramList.size(); k++) {

                Unigram temp = uniGramList.get(k);

                double conditionalProbability = Math.log10(temp.getProbabilty(textCharList.get(j)));

                updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);

                FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j)+") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);



            }

            FileHandler.writeSentences("",false);

        }

    }

    return totalProbabilities;

}

Both the above methods getBigramResult() and getUnigramResult() are very similar, and I feel like it's not design efficient, but I am not able to refactor them because of the different outer for-loop, if block and different probability calculating methods. Any suggestion on my code would be appreciated.

edited 12 mins ago

asked 5 hours ago

dividedbyzero

New contributor

put on hold as unclear what you're asking by Jamal♦ 1 hour ago

Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.

2

Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
4 hours ago

@SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
– dividedbyzero
28 mins ago

Please update the title to express what the code does not your concerns for the code.
– bruglesco
18 mins ago

@bruglesco Do you think its ok now?
– dividedbyzero
11 mins ago

add a comment |

up vote
-2
down vote

favorite

The following code has 2 methods 1. getBigramResult() 2. getUnigramResult().

The for loop->

for(int j = 0; j < textCharList.size() - 1; j++)// getBigramResult()



for(int j=0; j<textCharList.size(); j++)// getUnigramResult()

The if condition->

if(textCharList.get(i) !='+' && textCharList.get(i+1) !='+')// getBigramResult()



if(textCharList.get(i)!='+')// getUnigramResult()

The probability calculating function

getConditionalProbability(textCharacter.get(i),textCharacter.get(i+1)) // getBigramResult()



getProbability(textCharacter.get(i))// getUnigramResult()

getBigramResult() works on a class call BigramV2 and getUnigramResult() works on a class call Unigram.

The code of the methods are as follows

public static HashMap<Language, Double> getBigramResult(ArrayList<Character> textCharList) {

    HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();

    for (int j = 0; j < textCharList.size() - 1; j++) {

        if (textCharList.get(j) != '+' && textCharList.get(j + 1) != '+') {

            FileHandler.writeSentences("BIGRAM :"+textCharList.get(j)+""+textCharList.get(j + 1),false);

            for (int k = 0; k < biGramList.size(); k++) {

                BiGramV2 temp = biGramList.get(k);

                double conditionalProbability = Math.log10(temp.getConditionalProbabilty(textCharList.get(j),

                        textCharList.get(j + 1)));

                updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);

                FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j+1)+"|"+textCharList.get(j) +") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);

            }

            FileHandler.writeSentences("",false);

        }

    }

    return totalProbabilities;

}



public static HashMap<Language, Double> getUnigramResult(ArrayList<Character> textCharList) {

    HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();

    for (int j = 0; j < textCharList.size(); j++) {

        if (textCharList.get(j) != '+') {

            FileHandler.writeSentences("UNIGRAM :"+textCharList.get(j),false);

            for (int k = 0; k < uniGramList.size(); k++) {

                Unigram temp = uniGramList.get(k);

                double conditionalProbability = Math.log10(temp.getProbabilty(textCharList.get(j)));

                updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);

                FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j)+") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);



            }

            FileHandler.writeSentences("",false);

        }

    }

    return totalProbabilities;

}

edited 12 mins ago

asked 5 hours ago

dividedbyzero

New contributor

put on hold as unclear what you're asking by Jamal♦ 1 hour ago

2

Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
4 hours ago

@SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
– dividedbyzero
28 mins ago

Please update the title to express what the code does not your concerns for the code.
– bruglesco
18 mins ago

@bruglesco Do you think its ok now?
– dividedbyzero
11 mins ago

add a comment |

up vote
-2
down vote

favorite

The following code has 2 methods 1. getBigramResult() 2. getUnigramResult().

The for loop->

for(int j = 0; j < textCharList.size() - 1; j++)// getBigramResult()



for(int j=0; j<textCharList.size(); j++)// getUnigramResult()

The if condition->

if(textCharList.get(i) !='+' && textCharList.get(i+1) !='+')// getBigramResult()



if(textCharList.get(i)!='+')// getUnigramResult()

The probability calculating function

getConditionalProbability(textCharacter.get(i),textCharacter.get(i+1)) // getBigramResult()



getProbability(textCharacter.get(i))// getUnigramResult()

getBigramResult() works on a class call BigramV2 and getUnigramResult() works on a class call Unigram.

The code of the methods are as follows

public static HashMap<Language, Double> getBigramResult(ArrayList<Character> textCharList) {

    HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();

    for (int j = 0; j < textCharList.size() - 1; j++) {

        if (textCharList.get(j) != '+' && textCharList.get(j + 1) != '+') {

            FileHandler.writeSentences("BIGRAM :"+textCharList.get(j)+""+textCharList.get(j + 1),false);

            for (int k = 0; k < biGramList.size(); k++) {

                BiGramV2 temp = biGramList.get(k);

                double conditionalProbability = Math.log10(temp.getConditionalProbabilty(textCharList.get(j),

                        textCharList.get(j + 1)));

                updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);

                FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j+1)+"|"+textCharList.get(j) +") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);

            }

            FileHandler.writeSentences("",false);

        }

    }

    return totalProbabilities;

}



public static HashMap<Language, Double> getUnigramResult(ArrayList<Character> textCharList) {

    HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();

    for (int j = 0; j < textCharList.size(); j++) {

        if (textCharList.get(j) != '+') {

            FileHandler.writeSentences("UNIGRAM :"+textCharList.get(j),false);

            for (int k = 0; k < uniGramList.size(); k++) {

                Unigram temp = uniGramList.get(k);

                double conditionalProbability = Math.log10(temp.getProbabilty(textCharList.get(j)));

                updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);

                FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j)+") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);



            }

            FileHandler.writeSentences("",false);

        }

    }

    return totalProbabilities;

}

edited 12 mins ago

asked 5 hours ago

dividedbyzero

New contributor

The following code has 2 methods 1. getBigramResult() 2. getUnigramResult().

The for loop->

for(int j = 0; j < textCharList.size() - 1; j++)// getBigramResult()



for(int j=0; j<textCharList.size(); j++)// getUnigramResult()

The if condition->

if(textCharList.get(i) !='+' && textCharList.get(i+1) !='+')// getBigramResult()



if(textCharList.get(i)!='+')// getUnigramResult()

The probability calculating function

getConditionalProbability(textCharacter.get(i),textCharacter.get(i+1)) // getBigramResult()



getProbability(textCharacter.get(i))// getUnigramResult()

getBigramResult() works on a class call BigramV2 and getUnigramResult() works on a class call Unigram.

The code of the methods are as follows

public static HashMap<Language, Double> getBigramResult(ArrayList<Character> textCharList) {

    HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();

    for (int j = 0; j < textCharList.size() - 1; j++) {

        if (textCharList.get(j) != '+' && textCharList.get(j + 1) != '+') {

            FileHandler.writeSentences("BIGRAM :"+textCharList.get(j)+""+textCharList.get(j + 1),false);

            for (int k = 0; k < biGramList.size(); k++) {

                BiGramV2 temp = biGramList.get(k);

                double conditionalProbability = Math.log10(temp.getConditionalProbabilty(textCharList.get(j),

                        textCharList.get(j + 1)));

                updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);

                FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j+1)+"|"+textCharList.get(j) +") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);

            }

            FileHandler.writeSentences("",false);

        }

    }

    return totalProbabilities;

}



public static HashMap<Language, Double> getUnigramResult(ArrayList<Character> textCharList) {

    HashMap<Language, Double> totalProbabilities = new HashMap<Language, Double>();

    for (int j = 0; j < textCharList.size(); j++) {

        if (textCharList.get(j) != '+') {

            FileHandler.writeSentences("UNIGRAM :"+textCharList.get(j),false);

            for (int k = 0; k < uniGramList.size(); k++) {

                Unigram temp = uniGramList.get(k);

                double conditionalProbability = Math.log10(temp.getProbabilty(textCharList.get(j)));

                updateTotalProbabilities(totalProbabilities,temp.getLanguage(),conditionalProbability);

                FileHandler.writeSentences(temp.getLanguage().toString()+ ": p("+textCharList.get(j)+") ="+conditionalProbability+"==> log prob of sentence so far: " +totalProbabilities.get(temp.getLanguage()),false);



            }

            FileHandler.writeSentences("",false);

        }

    }

    return totalProbabilities;

}

java design-patterns

edited 12 mins ago

asked 5 hours ago

dividedbyzero

New contributor

edited 12 mins ago

asked 5 hours ago

dividedbyzero

New contributor

edited 12 mins ago

asked 5 hours ago

dividedbyzero

New contributor

asked 5 hours ago

dividedbyzero

asked 5 hours ago

dividedbyzero

New contributor

dividedbyzero is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

put on hold as unclear what you're asking by Jamal♦ 1 hour ago

2

Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
4 hours ago

@SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
– dividedbyzero
28 mins ago

Please update the title to express what the code does not your concerns for the code.
– bruglesco
18 mins ago

@bruglesco Do you think its ok now?
– dividedbyzero
11 mins ago

add a comment |

2

Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
4 hours ago

@SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
– dividedbyzero
28 mins ago

Please update the title to express what the code does not your concerns for the code.
– bruglesco
18 mins ago

@bruglesco Do you think its ok now?
– dividedbyzero
11 mins ago

Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
4 hours ago

@SᴀᴍOnᴇᴌᴀ Do you think the edits I made are OK?
– dividedbyzero
28 mins ago

Please update the title to express what the code does not your concerns for the code.
– bruglesco
18 mins ago

@bruglesco Do you think its ok now?
– dividedbyzero
11 mins ago

add a comment |

active

oldest

votes

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk