Evaluating association strengths of items












0












$begingroup$


Say I have a list of animals with their counts:



import numpy as np
import pandas as pd
from random import randint

table = np.zeros((5,1), dtype=int)
for i in range(5):
table[i]=randint(10, 20)

df1 = pd.DataFrame(columns=['Animal', 'Count'])
df1['Animal'] = animal_list
df1['Count'] = table
df1


enter image description here



And I have a matrix of how many times they appear together:



table = np.zeros((5,5), dtype=int)
animal_list = ['Monkey', 'Tiger', 'Cat', 'Dog', 'Lion']

for i in range(5):
for j in range(5):
table[i][j]=randint(0, 9)

df2 = pd.DataFrame(table, columns=animal_list, index=animal_list)
df2


enter image description here



I want to find the animals' association strength, which is defined like so - if Lion and Cat appear together 5 times, and Lion's count is 10 and Cat's count is 15, then Lion -> Cat association strength is 5/10=0.5, and Cat -> Lion association strength is 5/15=0.33.



I do it like so:



assoc_df = pd.DataFrame(columns=['Animal 1', 'Animal 2', 'Association Strength'])
for row_word in df2:
for col_word in df2:
if row_word!=col_word:
assoc_df = assoc_df.append({'Animal 1': row_word, 'Animal 2': col_word,
'Association Strength': df2[col_word][row_word]/df1[df1.Animal==row_word]['Count'].values[0]}, ignore_index=True)

assoc_df


enter image description here



The problem is, since there are 2 for loops, the complexity is O(n^2). This is a problem when (in my real dataset) I have ~1000 animals to loop on, which takes hours to finish computing the association strength table.



So, how to do I better optimize the creation/generation process of this association table?









share









$endgroup$

















    0












    $begingroup$


    Say I have a list of animals with their counts:



    import numpy as np
    import pandas as pd
    from random import randint

    table = np.zeros((5,1), dtype=int)
    for i in range(5):
    table[i]=randint(10, 20)

    df1 = pd.DataFrame(columns=['Animal', 'Count'])
    df1['Animal'] = animal_list
    df1['Count'] = table
    df1


    enter image description here



    And I have a matrix of how many times they appear together:



    table = np.zeros((5,5), dtype=int)
    animal_list = ['Monkey', 'Tiger', 'Cat', 'Dog', 'Lion']

    for i in range(5):
    for j in range(5):
    table[i][j]=randint(0, 9)

    df2 = pd.DataFrame(table, columns=animal_list, index=animal_list)
    df2


    enter image description here



    I want to find the animals' association strength, which is defined like so - if Lion and Cat appear together 5 times, and Lion's count is 10 and Cat's count is 15, then Lion -> Cat association strength is 5/10=0.5, and Cat -> Lion association strength is 5/15=0.33.



    I do it like so:



    assoc_df = pd.DataFrame(columns=['Animal 1', 'Animal 2', 'Association Strength'])
    for row_word in df2:
    for col_word in df2:
    if row_word!=col_word:
    assoc_df = assoc_df.append({'Animal 1': row_word, 'Animal 2': col_word,
    'Association Strength': df2[col_word][row_word]/df1[df1.Animal==row_word]['Count'].values[0]}, ignore_index=True)

    assoc_df


    enter image description here



    The problem is, since there are 2 for loops, the complexity is O(n^2). This is a problem when (in my real dataset) I have ~1000 animals to loop on, which takes hours to finish computing the association strength table.



    So, how to do I better optimize the creation/generation process of this association table?









    share









    $endgroup$















      0












      0








      0





      $begingroup$


      Say I have a list of animals with their counts:



      import numpy as np
      import pandas as pd
      from random import randint

      table = np.zeros((5,1), dtype=int)
      for i in range(5):
      table[i]=randint(10, 20)

      df1 = pd.DataFrame(columns=['Animal', 'Count'])
      df1['Animal'] = animal_list
      df1['Count'] = table
      df1


      enter image description here



      And I have a matrix of how many times they appear together:



      table = np.zeros((5,5), dtype=int)
      animal_list = ['Monkey', 'Tiger', 'Cat', 'Dog', 'Lion']

      for i in range(5):
      for j in range(5):
      table[i][j]=randint(0, 9)

      df2 = pd.DataFrame(table, columns=animal_list, index=animal_list)
      df2


      enter image description here



      I want to find the animals' association strength, which is defined like so - if Lion and Cat appear together 5 times, and Lion's count is 10 and Cat's count is 15, then Lion -> Cat association strength is 5/10=0.5, and Cat -> Lion association strength is 5/15=0.33.



      I do it like so:



      assoc_df = pd.DataFrame(columns=['Animal 1', 'Animal 2', 'Association Strength'])
      for row_word in df2:
      for col_word in df2:
      if row_word!=col_word:
      assoc_df = assoc_df.append({'Animal 1': row_word, 'Animal 2': col_word,
      'Association Strength': df2[col_word][row_word]/df1[df1.Animal==row_word]['Count'].values[0]}, ignore_index=True)

      assoc_df


      enter image description here



      The problem is, since there are 2 for loops, the complexity is O(n^2). This is a problem when (in my real dataset) I have ~1000 animals to loop on, which takes hours to finish computing the association strength table.



      So, how to do I better optimize the creation/generation process of this association table?









      share









      $endgroup$




      Say I have a list of animals with their counts:



      import numpy as np
      import pandas as pd
      from random import randint

      table = np.zeros((5,1), dtype=int)
      for i in range(5):
      table[i]=randint(10, 20)

      df1 = pd.DataFrame(columns=['Animal', 'Count'])
      df1['Animal'] = animal_list
      df1['Count'] = table
      df1


      enter image description here



      And I have a matrix of how many times they appear together:



      table = np.zeros((5,5), dtype=int)
      animal_list = ['Monkey', 'Tiger', 'Cat', 'Dog', 'Lion']

      for i in range(5):
      for j in range(5):
      table[i][j]=randint(0, 9)

      df2 = pd.DataFrame(table, columns=animal_list, index=animal_list)
      df2


      enter image description here



      I want to find the animals' association strength, which is defined like so - if Lion and Cat appear together 5 times, and Lion's count is 10 and Cat's count is 15, then Lion -> Cat association strength is 5/10=0.5, and Cat -> Lion association strength is 5/15=0.33.



      I do it like so:



      assoc_df = pd.DataFrame(columns=['Animal 1', 'Animal 2', 'Association Strength'])
      for row_word in df2:
      for col_word in df2:
      if row_word!=col_word:
      assoc_df = assoc_df.append({'Animal 1': row_word, 'Animal 2': col_word,
      'Association Strength': df2[col_word][row_word]/df1[df1.Animal==row_word]['Count'].values[0]}, ignore_index=True)

      assoc_df


      enter image description here



      The problem is, since there are 2 for loops, the complexity is O(n^2). This is a problem when (in my real dataset) I have ~1000 animals to loop on, which takes hours to finish computing the association strength table.



      So, how to do I better optimize the creation/generation process of this association table?







      python performance python-3.x community-challenge data-mining





      share












      share










      share



      share










      asked 5 mins ago









      Kristada673Kristada673

      1463




      1463






















          0






          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214381%2fevaluating-association-strengths-of-items%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214381%2fevaluating-association-strengths-of-items%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          404 Error Contact Form 7 ajax form submitting

          How to know if a Active Directory user can login interactively

          TypeError: fit_transform() missing 1 required positional argument: 'X'