efficient substring match












0












$begingroup$


I have a list of unique string's(Approx 25,00,000) of different lengths and I am trying to find if there is any string which occurs as a substring of previous string.



here is my code



  def index_containing_substring(the_list, substring):
for i, s in enumerate(the_list):
if substring in s:
return i
return -1

def string_match():
test_list=['foo bar abc xml','fdff gdnfgf gdkgf','foo bar','abc','xml','xyz']
max_len=4 # I am storing the maximum length of sentence
# the list starts with reverse order
# i.e sentence with highest length are at the top
safe_to_add=
for s in test_list:
if len(s)==max_len:
safe_to_add.append(s)
else:
idx=index_containing_substring(safe_to_add,s)
if idx==-1:
safe_to_add.append(s)
else:
# process the substring
print('match found {} for {}'.format(test_list[idx],s))


This method works fine but I think it is pretty slow. Is there a better way to solve this problem.Using better Datastructure? using Trie or suffix tree?



Output
match found foo bar abc xml for foo bar
match found foo bar abc xml for abc
match found foo bar abc xml for xml








share









$endgroup$

















    0












    $begingroup$


    I have a list of unique string's(Approx 25,00,000) of different lengths and I am trying to find if there is any string which occurs as a substring of previous string.



    here is my code



      def index_containing_substring(the_list, substring):
    for i, s in enumerate(the_list):
    if substring in s:
    return i
    return -1

    def string_match():
    test_list=['foo bar abc xml','fdff gdnfgf gdkgf','foo bar','abc','xml','xyz']
    max_len=4 # I am storing the maximum length of sentence
    # the list starts with reverse order
    # i.e sentence with highest length are at the top
    safe_to_add=
    for s in test_list:
    if len(s)==max_len:
    safe_to_add.append(s)
    else:
    idx=index_containing_substring(safe_to_add,s)
    if idx==-1:
    safe_to_add.append(s)
    else:
    # process the substring
    print('match found {} for {}'.format(test_list[idx],s))


    This method works fine but I think it is pretty slow. Is there a better way to solve this problem.Using better Datastructure? using Trie or suffix tree?



    Output
    match found foo bar abc xml for foo bar
    match found foo bar abc xml for abc
    match found foo bar abc xml for xml








    share









    $endgroup$















      0












      0








      0





      $begingroup$


      I have a list of unique string's(Approx 25,00,000) of different lengths and I am trying to find if there is any string which occurs as a substring of previous string.



      here is my code



        def index_containing_substring(the_list, substring):
      for i, s in enumerate(the_list):
      if substring in s:
      return i
      return -1

      def string_match():
      test_list=['foo bar abc xml','fdff gdnfgf gdkgf','foo bar','abc','xml','xyz']
      max_len=4 # I am storing the maximum length of sentence
      # the list starts with reverse order
      # i.e sentence with highest length are at the top
      safe_to_add=
      for s in test_list:
      if len(s)==max_len:
      safe_to_add.append(s)
      else:
      idx=index_containing_substring(safe_to_add,s)
      if idx==-1:
      safe_to_add.append(s)
      else:
      # process the substring
      print('match found {} for {}'.format(test_list[idx],s))


      This method works fine but I think it is pretty slow. Is there a better way to solve this problem.Using better Datastructure? using Trie or suffix tree?



      Output
      match found foo bar abc xml for foo bar
      match found foo bar abc xml for abc
      match found foo bar abc xml for xml








      share









      $endgroup$




      I have a list of unique string's(Approx 25,00,000) of different lengths and I am trying to find if there is any string which occurs as a substring of previous string.



      here is my code



        def index_containing_substring(the_list, substring):
      for i, s in enumerate(the_list):
      if substring in s:
      return i
      return -1

      def string_match():
      test_list=['foo bar abc xml','fdff gdnfgf gdkgf','foo bar','abc','xml','xyz']
      max_len=4 # I am storing the maximum length of sentence
      # the list starts with reverse order
      # i.e sentence with highest length are at the top
      safe_to_add=
      for s in test_list:
      if len(s)==max_len:
      safe_to_add.append(s)
      else:
      idx=index_containing_substring(safe_to_add,s)
      if idx==-1:
      safe_to_add.append(s)
      else:
      # process the substring
      print('match found {} for {}'.format(test_list[idx],s))


      This method works fine but I think it is pretty slow. Is there a better way to solve this problem.Using better Datastructure? using Trie or suffix tree?



      Output
      match found foo bar abc xml for foo bar
      match found foo bar abc xml for abc
      match found foo bar abc xml for xml






      algorithm python-3.x strings





      share












      share










      share



      share










      asked 7 mins ago









      RohitRohit

      54531020




      54531020






















          0






          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214587%2fefficient-substring-match%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214587%2fefficient-substring-match%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          404 Error Contact Form 7 ajax form submitting

          How to know if a Active Directory user can login interactively

          TypeError: fit_transform() missing 1 required positional argument: 'X'