Random name generator from JSON file











up vote
0
down vote

favorite












Tired of thinking up random names when following tutorials involving people? So was I.



The code below is the first project I've written in Python. Put simply, it grabs a list of forenames and last-names from an online database, saves them into a dictionary, then writes it to a JSON file. After that, the rest of the code reads the JSON file, chooses a first and last name based on any regex strings passed into the function, then returns the first and last name as a single string, separated by a space.



I've noticed however, this baby is sloooooow. I'm talking 150 - 200ms to generate a name. That's 20 seconds to generate a list of 100 names. I know there's huge room for improvement here, but I'm not well versed enough in the standard libraries yet.



Here it is:



from codecs import decode
from json import dump, loads
from pathlib import Path
from random import randrange
from re import compile, findall, IGNORECASE
from timeit import default_timer as timer
from urllib import request


DB_FORE = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/first%20names/us.txt'
DB_LAST = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/surnames/us.txt'


def write_names(dest, url_forename = DB_FORE, url_surname = DB_LAST):
"""
Writes the names from the database to the given destination.
:param dest: the name of the destination file. Blank if you want the string returned
:param url_forename: url containing plain text first names, entries separated by rn
:param url_surname: url containing plain text last names, entries separated by n
"""
names = {
# Last values tend to be buggy, so get rid of them
'forenames': _get_names(url_forename).split("rn")[:-1],
'surnames': _get_names(url_surname).split("n")[:-1],
}

with open(dest, 'w') as json_file:
dump(names, json_file, indent = 4, separators = (',', ': '))


def random_name(fmatches = '', smatches = ''):
"""
Gets a random name in the form "Forename Lastname".
:param fmatches: a regex string to try to match to a forename
:param smatches: a regex string to try to match to a surname
:return: a random name based on the regex inputs, or None if no names were found
"""
file_name = 'names.json'
names_file = Path(file_name)
if not Path(names_file).is_file():
print(f'{file_name} not found, writing from default database')
write_names(file_name)
print('Done!')

with names_file.open('r') as json_file:
names = loads(json_file.read())
first = _choose_name(fmatches, names['forenames'])
last = _choose_name(smatches, names['surnames'])

if first is None or last is None:
return None
else:
return first + " " + last


def _get_names(url):
with request.urlopen(url) as site:
return str(decode(site.read(), 'utf-8'))


def _choose_name(expr, data):
if expr is None:
return data[randrange(0, len(data))]
else:
return _random_match(compile(expr, flags = IGNORECASE), data)


def _match_names(expr, data):
return [name for name in data if findall(expr, name)]


def _random_match(expr, data):
matches = _match_names(expr, data)
return matches[randrange(0, len(matches))] if len(matches) > 0 else None


if __name__ == '__main__':
start = timer()
print(random_name('rose', 'an'))
end = timer()
print(end - start, 'n')

start = timer()
print(random_name(fmatches = '^jo'))
end = timer()
print(end - start, 'n')

start = timer()
print(random_name(smatches = 'en$'))
end = timer()
print(end - start, 'n')

start = timer()
print(random_name(fmatches = 'grf'))
end = timer()
print(end - start, 'n')

start = timer()
print([random_name() for _ in range(10)])
end = timer()
print(end - start, 'n')


The JSON file (named 'names.json') is in the following format:



{
"forenames": [
"Aaron",
"Abbey",
"Abdul",
...
],
"surnames": [
"Aaberg",
"Aaby",
"Aadland",
...
]
}


For reference, there's about 5,000 forenames, and 90,000 surnames... yeah, I'm thinking that has something to do with how slow it is (go figure). Obviously I could speed up the execution significantly by removing a lot of uncommon names, but I want to know if there is any way I can improve the current implementation with the current list of names.



The name generator supports entering regex strings, but I haven't seen any notable performance difference between generating a name with or without regex.



Here's a sample output from running the module:



Roseanna Sodeman
0.1488200619749599

Joette Knorr
0.21161438351850587

Birdie Tohen
0.15252753028006855

None
0.21214806850333456

['Mohammed Koelzer', 'Luvenia Kovalcin', 'Danica Lehnen', 'Chad Mannan',
'Naomi Kilborne', 'Cami Lydecker', 'Amie Dearson', 'Seema Reiche', 'Ai
Bruhn', 'Laureen Gitthens']
2.1672272217311948


Because I'm a beginner, I would appreciate reviews on both improving the performance of the code, and also my code style (i.e. any standard libraries I could be utilising, any shortcuts such as List Comprehension that I'm missing).









share







New contributor




Raddari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
























    up vote
    0
    down vote

    favorite












    Tired of thinking up random names when following tutorials involving people? So was I.



    The code below is the first project I've written in Python. Put simply, it grabs a list of forenames and last-names from an online database, saves them into a dictionary, then writes it to a JSON file. After that, the rest of the code reads the JSON file, chooses a first and last name based on any regex strings passed into the function, then returns the first and last name as a single string, separated by a space.



    I've noticed however, this baby is sloooooow. I'm talking 150 - 200ms to generate a name. That's 20 seconds to generate a list of 100 names. I know there's huge room for improvement here, but I'm not well versed enough in the standard libraries yet.



    Here it is:



    from codecs import decode
    from json import dump, loads
    from pathlib import Path
    from random import randrange
    from re import compile, findall, IGNORECASE
    from timeit import default_timer as timer
    from urllib import request


    DB_FORE = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/first%20names/us.txt'
    DB_LAST = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/surnames/us.txt'


    def write_names(dest, url_forename = DB_FORE, url_surname = DB_LAST):
    """
    Writes the names from the database to the given destination.
    :param dest: the name of the destination file. Blank if you want the string returned
    :param url_forename: url containing plain text first names, entries separated by rn
    :param url_surname: url containing plain text last names, entries separated by n
    """
    names = {
    # Last values tend to be buggy, so get rid of them
    'forenames': _get_names(url_forename).split("rn")[:-1],
    'surnames': _get_names(url_surname).split("n")[:-1],
    }

    with open(dest, 'w') as json_file:
    dump(names, json_file, indent = 4, separators = (',', ': '))


    def random_name(fmatches = '', smatches = ''):
    """
    Gets a random name in the form "Forename Lastname".
    :param fmatches: a regex string to try to match to a forename
    :param smatches: a regex string to try to match to a surname
    :return: a random name based on the regex inputs, or None if no names were found
    """
    file_name = 'names.json'
    names_file = Path(file_name)
    if not Path(names_file).is_file():
    print(f'{file_name} not found, writing from default database')
    write_names(file_name)
    print('Done!')

    with names_file.open('r') as json_file:
    names = loads(json_file.read())
    first = _choose_name(fmatches, names['forenames'])
    last = _choose_name(smatches, names['surnames'])

    if first is None or last is None:
    return None
    else:
    return first + " " + last


    def _get_names(url):
    with request.urlopen(url) as site:
    return str(decode(site.read(), 'utf-8'))


    def _choose_name(expr, data):
    if expr is None:
    return data[randrange(0, len(data))]
    else:
    return _random_match(compile(expr, flags = IGNORECASE), data)


    def _match_names(expr, data):
    return [name for name in data if findall(expr, name)]


    def _random_match(expr, data):
    matches = _match_names(expr, data)
    return matches[randrange(0, len(matches))] if len(matches) > 0 else None


    if __name__ == '__main__':
    start = timer()
    print(random_name('rose', 'an'))
    end = timer()
    print(end - start, 'n')

    start = timer()
    print(random_name(fmatches = '^jo'))
    end = timer()
    print(end - start, 'n')

    start = timer()
    print(random_name(smatches = 'en$'))
    end = timer()
    print(end - start, 'n')

    start = timer()
    print(random_name(fmatches = 'grf'))
    end = timer()
    print(end - start, 'n')

    start = timer()
    print([random_name() for _ in range(10)])
    end = timer()
    print(end - start, 'n')


    The JSON file (named 'names.json') is in the following format:



    {
    "forenames": [
    "Aaron",
    "Abbey",
    "Abdul",
    ...
    ],
    "surnames": [
    "Aaberg",
    "Aaby",
    "Aadland",
    ...
    ]
    }


    For reference, there's about 5,000 forenames, and 90,000 surnames... yeah, I'm thinking that has something to do with how slow it is (go figure). Obviously I could speed up the execution significantly by removing a lot of uncommon names, but I want to know if there is any way I can improve the current implementation with the current list of names.



    The name generator supports entering regex strings, but I haven't seen any notable performance difference between generating a name with or without regex.



    Here's a sample output from running the module:



    Roseanna Sodeman
    0.1488200619749599

    Joette Knorr
    0.21161438351850587

    Birdie Tohen
    0.15252753028006855

    None
    0.21214806850333456

    ['Mohammed Koelzer', 'Luvenia Kovalcin', 'Danica Lehnen', 'Chad Mannan',
    'Naomi Kilborne', 'Cami Lydecker', 'Amie Dearson', 'Seema Reiche', 'Ai
    Bruhn', 'Laureen Gitthens']
    2.1672272217311948


    Because I'm a beginner, I would appreciate reviews on both improving the performance of the code, and also my code style (i.e. any standard libraries I could be utilising, any shortcuts such as List Comprehension that I'm missing).









    share







    New contributor




    Raddari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      Tired of thinking up random names when following tutorials involving people? So was I.



      The code below is the first project I've written in Python. Put simply, it grabs a list of forenames and last-names from an online database, saves them into a dictionary, then writes it to a JSON file. After that, the rest of the code reads the JSON file, chooses a first and last name based on any regex strings passed into the function, then returns the first and last name as a single string, separated by a space.



      I've noticed however, this baby is sloooooow. I'm talking 150 - 200ms to generate a name. That's 20 seconds to generate a list of 100 names. I know there's huge room for improvement here, but I'm not well versed enough in the standard libraries yet.



      Here it is:



      from codecs import decode
      from json import dump, loads
      from pathlib import Path
      from random import randrange
      from re import compile, findall, IGNORECASE
      from timeit import default_timer as timer
      from urllib import request


      DB_FORE = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/first%20names/us.txt'
      DB_LAST = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/surnames/us.txt'


      def write_names(dest, url_forename = DB_FORE, url_surname = DB_LAST):
      """
      Writes the names from the database to the given destination.
      :param dest: the name of the destination file. Blank if you want the string returned
      :param url_forename: url containing plain text first names, entries separated by rn
      :param url_surname: url containing plain text last names, entries separated by n
      """
      names = {
      # Last values tend to be buggy, so get rid of them
      'forenames': _get_names(url_forename).split("rn")[:-1],
      'surnames': _get_names(url_surname).split("n")[:-1],
      }

      with open(dest, 'w') as json_file:
      dump(names, json_file, indent = 4, separators = (',', ': '))


      def random_name(fmatches = '', smatches = ''):
      """
      Gets a random name in the form "Forename Lastname".
      :param fmatches: a regex string to try to match to a forename
      :param smatches: a regex string to try to match to a surname
      :return: a random name based on the regex inputs, or None if no names were found
      """
      file_name = 'names.json'
      names_file = Path(file_name)
      if not Path(names_file).is_file():
      print(f'{file_name} not found, writing from default database')
      write_names(file_name)
      print('Done!')

      with names_file.open('r') as json_file:
      names = loads(json_file.read())
      first = _choose_name(fmatches, names['forenames'])
      last = _choose_name(smatches, names['surnames'])

      if first is None or last is None:
      return None
      else:
      return first + " " + last


      def _get_names(url):
      with request.urlopen(url) as site:
      return str(decode(site.read(), 'utf-8'))


      def _choose_name(expr, data):
      if expr is None:
      return data[randrange(0, len(data))]
      else:
      return _random_match(compile(expr, flags = IGNORECASE), data)


      def _match_names(expr, data):
      return [name for name in data if findall(expr, name)]


      def _random_match(expr, data):
      matches = _match_names(expr, data)
      return matches[randrange(0, len(matches))] if len(matches) > 0 else None


      if __name__ == '__main__':
      start = timer()
      print(random_name('rose', 'an'))
      end = timer()
      print(end - start, 'n')

      start = timer()
      print(random_name(fmatches = '^jo'))
      end = timer()
      print(end - start, 'n')

      start = timer()
      print(random_name(smatches = 'en$'))
      end = timer()
      print(end - start, 'n')

      start = timer()
      print(random_name(fmatches = 'grf'))
      end = timer()
      print(end - start, 'n')

      start = timer()
      print([random_name() for _ in range(10)])
      end = timer()
      print(end - start, 'n')


      The JSON file (named 'names.json') is in the following format:



      {
      "forenames": [
      "Aaron",
      "Abbey",
      "Abdul",
      ...
      ],
      "surnames": [
      "Aaberg",
      "Aaby",
      "Aadland",
      ...
      ]
      }


      For reference, there's about 5,000 forenames, and 90,000 surnames... yeah, I'm thinking that has something to do with how slow it is (go figure). Obviously I could speed up the execution significantly by removing a lot of uncommon names, but I want to know if there is any way I can improve the current implementation with the current list of names.



      The name generator supports entering regex strings, but I haven't seen any notable performance difference between generating a name with or without regex.



      Here's a sample output from running the module:



      Roseanna Sodeman
      0.1488200619749599

      Joette Knorr
      0.21161438351850587

      Birdie Tohen
      0.15252753028006855

      None
      0.21214806850333456

      ['Mohammed Koelzer', 'Luvenia Kovalcin', 'Danica Lehnen', 'Chad Mannan',
      'Naomi Kilborne', 'Cami Lydecker', 'Amie Dearson', 'Seema Reiche', 'Ai
      Bruhn', 'Laureen Gitthens']
      2.1672272217311948


      Because I'm a beginner, I would appreciate reviews on both improving the performance of the code, and also my code style (i.e. any standard libraries I could be utilising, any shortcuts such as List Comprehension that I'm missing).









      share







      New contributor




      Raddari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      Tired of thinking up random names when following tutorials involving people? So was I.



      The code below is the first project I've written in Python. Put simply, it grabs a list of forenames and last-names from an online database, saves them into a dictionary, then writes it to a JSON file. After that, the rest of the code reads the JSON file, chooses a first and last name based on any regex strings passed into the function, then returns the first and last name as a single string, separated by a space.



      I've noticed however, this baby is sloooooow. I'm talking 150 - 200ms to generate a name. That's 20 seconds to generate a list of 100 names. I know there's huge room for improvement here, but I'm not well versed enough in the standard libraries yet.



      Here it is:



      from codecs import decode
      from json import dump, loads
      from pathlib import Path
      from random import randrange
      from re import compile, findall, IGNORECASE
      from timeit import default_timer as timer
      from urllib import request


      DB_FORE = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/first%20names/us.txt'
      DB_LAST = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/surnames/us.txt'


      def write_names(dest, url_forename = DB_FORE, url_surname = DB_LAST):
      """
      Writes the names from the database to the given destination.
      :param dest: the name of the destination file. Blank if you want the string returned
      :param url_forename: url containing plain text first names, entries separated by rn
      :param url_surname: url containing plain text last names, entries separated by n
      """
      names = {
      # Last values tend to be buggy, so get rid of them
      'forenames': _get_names(url_forename).split("rn")[:-1],
      'surnames': _get_names(url_surname).split("n")[:-1],
      }

      with open(dest, 'w') as json_file:
      dump(names, json_file, indent = 4, separators = (',', ': '))


      def random_name(fmatches = '', smatches = ''):
      """
      Gets a random name in the form "Forename Lastname".
      :param fmatches: a regex string to try to match to a forename
      :param smatches: a regex string to try to match to a surname
      :return: a random name based on the regex inputs, or None if no names were found
      """
      file_name = 'names.json'
      names_file = Path(file_name)
      if not Path(names_file).is_file():
      print(f'{file_name} not found, writing from default database')
      write_names(file_name)
      print('Done!')

      with names_file.open('r') as json_file:
      names = loads(json_file.read())
      first = _choose_name(fmatches, names['forenames'])
      last = _choose_name(smatches, names['surnames'])

      if first is None or last is None:
      return None
      else:
      return first + " " + last


      def _get_names(url):
      with request.urlopen(url) as site:
      return str(decode(site.read(), 'utf-8'))


      def _choose_name(expr, data):
      if expr is None:
      return data[randrange(0, len(data))]
      else:
      return _random_match(compile(expr, flags = IGNORECASE), data)


      def _match_names(expr, data):
      return [name for name in data if findall(expr, name)]


      def _random_match(expr, data):
      matches = _match_names(expr, data)
      return matches[randrange(0, len(matches))] if len(matches) > 0 else None


      if __name__ == '__main__':
      start = timer()
      print(random_name('rose', 'an'))
      end = timer()
      print(end - start, 'n')

      start = timer()
      print(random_name(fmatches = '^jo'))
      end = timer()
      print(end - start, 'n')

      start = timer()
      print(random_name(smatches = 'en$'))
      end = timer()
      print(end - start, 'n')

      start = timer()
      print(random_name(fmatches = 'grf'))
      end = timer()
      print(end - start, 'n')

      start = timer()
      print([random_name() for _ in range(10)])
      end = timer()
      print(end - start, 'n')


      The JSON file (named 'names.json') is in the following format:



      {
      "forenames": [
      "Aaron",
      "Abbey",
      "Abdul",
      ...
      ],
      "surnames": [
      "Aaberg",
      "Aaby",
      "Aadland",
      ...
      ]
      }


      For reference, there's about 5,000 forenames, and 90,000 surnames... yeah, I'm thinking that has something to do with how slow it is (go figure). Obviously I could speed up the execution significantly by removing a lot of uncommon names, but I want to know if there is any way I can improve the current implementation with the current list of names.



      The name generator supports entering regex strings, but I haven't seen any notable performance difference between generating a name with or without regex.



      Here's a sample output from running the module:



      Roseanna Sodeman
      0.1488200619749599

      Joette Knorr
      0.21161438351850587

      Birdie Tohen
      0.15252753028006855

      None
      0.21214806850333456

      ['Mohammed Koelzer', 'Luvenia Kovalcin', 'Danica Lehnen', 'Chad Mannan',
      'Naomi Kilborne', 'Cami Lydecker', 'Amie Dearson', 'Seema Reiche', 'Ai
      Bruhn', 'Laureen Gitthens']
      2.1672272217311948


      Because I'm a beginner, I would appreciate reviews on both improving the performance of the code, and also my code style (i.e. any standard libraries I could be utilising, any shortcuts such as List Comprehension that I'm missing).







      python performance python-3.x





      share







      New contributor




      Raddari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.










      share







      New contributor




      Raddari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      share



      share






      New contributor




      Raddari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 9 mins ago









      Raddari

      1




      1




      New contributor




      Raddari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Raddari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Raddari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.



























          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Raddari is a new contributor. Be nice, and check out our Code of Conduct.










           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208207%2frandom-name-generator-from-json-file%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown






























          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          Raddari is a new contributor. Be nice, and check out our Code of Conduct.










           

          draft saved


          draft discarded


















          Raddari is a new contributor. Be nice, and check out our Code of Conduct.













          Raddari is a new contributor. Be nice, and check out our Code of Conduct.












          Raddari is a new contributor. Be nice, and check out our Code of Conduct.















           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208207%2frandom-name-generator-from-json-file%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          404 Error Contact Form 7 ajax form submitting

          How to know if a Active Directory user can login interactively

          TypeError: fit_transform() missing 1 required positional argument: 'X'