Random name generator from JSON file
up vote
0
down vote
favorite
Tired of thinking up random names when following tutorials involving people? So was I.
The code below is the first project I've written in Python. Put simply, it grabs a list of forenames and last-names from an online database, saves them into a dictionary, then writes it to a JSON file. After that, the rest of the code reads the JSON file, chooses a first and last name based on any regex strings passed into the function, then returns the first and last name as a single string, separated by a space.
I've noticed however, this baby is sloooooow. I'm talking 150 - 200ms to generate a name. That's 20 seconds to generate a list of 100 names. I know there's huge room for improvement here, but I'm not well versed enough in the standard libraries yet.
Here it is:
from codecs import decode
from json import dump, loads
from pathlib import Path
from random import randrange
from re import compile, findall, IGNORECASE
from timeit import default_timer as timer
from urllib import request
DB_FORE = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/first%20names/us.txt'
DB_LAST = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/surnames/us.txt'
def write_names(dest, url_forename = DB_FORE, url_surname = DB_LAST):
"""
Writes the names from the database to the given destination.
:param dest: the name of the destination file. Blank if you want the string returned
:param url_forename: url containing plain text first names, entries separated by rn
:param url_surname: url containing plain text last names, entries separated by n
"""
names = {
# Last values tend to be buggy, so get rid of them
'forenames': _get_names(url_forename).split("rn")[:-1],
'surnames': _get_names(url_surname).split("n")[:-1],
}
with open(dest, 'w') as json_file:
dump(names, json_file, indent = 4, separators = (',', ': '))
def random_name(fmatches = '', smatches = ''):
"""
Gets a random name in the form "Forename Lastname".
:param fmatches: a regex string to try to match to a forename
:param smatches: a regex string to try to match to a surname
:return: a random name based on the regex inputs, or None if no names were found
"""
file_name = 'names.json'
names_file = Path(file_name)
if not Path(names_file).is_file():
print(f'{file_name} not found, writing from default database')
write_names(file_name)
print('Done!')
with names_file.open('r') as json_file:
names = loads(json_file.read())
first = _choose_name(fmatches, names['forenames'])
last = _choose_name(smatches, names['surnames'])
if first is None or last is None:
return None
else:
return first + " " + last
def _get_names(url):
with request.urlopen(url) as site:
return str(decode(site.read(), 'utf-8'))
def _choose_name(expr, data):
if expr is None:
return data[randrange(0, len(data))]
else:
return _random_match(compile(expr, flags = IGNORECASE), data)
def _match_names(expr, data):
return [name for name in data if findall(expr, name)]
def _random_match(expr, data):
matches = _match_names(expr, data)
return matches[randrange(0, len(matches))] if len(matches) > 0 else None
if __name__ == '__main__':
start = timer()
print(random_name('rose', 'an'))
end = timer()
print(end - start, 'n')
start = timer()
print(random_name(fmatches = '^jo'))
end = timer()
print(end - start, 'n')
start = timer()
print(random_name(smatches = 'en$'))
end = timer()
print(end - start, 'n')
start = timer()
print(random_name(fmatches = 'grf'))
end = timer()
print(end - start, 'n')
start = timer()
print([random_name() for _ in range(10)])
end = timer()
print(end - start, 'n')
The JSON file (named 'names.json') is in the following format:
{
"forenames": [
"Aaron",
"Abbey",
"Abdul",
...
],
"surnames": [
"Aaberg",
"Aaby",
"Aadland",
...
]
}
For reference, there's about 5,000 forenames, and 90,000 surnames... yeah, I'm thinking that has something to do with how slow it is (go figure). Obviously I could speed up the execution significantly by removing a lot of uncommon names, but I want to know if there is any way I can improve the current implementation with the current list of names.
The name generator supports entering regex strings, but I haven't seen any notable performance difference between generating a name with or without regex.
Here's a sample output from running the module:
Roseanna Sodeman
0.1488200619749599
Joette Knorr
0.21161438351850587
Birdie Tohen
0.15252753028006855
None
0.21214806850333456
['Mohammed Koelzer', 'Luvenia Kovalcin', 'Danica Lehnen', 'Chad Mannan',
'Naomi Kilborne', 'Cami Lydecker', 'Amie Dearson', 'Seema Reiche', 'Ai
Bruhn', 'Laureen Gitthens']
2.1672272217311948
Because I'm a beginner, I would appreciate reviews on both improving the performance of the code, and also my code style (i.e. any standard libraries I could be utilising, any shortcuts such as List Comprehension that I'm missing).
python performance python-3.x
New contributor
add a comment |
up vote
0
down vote
favorite
Tired of thinking up random names when following tutorials involving people? So was I.
The code below is the first project I've written in Python. Put simply, it grabs a list of forenames and last-names from an online database, saves them into a dictionary, then writes it to a JSON file. After that, the rest of the code reads the JSON file, chooses a first and last name based on any regex strings passed into the function, then returns the first and last name as a single string, separated by a space.
I've noticed however, this baby is sloooooow. I'm talking 150 - 200ms to generate a name. That's 20 seconds to generate a list of 100 names. I know there's huge room for improvement here, but I'm not well versed enough in the standard libraries yet.
Here it is:
from codecs import decode
from json import dump, loads
from pathlib import Path
from random import randrange
from re import compile, findall, IGNORECASE
from timeit import default_timer as timer
from urllib import request
DB_FORE = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/first%20names/us.txt'
DB_LAST = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/surnames/us.txt'
def write_names(dest, url_forename = DB_FORE, url_surname = DB_LAST):
"""
Writes the names from the database to the given destination.
:param dest: the name of the destination file. Blank if you want the string returned
:param url_forename: url containing plain text first names, entries separated by rn
:param url_surname: url containing plain text last names, entries separated by n
"""
names = {
# Last values tend to be buggy, so get rid of them
'forenames': _get_names(url_forename).split("rn")[:-1],
'surnames': _get_names(url_surname).split("n")[:-1],
}
with open(dest, 'w') as json_file:
dump(names, json_file, indent = 4, separators = (',', ': '))
def random_name(fmatches = '', smatches = ''):
"""
Gets a random name in the form "Forename Lastname".
:param fmatches: a regex string to try to match to a forename
:param smatches: a regex string to try to match to a surname
:return: a random name based on the regex inputs, or None if no names were found
"""
file_name = 'names.json'
names_file = Path(file_name)
if not Path(names_file).is_file():
print(f'{file_name} not found, writing from default database')
write_names(file_name)
print('Done!')
with names_file.open('r') as json_file:
names = loads(json_file.read())
first = _choose_name(fmatches, names['forenames'])
last = _choose_name(smatches, names['surnames'])
if first is None or last is None:
return None
else:
return first + " " + last
def _get_names(url):
with request.urlopen(url) as site:
return str(decode(site.read(), 'utf-8'))
def _choose_name(expr, data):
if expr is None:
return data[randrange(0, len(data))]
else:
return _random_match(compile(expr, flags = IGNORECASE), data)
def _match_names(expr, data):
return [name for name in data if findall(expr, name)]
def _random_match(expr, data):
matches = _match_names(expr, data)
return matches[randrange(0, len(matches))] if len(matches) > 0 else None
if __name__ == '__main__':
start = timer()
print(random_name('rose', 'an'))
end = timer()
print(end - start, 'n')
start = timer()
print(random_name(fmatches = '^jo'))
end = timer()
print(end - start, 'n')
start = timer()
print(random_name(smatches = 'en$'))
end = timer()
print(end - start, 'n')
start = timer()
print(random_name(fmatches = 'grf'))
end = timer()
print(end - start, 'n')
start = timer()
print([random_name() for _ in range(10)])
end = timer()
print(end - start, 'n')
The JSON file (named 'names.json') is in the following format:
{
"forenames": [
"Aaron",
"Abbey",
"Abdul",
...
],
"surnames": [
"Aaberg",
"Aaby",
"Aadland",
...
]
}
For reference, there's about 5,000 forenames, and 90,000 surnames... yeah, I'm thinking that has something to do with how slow it is (go figure). Obviously I could speed up the execution significantly by removing a lot of uncommon names, but I want to know if there is any way I can improve the current implementation with the current list of names.
The name generator supports entering regex strings, but I haven't seen any notable performance difference between generating a name with or without regex.
Here's a sample output from running the module:
Roseanna Sodeman
0.1488200619749599
Joette Knorr
0.21161438351850587
Birdie Tohen
0.15252753028006855
None
0.21214806850333456
['Mohammed Koelzer', 'Luvenia Kovalcin', 'Danica Lehnen', 'Chad Mannan',
'Naomi Kilborne', 'Cami Lydecker', 'Amie Dearson', 'Seema Reiche', 'Ai
Bruhn', 'Laureen Gitthens']
2.1672272217311948
Because I'm a beginner, I would appreciate reviews on both improving the performance of the code, and also my code style (i.e. any standard libraries I could be utilising, any shortcuts such as List Comprehension that I'm missing).
python performance python-3.x
New contributor
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Tired of thinking up random names when following tutorials involving people? So was I.
The code below is the first project I've written in Python. Put simply, it grabs a list of forenames and last-names from an online database, saves them into a dictionary, then writes it to a JSON file. After that, the rest of the code reads the JSON file, chooses a first and last name based on any regex strings passed into the function, then returns the first and last name as a single string, separated by a space.
I've noticed however, this baby is sloooooow. I'm talking 150 - 200ms to generate a name. That's 20 seconds to generate a list of 100 names. I know there's huge room for improvement here, but I'm not well versed enough in the standard libraries yet.
Here it is:
from codecs import decode
from json import dump, loads
from pathlib import Path
from random import randrange
from re import compile, findall, IGNORECASE
from timeit import default_timer as timer
from urllib import request
DB_FORE = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/first%20names/us.txt'
DB_LAST = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/surnames/us.txt'
def write_names(dest, url_forename = DB_FORE, url_surname = DB_LAST):
"""
Writes the names from the database to the given destination.
:param dest: the name of the destination file. Blank if you want the string returned
:param url_forename: url containing plain text first names, entries separated by rn
:param url_surname: url containing plain text last names, entries separated by n
"""
names = {
# Last values tend to be buggy, so get rid of them
'forenames': _get_names(url_forename).split("rn")[:-1],
'surnames': _get_names(url_surname).split("n")[:-1],
}
with open(dest, 'w') as json_file:
dump(names, json_file, indent = 4, separators = (',', ': '))
def random_name(fmatches = '', smatches = ''):
"""
Gets a random name in the form "Forename Lastname".
:param fmatches: a regex string to try to match to a forename
:param smatches: a regex string to try to match to a surname
:return: a random name based on the regex inputs, or None if no names were found
"""
file_name = 'names.json'
names_file = Path(file_name)
if not Path(names_file).is_file():
print(f'{file_name} not found, writing from default database')
write_names(file_name)
print('Done!')
with names_file.open('r') as json_file:
names = loads(json_file.read())
first = _choose_name(fmatches, names['forenames'])
last = _choose_name(smatches, names['surnames'])
if first is None or last is None:
return None
else:
return first + " " + last
def _get_names(url):
with request.urlopen(url) as site:
return str(decode(site.read(), 'utf-8'))
def _choose_name(expr, data):
if expr is None:
return data[randrange(0, len(data))]
else:
return _random_match(compile(expr, flags = IGNORECASE), data)
def _match_names(expr, data):
return [name for name in data if findall(expr, name)]
def _random_match(expr, data):
matches = _match_names(expr, data)
return matches[randrange(0, len(matches))] if len(matches) > 0 else None
if __name__ == '__main__':
start = timer()
print(random_name('rose', 'an'))
end = timer()
print(end - start, 'n')
start = timer()
print(random_name(fmatches = '^jo'))
end = timer()
print(end - start, 'n')
start = timer()
print(random_name(smatches = 'en$'))
end = timer()
print(end - start, 'n')
start = timer()
print(random_name(fmatches = 'grf'))
end = timer()
print(end - start, 'n')
start = timer()
print([random_name() for _ in range(10)])
end = timer()
print(end - start, 'n')
The JSON file (named 'names.json') is in the following format:
{
"forenames": [
"Aaron",
"Abbey",
"Abdul",
...
],
"surnames": [
"Aaberg",
"Aaby",
"Aadland",
...
]
}
For reference, there's about 5,000 forenames, and 90,000 surnames... yeah, I'm thinking that has something to do with how slow it is (go figure). Obviously I could speed up the execution significantly by removing a lot of uncommon names, but I want to know if there is any way I can improve the current implementation with the current list of names.
The name generator supports entering regex strings, but I haven't seen any notable performance difference between generating a name with or without regex.
Here's a sample output from running the module:
Roseanna Sodeman
0.1488200619749599
Joette Knorr
0.21161438351850587
Birdie Tohen
0.15252753028006855
None
0.21214806850333456
['Mohammed Koelzer', 'Luvenia Kovalcin', 'Danica Lehnen', 'Chad Mannan',
'Naomi Kilborne', 'Cami Lydecker', 'Amie Dearson', 'Seema Reiche', 'Ai
Bruhn', 'Laureen Gitthens']
2.1672272217311948
Because I'm a beginner, I would appreciate reviews on both improving the performance of the code, and also my code style (i.e. any standard libraries I could be utilising, any shortcuts such as List Comprehension that I'm missing).
python performance python-3.x
New contributor
Tired of thinking up random names when following tutorials involving people? So was I.
The code below is the first project I've written in Python. Put simply, it grabs a list of forenames and last-names from an online database, saves them into a dictionary, then writes it to a JSON file. After that, the rest of the code reads the JSON file, chooses a first and last name based on any regex strings passed into the function, then returns the first and last name as a single string, separated by a space.
I've noticed however, this baby is sloooooow. I'm talking 150 - 200ms to generate a name. That's 20 seconds to generate a list of 100 names. I know there's huge room for improvement here, but I'm not well versed enough in the standard libraries yet.
Here it is:
from codecs import decode
from json import dump, loads
from pathlib import Path
from random import randrange
from re import compile, findall, IGNORECASE
from timeit import default_timer as timer
from urllib import request
DB_FORE = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/first%20names/us.txt'
DB_LAST = 'https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/surnames/us.txt'
def write_names(dest, url_forename = DB_FORE, url_surname = DB_LAST):
"""
Writes the names from the database to the given destination.
:param dest: the name of the destination file. Blank if you want the string returned
:param url_forename: url containing plain text first names, entries separated by rn
:param url_surname: url containing plain text last names, entries separated by n
"""
names = {
# Last values tend to be buggy, so get rid of them
'forenames': _get_names(url_forename).split("rn")[:-1],
'surnames': _get_names(url_surname).split("n")[:-1],
}
with open(dest, 'w') as json_file:
dump(names, json_file, indent = 4, separators = (',', ': '))
def random_name(fmatches = '', smatches = ''):
"""
Gets a random name in the form "Forename Lastname".
:param fmatches: a regex string to try to match to a forename
:param smatches: a regex string to try to match to a surname
:return: a random name based on the regex inputs, or None if no names were found
"""
file_name = 'names.json'
names_file = Path(file_name)
if not Path(names_file).is_file():
print(f'{file_name} not found, writing from default database')
write_names(file_name)
print('Done!')
with names_file.open('r') as json_file:
names = loads(json_file.read())
first = _choose_name(fmatches, names['forenames'])
last = _choose_name(smatches, names['surnames'])
if first is None or last is None:
return None
else:
return first + " " + last
def _get_names(url):
with request.urlopen(url) as site:
return str(decode(site.read(), 'utf-8'))
def _choose_name(expr, data):
if expr is None:
return data[randrange(0, len(data))]
else:
return _random_match(compile(expr, flags = IGNORECASE), data)
def _match_names(expr, data):
return [name for name in data if findall(expr, name)]
def _random_match(expr, data):
matches = _match_names(expr, data)
return matches[randrange(0, len(matches))] if len(matches) > 0 else None
if __name__ == '__main__':
start = timer()
print(random_name('rose', 'an'))
end = timer()
print(end - start, 'n')
start = timer()
print(random_name(fmatches = '^jo'))
end = timer()
print(end - start, 'n')
start = timer()
print(random_name(smatches = 'en$'))
end = timer()
print(end - start, 'n')
start = timer()
print(random_name(fmatches = 'grf'))
end = timer()
print(end - start, 'n')
start = timer()
print([random_name() for _ in range(10)])
end = timer()
print(end - start, 'n')
The JSON file (named 'names.json') is in the following format:
{
"forenames": [
"Aaron",
"Abbey",
"Abdul",
...
],
"surnames": [
"Aaberg",
"Aaby",
"Aadland",
...
]
}
For reference, there's about 5,000 forenames, and 90,000 surnames... yeah, I'm thinking that has something to do with how slow it is (go figure). Obviously I could speed up the execution significantly by removing a lot of uncommon names, but I want to know if there is any way I can improve the current implementation with the current list of names.
The name generator supports entering regex strings, but I haven't seen any notable performance difference between generating a name with or without regex.
Here's a sample output from running the module:
Roseanna Sodeman
0.1488200619749599
Joette Knorr
0.21161438351850587
Birdie Tohen
0.15252753028006855
None
0.21214806850333456
['Mohammed Koelzer', 'Luvenia Kovalcin', 'Danica Lehnen', 'Chad Mannan',
'Naomi Kilborne', 'Cami Lydecker', 'Amie Dearson', 'Seema Reiche', 'Ai
Bruhn', 'Laureen Gitthens']
2.1672272217311948
Because I'm a beginner, I would appreciate reviews on both improving the performance of the code, and also my code style (i.e. any standard libraries I could be utilising, any shortcuts such as List Comprehension that I'm missing).
python performance python-3.x
python performance python-3.x
New contributor
New contributor
New contributor
asked 9 mins ago
Raddari
1
1
New contributor
New contributor
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Raddari is a new contributor. Be nice, and check out our Code of Conduct.
Raddari is a new contributor. Be nice, and check out our Code of Conduct.
Raddari is a new contributor. Be nice, and check out our Code of Conduct.
Raddari is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208207%2frandom-name-generator-from-json-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown