psycopg2: insert multiple rows with one query
I need to insert multiple rows with one query (number of rows is not constant), so I need to execute query like this one:
INSERT INTO t (a, b) VALUES (1, 2), (3, 4), (5, 6);
The only way I know is
args = [(1,2), (3,4), (5,6)]
args_str = ','.join(cursor.mogrify("%s", (x, )) for x in args)
cursor.execute("INSERT INTO t (a, b) VALUES "+args_str)
but I want some simpler way.
python postgresql psycopg2
add a comment |
I need to insert multiple rows with one query (number of rows is not constant), so I need to execute query like this one:
INSERT INTO t (a, b) VALUES (1, 2), (3, 4), (5, 6);
The only way I know is
args = [(1,2), (3,4), (5,6)]
args_str = ','.join(cursor.mogrify("%s", (x, )) for x in args)
cursor.execute("INSERT INTO t (a, b) VALUES "+args_str)
but I want some simpler way.
python postgresql psycopg2
add a comment |
I need to insert multiple rows with one query (number of rows is not constant), so I need to execute query like this one:
INSERT INTO t (a, b) VALUES (1, 2), (3, 4), (5, 6);
The only way I know is
args = [(1,2), (3,4), (5,6)]
args_str = ','.join(cursor.mogrify("%s", (x, )) for x in args)
cursor.execute("INSERT INTO t (a, b) VALUES "+args_str)
but I want some simpler way.
python postgresql psycopg2
I need to insert multiple rows with one query (number of rows is not constant), so I need to execute query like this one:
INSERT INTO t (a, b) VALUES (1, 2), (3, 4), (5, 6);
The only way I know is
args = [(1,2), (3,4), (5,6)]
args_str = ','.join(cursor.mogrify("%s", (x, )) for x in args)
cursor.execute("INSERT INTO t (a, b) VALUES "+args_str)
but I want some simpler way.
python postgresql psycopg2
python postgresql psycopg2
edited Nov 17 '11 at 14:28
Sergey Fedoseev
asked Nov 15 '11 at 10:09
Sergey FedoseevSergey Fedoseev
98521116
98521116
add a comment |
add a comment |
13 Answers
13
active
oldest
votes
I built a program that inserts multiple lines to a server that was located in another city.
I found out that using this method was about 10 times faster than executemany
. In my case tup
is a tuple containing about 2000 rows. It took about 10 seconds when using this method:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
and 2 minutes when using this method:
cur.executemany("INSERT INTO table VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s)", tup)
10
Still very relevant almost two years later. An experience today suggests that as the number of rows you want to push increases, the better it is to use theexecute
strategy. I saw speedup of around 100x thanks to this!
– Rob Watts
Jan 22 '14 at 21:16
3
Perhapsexecutemany
runs a commit after each insert. If you instead wrap the whole thing in a transaction, maybe that would expedite things?
– Richard
Apr 12 '15 at 17:06
4
Just confirmed this improvement myself. From what I've read psycopg2'sexecutemany
doesn't do anything optimal, just loops and does manyexecute
statements. Using this method, a 700 row insert to a remote server went from 60s to <2s.
– Nelson
Apr 27 '15 at 23:22
2
Maybe I'm being paranoid, but concatenating the query with a+
seems like it could open up to sql injection, I feel like @Clodoaldo Netoexecute_values()
solution is safer.
– Will Munn
Jan 17 '18 at 11:55
4
in case someone encounters the following error: [TypeError: sequence item 0: expected str instance, bytes found] run this command instead [args_str = ','.join(cur.mogrify("(%s,%s)", x).decode("utf-8") for x in tup)]
– mrt
Sep 3 '18 at 23:08
|
show 6 more comments
New execute_values
method in Psycopg 2.7:
data = [(1,'x'), (2,'y')]
insert_query = 'insert into t (a, b) values %s'
psycopg2.extras.execute_values (
cursor, insert_query, data, template=None, page_size=100
)
The pythonic way of doing it in Psycopg 2.6:
data = [(1,'x'), (2,'y')]
records_list_template = ','.join(['%s'] * len(data))
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
cursor.execute(insert_query, data)
Explanation: If the data to be inserted is given as a list of tuples like in
data = [(1,'x'), (2,'y')]
then it is already in the exact required format as
the
values
syntax of theinsert
clause expects a list of records as in
insert into t (a, b) values (1, 'x'),(2, 'y')
Psycopg
adapts a Pythontuple
to a Postgresqlrecord
.
The only necessary work is to provide a records list template to be filled by psycopg
# We use the data list to be sure of the template length
records_list_template = ','.join(['%s'] * len(data))
and place it in the insert
query
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
Printing the insert_query
outputs
insert into t (a, b) values %s,%s
Now to the usual Psycopg
arguments substitution
cursor.execute(insert_query, data)
Or just testing what will be sent to the server
print (cursor.mogrify(insert_query, data).decode('utf8'))
Output:
insert into t (a, b) values (1, 'x'),(2, 'y')
How does the performance of this method compare with cur.copy_from?
– Michael Goldshteyn
Mar 3 '16 at 22:05
Here's a gist with a benchmark. copy_from scales to about 6.5X faster on my machine with 10M records.
– Joseph Sheedy
Jul 13 '16 at 0:28
Looks nice - I think you have a stray , at the end of your initial definition of insert_query (unless you were trying to make it a tuple?) and are missing a s after the % for %s also in the initial definition of insert_query.
– deadcode
May 18 '17 at 1:22
@deadcode Fixed, Thanks.
– Clodoaldo Neto
May 18 '17 at 12:20
1
usingexecute_values
I was able to get my system running at 1k records a minute up to 128k records a minute
– Conrad.Dean
Nov 15 '18 at 15:17
add a comment |
Update with psycopg2 2.7:
The classic executemany()
is about 60 times slower than @ant32 's implementation (called "folded") as explained in this thread: https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com
This implementation was added to psycopg2 in version 2.7 and is called execute_values()
:
from psycopg2.extras import execute_values
execute_values(cur,
"INSERT INTO test (id, v1, v2) VALUES %s",
[(1, 2, 3), (4, 5, 6), (7, 8, 9)])
Previous Answer:
To insert multiple rows, using the multirow VALUES
syntax with execute()
is about 10x faster than using psycopg2 executemany()
. Indeed, executemany()
just runs many individual INSERT
statements.
@ant32 's code works perfectly in Python 2. But in Python 3, cursor.mogrify()
returns bytes, cursor.execute()
takes either bytes or strings, and ','.join()
expects str
instance.
So in Python 3 you may need to modify @ant32 's code, by adding .decode('utf-8')
:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x).decode('utf-8') for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
Or by using bytes (with b''
or b""
) only:
args_bytes = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_bytes)
add a comment |
A snippet from Psycopg2's tutorial page at Postgresql.org (see bottom):
A last item I would like to show you is how to insert multiple rows using a dictionary. If you had the following:
namedict = ({"first_name":"Joshua", "last_name":"Drake"},
{"first_name":"Steven", "last_name":"Foo"},
{"first_name":"David", "last_name":"Bar"})
You could easily insert all three rows within the dictionary by using:
cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)
It doesn't save much code, but it definitively looks better.
29
This will run many individualINSERT
statements. Useful, but not the same as a single multi-VALUE
d insert.
– Craig Ringer
Apr 9 '13 at 2:26
add a comment |
cursor.copy_from is the fastest solution I've found for bulk inserts by far. Here's a gist I made containing a class named IteratorFile which allows an iterator yielding strings to be read like a file. We can convert each input record to a string using a generator expression. So the solution would be
args = [(1,2), (3,4), (5,6)]
f = IteratorFile(("{}t{}".format(x[0], x[1]) for x in args))
cursor.copy_from(f, 'table_name', columns=('a', 'b'))
For this trivial size of args it won't make much of a speed difference, but I see big speedups when dealing with thousands+ of rows. It will also be more memory efficient than building a giant query string. An iterator would only ever hold one input record in memory at a time, where at some point you'll run out of memory in your Python process or in Postgres by building the query string.
2
Here is a benchmark comparing copy_from/IteratorFile with a query builder solution. copy_from scales to about 6.5X faster on my machine with 10M records.
– Joseph Sheedy
Sep 30 '16 at 17:49
3
do you have to dick around with escaping strings and timestamps etc?
– CpILL
Jun 13 '17 at 9:35
Yes, you'll have to make sure you have a well formed TSV records.
– Joseph Sheedy
Jun 14 '17 at 2:51
add a comment |
All of these techniques are called 'Extended Inserts" in Postgres terminology, and as of the 24th of November 2016, it's still a ton faster than psychopg2's executemany() and all the other methods listed in this thread (which i tried before coming to this answer).
Here's some code which doesnt use cur.mogrify and is nice and simply to get your head around:
valueSQL = [ '%s', '%s', '%s', ... ] # as many as you have columns.
sqlrows =
rowsPerInsert = 3 # more means faster, but with diminishing returns..
for row in getSomeData:
# row == [1, 'a', 'yolo', ... ]
sqlrows += row
if ( len(sqlrows)/len(valueSQL) ) % rowsPerInsert == 0:
# sqlrows == [ 1, 'a', 'yolo', 2, 'b', 'swag', 3, 'c', 'selfie' ]
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*rowsPerInsert)
cur.execute(insertSQL, sqlrows)
con.commit()
sqlrows =
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*len(sqlrows))
cur.execute(insertSQL, sqlrows)
con.commit()
But it should be noted that if you can use copy_from(), you should use copy_from ;)
Bringing up from the dead, but what happens in the situation of the last few rows? I assume you actually run that final clause again on the last remaining rows, in the case you have an even number of rows?
– mcpeterson
Nov 23 '16 at 23:03
Correct, sorry I must have forgotten to do that when i wrote the example - thats pretty stupid of me. Not doing so wouldn't have given people an error, which makes me worried how many people copy/pasted the solution and went about their business..... Anyway, very grateful mcpeterson - thank you!
– J.J
Nov 24 '16 at 1:49
add a comment |
To nicely insert into DB a list of rows, using user given batch size and with psycopg2 !
def get_batch(iterable, size=100):
for i in range(0, len(iterable), size):
yield iterable[i: i + size]
def insert_rows_batch(table, rows, batch_size=500, target_fields=None):
"""
A utility method to insert batch of tuples(rows) into a table
NOTE: Handle data type for fields in rows yourself as per your table
columns' type.
:param table: Name of the target table
:type table: str
:param rows: The rows to insert into the table
:type rows: iterable of tuples
:param batch_size: The size of batch of rows to insert at a time
:type batch_size: int
:param target_fields: The names of the columns to fill in the table
:type target_fields: iterable of strings
"""
conn = cur = None
if target_fields:
target_fields = ", ".join(target_fields)
target_fields = "({})".format(target_fields)
else:
target_fields = ''
conn = get_conn() # get connection using psycopg2
if conn:
cur = conn.cursor()
count = 0
for mini_batch in get_batch(rows, batch_size):
mini_batch_size = len(mini_batch)
count += mini_batch_size
record_template = ','.join(["%s"] * mini_batch_size)
sql = "INSERT INTO {0} {1} VALUES {2};".format(
table,
target_fields,
record_template)
cur.execute(sql, mini_batch)
conn.commit()
print("Loaded {} rows into {} so far".format(count, table))
print("Done loading. Loaded a total of {} rows".format(count))
if cur:cur.close()
if conn:conn.close()
If you want UPSERT (Insert+Update) as well in postgres with batches: postgres_utilities
add a comment |
Another nice and efficient approach - is to pass rows for insertion as 1 argument,
which is array of json objects.
E.g. you passing argument:
[ {id: 18, score: 1}, { id: 19, score: 5} ]
It is array, which may contain any amount of objects inside.
Then your SQL looks like:
INSERT INTO links (parent_id, child_id, score)
SELECT 123, (r->>'id')::int, (r->>'score')::int
FROM unnest($1::json) as r
Notice: Your postgress must be new enough, to support json
add a comment |
I've been using ant32's answer above for several years. However I've found that is thorws an error in python 3 because mogrify
returns a byte string.
Converting explicitly to bytse strings is a simple solution for making code python 3 compatible.
args_str = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_str)
add a comment |
If you're using SQLAlchemy, you don't need to mess with hand-crafting the string because SQLAlchemy supports generating a multi-row VALUES
clause for a single INSERT
statement:
rows =
for i, name in enumerate(rawdata):
row = {
'id': i,
'name': name,
'valid': True,
}
rows.append(row)
if len(rows) > 0: # INSERT fails if no rows
insert_query = SQLAlchemyModelName.__table__.insert().values(rows)
session.execute(insert_query)
Under the hood SQLAlchemy uses psychopg2's executemany() for calls like this and therefore this answer will have severe performance issues for large queries. See the execute method docs.sqlalchemy.org/en/latest/orm/session_api.html.
– sage88
Jan 5 '17 at 22:48
2
I don't think that's the case. It's been a bit since I looked at this, but IIRC, this is actually building a single insertion statement in theinsert_query
line. Then,session.execute()
is just calling psycopg2'sexecute()
statement with a single massive string. So the "trick" is building the entire insertion statement object first. I am using this to insert 200,000 rows at a time and saw massive performance increases using this code compared to the normalexecutemany()
.
– Jeff Widman
Jan 6 '17 at 1:13
1
The SQLAlchemy doc you linked to has a section that shows exactly how this works and even says: "It is essential to note that passing multiple values is NOT the same as using traditional executemany() form". So it is explicitly calling out that this works.
– Jeff Widman
Jan 6 '17 at 1:23
1
I stand corrected. I didn't notice your usage of the values() method (without it SQLAlchemy just does executemany). I would say edit the answer to include a link to that doc so that I can change my vote, but obviously you've already included it. Perhaps mention that this is not the same thing as calling an insert() with execute() with a list of dicts?
– sage88
Jan 6 '17 at 6:15
add a comment |
If you want to insert multiple rows within one insert statemens (assuming you are not using ORM) the easiest way so far for me would be to use list of dictionaries. Here is an example:
t = [{'id':1, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 6},
{'id':2, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 7},
{'id':3, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 8}]
conn.execute("insert into campaign_dates
(id, start_date, end_date, campaignid)
values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);",
t)
As you can see only one query will be executed:
INFO sqlalchemy.engine.base.Engine insert into campaign_dates (id, start_date, end_date, campaignid) values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);
INFO sqlalchemy.engine.base.Engine [{'campaignid': 6, 'id': 1, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 7, 'id': 2, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 8, 'id': 3, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}]
INFO sqlalchemy.engine.base.Engine COMMIT
Showing logging from the sqlalchemy engine is NOT a demonstration of only running a single query, it just means that the sqlalchemy engine ran one command. Under the hood this is using psychopg2's executemany which is very inefficient. See the execute method docs.sqlalchemy.org/en/latest/orm/session_api.html.
– sage88
Jan 5 '17 at 23:06
add a comment |
Finally in SQLalchemy1.2 version, this new implementation is added to use psycopg2.extras.execute_batch() instead of executemany when you initialize your engine with use_batch_mode=True like:
engine = create_engine(
"postgresql+psycopg2://scott:tiger@host/dbname",
use_batch_mode=True)
http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109
Then someone would have to use SQLalchmey won't bother to try different combinations of sqla and psycopg2 and direct SQL together..
add a comment |
Using aiopg - The snippet below works perfectly fine
# items = [10, 11, 12, 13]
# group = 1
tup = [(gid, pid) for pid in items]
args_str = ",".join([str(s) for s in tup])
# insert into group values (1, 10), (1, 11), (1, 12), (1, 13)
yield from cur.execute("INSERT INTO group VALUES " + args_str)
8
This method is not safe from SQL injections. As psycopg2 documentation states (that aiopg2 links to): 'Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.'
– Michał Pawłowski
Sep 10 '15 at 16:13
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f8134602%2fpsycopg2-insert-multiple-rows-with-one-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
13 Answers
13
active
oldest
votes
13 Answers
13
active
oldest
votes
active
oldest
votes
active
oldest
votes
I built a program that inserts multiple lines to a server that was located in another city.
I found out that using this method was about 10 times faster than executemany
. In my case tup
is a tuple containing about 2000 rows. It took about 10 seconds when using this method:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
and 2 minutes when using this method:
cur.executemany("INSERT INTO table VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s)", tup)
10
Still very relevant almost two years later. An experience today suggests that as the number of rows you want to push increases, the better it is to use theexecute
strategy. I saw speedup of around 100x thanks to this!
– Rob Watts
Jan 22 '14 at 21:16
3
Perhapsexecutemany
runs a commit after each insert. If you instead wrap the whole thing in a transaction, maybe that would expedite things?
– Richard
Apr 12 '15 at 17:06
4
Just confirmed this improvement myself. From what I've read psycopg2'sexecutemany
doesn't do anything optimal, just loops and does manyexecute
statements. Using this method, a 700 row insert to a remote server went from 60s to <2s.
– Nelson
Apr 27 '15 at 23:22
2
Maybe I'm being paranoid, but concatenating the query with a+
seems like it could open up to sql injection, I feel like @Clodoaldo Netoexecute_values()
solution is safer.
– Will Munn
Jan 17 '18 at 11:55
4
in case someone encounters the following error: [TypeError: sequence item 0: expected str instance, bytes found] run this command instead [args_str = ','.join(cur.mogrify("(%s,%s)", x).decode("utf-8") for x in tup)]
– mrt
Sep 3 '18 at 23:08
|
show 6 more comments
I built a program that inserts multiple lines to a server that was located in another city.
I found out that using this method was about 10 times faster than executemany
. In my case tup
is a tuple containing about 2000 rows. It took about 10 seconds when using this method:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
and 2 minutes when using this method:
cur.executemany("INSERT INTO table VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s)", tup)
10
Still very relevant almost two years later. An experience today suggests that as the number of rows you want to push increases, the better it is to use theexecute
strategy. I saw speedup of around 100x thanks to this!
– Rob Watts
Jan 22 '14 at 21:16
3
Perhapsexecutemany
runs a commit after each insert. If you instead wrap the whole thing in a transaction, maybe that would expedite things?
– Richard
Apr 12 '15 at 17:06
4
Just confirmed this improvement myself. From what I've read psycopg2'sexecutemany
doesn't do anything optimal, just loops and does manyexecute
statements. Using this method, a 700 row insert to a remote server went from 60s to <2s.
– Nelson
Apr 27 '15 at 23:22
2
Maybe I'm being paranoid, but concatenating the query with a+
seems like it could open up to sql injection, I feel like @Clodoaldo Netoexecute_values()
solution is safer.
– Will Munn
Jan 17 '18 at 11:55
4
in case someone encounters the following error: [TypeError: sequence item 0: expected str instance, bytes found] run this command instead [args_str = ','.join(cur.mogrify("(%s,%s)", x).decode("utf-8") for x in tup)]
– mrt
Sep 3 '18 at 23:08
|
show 6 more comments
I built a program that inserts multiple lines to a server that was located in another city.
I found out that using this method was about 10 times faster than executemany
. In my case tup
is a tuple containing about 2000 rows. It took about 10 seconds when using this method:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
and 2 minutes when using this method:
cur.executemany("INSERT INTO table VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s)", tup)
I built a program that inserts multiple lines to a server that was located in another city.
I found out that using this method was about 10 times faster than executemany
. In my case tup
is a tuple containing about 2000 rows. It took about 10 seconds when using this method:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
and 2 minutes when using this method:
cur.executemany("INSERT INTO table VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s)", tup)
edited Mar 9 '17 at 11:40
Alex Riley
80k26160163
80k26160163
answered Apr 13 '12 at 19:53
ant32ant32
1,949184
1,949184
10
Still very relevant almost two years later. An experience today suggests that as the number of rows you want to push increases, the better it is to use theexecute
strategy. I saw speedup of around 100x thanks to this!
– Rob Watts
Jan 22 '14 at 21:16
3
Perhapsexecutemany
runs a commit after each insert. If you instead wrap the whole thing in a transaction, maybe that would expedite things?
– Richard
Apr 12 '15 at 17:06
4
Just confirmed this improvement myself. From what I've read psycopg2'sexecutemany
doesn't do anything optimal, just loops and does manyexecute
statements. Using this method, a 700 row insert to a remote server went from 60s to <2s.
– Nelson
Apr 27 '15 at 23:22
2
Maybe I'm being paranoid, but concatenating the query with a+
seems like it could open up to sql injection, I feel like @Clodoaldo Netoexecute_values()
solution is safer.
– Will Munn
Jan 17 '18 at 11:55
4
in case someone encounters the following error: [TypeError: sequence item 0: expected str instance, bytes found] run this command instead [args_str = ','.join(cur.mogrify("(%s,%s)", x).decode("utf-8") for x in tup)]
– mrt
Sep 3 '18 at 23:08
|
show 6 more comments
10
Still very relevant almost two years later. An experience today suggests that as the number of rows you want to push increases, the better it is to use theexecute
strategy. I saw speedup of around 100x thanks to this!
– Rob Watts
Jan 22 '14 at 21:16
3
Perhapsexecutemany
runs a commit after each insert. If you instead wrap the whole thing in a transaction, maybe that would expedite things?
– Richard
Apr 12 '15 at 17:06
4
Just confirmed this improvement myself. From what I've read psycopg2'sexecutemany
doesn't do anything optimal, just loops and does manyexecute
statements. Using this method, a 700 row insert to a remote server went from 60s to <2s.
– Nelson
Apr 27 '15 at 23:22
2
Maybe I'm being paranoid, but concatenating the query with a+
seems like it could open up to sql injection, I feel like @Clodoaldo Netoexecute_values()
solution is safer.
– Will Munn
Jan 17 '18 at 11:55
4
in case someone encounters the following error: [TypeError: sequence item 0: expected str instance, bytes found] run this command instead [args_str = ','.join(cur.mogrify("(%s,%s)", x).decode("utf-8") for x in tup)]
– mrt
Sep 3 '18 at 23:08
10
10
Still very relevant almost two years later. An experience today suggests that as the number of rows you want to push increases, the better it is to use the
execute
strategy. I saw speedup of around 100x thanks to this!– Rob Watts
Jan 22 '14 at 21:16
Still very relevant almost two years later. An experience today suggests that as the number of rows you want to push increases, the better it is to use the
execute
strategy. I saw speedup of around 100x thanks to this!– Rob Watts
Jan 22 '14 at 21:16
3
3
Perhaps
executemany
runs a commit after each insert. If you instead wrap the whole thing in a transaction, maybe that would expedite things?– Richard
Apr 12 '15 at 17:06
Perhaps
executemany
runs a commit after each insert. If you instead wrap the whole thing in a transaction, maybe that would expedite things?– Richard
Apr 12 '15 at 17:06
4
4
Just confirmed this improvement myself. From what I've read psycopg2's
executemany
doesn't do anything optimal, just loops and does many execute
statements. Using this method, a 700 row insert to a remote server went from 60s to <2s.– Nelson
Apr 27 '15 at 23:22
Just confirmed this improvement myself. From what I've read psycopg2's
executemany
doesn't do anything optimal, just loops and does many execute
statements. Using this method, a 700 row insert to a remote server went from 60s to <2s.– Nelson
Apr 27 '15 at 23:22
2
2
Maybe I'm being paranoid, but concatenating the query with a
+
seems like it could open up to sql injection, I feel like @Clodoaldo Neto execute_values()
solution is safer.– Will Munn
Jan 17 '18 at 11:55
Maybe I'm being paranoid, but concatenating the query with a
+
seems like it could open up to sql injection, I feel like @Clodoaldo Neto execute_values()
solution is safer.– Will Munn
Jan 17 '18 at 11:55
4
4
in case someone encounters the following error: [TypeError: sequence item 0: expected str instance, bytes found] run this command instead [args_str = ','.join(cur.mogrify("(%s,%s)", x).decode("utf-8") for x in tup)]
– mrt
Sep 3 '18 at 23:08
in case someone encounters the following error: [TypeError: sequence item 0: expected str instance, bytes found] run this command instead [args_str = ','.join(cur.mogrify("(%s,%s)", x).decode("utf-8") for x in tup)]
– mrt
Sep 3 '18 at 23:08
|
show 6 more comments
New execute_values
method in Psycopg 2.7:
data = [(1,'x'), (2,'y')]
insert_query = 'insert into t (a, b) values %s'
psycopg2.extras.execute_values (
cursor, insert_query, data, template=None, page_size=100
)
The pythonic way of doing it in Psycopg 2.6:
data = [(1,'x'), (2,'y')]
records_list_template = ','.join(['%s'] * len(data))
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
cursor.execute(insert_query, data)
Explanation: If the data to be inserted is given as a list of tuples like in
data = [(1,'x'), (2,'y')]
then it is already in the exact required format as
the
values
syntax of theinsert
clause expects a list of records as in
insert into t (a, b) values (1, 'x'),(2, 'y')
Psycopg
adapts a Pythontuple
to a Postgresqlrecord
.
The only necessary work is to provide a records list template to be filled by psycopg
# We use the data list to be sure of the template length
records_list_template = ','.join(['%s'] * len(data))
and place it in the insert
query
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
Printing the insert_query
outputs
insert into t (a, b) values %s,%s
Now to the usual Psycopg
arguments substitution
cursor.execute(insert_query, data)
Or just testing what will be sent to the server
print (cursor.mogrify(insert_query, data).decode('utf8'))
Output:
insert into t (a, b) values (1, 'x'),(2, 'y')
How does the performance of this method compare with cur.copy_from?
– Michael Goldshteyn
Mar 3 '16 at 22:05
Here's a gist with a benchmark. copy_from scales to about 6.5X faster on my machine with 10M records.
– Joseph Sheedy
Jul 13 '16 at 0:28
Looks nice - I think you have a stray , at the end of your initial definition of insert_query (unless you were trying to make it a tuple?) and are missing a s after the % for %s also in the initial definition of insert_query.
– deadcode
May 18 '17 at 1:22
@deadcode Fixed, Thanks.
– Clodoaldo Neto
May 18 '17 at 12:20
1
usingexecute_values
I was able to get my system running at 1k records a minute up to 128k records a minute
– Conrad.Dean
Nov 15 '18 at 15:17
add a comment |
New execute_values
method in Psycopg 2.7:
data = [(1,'x'), (2,'y')]
insert_query = 'insert into t (a, b) values %s'
psycopg2.extras.execute_values (
cursor, insert_query, data, template=None, page_size=100
)
The pythonic way of doing it in Psycopg 2.6:
data = [(1,'x'), (2,'y')]
records_list_template = ','.join(['%s'] * len(data))
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
cursor.execute(insert_query, data)
Explanation: If the data to be inserted is given as a list of tuples like in
data = [(1,'x'), (2,'y')]
then it is already in the exact required format as
the
values
syntax of theinsert
clause expects a list of records as in
insert into t (a, b) values (1, 'x'),(2, 'y')
Psycopg
adapts a Pythontuple
to a Postgresqlrecord
.
The only necessary work is to provide a records list template to be filled by psycopg
# We use the data list to be sure of the template length
records_list_template = ','.join(['%s'] * len(data))
and place it in the insert
query
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
Printing the insert_query
outputs
insert into t (a, b) values %s,%s
Now to the usual Psycopg
arguments substitution
cursor.execute(insert_query, data)
Or just testing what will be sent to the server
print (cursor.mogrify(insert_query, data).decode('utf8'))
Output:
insert into t (a, b) values (1, 'x'),(2, 'y')
How does the performance of this method compare with cur.copy_from?
– Michael Goldshteyn
Mar 3 '16 at 22:05
Here's a gist with a benchmark. copy_from scales to about 6.5X faster on my machine with 10M records.
– Joseph Sheedy
Jul 13 '16 at 0:28
Looks nice - I think you have a stray , at the end of your initial definition of insert_query (unless you were trying to make it a tuple?) and are missing a s after the % for %s also in the initial definition of insert_query.
– deadcode
May 18 '17 at 1:22
@deadcode Fixed, Thanks.
– Clodoaldo Neto
May 18 '17 at 12:20
1
usingexecute_values
I was able to get my system running at 1k records a minute up to 128k records a minute
– Conrad.Dean
Nov 15 '18 at 15:17
add a comment |
New execute_values
method in Psycopg 2.7:
data = [(1,'x'), (2,'y')]
insert_query = 'insert into t (a, b) values %s'
psycopg2.extras.execute_values (
cursor, insert_query, data, template=None, page_size=100
)
The pythonic way of doing it in Psycopg 2.6:
data = [(1,'x'), (2,'y')]
records_list_template = ','.join(['%s'] * len(data))
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
cursor.execute(insert_query, data)
Explanation: If the data to be inserted is given as a list of tuples like in
data = [(1,'x'), (2,'y')]
then it is already in the exact required format as
the
values
syntax of theinsert
clause expects a list of records as in
insert into t (a, b) values (1, 'x'),(2, 'y')
Psycopg
adapts a Pythontuple
to a Postgresqlrecord
.
The only necessary work is to provide a records list template to be filled by psycopg
# We use the data list to be sure of the template length
records_list_template = ','.join(['%s'] * len(data))
and place it in the insert
query
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
Printing the insert_query
outputs
insert into t (a, b) values %s,%s
Now to the usual Psycopg
arguments substitution
cursor.execute(insert_query, data)
Or just testing what will be sent to the server
print (cursor.mogrify(insert_query, data).decode('utf8'))
Output:
insert into t (a, b) values (1, 'x'),(2, 'y')
New execute_values
method in Psycopg 2.7:
data = [(1,'x'), (2,'y')]
insert_query = 'insert into t (a, b) values %s'
psycopg2.extras.execute_values (
cursor, insert_query, data, template=None, page_size=100
)
The pythonic way of doing it in Psycopg 2.6:
data = [(1,'x'), (2,'y')]
records_list_template = ','.join(['%s'] * len(data))
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
cursor.execute(insert_query, data)
Explanation: If the data to be inserted is given as a list of tuples like in
data = [(1,'x'), (2,'y')]
then it is already in the exact required format as
the
values
syntax of theinsert
clause expects a list of records as in
insert into t (a, b) values (1, 'x'),(2, 'y')
Psycopg
adapts a Pythontuple
to a Postgresqlrecord
.
The only necessary work is to provide a records list template to be filled by psycopg
# We use the data list to be sure of the template length
records_list_template = ','.join(['%s'] * len(data))
and place it in the insert
query
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
Printing the insert_query
outputs
insert into t (a, b) values %s,%s
Now to the usual Psycopg
arguments substitution
cursor.execute(insert_query, data)
Or just testing what will be sent to the server
print (cursor.mogrify(insert_query, data).decode('utf8'))
Output:
insert into t (a, b) values (1, 'x'),(2, 'y')
edited May 18 '17 at 12:20
answered Jun 22 '15 at 16:50
Clodoaldo NetoClodoaldo Neto
73.3k14129184
73.3k14129184
How does the performance of this method compare with cur.copy_from?
– Michael Goldshteyn
Mar 3 '16 at 22:05
Here's a gist with a benchmark. copy_from scales to about 6.5X faster on my machine with 10M records.
– Joseph Sheedy
Jul 13 '16 at 0:28
Looks nice - I think you have a stray , at the end of your initial definition of insert_query (unless you were trying to make it a tuple?) and are missing a s after the % for %s also in the initial definition of insert_query.
– deadcode
May 18 '17 at 1:22
@deadcode Fixed, Thanks.
– Clodoaldo Neto
May 18 '17 at 12:20
1
usingexecute_values
I was able to get my system running at 1k records a minute up to 128k records a minute
– Conrad.Dean
Nov 15 '18 at 15:17
add a comment |
How does the performance of this method compare with cur.copy_from?
– Michael Goldshteyn
Mar 3 '16 at 22:05
Here's a gist with a benchmark. copy_from scales to about 6.5X faster on my machine with 10M records.
– Joseph Sheedy
Jul 13 '16 at 0:28
Looks nice - I think you have a stray , at the end of your initial definition of insert_query (unless you were trying to make it a tuple?) and are missing a s after the % for %s also in the initial definition of insert_query.
– deadcode
May 18 '17 at 1:22
@deadcode Fixed, Thanks.
– Clodoaldo Neto
May 18 '17 at 12:20
1
usingexecute_values
I was able to get my system running at 1k records a minute up to 128k records a minute
– Conrad.Dean
Nov 15 '18 at 15:17
How does the performance of this method compare with cur.copy_from?
– Michael Goldshteyn
Mar 3 '16 at 22:05
How does the performance of this method compare with cur.copy_from?
– Michael Goldshteyn
Mar 3 '16 at 22:05
Here's a gist with a benchmark. copy_from scales to about 6.5X faster on my machine with 10M records.
– Joseph Sheedy
Jul 13 '16 at 0:28
Here's a gist with a benchmark. copy_from scales to about 6.5X faster on my machine with 10M records.
– Joseph Sheedy
Jul 13 '16 at 0:28
Looks nice - I think you have a stray , at the end of your initial definition of insert_query (unless you were trying to make it a tuple?) and are missing a s after the % for %s also in the initial definition of insert_query.
– deadcode
May 18 '17 at 1:22
Looks nice - I think you have a stray , at the end of your initial definition of insert_query (unless you were trying to make it a tuple?) and are missing a s after the % for %s also in the initial definition of insert_query.
– deadcode
May 18 '17 at 1:22
@deadcode Fixed, Thanks.
– Clodoaldo Neto
May 18 '17 at 12:20
@deadcode Fixed, Thanks.
– Clodoaldo Neto
May 18 '17 at 12:20
1
1
using
execute_values
I was able to get my system running at 1k records a minute up to 128k records a minute– Conrad.Dean
Nov 15 '18 at 15:17
using
execute_values
I was able to get my system running at 1k records a minute up to 128k records a minute– Conrad.Dean
Nov 15 '18 at 15:17
add a comment |
Update with psycopg2 2.7:
The classic executemany()
is about 60 times slower than @ant32 's implementation (called "folded") as explained in this thread: https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com
This implementation was added to psycopg2 in version 2.7 and is called execute_values()
:
from psycopg2.extras import execute_values
execute_values(cur,
"INSERT INTO test (id, v1, v2) VALUES %s",
[(1, 2, 3), (4, 5, 6), (7, 8, 9)])
Previous Answer:
To insert multiple rows, using the multirow VALUES
syntax with execute()
is about 10x faster than using psycopg2 executemany()
. Indeed, executemany()
just runs many individual INSERT
statements.
@ant32 's code works perfectly in Python 2. But in Python 3, cursor.mogrify()
returns bytes, cursor.execute()
takes either bytes or strings, and ','.join()
expects str
instance.
So in Python 3 you may need to modify @ant32 's code, by adding .decode('utf-8')
:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x).decode('utf-8') for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
Or by using bytes (with b''
or b""
) only:
args_bytes = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_bytes)
add a comment |
Update with psycopg2 2.7:
The classic executemany()
is about 60 times slower than @ant32 's implementation (called "folded") as explained in this thread: https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com
This implementation was added to psycopg2 in version 2.7 and is called execute_values()
:
from psycopg2.extras import execute_values
execute_values(cur,
"INSERT INTO test (id, v1, v2) VALUES %s",
[(1, 2, 3), (4, 5, 6), (7, 8, 9)])
Previous Answer:
To insert multiple rows, using the multirow VALUES
syntax with execute()
is about 10x faster than using psycopg2 executemany()
. Indeed, executemany()
just runs many individual INSERT
statements.
@ant32 's code works perfectly in Python 2. But in Python 3, cursor.mogrify()
returns bytes, cursor.execute()
takes either bytes or strings, and ','.join()
expects str
instance.
So in Python 3 you may need to modify @ant32 's code, by adding .decode('utf-8')
:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x).decode('utf-8') for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
Or by using bytes (with b''
or b""
) only:
args_bytes = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_bytes)
add a comment |
Update with psycopg2 2.7:
The classic executemany()
is about 60 times slower than @ant32 's implementation (called "folded") as explained in this thread: https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com
This implementation was added to psycopg2 in version 2.7 and is called execute_values()
:
from psycopg2.extras import execute_values
execute_values(cur,
"INSERT INTO test (id, v1, v2) VALUES %s",
[(1, 2, 3), (4, 5, 6), (7, 8, 9)])
Previous Answer:
To insert multiple rows, using the multirow VALUES
syntax with execute()
is about 10x faster than using psycopg2 executemany()
. Indeed, executemany()
just runs many individual INSERT
statements.
@ant32 's code works perfectly in Python 2. But in Python 3, cursor.mogrify()
returns bytes, cursor.execute()
takes either bytes or strings, and ','.join()
expects str
instance.
So in Python 3 you may need to modify @ant32 's code, by adding .decode('utf-8')
:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x).decode('utf-8') for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
Or by using bytes (with b''
or b""
) only:
args_bytes = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_bytes)
Update with psycopg2 2.7:
The classic executemany()
is about 60 times slower than @ant32 's implementation (called "folded") as explained in this thread: https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com
This implementation was added to psycopg2 in version 2.7 and is called execute_values()
:
from psycopg2.extras import execute_values
execute_values(cur,
"INSERT INTO test (id, v1, v2) VALUES %s",
[(1, 2, 3), (4, 5, 6), (7, 8, 9)])
Previous Answer:
To insert multiple rows, using the multirow VALUES
syntax with execute()
is about 10x faster than using psycopg2 executemany()
. Indeed, executemany()
just runs many individual INSERT
statements.
@ant32 's code works perfectly in Python 2. But in Python 3, cursor.mogrify()
returns bytes, cursor.execute()
takes either bytes or strings, and ','.join()
expects str
instance.
So in Python 3 you may need to modify @ant32 's code, by adding .decode('utf-8')
:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x).decode('utf-8') for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
Or by using bytes (with b''
or b""
) only:
args_bytes = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_bytes)
edited Nov 24 '18 at 9:12
answered Aug 19 '16 at 8:48
Antoine DusséauxAntoine Dusséaux
1,70511126
1,70511126
add a comment |
add a comment |
A snippet from Psycopg2's tutorial page at Postgresql.org (see bottom):
A last item I would like to show you is how to insert multiple rows using a dictionary. If you had the following:
namedict = ({"first_name":"Joshua", "last_name":"Drake"},
{"first_name":"Steven", "last_name":"Foo"},
{"first_name":"David", "last_name":"Bar"})
You could easily insert all three rows within the dictionary by using:
cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)
It doesn't save much code, but it definitively looks better.
29
This will run many individualINSERT
statements. Useful, but not the same as a single multi-VALUE
d insert.
– Craig Ringer
Apr 9 '13 at 2:26
add a comment |
A snippet from Psycopg2's tutorial page at Postgresql.org (see bottom):
A last item I would like to show you is how to insert multiple rows using a dictionary. If you had the following:
namedict = ({"first_name":"Joshua", "last_name":"Drake"},
{"first_name":"Steven", "last_name":"Foo"},
{"first_name":"David", "last_name":"Bar"})
You could easily insert all three rows within the dictionary by using:
cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)
It doesn't save much code, but it definitively looks better.
29
This will run many individualINSERT
statements. Useful, but not the same as a single multi-VALUE
d insert.
– Craig Ringer
Apr 9 '13 at 2:26
add a comment |
A snippet from Psycopg2's tutorial page at Postgresql.org (see bottom):
A last item I would like to show you is how to insert multiple rows using a dictionary. If you had the following:
namedict = ({"first_name":"Joshua", "last_name":"Drake"},
{"first_name":"Steven", "last_name":"Foo"},
{"first_name":"David", "last_name":"Bar"})
You could easily insert all three rows within the dictionary by using:
cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)
It doesn't save much code, but it definitively looks better.
A snippet from Psycopg2's tutorial page at Postgresql.org (see bottom):
A last item I would like to show you is how to insert multiple rows using a dictionary. If you had the following:
namedict = ({"first_name":"Joshua", "last_name":"Drake"},
{"first_name":"Steven", "last_name":"Foo"},
{"first_name":"David", "last_name":"Bar"})
You could easily insert all three rows within the dictionary by using:
cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)
It doesn't save much code, but it definitively looks better.
edited Oct 2 '15 at 1:22
Eric Leschinski
87.6k38323274
87.6k38323274
answered Dec 14 '11 at 11:06
ptrnptrn
1,75032228
1,75032228
29
This will run many individualINSERT
statements. Useful, but not the same as a single multi-VALUE
d insert.
– Craig Ringer
Apr 9 '13 at 2:26
add a comment |
29
This will run many individualINSERT
statements. Useful, but not the same as a single multi-VALUE
d insert.
– Craig Ringer
Apr 9 '13 at 2:26
29
29
This will run many individual
INSERT
statements. Useful, but not the same as a single multi-VALUE
d insert.– Craig Ringer
Apr 9 '13 at 2:26
This will run many individual
INSERT
statements. Useful, but not the same as a single multi-VALUE
d insert.– Craig Ringer
Apr 9 '13 at 2:26
add a comment |
cursor.copy_from is the fastest solution I've found for bulk inserts by far. Here's a gist I made containing a class named IteratorFile which allows an iterator yielding strings to be read like a file. We can convert each input record to a string using a generator expression. So the solution would be
args = [(1,2), (3,4), (5,6)]
f = IteratorFile(("{}t{}".format(x[0], x[1]) for x in args))
cursor.copy_from(f, 'table_name', columns=('a', 'b'))
For this trivial size of args it won't make much of a speed difference, but I see big speedups when dealing with thousands+ of rows. It will also be more memory efficient than building a giant query string. An iterator would only ever hold one input record in memory at a time, where at some point you'll run out of memory in your Python process or in Postgres by building the query string.
2
Here is a benchmark comparing copy_from/IteratorFile with a query builder solution. copy_from scales to about 6.5X faster on my machine with 10M records.
– Joseph Sheedy
Sep 30 '16 at 17:49
3
do you have to dick around with escaping strings and timestamps etc?
– CpILL
Jun 13 '17 at 9:35
Yes, you'll have to make sure you have a well formed TSV records.
– Joseph Sheedy
Jun 14 '17 at 2:51
add a comment |
cursor.copy_from is the fastest solution I've found for bulk inserts by far. Here's a gist I made containing a class named IteratorFile which allows an iterator yielding strings to be read like a file. We can convert each input record to a string using a generator expression. So the solution would be
args = [(1,2), (3,4), (5,6)]
f = IteratorFile(("{}t{}".format(x[0], x[1]) for x in args))
cursor.copy_from(f, 'table_name', columns=('a', 'b'))
For this trivial size of args it won't make much of a speed difference, but I see big speedups when dealing with thousands+ of rows. It will also be more memory efficient than building a giant query string. An iterator would only ever hold one input record in memory at a time, where at some point you'll run out of memory in your Python process or in Postgres by building the query string.
2
Here is a benchmark comparing copy_from/IteratorFile with a query builder solution. copy_from scales to about 6.5X faster on my machine with 10M records.
– Joseph Sheedy
Sep 30 '16 at 17:49
3
do you have to dick around with escaping strings and timestamps etc?
– CpILL
Jun 13 '17 at 9:35
Yes, you'll have to make sure you have a well formed TSV records.
– Joseph Sheedy
Jun 14 '17 at 2:51
add a comment |
cursor.copy_from is the fastest solution I've found for bulk inserts by far. Here's a gist I made containing a class named IteratorFile which allows an iterator yielding strings to be read like a file. We can convert each input record to a string using a generator expression. So the solution would be
args = [(1,2), (3,4), (5,6)]
f = IteratorFile(("{}t{}".format(x[0], x[1]) for x in args))
cursor.copy_from(f, 'table_name', columns=('a', 'b'))
For this trivial size of args it won't make much of a speed difference, but I see big speedups when dealing with thousands+ of rows. It will also be more memory efficient than building a giant query string. An iterator would only ever hold one input record in memory at a time, where at some point you'll run out of memory in your Python process or in Postgres by building the query string.
cursor.copy_from is the fastest solution I've found for bulk inserts by far. Here's a gist I made containing a class named IteratorFile which allows an iterator yielding strings to be read like a file. We can convert each input record to a string using a generator expression. So the solution would be
args = [(1,2), (3,4), (5,6)]
f = IteratorFile(("{}t{}".format(x[0], x[1]) for x in args))
cursor.copy_from(f, 'table_name', columns=('a', 'b'))
For this trivial size of args it won't make much of a speed difference, but I see big speedups when dealing with thousands+ of rows. It will also be more memory efficient than building a giant query string. An iterator would only ever hold one input record in memory at a time, where at some point you'll run out of memory in your Python process or in Postgres by building the query string.
edited Sep 30 '16 at 20:25
answered Jun 9 '15 at 1:06
Joseph SheedyJoseph Sheedy
3,58121726
3,58121726
2
Here is a benchmark comparing copy_from/IteratorFile with a query builder solution. copy_from scales to about 6.5X faster on my machine with 10M records.
– Joseph Sheedy
Sep 30 '16 at 17:49
3
do you have to dick around with escaping strings and timestamps etc?
– CpILL
Jun 13 '17 at 9:35
Yes, you'll have to make sure you have a well formed TSV records.
– Joseph Sheedy
Jun 14 '17 at 2:51
add a comment |
2
Here is a benchmark comparing copy_from/IteratorFile with a query builder solution. copy_from scales to about 6.5X faster on my machine with 10M records.
– Joseph Sheedy
Sep 30 '16 at 17:49
3
do you have to dick around with escaping strings and timestamps etc?
– CpILL
Jun 13 '17 at 9:35
Yes, you'll have to make sure you have a well formed TSV records.
– Joseph Sheedy
Jun 14 '17 at 2:51
2
2
Here is a benchmark comparing copy_from/IteratorFile with a query builder solution. copy_from scales to about 6.5X faster on my machine with 10M records.
– Joseph Sheedy
Sep 30 '16 at 17:49
Here is a benchmark comparing copy_from/IteratorFile with a query builder solution. copy_from scales to about 6.5X faster on my machine with 10M records.
– Joseph Sheedy
Sep 30 '16 at 17:49
3
3
do you have to dick around with escaping strings and timestamps etc?
– CpILL
Jun 13 '17 at 9:35
do you have to dick around with escaping strings and timestamps etc?
– CpILL
Jun 13 '17 at 9:35
Yes, you'll have to make sure you have a well formed TSV records.
– Joseph Sheedy
Jun 14 '17 at 2:51
Yes, you'll have to make sure you have a well formed TSV records.
– Joseph Sheedy
Jun 14 '17 at 2:51
add a comment |
All of these techniques are called 'Extended Inserts" in Postgres terminology, and as of the 24th of November 2016, it's still a ton faster than psychopg2's executemany() and all the other methods listed in this thread (which i tried before coming to this answer).
Here's some code which doesnt use cur.mogrify and is nice and simply to get your head around:
valueSQL = [ '%s', '%s', '%s', ... ] # as many as you have columns.
sqlrows =
rowsPerInsert = 3 # more means faster, but with diminishing returns..
for row in getSomeData:
# row == [1, 'a', 'yolo', ... ]
sqlrows += row
if ( len(sqlrows)/len(valueSQL) ) % rowsPerInsert == 0:
# sqlrows == [ 1, 'a', 'yolo', 2, 'b', 'swag', 3, 'c', 'selfie' ]
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*rowsPerInsert)
cur.execute(insertSQL, sqlrows)
con.commit()
sqlrows =
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*len(sqlrows))
cur.execute(insertSQL, sqlrows)
con.commit()
But it should be noted that if you can use copy_from(), you should use copy_from ;)
Bringing up from the dead, but what happens in the situation of the last few rows? I assume you actually run that final clause again on the last remaining rows, in the case you have an even number of rows?
– mcpeterson
Nov 23 '16 at 23:03
Correct, sorry I must have forgotten to do that when i wrote the example - thats pretty stupid of me. Not doing so wouldn't have given people an error, which makes me worried how many people copy/pasted the solution and went about their business..... Anyway, very grateful mcpeterson - thank you!
– J.J
Nov 24 '16 at 1:49
add a comment |
All of these techniques are called 'Extended Inserts" in Postgres terminology, and as of the 24th of November 2016, it's still a ton faster than psychopg2's executemany() and all the other methods listed in this thread (which i tried before coming to this answer).
Here's some code which doesnt use cur.mogrify and is nice and simply to get your head around:
valueSQL = [ '%s', '%s', '%s', ... ] # as many as you have columns.
sqlrows =
rowsPerInsert = 3 # more means faster, but with diminishing returns..
for row in getSomeData:
# row == [1, 'a', 'yolo', ... ]
sqlrows += row
if ( len(sqlrows)/len(valueSQL) ) % rowsPerInsert == 0:
# sqlrows == [ 1, 'a', 'yolo', 2, 'b', 'swag', 3, 'c', 'selfie' ]
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*rowsPerInsert)
cur.execute(insertSQL, sqlrows)
con.commit()
sqlrows =
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*len(sqlrows))
cur.execute(insertSQL, sqlrows)
con.commit()
But it should be noted that if you can use copy_from(), you should use copy_from ;)
Bringing up from the dead, but what happens in the situation of the last few rows? I assume you actually run that final clause again on the last remaining rows, in the case you have an even number of rows?
– mcpeterson
Nov 23 '16 at 23:03
Correct, sorry I must have forgotten to do that when i wrote the example - thats pretty stupid of me. Not doing so wouldn't have given people an error, which makes me worried how many people copy/pasted the solution and went about their business..... Anyway, very grateful mcpeterson - thank you!
– J.J
Nov 24 '16 at 1:49
add a comment |
All of these techniques are called 'Extended Inserts" in Postgres terminology, and as of the 24th of November 2016, it's still a ton faster than psychopg2's executemany() and all the other methods listed in this thread (which i tried before coming to this answer).
Here's some code which doesnt use cur.mogrify and is nice and simply to get your head around:
valueSQL = [ '%s', '%s', '%s', ... ] # as many as you have columns.
sqlrows =
rowsPerInsert = 3 # more means faster, but with diminishing returns..
for row in getSomeData:
# row == [1, 'a', 'yolo', ... ]
sqlrows += row
if ( len(sqlrows)/len(valueSQL) ) % rowsPerInsert == 0:
# sqlrows == [ 1, 'a', 'yolo', 2, 'b', 'swag', 3, 'c', 'selfie' ]
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*rowsPerInsert)
cur.execute(insertSQL, sqlrows)
con.commit()
sqlrows =
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*len(sqlrows))
cur.execute(insertSQL, sqlrows)
con.commit()
But it should be noted that if you can use copy_from(), you should use copy_from ;)
All of these techniques are called 'Extended Inserts" in Postgres terminology, and as of the 24th of November 2016, it's still a ton faster than psychopg2's executemany() and all the other methods listed in this thread (which i tried before coming to this answer).
Here's some code which doesnt use cur.mogrify and is nice and simply to get your head around:
valueSQL = [ '%s', '%s', '%s', ... ] # as many as you have columns.
sqlrows =
rowsPerInsert = 3 # more means faster, but with diminishing returns..
for row in getSomeData:
# row == [1, 'a', 'yolo', ... ]
sqlrows += row
if ( len(sqlrows)/len(valueSQL) ) % rowsPerInsert == 0:
# sqlrows == [ 1, 'a', 'yolo', 2, 'b', 'swag', 3, 'c', 'selfie' ]
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*rowsPerInsert)
cur.execute(insertSQL, sqlrows)
con.commit()
sqlrows =
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*len(sqlrows))
cur.execute(insertSQL, sqlrows)
con.commit()
But it should be noted that if you can use copy_from(), you should use copy_from ;)
edited Nov 24 '16 at 1:47
answered May 4 '15 at 15:28
J.JJ.J
1,82111828
1,82111828
Bringing up from the dead, but what happens in the situation of the last few rows? I assume you actually run that final clause again on the last remaining rows, in the case you have an even number of rows?
– mcpeterson
Nov 23 '16 at 23:03
Correct, sorry I must have forgotten to do that when i wrote the example - thats pretty stupid of me. Not doing so wouldn't have given people an error, which makes me worried how many people copy/pasted the solution and went about their business..... Anyway, very grateful mcpeterson - thank you!
– J.J
Nov 24 '16 at 1:49
add a comment |
Bringing up from the dead, but what happens in the situation of the last few rows? I assume you actually run that final clause again on the last remaining rows, in the case you have an even number of rows?
– mcpeterson
Nov 23 '16 at 23:03
Correct, sorry I must have forgotten to do that when i wrote the example - thats pretty stupid of me. Not doing so wouldn't have given people an error, which makes me worried how many people copy/pasted the solution and went about their business..... Anyway, very grateful mcpeterson - thank you!
– J.J
Nov 24 '16 at 1:49
Bringing up from the dead, but what happens in the situation of the last few rows? I assume you actually run that final clause again on the last remaining rows, in the case you have an even number of rows?
– mcpeterson
Nov 23 '16 at 23:03
Bringing up from the dead, but what happens in the situation of the last few rows? I assume you actually run that final clause again on the last remaining rows, in the case you have an even number of rows?
– mcpeterson
Nov 23 '16 at 23:03
Correct, sorry I must have forgotten to do that when i wrote the example - thats pretty stupid of me. Not doing so wouldn't have given people an error, which makes me worried how many people copy/pasted the solution and went about their business..... Anyway, very grateful mcpeterson - thank you!
– J.J
Nov 24 '16 at 1:49
Correct, sorry I must have forgotten to do that when i wrote the example - thats pretty stupid of me. Not doing so wouldn't have given people an error, which makes me worried how many people copy/pasted the solution and went about their business..... Anyway, very grateful mcpeterson - thank you!
– J.J
Nov 24 '16 at 1:49
add a comment |
To nicely insert into DB a list of rows, using user given batch size and with psycopg2 !
def get_batch(iterable, size=100):
for i in range(0, len(iterable), size):
yield iterable[i: i + size]
def insert_rows_batch(table, rows, batch_size=500, target_fields=None):
"""
A utility method to insert batch of tuples(rows) into a table
NOTE: Handle data type for fields in rows yourself as per your table
columns' type.
:param table: Name of the target table
:type table: str
:param rows: The rows to insert into the table
:type rows: iterable of tuples
:param batch_size: The size of batch of rows to insert at a time
:type batch_size: int
:param target_fields: The names of the columns to fill in the table
:type target_fields: iterable of strings
"""
conn = cur = None
if target_fields:
target_fields = ", ".join(target_fields)
target_fields = "({})".format(target_fields)
else:
target_fields = ''
conn = get_conn() # get connection using psycopg2
if conn:
cur = conn.cursor()
count = 0
for mini_batch in get_batch(rows, batch_size):
mini_batch_size = len(mini_batch)
count += mini_batch_size
record_template = ','.join(["%s"] * mini_batch_size)
sql = "INSERT INTO {0} {1} VALUES {2};".format(
table,
target_fields,
record_template)
cur.execute(sql, mini_batch)
conn.commit()
print("Loaded {} rows into {} so far".format(count, table))
print("Done loading. Loaded a total of {} rows".format(count))
if cur:cur.close()
if conn:conn.close()
If you want UPSERT (Insert+Update) as well in postgres with batches: postgres_utilities
add a comment |
To nicely insert into DB a list of rows, using user given batch size and with psycopg2 !
def get_batch(iterable, size=100):
for i in range(0, len(iterable), size):
yield iterable[i: i + size]
def insert_rows_batch(table, rows, batch_size=500, target_fields=None):
"""
A utility method to insert batch of tuples(rows) into a table
NOTE: Handle data type for fields in rows yourself as per your table
columns' type.
:param table: Name of the target table
:type table: str
:param rows: The rows to insert into the table
:type rows: iterable of tuples
:param batch_size: The size of batch of rows to insert at a time
:type batch_size: int
:param target_fields: The names of the columns to fill in the table
:type target_fields: iterable of strings
"""
conn = cur = None
if target_fields:
target_fields = ", ".join(target_fields)
target_fields = "({})".format(target_fields)
else:
target_fields = ''
conn = get_conn() # get connection using psycopg2
if conn:
cur = conn.cursor()
count = 0
for mini_batch in get_batch(rows, batch_size):
mini_batch_size = len(mini_batch)
count += mini_batch_size
record_template = ','.join(["%s"] * mini_batch_size)
sql = "INSERT INTO {0} {1} VALUES {2};".format(
table,
target_fields,
record_template)
cur.execute(sql, mini_batch)
conn.commit()
print("Loaded {} rows into {} so far".format(count, table))
print("Done loading. Loaded a total of {} rows".format(count))
if cur:cur.close()
if conn:conn.close()
If you want UPSERT (Insert+Update) as well in postgres with batches: postgres_utilities
add a comment |
To nicely insert into DB a list of rows, using user given batch size and with psycopg2 !
def get_batch(iterable, size=100):
for i in range(0, len(iterable), size):
yield iterable[i: i + size]
def insert_rows_batch(table, rows, batch_size=500, target_fields=None):
"""
A utility method to insert batch of tuples(rows) into a table
NOTE: Handle data type for fields in rows yourself as per your table
columns' type.
:param table: Name of the target table
:type table: str
:param rows: The rows to insert into the table
:type rows: iterable of tuples
:param batch_size: The size of batch of rows to insert at a time
:type batch_size: int
:param target_fields: The names of the columns to fill in the table
:type target_fields: iterable of strings
"""
conn = cur = None
if target_fields:
target_fields = ", ".join(target_fields)
target_fields = "({})".format(target_fields)
else:
target_fields = ''
conn = get_conn() # get connection using psycopg2
if conn:
cur = conn.cursor()
count = 0
for mini_batch in get_batch(rows, batch_size):
mini_batch_size = len(mini_batch)
count += mini_batch_size
record_template = ','.join(["%s"] * mini_batch_size)
sql = "INSERT INTO {0} {1} VALUES {2};".format(
table,
target_fields,
record_template)
cur.execute(sql, mini_batch)
conn.commit()
print("Loaded {} rows into {} so far".format(count, table))
print("Done loading. Loaded a total of {} rows".format(count))
if cur:cur.close()
if conn:conn.close()
If you want UPSERT (Insert+Update) as well in postgres with batches: postgres_utilities
To nicely insert into DB a list of rows, using user given batch size and with psycopg2 !
def get_batch(iterable, size=100):
for i in range(0, len(iterable), size):
yield iterable[i: i + size]
def insert_rows_batch(table, rows, batch_size=500, target_fields=None):
"""
A utility method to insert batch of tuples(rows) into a table
NOTE: Handle data type for fields in rows yourself as per your table
columns' type.
:param table: Name of the target table
:type table: str
:param rows: The rows to insert into the table
:type rows: iterable of tuples
:param batch_size: The size of batch of rows to insert at a time
:type batch_size: int
:param target_fields: The names of the columns to fill in the table
:type target_fields: iterable of strings
"""
conn = cur = None
if target_fields:
target_fields = ", ".join(target_fields)
target_fields = "({})".format(target_fields)
else:
target_fields = ''
conn = get_conn() # get connection using psycopg2
if conn:
cur = conn.cursor()
count = 0
for mini_batch in get_batch(rows, batch_size):
mini_batch_size = len(mini_batch)
count += mini_batch_size
record_template = ','.join(["%s"] * mini_batch_size)
sql = "INSERT INTO {0} {1} VALUES {2};".format(
table,
target_fields,
record_template)
cur.execute(sql, mini_batch)
conn.commit()
print("Loaded {} rows into {} so far".format(count, table))
print("Done loading. Loaded a total of {} rows".format(count))
if cur:cur.close()
if conn:conn.close()
If you want UPSERT (Insert+Update) as well in postgres with batches: postgres_utilities
edited Jan 7 at 5:42
answered Apr 6 '18 at 11:03
MANUMANU
3001415
3001415
add a comment |
add a comment |
Another nice and efficient approach - is to pass rows for insertion as 1 argument,
which is array of json objects.
E.g. you passing argument:
[ {id: 18, score: 1}, { id: 19, score: 5} ]
It is array, which may contain any amount of objects inside.
Then your SQL looks like:
INSERT INTO links (parent_id, child_id, score)
SELECT 123, (r->>'id')::int, (r->>'score')::int
FROM unnest($1::json) as r
Notice: Your postgress must be new enough, to support json
add a comment |
Another nice and efficient approach - is to pass rows for insertion as 1 argument,
which is array of json objects.
E.g. you passing argument:
[ {id: 18, score: 1}, { id: 19, score: 5} ]
It is array, which may contain any amount of objects inside.
Then your SQL looks like:
INSERT INTO links (parent_id, child_id, score)
SELECT 123, (r->>'id')::int, (r->>'score')::int
FROM unnest($1::json) as r
Notice: Your postgress must be new enough, to support json
add a comment |
Another nice and efficient approach - is to pass rows for insertion as 1 argument,
which is array of json objects.
E.g. you passing argument:
[ {id: 18, score: 1}, { id: 19, score: 5} ]
It is array, which may contain any amount of objects inside.
Then your SQL looks like:
INSERT INTO links (parent_id, child_id, score)
SELECT 123, (r->>'id')::int, (r->>'score')::int
FROM unnest($1::json) as r
Notice: Your postgress must be new enough, to support json
Another nice and efficient approach - is to pass rows for insertion as 1 argument,
which is array of json objects.
E.g. you passing argument:
[ {id: 18, score: 1}, { id: 19, score: 5} ]
It is array, which may contain any amount of objects inside.
Then your SQL looks like:
INSERT INTO links (parent_id, child_id, score)
SELECT 123, (r->>'id')::int, (r->>'score')::int
FROM unnest($1::json) as r
Notice: Your postgress must be new enough, to support json
answered Oct 18 '15 at 11:04
Dan KeyDan Key
2,3131822
2,3131822
add a comment |
add a comment |
I've been using ant32's answer above for several years. However I've found that is thorws an error in python 3 because mogrify
returns a byte string.
Converting explicitly to bytse strings is a simple solution for making code python 3 compatible.
args_str = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_str)
add a comment |
I've been using ant32's answer above for several years. However I've found that is thorws an error in python 3 because mogrify
returns a byte string.
Converting explicitly to bytse strings is a simple solution for making code python 3 compatible.
args_str = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_str)
add a comment |
I've been using ant32's answer above for several years. However I've found that is thorws an error in python 3 because mogrify
returns a byte string.
Converting explicitly to bytse strings is a simple solution for making code python 3 compatible.
args_str = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_str)
I've been using ant32's answer above for several years. However I've found that is thorws an error in python 3 because mogrify
returns a byte string.
Converting explicitly to bytse strings is a simple solution for making code python 3 compatible.
args_str = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_str)
answered Jul 27 '17 at 1:29
jprockbellyjprockbelly
989921
989921
add a comment |
add a comment |
If you're using SQLAlchemy, you don't need to mess with hand-crafting the string because SQLAlchemy supports generating a multi-row VALUES
clause for a single INSERT
statement:
rows =
for i, name in enumerate(rawdata):
row = {
'id': i,
'name': name,
'valid': True,
}
rows.append(row)
if len(rows) > 0: # INSERT fails if no rows
insert_query = SQLAlchemyModelName.__table__.insert().values(rows)
session.execute(insert_query)
Under the hood SQLAlchemy uses psychopg2's executemany() for calls like this and therefore this answer will have severe performance issues for large queries. See the execute method docs.sqlalchemy.org/en/latest/orm/session_api.html.
– sage88
Jan 5 '17 at 22:48
2
I don't think that's the case. It's been a bit since I looked at this, but IIRC, this is actually building a single insertion statement in theinsert_query
line. Then,session.execute()
is just calling psycopg2'sexecute()
statement with a single massive string. So the "trick" is building the entire insertion statement object first. I am using this to insert 200,000 rows at a time and saw massive performance increases using this code compared to the normalexecutemany()
.
– Jeff Widman
Jan 6 '17 at 1:13
1
The SQLAlchemy doc you linked to has a section that shows exactly how this works and even says: "It is essential to note that passing multiple values is NOT the same as using traditional executemany() form". So it is explicitly calling out that this works.
– Jeff Widman
Jan 6 '17 at 1:23
1
I stand corrected. I didn't notice your usage of the values() method (without it SQLAlchemy just does executemany). I would say edit the answer to include a link to that doc so that I can change my vote, but obviously you've already included it. Perhaps mention that this is not the same thing as calling an insert() with execute() with a list of dicts?
– sage88
Jan 6 '17 at 6:15
add a comment |
If you're using SQLAlchemy, you don't need to mess with hand-crafting the string because SQLAlchemy supports generating a multi-row VALUES
clause for a single INSERT
statement:
rows =
for i, name in enumerate(rawdata):
row = {
'id': i,
'name': name,
'valid': True,
}
rows.append(row)
if len(rows) > 0: # INSERT fails if no rows
insert_query = SQLAlchemyModelName.__table__.insert().values(rows)
session.execute(insert_query)
Under the hood SQLAlchemy uses psychopg2's executemany() for calls like this and therefore this answer will have severe performance issues for large queries. See the execute method docs.sqlalchemy.org/en/latest/orm/session_api.html.
– sage88
Jan 5 '17 at 22:48
2
I don't think that's the case. It's been a bit since I looked at this, but IIRC, this is actually building a single insertion statement in theinsert_query
line. Then,session.execute()
is just calling psycopg2'sexecute()
statement with a single massive string. So the "trick" is building the entire insertion statement object first. I am using this to insert 200,000 rows at a time and saw massive performance increases using this code compared to the normalexecutemany()
.
– Jeff Widman
Jan 6 '17 at 1:13
1
The SQLAlchemy doc you linked to has a section that shows exactly how this works and even says: "It is essential to note that passing multiple values is NOT the same as using traditional executemany() form". So it is explicitly calling out that this works.
– Jeff Widman
Jan 6 '17 at 1:23
1
I stand corrected. I didn't notice your usage of the values() method (without it SQLAlchemy just does executemany). I would say edit the answer to include a link to that doc so that I can change my vote, but obviously you've already included it. Perhaps mention that this is not the same thing as calling an insert() with execute() with a list of dicts?
– sage88
Jan 6 '17 at 6:15
add a comment |
If you're using SQLAlchemy, you don't need to mess with hand-crafting the string because SQLAlchemy supports generating a multi-row VALUES
clause for a single INSERT
statement:
rows =
for i, name in enumerate(rawdata):
row = {
'id': i,
'name': name,
'valid': True,
}
rows.append(row)
if len(rows) > 0: # INSERT fails if no rows
insert_query = SQLAlchemyModelName.__table__.insert().values(rows)
session.execute(insert_query)
If you're using SQLAlchemy, you don't need to mess with hand-crafting the string because SQLAlchemy supports generating a multi-row VALUES
clause for a single INSERT
statement:
rows =
for i, name in enumerate(rawdata):
row = {
'id': i,
'name': name,
'valid': True,
}
rows.append(row)
if len(rows) > 0: # INSERT fails if no rows
insert_query = SQLAlchemyModelName.__table__.insert().values(rows)
session.execute(insert_query)
edited Oct 19 '17 at 2:47
answered Apr 27 '16 at 2:48
Jeff WidmanJeff Widman
7,38434365
7,38434365
Under the hood SQLAlchemy uses psychopg2's executemany() for calls like this and therefore this answer will have severe performance issues for large queries. See the execute method docs.sqlalchemy.org/en/latest/orm/session_api.html.
– sage88
Jan 5 '17 at 22:48
2
I don't think that's the case. It's been a bit since I looked at this, but IIRC, this is actually building a single insertion statement in theinsert_query
line. Then,session.execute()
is just calling psycopg2'sexecute()
statement with a single massive string. So the "trick" is building the entire insertion statement object first. I am using this to insert 200,000 rows at a time and saw massive performance increases using this code compared to the normalexecutemany()
.
– Jeff Widman
Jan 6 '17 at 1:13
1
The SQLAlchemy doc you linked to has a section that shows exactly how this works and even says: "It is essential to note that passing multiple values is NOT the same as using traditional executemany() form". So it is explicitly calling out that this works.
– Jeff Widman
Jan 6 '17 at 1:23
1
I stand corrected. I didn't notice your usage of the values() method (without it SQLAlchemy just does executemany). I would say edit the answer to include a link to that doc so that I can change my vote, but obviously you've already included it. Perhaps mention that this is not the same thing as calling an insert() with execute() with a list of dicts?
– sage88
Jan 6 '17 at 6:15
add a comment |
Under the hood SQLAlchemy uses psychopg2's executemany() for calls like this and therefore this answer will have severe performance issues for large queries. See the execute method docs.sqlalchemy.org/en/latest/orm/session_api.html.
– sage88
Jan 5 '17 at 22:48
2
I don't think that's the case. It's been a bit since I looked at this, but IIRC, this is actually building a single insertion statement in theinsert_query
line. Then,session.execute()
is just calling psycopg2'sexecute()
statement with a single massive string. So the "trick" is building the entire insertion statement object first. I am using this to insert 200,000 rows at a time and saw massive performance increases using this code compared to the normalexecutemany()
.
– Jeff Widman
Jan 6 '17 at 1:13
1
The SQLAlchemy doc you linked to has a section that shows exactly how this works and even says: "It is essential to note that passing multiple values is NOT the same as using traditional executemany() form". So it is explicitly calling out that this works.
– Jeff Widman
Jan 6 '17 at 1:23
1
I stand corrected. I didn't notice your usage of the values() method (without it SQLAlchemy just does executemany). I would say edit the answer to include a link to that doc so that I can change my vote, but obviously you've already included it. Perhaps mention that this is not the same thing as calling an insert() with execute() with a list of dicts?
– sage88
Jan 6 '17 at 6:15
Under the hood SQLAlchemy uses psychopg2's executemany() for calls like this and therefore this answer will have severe performance issues for large queries. See the execute method docs.sqlalchemy.org/en/latest/orm/session_api.html.
– sage88
Jan 5 '17 at 22:48
Under the hood SQLAlchemy uses psychopg2's executemany() for calls like this and therefore this answer will have severe performance issues for large queries. See the execute method docs.sqlalchemy.org/en/latest/orm/session_api.html.
– sage88
Jan 5 '17 at 22:48
2
2
I don't think that's the case. It's been a bit since I looked at this, but IIRC, this is actually building a single insertion statement in the
insert_query
line. Then, session.execute()
is just calling psycopg2's execute()
statement with a single massive string. So the "trick" is building the entire insertion statement object first. I am using this to insert 200,000 rows at a time and saw massive performance increases using this code compared to the normal executemany()
.– Jeff Widman
Jan 6 '17 at 1:13
I don't think that's the case. It's been a bit since I looked at this, but IIRC, this is actually building a single insertion statement in the
insert_query
line. Then, session.execute()
is just calling psycopg2's execute()
statement with a single massive string. So the "trick" is building the entire insertion statement object first. I am using this to insert 200,000 rows at a time and saw massive performance increases using this code compared to the normal executemany()
.– Jeff Widman
Jan 6 '17 at 1:13
1
1
The SQLAlchemy doc you linked to has a section that shows exactly how this works and even says: "It is essential to note that passing multiple values is NOT the same as using traditional executemany() form". So it is explicitly calling out that this works.
– Jeff Widman
Jan 6 '17 at 1:23
The SQLAlchemy doc you linked to has a section that shows exactly how this works and even says: "It is essential to note that passing multiple values is NOT the same as using traditional executemany() form". So it is explicitly calling out that this works.
– Jeff Widman
Jan 6 '17 at 1:23
1
1
I stand corrected. I didn't notice your usage of the values() method (without it SQLAlchemy just does executemany). I would say edit the answer to include a link to that doc so that I can change my vote, but obviously you've already included it. Perhaps mention that this is not the same thing as calling an insert() with execute() with a list of dicts?
– sage88
Jan 6 '17 at 6:15
I stand corrected. I didn't notice your usage of the values() method (without it SQLAlchemy just does executemany). I would say edit the answer to include a link to that doc so that I can change my vote, but obviously you've already included it. Perhaps mention that this is not the same thing as calling an insert() with execute() with a list of dicts?
– sage88
Jan 6 '17 at 6:15
add a comment |
If you want to insert multiple rows within one insert statemens (assuming you are not using ORM) the easiest way so far for me would be to use list of dictionaries. Here is an example:
t = [{'id':1, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 6},
{'id':2, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 7},
{'id':3, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 8}]
conn.execute("insert into campaign_dates
(id, start_date, end_date, campaignid)
values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);",
t)
As you can see only one query will be executed:
INFO sqlalchemy.engine.base.Engine insert into campaign_dates (id, start_date, end_date, campaignid) values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);
INFO sqlalchemy.engine.base.Engine [{'campaignid': 6, 'id': 1, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 7, 'id': 2, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 8, 'id': 3, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}]
INFO sqlalchemy.engine.base.Engine COMMIT
Showing logging from the sqlalchemy engine is NOT a demonstration of only running a single query, it just means that the sqlalchemy engine ran one command. Under the hood this is using psychopg2's executemany which is very inefficient. See the execute method docs.sqlalchemy.org/en/latest/orm/session_api.html.
– sage88
Jan 5 '17 at 23:06
add a comment |
If you want to insert multiple rows within one insert statemens (assuming you are not using ORM) the easiest way so far for me would be to use list of dictionaries. Here is an example:
t = [{'id':1, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 6},
{'id':2, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 7},
{'id':3, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 8}]
conn.execute("insert into campaign_dates
(id, start_date, end_date, campaignid)
values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);",
t)
As you can see only one query will be executed:
INFO sqlalchemy.engine.base.Engine insert into campaign_dates (id, start_date, end_date, campaignid) values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);
INFO sqlalchemy.engine.base.Engine [{'campaignid': 6, 'id': 1, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 7, 'id': 2, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 8, 'id': 3, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}]
INFO sqlalchemy.engine.base.Engine COMMIT
Showing logging from the sqlalchemy engine is NOT a demonstration of only running a single query, it just means that the sqlalchemy engine ran one command. Under the hood this is using psychopg2's executemany which is very inefficient. See the execute method docs.sqlalchemy.org/en/latest/orm/session_api.html.
– sage88
Jan 5 '17 at 23:06
add a comment |
If you want to insert multiple rows within one insert statemens (assuming you are not using ORM) the easiest way so far for me would be to use list of dictionaries. Here is an example:
t = [{'id':1, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 6},
{'id':2, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 7},
{'id':3, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 8}]
conn.execute("insert into campaign_dates
(id, start_date, end_date, campaignid)
values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);",
t)
As you can see only one query will be executed:
INFO sqlalchemy.engine.base.Engine insert into campaign_dates (id, start_date, end_date, campaignid) values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);
INFO sqlalchemy.engine.base.Engine [{'campaignid': 6, 'id': 1, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 7, 'id': 2, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 8, 'id': 3, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}]
INFO sqlalchemy.engine.base.Engine COMMIT
If you want to insert multiple rows within one insert statemens (assuming you are not using ORM) the easiest way so far for me would be to use list of dictionaries. Here is an example:
t = [{'id':1, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 6},
{'id':2, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 7},
{'id':3, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 8}]
conn.execute("insert into campaign_dates
(id, start_date, end_date, campaignid)
values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);",
t)
As you can see only one query will be executed:
INFO sqlalchemy.engine.base.Engine insert into campaign_dates (id, start_date, end_date, campaignid) values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);
INFO sqlalchemy.engine.base.Engine [{'campaignid': 6, 'id': 1, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 7, 'id': 2, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 8, 'id': 3, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}]
INFO sqlalchemy.engine.base.Engine COMMIT
answered Nov 16 '15 at 15:21
AlexAlex
1177
1177
Showing logging from the sqlalchemy engine is NOT a demonstration of only running a single query, it just means that the sqlalchemy engine ran one command. Under the hood this is using psychopg2's executemany which is very inefficient. See the execute method docs.sqlalchemy.org/en/latest/orm/session_api.html.
– sage88
Jan 5 '17 at 23:06
add a comment |
Showing logging from the sqlalchemy engine is NOT a demonstration of only running a single query, it just means that the sqlalchemy engine ran one command. Under the hood this is using psychopg2's executemany which is very inefficient. See the execute method docs.sqlalchemy.org/en/latest/orm/session_api.html.
– sage88
Jan 5 '17 at 23:06
Showing logging from the sqlalchemy engine is NOT a demonstration of only running a single query, it just means that the sqlalchemy engine ran one command. Under the hood this is using psychopg2's executemany which is very inefficient. See the execute method docs.sqlalchemy.org/en/latest/orm/session_api.html.
– sage88
Jan 5 '17 at 23:06
Showing logging from the sqlalchemy engine is NOT a demonstration of only running a single query, it just means that the sqlalchemy engine ran one command. Under the hood this is using psychopg2's executemany which is very inefficient. See the execute method docs.sqlalchemy.org/en/latest/orm/session_api.html.
– sage88
Jan 5 '17 at 23:06
add a comment |
Finally in SQLalchemy1.2 version, this new implementation is added to use psycopg2.extras.execute_batch() instead of executemany when you initialize your engine with use_batch_mode=True like:
engine = create_engine(
"postgresql+psycopg2://scott:tiger@host/dbname",
use_batch_mode=True)
http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109
Then someone would have to use SQLalchmey won't bother to try different combinations of sqla and psycopg2 and direct SQL together..
add a comment |
Finally in SQLalchemy1.2 version, this new implementation is added to use psycopg2.extras.execute_batch() instead of executemany when you initialize your engine with use_batch_mode=True like:
engine = create_engine(
"postgresql+psycopg2://scott:tiger@host/dbname",
use_batch_mode=True)
http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109
Then someone would have to use SQLalchmey won't bother to try different combinations of sqla and psycopg2 and direct SQL together..
add a comment |
Finally in SQLalchemy1.2 version, this new implementation is added to use psycopg2.extras.execute_batch() instead of executemany when you initialize your engine with use_batch_mode=True like:
engine = create_engine(
"postgresql+psycopg2://scott:tiger@host/dbname",
use_batch_mode=True)
http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109
Then someone would have to use SQLalchmey won't bother to try different combinations of sqla and psycopg2 and direct SQL together..
Finally in SQLalchemy1.2 version, this new implementation is added to use psycopg2.extras.execute_batch() instead of executemany when you initialize your engine with use_batch_mode=True like:
engine = create_engine(
"postgresql+psycopg2://scott:tiger@host/dbname",
use_batch_mode=True)
http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109
Then someone would have to use SQLalchmey won't bother to try different combinations of sqla and psycopg2 and direct SQL together..
answered Jan 2 '18 at 2:01
user2189731user2189731
997
997
add a comment |
add a comment |
Using aiopg - The snippet below works perfectly fine
# items = [10, 11, 12, 13]
# group = 1
tup = [(gid, pid) for pid in items]
args_str = ",".join([str(s) for s in tup])
# insert into group values (1, 10), (1, 11), (1, 12), (1, 13)
yield from cur.execute("INSERT INTO group VALUES " + args_str)
8
This method is not safe from SQL injections. As psycopg2 documentation states (that aiopg2 links to): 'Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.'
– Michał Pawłowski
Sep 10 '15 at 16:13
add a comment |
Using aiopg - The snippet below works perfectly fine
# items = [10, 11, 12, 13]
# group = 1
tup = [(gid, pid) for pid in items]
args_str = ",".join([str(s) for s in tup])
# insert into group values (1, 10), (1, 11), (1, 12), (1, 13)
yield from cur.execute("INSERT INTO group VALUES " + args_str)
8
This method is not safe from SQL injections. As psycopg2 documentation states (that aiopg2 links to): 'Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.'
– Michał Pawłowski
Sep 10 '15 at 16:13
add a comment |
Using aiopg - The snippet below works perfectly fine
# items = [10, 11, 12, 13]
# group = 1
tup = [(gid, pid) for pid in items]
args_str = ",".join([str(s) for s in tup])
# insert into group values (1, 10), (1, 11), (1, 12), (1, 13)
yield from cur.execute("INSERT INTO group VALUES " + args_str)
Using aiopg - The snippet below works perfectly fine
# items = [10, 11, 12, 13]
# group = 1
tup = [(gid, pid) for pid in items]
args_str = ",".join([str(s) for s in tup])
# insert into group values (1, 10), (1, 11), (1, 12), (1, 13)
yield from cur.execute("INSERT INTO group VALUES " + args_str)
answered Jul 16 '15 at 10:41
Nihal SharmaNihal Sharma
1,06542747
1,06542747
8
This method is not safe from SQL injections. As psycopg2 documentation states (that aiopg2 links to): 'Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.'
– Michał Pawłowski
Sep 10 '15 at 16:13
add a comment |
8
This method is not safe from SQL injections. As psycopg2 documentation states (that aiopg2 links to): 'Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.'
– Michał Pawłowski
Sep 10 '15 at 16:13
8
8
This method is not safe from SQL injections. As psycopg2 documentation states (that aiopg2 links to): 'Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.'
– Michał Pawłowski
Sep 10 '15 at 16:13
This method is not safe from SQL injections. As psycopg2 documentation states (that aiopg2 links to): 'Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.'
– Michał Pawłowski
Sep 10 '15 at 16:13
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f8134602%2fpsycopg2-insert-multiple-rows-with-one-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown