python - PostgreSQL connection closes unexpectedly when doing a large insert -
i'm populating postgresql table ~11.000.000 rows have been selected before database. i'm using python , psycopg2. whole process takes estimated 1.5 hours complete. however, after ~30 minutes "connection closed unexpectedly" exception. source code looks this:
incursor = indb.cursor() incursor.execute("select ...") indb.commit() # (1) close transaction outcursor = outdb.cursor() rows = 0 (col1, col2, col3) in incursor: # incursor contains ~11.000.000 rows outcursor.execute("insert ...", (col1, col2, col3)) # fails after ~30 minutes row += 1 if row % 100 == 0: # (2) write data every 100 rows outcursor.close() outdb.commit() outcursor = outdb.cursor() incursor.close() outcursor.close() outdb.commit()
i inserted (1)
, (2)
after first tries failed, assuming open transaction has upper time limit of ~30 minutes or cursor has upper limit of pending inserts. seems none of assumptions true , error lies somewhere else.
both databases stored on virtualbox machine connect via port forwarding host. run program on host machine.
both database testing purposes , have no other connections manage. maybe have rewrite problem around this, need time-consuming inserts elsewhere (running approx. days) i'm concerned hidden time limits in psycopg2
or postgresql.
i don't know of such "hidden" timeout in postgresql itself. postgresql have statement_timeout
, if hit should error: canceling statement due statement timeout
in server log (and log cancelled statement too). can't speak psycopg2. check server log looks relevant.
maybe it's networking issue? long-running statement tcp connection stays idle long time. perhaps port forwarding purges connections idle more 30 minutes? maybe tcp connections aren't using keepalive. postgresql has settings tuning tcp keepalive (tcp_keepalives_interval etc.) , may need kernel/networking configuration make sure enabled.
e.g. tried connecting own machine here , tcp_keepalives_interval
defaults 7200, 2 hours. if port forwarding cuts off after 30 minutes, default won't do. can override setting used in client connection string (assuming can twiddle conninfo string directly), or set guc variable in user/database properties or postgresql.conf.
see:
- http://www.postgresql.org/docs/9.0/static/runtime-config-connection.html - server settings
- http://www.postgresql.org/docs/9.0/static/libpq-connect.html - client connection string settings
Comments
Post a Comment