SQL databases¶
Note
If you need to create temporary tables, use CREATE TEMPORARY TABLE
. If you need to create persistent tables, create a new schema first; do not create tables in the public
schema.
Connect to a database¶
Note
To query the database directly from your personal computer, request a personal SQL user account from James or Yohanna, and configure psql, Beekeeper Studio and/or pgAdmin to use it.
OCP has a main database on the postgres.kingfisher.open-contracting.org
server.
psql¶
If PostgreSQL is installed, you can use psql, PostgreSQL’s interactive terminal, from the command-line.
For security, remember to set sslmode
to require
.
psql "dbname=DBNAME user=USERNAME host=HOST sslmode=require"
For example:
psql "dbname=kingfisher_process user=jmckinney host=postgres.kingfisher.open-contracting.org sslmode=require"
Instead of entering your password each time, you can add your credentials to the PostgreSQL Password File, replacing USER
and PASS
:
echo 'postgres.kingfisher.open-contracting.org:5432:kingfisher_process:USERNAME:PASSWORD' >> ~/.pgpass
Then, set the permissions of the ~/.pgpass
file:
chmod 600 ~/.pgpass
Tip
If you are logged into the postgres.kingfisher.open-contracting.org
server, you can also run:
psql kingfisher_process
Beekeeper Studio¶
Beekeeper Studio is a cross-platform app for querying databases. Use the open-source Community Edition.
For security, remember to check Enable SSL.
Select “Postgres” from Connection Type
Set the Host, e.g. “postgres.kingfisher.open-contracting.org”
Check Enable SSL
Set the User
Set the Password
Set the Default Database, e.g. “kingfisher_process”
Click the Test button
Then, either click the Connect button or set the Connection Name and click Save.
pgAdmin¶
pgAdmin is a locally hosted web interface for querying databases.
For security, remember to set SSL mode to “Require”.
Open the Object > Create > Server… menu item
Set the Name, e.g. “Kingfisher”
Click the Connection tab
Set the Host name/address, e.g. “postgres.kingfisher.open-contracting.org”
Set the Username
Set the Password
Check Save password?
Click the SSL tab
Set SSL mode to “Require”
Click the Save button
To avoid unnecessary queries to the database, please make these one-time configuration changes:
Open the File > Preferences menu item
Click Display under Dashboards in the sidebar
Uncheck Show activity?
Uncheck Show graphs?
Click the Save button
Google Colaboratory¶
Google Colaboratory is an executable document to write, run and share code in Google Drive, similar to Jupyter Notebook.
Install the ocdskingfishercolab Python package, which installs the ipython-sql Python package.
For security, remember to set sslmode
to 'require'
.
%sql postgresql://USERNAME:PASSWORD@postgres.kingfisher.open-contracting.org/kingfisher_process?sslmode=require
Note
There is an open issue to use Colaboratory Forms to store credentials.
Python¶
Python is the programming language in which many OCDS tools are written.
Install the psycopg2 Python package.
For security, remember to set sslmode
to 'require'
.
import psycopg2
conn = psycopg2.connect(
dbname='kingfisher_process',
user='USER',
password='PASSWORD',
host='postgres.kingfisher.open-contracting.org',
sslmode='require')
Improve slow queries¶
See also
Define tables, Define indexes and Paginate rows under PostgreSQL in the Software Development Handbook
To see the queries running under your user account, run:
SELECT pid, client_addr, usename, state, wait_event_type, NOW() - query_start AS time, query
FROM pg_stat_activity
WHERE query <> ''
ORDER BY time DESC;
Find your username in the usename
column. The time
column indicates how long the query has run for. If it is longer than one minute, it is slow.
In general, a query is slow due to not using indexes for its JOIN
and/or WHERE
clauses. Using indexes can decrease the running time from hours/days to seconds.
Tip
For tables created by Kingfisher Summarize, always JOIN
on the id
column, which has an index, and never on the ocid
column, which has no index.
To see the indexes on a table, run \d TABLE_NAME
. A view cannot have indexes; instead, check the indexes on the tables it queries. To see a view’s query, run \d+ VIEW_NAME
.
If the columns in the query’s JOIN
and/or WHERE
clauses aren’t indexed, and you can’t change the clauses to use columns that are indexed, open an issue on GitHub about adding an index to the table.
To stop a query, run, replacing PID
with the appropriate value from the pid
column:
SELECT pg_cancel_backend(PID)