SQL databases#
Note
If you need to create temporary tables, use CREATE TEMPORARY TABLE
. If you need to create persistent tables, create a new schema first; do not create tables in the public
schema.
Connect to a database#
Note
To query the database directly from your personal computer, request a personal SQL user account from James or Yohanna, and configure psql, Beekeeper Studio and/or pgAdmin to use it.
For most use cases, you can instead query the database from Redash. To request an account, email data@open-contracting.org.
OCP has a main database on the postgres.kingfisher.open-contracting.org
server.
psql#
If PostgreSQL is installed, you can use psql, PostgreSQL’s interactive terminal, from the command-line.
For security, remember to set sslmode
to require
.
psql "dbname=DBNAME user=USERNAME host=HOST sslmode=require"
For example:
psql "dbname=kingfisher_process user=jmckinney host=postgres.kingfisher.open-contracting.org sslmode=require"
Instead of entering your password each time, you can add your credentials to the PostgreSQL Password File, replacing USER
and PASS
:
echo 'postgres.kingfisher.open-contracting.org:5432:kingfisher_process:USER:PASS' >> ~/.pgpass
Then, set the permissions of the ~/.pgpass
file:
chmod 600 ~/.pgpass
Tip
If you are logged into the postgres.kingfisher.open-contracting.org
server, you can also run:
psql kingfisher_process
Beekeeper Studio#
Beekeeper Studio is a cross-platform app for querying databases.
For security, remember to check Enable SSL.
Select “Postgres” from Connection Type
Set the Host, e.g. “postgres.kingfisher.open-contracting.org”
Check Enable SSL
Set the User
Set the Password
Set the Default Database, e.g. “kingfisher_process”
Click the Test button
Then, either click the Connect button or set the Connection Name and click Save.
pgAdmin#
pgAdmin is a locally hosted web interface for querying databases.
For security, remember to set SSL mode to “Require”.
Open the Object > Create > Server… menu item
Set the Name, e.g. “Kingfisher”
Click the Connection tab
Set the Host name/address, e.g. “postgres.kingfisher.open-contracting.org”
Set the Username
Set the Password
Check Save password?
Click the SSL tab
Set SSL mode to “Require”
Click the Save button
To avoid unnecessary queries to the database, please make these one-time configuration changes:
Open the File > Preferences menu item
Click Display under Dashboards in the sidebar
Uncheck Show activity?
Uncheck Show graphs?
Click the Save button
Google Colaboratory#
Google Colaboratory is an executable document to write, run and share code in Google Drive, similar to Jupyter Notebook.
Install the ocdskingfishercolab Python package, which installs the ipython-sql Python package.
For security, remember to set sslmode
to 'require'
.
%sql postgresql://USER:PASSWORD@postgres.kingfisher.open-contracting.org/kingfisher_process?sslmode=require
Note
There is an open issue to use Colaboratory Forms to store credentials.
Python#
Python is the programming language in which many OCDS tools are written.
Install the psycopg2 Python package.
For security, remember to set sslmode
to 'require'
.
import psycopg2
conn = psycopg2.connect(
dbname='kingfisher_process',
user='USER',
password='PASSWORD',
host='postgres.kingfisher.open-contracting.org',
sslmode='require')
Improve slow queries#
If a query is slow (more than 1 minute), it most likely is not using an index for its JOIN
and WHERE
clauses. In practice, using indexes can decrease the running time from hours/days to seconds.
Note
In a given clause, all columns from the same table must be in the same index. To see a table’s indices, run \d TABLE_NAME
. A view cannot have indices; you must instead check the tables it queries. To see a view’s query, run \d+ VIEW_NAME
.
Tip
For tables created by Kingfisher Summarize, always JOIN
on the id
column, which has an index, and never on the ocid
column, which has no index.
To see the queries running under your user account, run:
SELECT pid, client_addr, usename, state, wait_event_type, NOW() - query_start AS time, query
FROM pg_stat_activity
WHERE query <> ''
ORDER BY time DESC;
Find your username in the usename
column. The time
column indicates how long the query has run for. If it is longer than one minute, consider using EXPLAIN to figure out why.
Note
When using a tool like pgMustard or Dalibo, follow these instructions to get the query plan. For tools other than pgMustard, if you don’t know how slow your query is, omit ANALYZE
and BUFFERS
from the EXPLAIN
parameters.
If you frequently filter on the same columns in ON
or WHERE
clauses, open an issue on GitHub to add an index to the table. (In most cases, this should be a multi-column index, with the most common column as the index’s first column.)
To stop a query, run, replacing PID
with the appropriate value from the pid
column:
SELECT pg_cancel_backend(PID)
Note
If you are running a query via Redash, it will not appear in the results.