SQL databases#

Note

If you need to create temporary tables, use CREATE TEMPORARY TABLE. If you need to create persistent tables, create a new schema first; do not create tables in the public schema.

Connect to a database#

Note

To query the database directly from your personal computer, request a personal SQL user account from James or Yohanna, and configure psql, Beekeeper Studio and/or pgAdmin to use it.

For most use cases, you can instead query the database from Redash. To request an account, email data@open-contracting.org.

OCP has a main database on the postgres.kingfisher.open-contracting.org server.

psql#

If PostgreSQL is installed, you can use psql, PostgreSQL’s interactive terminal, from the command-line.

For security, remember to set sslmode to require.

psql "dbname=DBNAME user=USERNAME host=HOST sslmode=require"

For example:

psql "dbname=kingfisher_process user=jmckinney host=postgres.kingfisher.open-contracting.org sslmode=require"

Instead of entering your password each time, you can add your credentials to the PostgreSQL Password File, replacing USER and PASS:

echo 'postgres.kingfisher.open-contracting.org:5432:kingfisher_process:USER:PASS' >> ~/.pgpass

Then, set the permissions of the ~/.pgpass file:

chmod 600 ~/.pgpass

Tip

If you are logged into the postgres.kingfisher.open-contracting.org server, you can also run:

psql kingfisher_process

Beekeeper Studio#

Beekeeper Studio is a cross-platform app for querying databases.

For security, remember to check Enable SSL.

  1. Select “Postgres” from Connection Type

  2. Set the Host, e.g. “postgres.kingfisher.open-contracting.org”

  3. Check Enable SSL

  4. Set the User

  5. Set the Password

  6. Set the Default Database, e.g. “kingfisher_process”

  7. Click the Test button

Then, either click the Connect button or set the Connection Name and click Save.

pgAdmin#

pgAdmin is a locally hosted web interface for querying databases.

For security, remember to set SSL mode to “Require”.

  1. Open the Object > Create > Server… menu item

  2. Set the Name, e.g. “Kingfisher”

  3. Click the Connection tab

  4. Set the Host name/address, e.g. “postgres.kingfisher.open-contracting.org”

  5. Set the Username

  6. Set the Password

  7. Check Save password?

  8. Click the SSL tab

  9. Set SSL mode to “Require”

  10. Click the Save button

To avoid unnecessary queries to the database, please make these one-time configuration changes:

  1. Open the File > Preferences menu item

  2. Click Display under Dashboards in the sidebar

  3. Uncheck Show activity?

  4. Uncheck Show graphs?

  5. Click the Save button

Google Colaboratory#

Google Colaboratory is an executable document to write, run and share code in Google Drive, similar to Jupyter Notebook.

Install the ocdskingfishercolab Python package, which installs the ipython-sql Python package.

For security, remember to set sslmode to 'require'.

%sql postgresql://USER:PASSWORD@postgres.kingfisher.open-contracting.org/kingfisher_process?sslmode=require

Note

There is an open issue to use Colaboratory Forms to store credentials.

Python#

Python is the programming language in which many OCDS tools are written.

Install the psycopg2 Python package.

For security, remember to set sslmode to 'require'.

import psycopg2

conn = psycopg2.connect(
    dbname='kingfisher_process',
    user='USER',
    password='PASSWORD',
    host='postgres.kingfisher.open-contracting.org',
    sslmode='require')

Improve slow queries#

If a query is slow (more than 1 minute), it most likely is not using an index for its JOIN and WHERE clauses. In practice, using indexes can decrease the running time from hours/days to seconds.

Note

In a given clause, all columns from the same table must be in the same index. To see a table’s indices, run \d TABLE_NAME. A view cannot have indices; you must instead check the tables it queries. To see a view’s query, run \d+ VIEW_NAME.

Tip

For tables created by Kingfisher Summarize, always JOIN on the id column, which has an index, and never on the ocid column, which has no index.

To see the queries running under your user account, run:

SELECT pid, client_addr, usename, state, wait_event_type, NOW() - query_start AS time, query
FROM pg_stat_activity
WHERE query <> ''
ORDER BY time DESC;

Find your username in the usename column. The time column indicates how long the query has run for. If it is longer than one minute, consider using EXPLAIN to figure out why.

Note

When using a tool like pgMustard or Dalibo, follow these instructions to get the query plan. For tools other than pgMustard, if you don’t know how slow your query is, omit ANALYZE and BUFFERS from the EXPLAIN parameters.

If you frequently filter on the same columns in ON or WHERE clauses, open an issue on GitHub to add an index to the table. (In most cases, this should be a multi-column index, with the most common column as the index’s first column.)

To stop a query, run, replacing PID with the appropriate value from the pid column:

SELECT pg_cancel_backend(PID)

Note

If you are running a query via Redash, it will not appear in the results.