SQL databases

Note

If you need to create temporary tables, use CREATE TEMPORARY TABLE. If you need to create persistent tables, create a new schema first; do not create tables in the public schema.

Connect to a database

Note

To query the database directly from your personal computer, please request a personal SQL user account, and configure psql, Beekeeper Studio and/or pgAdmin to use it.

For most use cases, you can instead query the database from Redash. To request an account, email data@open-contracting.org.

OCP has a main database on the postgres.kingfisher.open-contracting.org server, and provides a replica database on the postgres-readonly.kingfisher.open-contracting.org server, in order to ease the load on the main server. Please always use the replica database. If that server goes down, use the main database until the server is restored.

psql

If PostgreSQL is installed, you can use psql, PostgreSQL’s interactive terminal, from the command-line.

For security, remember to set sslmode to require.

psql "dbname=DBNAME user=USERNAME host=HOST sslmode=require"

For example:

psql "dbname=ocdskingfisherprocess user=jmckinney host=postgres-readonly.kingfisher.open-contracting.org sslmode=require"

Instead of entering your password each time, you can add your credentials to the PostgreSQL Password File, replacing USER and PASS:

echo 'postgres-readonly.kingfisher.open-contracting.org:5432:ocdskingfisherprocess:USER:PASS' >> ~/.pgpass

Then, set the permissions of the ~/.pgpass file:

chmod 600 ~/.pgpass

Beekeeper Studio

Beekeeper Studio is a cross-platform app for querying databases.

For security, remember to check Enable SSL.

  1. Select “Postgres” from Connection Type
  2. Set the Host, e.g. “postgres-readonly.kingfisher.open-contracting.org”
  3. Check Enable SSL
  4. Set the User
  5. Set the Password
  6. Set the Default Database, e.g. “ocdskingfisherprocess”
  7. Click the Test button

Then, either click the Connect button or set the Connection Name and click Save.

pgAdmin

pgAdmin is a locally hosted web interface for querying databases.

For security, remember to set SSL mode to “Require”.

  1. Open the Object > Create > Server… menu item
  2. Set the Name, e.g. “Kingfisher”
  3. Click the Connection tab
  4. Set the Host name/address, e.g. “postgres-readonly.kingfisher.open-contracting.org”
  5. Set the Username
  6. Set the Password
  7. Check Save password?
  8. Click the SSL tab
  9. Set SSL mode to “Require”
  10. Click the Save button

To avoid unnecessary queries to the database, please make these one-time configuration changes:

  1. Open the File > Preferences menu item
  2. Click Display under Dashboards in the sidebar
  3. Uncheck Show activity?
  4. Uncheck Show graphs?
  5. Click the Save button

Google Colaboratory

Google Colaboratory is an executable document to write, run and share code in Google Drive, similar to Jupyter Notebook.

Install the ocdskingfishercolab Python package, which installs the ipython-sql Python package.

For security, remember to set sslmode to 'require'.

%sql postgresql://USER:PASSWORD@postgres-readonly.kingfisher.open-contracting.org/ocdskingfisherprocess?sslmode=require

Note

There is an open issue to use Colaboratory Forms to store credentials.

Python

Python is the programming language in which many OCDS tools are written.

Install the psycopg2 Python package.

For security, remember to set sslmode to 'require'.

import psycopg2

conn = psycopg2.connect(
    dbname='ocdskingfisherprocess',
    user='USER',
    password='PASSWORD',
    host='postgres-readonly.kingfisher.open-contracting.org',
    sslmode='require')

Improve slow queries

If a query is slow (more than 1 minute), it most likely is not using an index for its JOIN and WHERE clauses. In practice, using indexes can decrease the running time from hours/days to seconds.

Tip

For tables created by Kingfisher Summarize, always JOIN on the id column, which has an index, and never on the ocid column, which has no index.

To see the queries running under your user account, run:

SELECT pid, client_addr, usename, state, wait_event_type, NOW() - query_start AS time, query
FROM pg_stat_activity
WHERE query <> ''
ORDER BY time DESC;

Find your username in the usename column. The time column indicates how long the query has run for. If it is longer than one minute, consider using EXPLAIN to figure out why.

To stop a query, run, replacing PID with the appropriate value from the pid column:

SELECT pg_cancel_backend(PID)

Note

If you are running a query via Redash, it will not appear in the results.