SQL databases¶
Note
If you need to create temporary tables, use CREATE TEMPORARY TABLE. If you need to create persistent tables, create a new schema first; do not create tables in the public schema.
Connect to a database¶
Note
To query the database directly from your personal computer, request a personal SQL user account from James or Yohanna, and configure psql, Beekeeper Studio and/or pgAdmin to use it.
OCP has a main database on the postgres.kingfisher.open-contracting.org server.
psql¶
If PostgreSQL is installed, you can use psql, PostgreSQL’s interactive terminal, from the command-line.
For security, remember to set sslmode to require.
psql "dbname=DBNAME user=USERNAME host=HOST sslmode=require"
For example:
psql "dbname=kingfisher_process user=jmckinney host=postgres.kingfisher.open-contracting.org sslmode=require"
Instead of entering your password each time, you can add your credentials to the PostgreSQL Password File, replacing USER and PASS:
echo 'postgres.kingfisher.open-contracting.org:5432:kingfisher_process:USERNAME:PASSWORD' >> ~/.pgpass
Then, set the permissions of the ~/.pgpass file:
chmod 600 ~/.pgpass
Tip
If you are logged into the postgres.kingfisher.open-contracting.org server, you can also run:
psql kingfisher_process
Beekeeper Studio¶
Beekeeper Studio is a cross-platform app for querying databases. Use the open-source Community Edition.
For security, remember to check Enable SSL.
Select “Postgres” from Connection Type
Set the Host, e.g. “postgres.kingfisher.open-contracting.org”
Check Enable SSL
Set the User
Set the Password
Set the Default Database, e.g. “kingfisher_process”
Click the Test button
Then, either click the Connect button or set the Connection Name and click Save.
pgAdmin¶
pgAdmin is a locally hosted web interface for querying databases.
For security, remember to set SSL mode to “Require”.
Open the Object > Create > Server… menu item
Set the Name, e.g. “Kingfisher”
Click the Connection tab
Set the Host name/address, e.g. “postgres.kingfisher.open-contracting.org”
Set the Username
Set the Password
Check Save password?
Click the SSL tab
Set SSL mode to “Require”
Click the Save button
To avoid unnecessary queries to the database, please make these one-time configuration changes:
Open the File > Preferences menu item
Click Display under Dashboards in the sidebar
Uncheck Show activity?
Uncheck Show graphs?
Click the Save button
Google Colaboratory¶
Google Colaboratory is an executable document to write, run and share code in Google Drive, similar to Jupyter Notebook.
Install the ocdskingfishercolab Python package, which installs the ipython-sql Python package.
For security, remember to set sslmode to 'require'.
%sql postgresql://USERNAME:PASSWORD@postgres.kingfisher.open-contracting.org/kingfisher_process?sslmode=require
Note
There is an open issue to use Colaboratory Forms to store credentials.
Python¶
Python is the programming language in which many OCDS tools are written.
Install the psycopg2 Python package.
For security, remember to set sslmode to 'require'.
import psycopg2
conn = psycopg2.connect(
dbname='kingfisher_process',
user='USER',
password='PASSWORD',
host='postgres.kingfisher.open-contracting.org',
sslmode='require')
Improve slow queries¶
See also
Define tables, Define indexes and Paginate rows under PostgreSQL in the Software Development Handbook
To see the queries running under your user account, run:
SELECT pid, client_addr, usename, state, wait_event_type, NOW() - query_start AS time, query
FROM pg_stat_activity
WHERE query <> ''
ORDER BY time DESC;
Find your username in the usename column. The time column indicates how long the query has run for. If it is longer than one minute, it is slow.
In general, a query is slow due to not using indexes for its JOIN and/or WHERE clauses. Using indexes can decrease the running time from hours/days to seconds.
Tip
For tables created by Kingfisher Summarize, always JOIN on the id column, which has an index, and never on the ocid column, which has no index.
To see the indexes on a table, run \d TABLE_NAME. A view cannot have indexes; instead, check the indexes on the tables it queries. To see a view’s query, run \d+ VIEW_NAME.
If the columns in the query’s JOIN and/or WHERE clauses aren’t indexed, and you can’t change the clauses to use columns that are indexed, open an issue on GitHub about adding an index to the table.
To stop a query, run, replacing PID with the appropriate value from the pid column:
SELECT pg_cancel_backend(PID)