Kingfisher Process#

Read the Kingfisher Process documentation, which covers general usage.

Note

Is the service unresponsive or erroring? Follow these instructions.

Review log files#

Kingfisher Process writes log messages to the /var/log/kingfisher.log file. The log file is rotated weekly; last week’s log file is at /var/log/kingfisher.log.1, and earlier log files are compressed at /var/log/kingfisher.log.2.gz, etc.

The log files can be read by the ocdskfp user, after connecting to the server.

Log messages are formatted as:

[date] [hostname] %(asctime)s - %(process)d - %(name)s - %(levelname)s - %(message)s

You can filter messages by topic. For example:

grep NAME /var/log/kingfisher.log | less

For more information, read Kingfisher Process’ logging documentation.

Load local data#

  1. Connect to the main server as the ocdskfp user

  2. Change into the local-load directory:

    cd ~/local-load
    
  3. Create a data directory following the pattern source-YYYY-MM-DD-analyst. For example: moldova-2020-04-07-romina

    • If the data source is the same as for an existing spider, use the same source ID, for example: moldova. Otherwise, use a different source ID that follows our regular pattern country[_region][_label], for example: moldova_covid19.

  4. If you need to download an archive file (e.g. ZIP) from a remote URL, prefer curl to wget, because wget sometimes writes unwanted files like wget-log.

  5. If you need to copy a file from your local machine, you can use scp. For example, on your local machine:

curl --silent --connect-timeout 1 process.kingfisher.open-contracting.org:8255 || true
scp file.json ocdskfp@process.kingfisher.open-contracting.org:~/local-load/moldova-2020-04-07-romina
  1. Load the data, using the local-load command.

  2. Delete the data directory once you’re satisfied that it loaded correctly.

Export compiled releases from the database as record packages#

Check the number of compiled releases to be exported. For example:

SELECT cached_compiled_releases_count FROM collection WHERE id = 123;

Large collections will take time to export, so run the commands below in a tmux session.

Change to the directory in which you want to write the files.

To export the compiled releases to a single JSONL file, run, for example:

psql "connection string" -c '\t' \
-c 'SELECT data FROM data INNER JOIN compiled_release r ON r.data_id = data.id WHERE collection_id = 123' \
-o myfilename.jsonl

To export the compiled releases to individual files, run, for example:

psql "connection string" -c '\t' \
-c 'SELECT data FROM data INNER JOIN compiled_release r ON r.data_id = data.id WHERE collection_id = 123' \
| split -l 1 -a 5 --additional-suffix=.json

The files will be named xaaaaa.json, xaaaab.json, etc. -a 5 is sufficient for 11M files (26^5).

If you need to wrap each compiled release in a record package, modify the files in-place. For example:

echo *.json | xargs sed -i '1i {"records":[{"compiledRelease":'
for filename in *.json; do echo "}]}" >> "$filename"; done

Data retention policy#

On the first day of each month, the following are deleted:

  • Collections that ended over a year ago, while retaining one set of collections per source from over a year ago

  • Collections that never ended and started over 2 months ago

  • Collections that ended over 2 months ago and have no data