Pelican

Pelican is composed of:

Measure a collection

One-time setup

Create a ~/.netrc file for the pelican.open-contracting.org service, using the same credentials as Access Scrapyd’s web interface.

To create a report, submit a POST request to the /api/datasets/ endpoint. Set name to the spider’s name and the collection date (a.k.a. data version) for easy reference, and set collection_id to the collection ID for the compiled releases. For example:

curl -n --json '{"name":"spider_name_2020-01-01","collection_id":123}' https://pelican.open-contracting.org/api/datasets/

After a few seconds, you should see your report being processed at https://pelican.open-contracting.org.

Note

Pelican is more robust to structural errors in OCDS data than it was in 2021. That said, it could fail (stall) on structural errors. If so, Sentry will notify James and Yohanna.

Measure time-based checks

If a report exists for an old collection, and Kingfisher Process has a new collection of the same dataset, you can create a report for that new collection that calculates time-based checks between the two collections. Set ancestor_id to the ID of the previous report in Pelican:

curl -n --json '{"name":"spider_name_2021-02-03","collection_id":456,"ancestor_id":1}' https://pelican.open-contracting.org/api/datasets/

Check on progress

https://pelican.open-contracting.org indicates the status of reports. In general, this is sufficient. However, you can use the RabbitMQ management interface to check that work isn’t stuck, like for Kingfisher Process, instead reading the pelican_backend_ rows.

Read and export a report

Open https://pelican.open-contracting.org. Your username and password are the same as for Kingfisher Collect.

To export a report, click the report’s document icon on the homepage, and fill in the short form.

Delete a report

Once you no longer need a report, remember to delete it, replacing 1 with its ID:

curl -n -X DELETE https://pelican.open-contracting.org/api/datasets/1/