Data support¶

Set up incremental updates¶

This creates a cron job to run a scrapy crawl command. The DatabaseStore extension implements the incremental updates.

Choose a spider that collects the desired data. Prefer the spider that:
- Accepts a from_date spider argument, preferably at the same granularity as the cron schedule
- Is fastest: for example, _bulk, instead of _api
- Reduces processing: for example, a spider that yields compiled releases
If needed, improve the spider in Kingfisher Collect.
Add an entry to the python_apps.kingfisher_collect.crawls section of the pillar/kingfisher_main.sls file:

identifier
An uppercase, underscore-separated name, like DOMINICAN_REPUBLIC.

spider
The spider’s name, like dominican_republic_api.

crawl_time
The current date, like '2025-05-06' (though, any date works).

spider_arguments (optional)
Any spider arguments.

If the spider doesn’t yield compiled releases, add -a compile_releases=true.

cardinal (optional)
True, to enable a pipeline involving Cardinal.

users (optional)
A list of additional PostgreSQL users that need read access to the database.

day (optional)
The day of the month on which to run the cron job.

Required if an incremental update takes longer than a day.
If an initial crawl would take longer than a day, run the scrapy crawl command manually.
Deploy the server.

Create a data support main server¶

Dependencies¶

Tinyproxy

Update the allowed IP addresses in the pillar/tinyproxy.sls file.
Deploy the docs and cove services, when ready.

From the new server, test the proxy. For example:

env http_proxy=ocp99.open-contracting.org:8888 curl example.com

OCDS APIs

Ask Yohanna to request that the DGCP add the new server’s IPv4 to their API’s allowlist.

Replica, if applicable

Update the allowed IP addresses and hostname in the pillar/kingfisher_replica.sls file.
Deploy the kingfisher-replica service, when ready.

Dependents¶

Notify RBC Group of the new domain name for the new PostgreSQL server.

Update Salt configuration and halt jobs¶

Check that docker.uid in the server’s Pillar file matches the entry in the /etc/passwd file for the docker.user (deployer).
Change cron.present to cron.absent in the salt/pelican/backend/init.sls file.
Comment out the postgres.backup section of the Pillar file.
Deploy the old server and the new server.
On the old server:
1. Delete the /etc/cron.d/postgres_backups file.
2. docker compose down all containers.
Check that no crawls are running at https://collect.kingfisher.open-contracting.org/jobs.

If a crawl is running, job owners can cancel jobs.
Check that no messages are enqueued at https://rabbitmq.kingfisher.open-contracting.org.

If a job is running in Kingfisher Process, job owners can cancel jobs.

Docker apps¶

Run migrations for Docker apps as the deployer user:

su - deployer

cd /data/deploy/kingfisher-process/
docker compose run --rm --name django-migrate cron python manage.py migrate

cd /data/deploy/pelican-frontend/
docker compose run --rm --name django-migrate web python manage.py migrate

Pull new images and start new containers for each Docker app.

Kingfisher Collect¶

Once DNS has propagated, Update spiders in Kingfisher Collect.

Copy incremental data¶

SSH into the new server as the incremental user:
1. Generate an SSH key pair:
```
ssh-keygen -t rsa -b 4096 -C "incremental"
```
2. Get the public SSH key:
```
cat ~/.ssh/id_rsa.pub
```
Add the public SSH key to the ssh.incremental list in the pillar/kingfisher_main.sls file:
```
ssh:
  incremental:
    - ssh-rsa AAAB3N...
```
Change cron.present to cron.absent in the salt/kingfisher/collect/incremental.sls file.
Deploy the old server and the new server.
SSH into the old server as the incremental user:
1. Stop any processes started by the cron jobs.
2. Dump the kingfisher_collect database:
```
pg_dump -U kingfisher_collect -h localhost -f kingfisher_collect.sql kingfisher_collect
```

SSH into the new server as the incremental user.

Copy the database dump from the old server. For example:

rsync -avz incremental@ocp99.open-contracting.org:~/kingfisher_collect.sql .

Load the database dump:

psql -U kingfisher_collect -h localhost -f kingfisher_collect.sql kingfisher_collect

Copy the data directory from the old server. For example:

rsync -avz incremental@ocp99.open-contracting.org:/home/incremental/data/ /home/incremental/data/

Copy the logs directory from the old server. For example:

rsync -avz incremental@ocp99.open-contracting.org:/home/incremental/logs/ /home/incremental/logs/

Remove the public SSH key from the ssh.incremental list in the pillar/kingfisher_main.sls file.
Change cron.absent to cron.present in the salt/kingfisher/collect/incremental.sls file.
Deploy the new server.

Pelican backend¶

The initial migrations for Pelican backend, which create the exchange_rates table, are run by Salt.

Connect to the old server, and dump the exchange_rates table:

sudo -i -u postgres psql -c '\copy exchange_rates (valid_on, rates, created, modified) to stdout' pelican_backend > exchange_rates.csv

Copy the database dump to your local machine. For example:

rsync -avz root@ocp99.open-contracting.org:~/exchange_rates.csv .

Copy the database dump to the new server. For example:

rsync -avz exchange_rates.sql root@ocp99.open-contracting.org:~/

Populate the exchange_rates table:

psql -U pelican_backend -h localhost -c "\copy exchange_rates (valid_on, rates, created, modified) from 'exchange_rates.csv';" pelican_backend

Restore Salt configuration and start jobs¶

Change cron.absent to cron.present in the salt/pelican/backend/init.sls file.
Uncomment the postgres.backup section of the Pillar file.
Deploy the new server.

Create a data support replica server¶

Update postgres.replica_ipv4 (and postgres.replica_ipv6, if applicable) in the pillar/kingfisher_main.sls file.