Data support¶
Set up incremental updates¶
This creates a cron job to run a scrapy crawl command. The DatabaseStore extension implements the incremental updates.
Choose a spider that collects the desired data. Prefer the spider that:
Accepts a
from_datespider argument, preferably at the same granularity as the cron scheduleIs fastest: for example,
_bulk, instead of_apiReduces processing: for example, a spider that yields compiled releases
If needed, improve the spider in Kingfisher Collect.
Add an entry to the
python_apps.kingfisher_collect.crawlssection of thepillar/kingfisher_main.slsfile:identifierAn uppercase, underscore-separated name, like
DOMINICAN_REPUBLIC.spiderThe spider’s name, like
dominican_republic_api.crawl_timeThe current date, like
'2025-05-06'(though, any date works).spider_arguments(optional)Any spider arguments.
If the spider doesn’t yield compiled releases, add
-a compile_releases=true.cardinal(optional)True, to enable a pipeline involving Cardinal.users(optional)A list of additional PostgreSQL users that need read access to the database.
day(optional)The day of the month on which to run the cron job.
Required if an incremental update takes longer than a day.
If an initial crawl would take longer than a day, run the scrapy crawl command manually.
Create a data support main server¶
Dependencies¶
- Tinyproxy
Update the allowed IP addresses in the
pillar/tinyproxy.slsfile.Deploy the
docsandcoveservices, when ready.From the new server, test the proxy. For example:
env http_proxy=ocp99.open-contracting.org:8888 curl example.com
- OCDS APIs
Ask Yohanna to request that the DGCP add the new server’s IPv4 to their API’s allowlist.
- Replica, if applicable
Update the allowed IP addresses and hostname in the
pillar/kingfisher_replica.slsfile.Deploy the
kingfisher-replicaservice, when ready.
Dependents¶
Notify RBC Group of the new domain name for the new PostgreSQL server.
Update Salt configuration and halt jobs¶
Check that
docker.uidin the server’s Pillar file matches the entry in the/etc/passwdfile for thedocker.user(deployer).Change
cron.presenttocron.absentin thesalt/pelican/backend/init.slsfile.Comment out the
postgres.backupsection of the Pillar file.On the old server:
Delete the
/etc/cron.d/postgres_backupsfile.docker compose downall containers.
Check that no crawls are running at https://collect.kingfisher.open-contracting.org/jobs.
If a crawl is running, job owners can cancel jobs.
Check that no messages are enqueued at https://rabbitmq.kingfisher.open-contracting.org.
If a job is running in Kingfisher Process, job owners can cancel jobs.
Docker apps¶
Run migrations for Docker apps as the
deployeruser:su - deployer cd /data/deploy/kingfisher-process/ docker compose run --rm --name django-migrate cron python manage.py migrate cd /data/deploy/pelican-frontend/ docker compose run --rm --name django-migrate web python manage.py migrate
Pull new images and start new containers for each Docker app.
Kingfisher Collect¶
Once DNS has propagated, Update spiders in Kingfisher Collect.
Copy incremental data¶
SSH into the new server as the
incrementaluser:Generate an SSH key pair:
ssh-keygen -t rsa -b 4096 -C "incremental"
Get the public SSH key:
cat ~/.ssh/id_rsa.pub
Add the public SSH key to the
ssh.incrementallist in thepillar/kingfisher_main.slsfile:ssh: incremental: - ssh-rsa AAAB3N...
Change
cron.presenttocron.absentin thesalt/kingfisher/collect/incremental.slsfile.SSH into the old server as the
incrementaluser:Stop any processes started by the cron jobs.
Dump the
kingfisher_collectdatabase:pg_dump -U kingfisher_collect -h localhost -f kingfisher_collect.sql kingfisher_collect
SSH into the new server as the
incrementaluser.Copy the database dump from the old server. For example:
rsync -avz incremental@ocp99.open-contracting.org:~/kingfisher_collect.sql .
Load the database dump:
psql -U kingfisher_collect -h localhost -f kingfisher_collect.sql kingfisher_collect
Copy the
datadirectory from the old server. For example:rsync -avz incremental@ocp99.open-contracting.org:/home/incremental/data/ /home/incremental/data/
Copy the
logsdirectory from the old server. For example:rsync -avz incremental@ocp99.open-contracting.org:/home/incremental/logs/ /home/incremental/logs/
Remove the public SSH key from the
ssh.incrementallist in thepillar/kingfisher_main.slsfile.Change
cron.absenttocron.presentin thesalt/kingfisher/collect/incremental.slsfile.
Pelican backend¶
The initial migrations for Pelican backend, which create the exchange_rates table, are run by Salt.
Connect to the old server, and dump the
exchange_ratestable:sudo -i -u postgres psql -c '\copy exchange_rates (valid_on, rates, created, modified) to stdout' pelican_backend > exchange_rates.csv
Copy the database dump to your local machine. For example:
rsync -avz root@ocp99.open-contracting.org:~/exchange_rates.csv .
Copy the database dump to the new server. For example:
rsync -avz exchange_rates.sql root@ocp99.open-contracting.org:~/
Populate the
exchange_ratestable:psql -U pelican_backend -h localhost -c "\copy exchange_rates (valid_on, rates, created, modified) from 'exchange_rates.csv';" pelican_backend
Restore Salt configuration and start jobs¶
Change
cron.absenttocron.presentin thesalt/pelican/backend/init.slsfile.Uncomment the
postgres.backupsection of the Pillar file.
Create a data support replica server¶
Update
postgres.replica_ipv4(andpostgres.replica_ipv6, if applicable) in thepillar/kingfisher_main.slsfile.