Data support

Set up incremental updates

This creates a cron job to run a scrapy crawl command. The DatabaseStore extension implements the incremental updates.

  1. Choose a spider that collects the desired data. Prefer the spider that:

    • Accepts a from_date spider argument, preferably at the same granularity as the cron schedule

    • Is fastest: for example, _bulk, instead of _api

    • Reduces processing: for example, a spider that yields compiled releases

    If needed, improve the spider in Kingfisher Collect.

  2. Add an entry to the python_apps.kingfisher_collect.crawls section of the pillar/kingfisher_main.sls file:

    identifier

    An uppercase, underscore-separated name, like DOMINICAN_REPUBLIC.

    spider

    The spider’s name, like dominican_republic_api.

    crawl_time

    The current date, like '2025-05-06' (though, any date works).

    spider_arguments (optional)

    Any spider arguments.

    If the spider doesn’t yield compiled releases, add -a compile_releases=true.

    cardinal (optional)

    True, to enable a pipeline involving Cardinal.

    users (optional)

    A list of additional PostgreSQL users that need read access to the database.

    day (optional)

    The day of the month on which to run the cron job.

    Required if an incremental update takes longer than a day.

  3. If an initial crawl would take longer than a day, run the scrapy crawl command manually.

  4. Deploy the server.

Create a data support main server

Dependencies

Tinyproxy
  1. Update the allowed IP addresses in the pillar/tinyproxy.sls file.

  2. Deploy the docs service, when ready.

Replica, if applicable
  1. Update the allowed IP addresses and hostname in the pillar/kingfisher_replica.sls file.

  2. Deploy the kingfisher-replica service, when ready.

Dependents

  1. Notify RBC Group of the new domain name for the new PostgreSQL server.

Update Salt configuration and halt jobs

  1. Check that docker.uid in the server’s Pillar file matches the entry in the /etc/passwd file for the docker.user (deployer).

  2. Change cron.present to cron.absent in the salt/pelican/backend/init.sls file.

  3. Comment out the postgres.backup section of the Pillar file.

  4. Deploy the old server and the new server.

  5. On the old server:

    1. Delete the /etc/cron.d/postgres_backups file.

    2. docker compose down all containers.

  6. Check that no crawls are running at https://collect.kingfisher.open-contracting.org/jobs.

    If a crawl is running, job owners can cancel jobs.

  7. Check that no messages are enqueued at https://rabbitmq.kingfisher.open-contracting.org.

    If a job is running in Kingfisher Process, job owners can cancel jobs.

Docker apps

  1. Run migrations for Docker apps as the deployer user:

    su - deployer
    
    cd /data/deploy/kingfisher-process/
    docker compose run --rm --name django-migrate cron python manage.py migrate
    
    cd /data/deploy/pelican-frontend/
    docker compose run --rm --name django-migrate web python manage.py migrate
    
  2. Pull new images and start new containers for each Docker app.

Kingfisher Collect

Once DNS has propagated, Update spiders in Kingfisher Collect.

Copy incremental data

  1. SSH into the new server as the incremental user:

    1. Generate an SSH key pair:

      ssh-keygen -t rsa -b 4096 -C "incremental"
      
    2. Get the public SSH key:

      cat ~/.ssh/id_rsa.pub
      
  2. Add the public SSH key to the ssh.incremental list in the pillar/kingfisher_main.sls file:

    ssh:
      incremental:
        - ssh-rsa AAAB3N...
    
  3. Change cron.present to cron.absent in the salt/kingfisher/collect/incremental.sls file.

  4. Deploy the old server and the new server.

  5. SSH into the old server as the incremental user:

    1. Stop any processes started by the cron jobs.

    2. Dump the kingfisher_collect database:

      pg_dump -U kingfisher_collect -h localhost -f kingfisher_collect.sql kingfisher_collect
      
  6. SSH into the new server as the incremental user.

    1. Copy the database dump from the old server. For example:

      rsync -avz incremental@ocp04.open-contracting.org:~/kingfisher_collect.sql .
      
    2. Load the database dump:

      psql -U kingfisher_collect -h localhost -f kingfisher_collect.sql kingfisher_collect
      
    3. Copy the data directory from the old server. For example:

      rsync -avz incremental@ocp04.open-contracting.org:/home/incremental/data/ /home/incremental/data/
      
    4. Copy the logs directory from the old server. For example:

      rsync -avz incremental@ocp04.open-contracting.org:/home/incremental/logs/ /home/incremental/logs/
      
  7. Remove the public SSH key from the ssh.incremental list in the pillar/kingfisher_main.sls file.

  8. Change cron.absent to cron.present in the salt/kingfisher/collect/incremental.sls file.

  9. Deploy the new server.

Pelican backend

The initial migrations for Pelican backend, which create the exchange_rates table, are run by Salt.

  1. Connect to the old server, and dump the exchange_rates table:

    sudo -i -u postgres psql -c '\copy exchange_rates (valid_on, rates, created, modified) to stdout' pelican_backend > exchange_rates.csv
    
  2. Copy the database dump to your local machine. For example:

    rsync -avz root@ocp13.open-contracting.org:~/exchange_rates.csv .
    
  3. Copy the database dump to the new server. For example:

    rsync -avz exchange_rates.sql root@ocp23.open-contracting.org:~/
    
  4. Populate the exchange_rates table:

    psql -U pelican_backend -h localhost -c "\copy exchange_rates (valid_on, rates, created, modified) from 'exchange_rates.csv';" pelican_backend
    

Restore Salt configuration and start jobs

  1. Change cron.absent to cron.present in the salt/pelican/backend/init.sls file.

  2. Uncomment the postgres.backup section of the Pillar file.

  3. Deploy the new server.

Create a data support replica server

  1. Update postgres.replica_ipv4 (and postgres.replica_ipv6, if applicable) in the pillar/kingfisher_main.sls file.