Kingfisher tasks

Deploy Kingfisher Process without losing Scrapy requests

Note

If spiders are running, use this process or deploy specific state files. Otherwise, deploy as usual.

This should match salt/kingfisher/process/init.sls (up-to-date as of 2019-12-19). You can git log salt/kingfisher/process/init.sls to see if there have been any relevant changes, and update this page accordingly.

This assumes that there have been no changes to requirements.txt. If you are adding an index, altering a column, updating many rows, or performing another operation that locks tables or rows for longer than uWSGI’s harakiri setting, this might interfere with an ongoing collection (until queues are fully implemented).

Below, the two key operations are reloading uWSGI with the new application code, and migrating the database.

It’s possible for requests to arrive after uWSGI reloads and before the database migrates. If the new application code is not backwards-compatible with the old database schema, the requests might error. If, on the other hand, your old application code is forwards-compatible with the new database schema, then reload uWSGI after migrating the database, instead of before.

service uwsgi reload runs /etc/init.d/uwsgi reload, which sends the SIGHUP signal to the main uWSGI process, which causes it to gracefully reload and not lose any requests from Scrapy.

As with other deployment tasks, do the setup tasks before (and the cleanup tasks after) the steps below.

  1. Connect to the server as the ocdskfp user and change to the working directory:

    curl --silent --connect-timeout 1 process.kingfisher.open-contracting.org:8255 || true
    ssh ocdskfp@process.kingfisher.open-contracting.org
    cd ocdskingfisherprocess
    
  2. Check that you won’t deploy more commits than you intend, for example:

    git fetch
    # From https://github.com/open-contracting/kingfisher-process
    #    d8736f4..173dcf2  main                                    -> origin/main
    git log d8736f4..173dcf2
    
  3. Update the code:

    git pull --rebase
    
  4. In a new terminal, connect to the server as the root user, reload uWSGI, then close your connection to the server:

    curl --silent --connect-timeout 1 process.kingfisher.open-contracting.org:8255 || true
    ssh root@process.kingfisher.open-contracting.org
    service uwsgi reload
    
  5. In the original terminal, open a terminal multiplexer, in case you lose your connection while migrating the database. You can re-attach to the session with tmux attach-session -t deploy:

    tmux new -s deploy
    
  6. If workers are likely to interfere with a migration (e.g. inserting new rows that meet the criteria for an update), comment out the lines that start them in the cron table and kill them:

    crontab -e
    pkill -f ocdskingfisher-process-cli
    
  7. Migrate the database (log the time, in case you need to retry):

    . .ve/bin/activate
    date
    python ocdskingfisher-process-cli upgrade-database
    date
    

    Alembic has no verbose mode for upgrades. To see the current queries, open another terminal, open a PostgreSQL shell, and run SELECT pid, state, wait_event_type, query FROM pg_stat_activity;. If a migration query has a wait_event_type of Lock, look for queries that block it (for example, long-running DELETE queries). To stop a query, run SELECT pg_cancel_backend(PID), where PID is the pid of the query.

  8. Uncomment the lines that start the workers in the cron table:

    crontab -e
    
  9. Close the session with Ctrl-D and close your connection to the server.