Maintain PostgreSQL =================== Troubleshoot performance ------------------------ Check the log file – for example, ``/var/log/postgresql/postgresql-15-main.log`` – if debugging an unscheduled restart of the ``postgres`` service, for example. Find slow queries ~~~~~~~~~~~~~~~~~ .. seealso:: :ref:`improve-slow-queries` Use the `pg_stat_statements `__ extension. For example: .. code-block:: sql SELECT usename, substring(query, 1, 80) AS short_query, round(total_time::numeric, 2) AS total_time, calls, round(mean_time::numeric, 2) AS mean, round((100 * total_time / sum(total_time::numeric) OVER ())::numeric, 2) AS percentage_cpu FROM pg_stat_statements s INNER JOIN pg_user u ON s.userid = u.usesysid ORDER BY total_time DESC LIMIT 20; To display the full query, you might prefer to switch to unaligned output mode (``\a``): .. code-block:: sql SELECT usename, query, round(total_time::numeric, 2) AS total_time, calls, round(mean_time::numeric, 2) AS mean, round((100 * total_time / sum(total_time::numeric) OVER ())::numeric, 2) AS percentage_cpu FROM pg_stat_statements s INNER JOIN pg_user u ON s.userid = u.usesysid ORDER BY total_time DESC LIMIT 20; To reset the statistics: .. code-block:: sql SELECT pg_stat_statements_reset(); .. tip:: `pgBadger `__ is also available. Reference: `Tracking down slow queries in PostgreSQL `__ Tune server settings ~~~~~~~~~~~~~~~~~~~~ #. :doc:`SSH<../use/ssh>` into a PostgreSQL server as the ``postgres`` user. #. Run the ``postgresqltuner.pl`` file: .. code-block:: bash /var/lib/postgresql/postgresqltuner.pl --ssd #. Under *Configuration advice*, address *HIGH* and *MEDIUM* recommendations. Reference: `Tuning Your PostgreSQL Server `__, `Slow Query Questions `__ .. _pg-stat-all-tables: Check autovacuum statistics ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: sql SELECT nspname, s.relname, reltuples, n_live_tup::real, n_dead_tup::real, TRUNC(n_dead_tup / GREATEST(reltuples::numeric, 1) * 100, 2) AS percent, last_autovacuum, last_autoanalyze FROM pg_stat_all_tables s JOIN pg_class c ON relid = c.oid JOIN pg_namespace ON relnamespace = pg_namespace.oid ORDER BY percent DESC, last_autovacuum; See the `pg_stat_all_tables `__ table's documentation. To get the table related to a TOAST table, take the number after ``pg_toast_``, and run, for example: .. code-block:: sql SELECT '16712'::regclass; Troubleshoot backups -------------------- If ``cron`` sends an email like: .. code-block:: none ERROR: [082]: WAL segment 00000001000009FD0000000D was not archived before the 60000ms timeout HINT: check the archive_command to ensure that all options are correct (especially --stanza). HINT: check the PostgreSQL server log for errors. HINT: run the 'start' command if the stanza was previously stopped. (The same message will appear in a log file like ``/var/log/pgbackrest/STANZA-backup.log``.) This error occurs occasionally (for example, due to a network issue), and is not a concern. To confirm, after 24 hours, view recent backups with: .. code-block:: bash sudo -u postgres pgbackrest info --stanza=kingfisher-2023 You should see a "full backup" within one week of the error, an "incr backup" the day before the error, and another the day after. (In other words, an incremental backup failed.) If a full backup failed, re-run the weekly full backup command listed in the ``/etc/cron.d/postgres_backups`` file. Check usage ----------- Explore databases ~~~~~~~~~~~~~~~~~ List databases: .. code-block:: none \l List schemas: .. code-block:: none \dn List tables, views and sequences in the ``public`` schema: .. code-block:: none \d List tables, indexes, views and sequences in the ``public`` schema: .. code-block:: none \dtivs To list tables, views and/or sequences in a specific schema, append, for example, ``reference.*`` – or ``*.*`` for all schema. You can use the ``psql`` command's ``-E`` (``--echo-hidden``) `flag `__ to echo the queries generated by the backslash commands. Check drive usage ~~~~~~~~~~~~~~~~~ Get all database sizes: .. code-block:: none \l+ Get all schema sizes: .. code-block:: sql SELECT schema_name, schema_size, pg_size_pretty(schema_size), TRUNC(schema_size::numeric / pg_database_size(current_database()) * 100, 2) AS percent FROM ( SELECT nspname AS schema_name, SUM(pg_relation_size(c.oid))::bigint AS schema_size FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid GROUP BY schema_name ) t ORDER BY schema_size DESC; Get relation sizes in the ``public`` schema: .. code-block:: none \dtis+ To get relation sizes in a specific schema, append, for example, ``reference.*`` – or ``*.*`` for all schema. See the `Database Object Size Functions `__ documentation. .. _pg-stat-activity: Show active connections and running queries ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Show running queries: .. code-block:: sql SELECT pid, client_addr, usename, application_name, state, wait_event_type, NOW() - query_start AS time, query FROM pg_stat_activity WHERE query <> '' ORDER BY time DESC; See the `pg_stat_activity `__ table's documentation. .. note:: A ``client_addr`` in the range 172.16.0.0 to 172.31.255.255 (`private address space `__) is Docker. ``application_name`` can suggest an administrator's manual connection. For example, "psql" or "`DataGrip `__". .. tip:: To see all connections, remove the ``WHERE`` clause and add ``backend_type`` to the ``SELECT`` clause. Typically, the ``backend_type`` with no ``query`` are: - autovacuum launcher - logical replication launcher - background writer - archiver - checkpointer - walwriter Stop a query, replacing ``PID`` with the query's ``pid``: .. code-block:: sql SELECT pg_cancel_backend(PID) Find unexpected schema or tables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Find unexpected schema: .. code-block:: sql SELECT nspname FROM pg_namespace WHERE nspname NOT LIKE 'pg_temp_%' AND nspname NOT LIKE 'pg_toast_temp_%' AND nspname NOT LIKE 'summary_%' AND nspname NOT IN ( 'information_schema', 'pg_catalog', 'pg_toast', 'public', 'reference', 'summaries' ); Find unexpected tables in the public schema: .. code-block:: sql SELECT relname FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid WHERE nspname = 'public' AND -- Ignore sequences and indexes relkind NOT IN ('S', 'i') AND relname NOT IN ( -- Kingfisher Process tables 'collection', 'collection_file', 'collection_note', 'compiled_release', 'data', 'package_data', 'record', 'record_check', 'release', 'release_check', -- To be removed in future versions 'alembic_version', 'record_check_error', 'release_check_error', 'transform_upgrade_1_0_to_1_1_status_record', 'transform_upgrade_1_0_to_1_1_status_release', -- https://www.postgresql.org/docs/current/pgstatstatements.html 'pg_stat_statements', -- https://www.postgresql.org/docs/current/tablefunc.html 'tablefunc_crosstab_2', 'tablefunc_crosstab_3', 'tablefunc_crosstab_4' ); .. _pg-control-access: Control access -------------- Each individual should have a personal account. Add a user ~~~~~~~~~~ #. Add, in a private Pillar file, replacing ``PASSWORD`` with a `strong password `__ and ``USERNAME`` with a recognizable username (for people, use the lowercase first initial and the family name, like ``jdoe``): .. code-block:: yaml postgres: users: # me@example.com USERNAME: password: "PASSWORD" #. Assign the user to groups. For example, the ``kingfisher-main`` target includes the groups: kingfisher_process_read ``SELECT`` on all tables in schema ``public`` kingfisher_summarize_read ``SELECT`` on all tables in schema created by Kingfisher Summarize .. code-block:: yaml :emphasize-lines: 6-8 postgres: users: # me@example.com USERNAME: password: "PASSWORD" groups: - kingfisher_process_read - kingfisher_summarize_read #. :doc:`Deploy the server<../deploy/deploy>` Update a password ~~~~~~~~~~~~~~~~~ #. Update the private Pillar file, for example: .. code-block:: yaml :emphasize-lines: 5 postgres: users: # me@example.com USERNAME: password: "PASSWORD" #. :doc:`Deploy the server<../deploy/deploy>` #. Notify the contact at the email address in the comment .. _pg-delete-user: Delete a user ~~~~~~~~~~~~~ #. Delete the user from the private Pillar file. #. :doc:`SSH<../use/ssh>` into the main server as the ``root`` user. #. Attempt to drop the user as the ``postgres`` user, for example: .. code-block:: bash su - postgres -c 'psql postgres -c "DROP ROLE role2drop;"' #. If you see a message like: .. code-block:: none ERROR: role "role2drop" cannot be dropped because some objects depend on it DETAIL: privileges for table … … and 1234 other objects (see server log for list) #. Open the server log, and search for the relevant ``DROP ROLE`` statement (after running the command below, press ``/``, type ``DROP ROLE``, press Enter, and press ``n`` until you find it). For example: .. code-block:: bash less /var/log/postgresql/postgresql-15-main.log #. If all the objects listed after ``DETAIL:`` in the server log can be dropped (press Space to scroll forward), then press ``q`` to quit ``less`` and open a SQL terminal as the ``postgres`` user: .. code-block:: bash su - postgres -c 'psql kingfisher_process' #. Finally, drop the user: .. code-block:: sql REASSIGN OWNED BY role2drop TO anotheruser; DROP OWNED BY role2drop; DROP ROLE role2drop; Check privileges ~~~~~~~~~~~~~~~~ List users and groups: .. code-block:: none \du Find unexpected database ``CREATE`` privileges: .. code-block:: sql SELECT usename, string_agg(datname, ', ' ORDER BY datname) FROM pg_user CROSS JOIN pg_database WHERE usename NOT IN ('postgres') AND has_database_privilege(usename, datname, 'CREATE') AND NOT (usename = 'kingfisher_summarize' AND datname = 'kingfisher_process') GROUP BY usename ORDER BY usename; Find unexpected schema ``CREATE`` privileges: .. code-block:: sql SELECT usename, string_agg(nspname, ', ' ORDER BY nspname) FROM pg_user CROSS JOIN pg_namespace WHERE usename NOT IN ('postgres') AND has_schema_privilege(usename, nspname, 'CREATE') AND NOT (usename = 'kingfisher_process' AND nspname = 'public') AND NOT (usename = 'kingfisher_summarize' AND nspname LIKE 'summary_%') GROUP BY usename ORDER BY usename; Find unexpected schema ``USAGE`` privileges: .. code-block:: sql SELECT usename, string_agg(nspname, ', ' ORDER BY nspname) FROM pg_user CROSS JOIN pg_namespace WHERE usename NOT IN ('postgres') AND nspname NOT IN ('information_schema', 'pg_catalog', 'reference', 'summaries') AND has_schema_privilege(usename, nspname, 'USAGE') AND NOT (usename = 'kingfisher_summarize' AND nspname LIKE 'summary_%') AND NOT (pg_has_role(usename, 'kingfisher_process_read', 'MEMBER') AND nspname = 'public') AND NOT (pg_has_role(usename, 'kingfisher_summarize_read', 'MEMBER') AND nspname LIKE 'summary_%') GROUP BY usename ORDER BY usename; Find unexpected table non ``SELECT`` privileges: .. code-block:: sql SELECT usename, nspname, string_agg(relname, ', ' ORDER BY relname) FROM pg_user CROSS JOIN pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid WHERE usename NOT IN ('postgres') AND nspname NOT IN ('pg_toast') AND relname NOT IN ('pg_settings') AND has_table_privilege(usename, c.oid, 'INSERT,UPDATE,DELETE,TRUNCATE,REFERENCES,TRIGGER') AND NOT (usename = 'kingfisher_process' AND nspname = 'public') AND NOT (usename = 'kingfisher_summarize' AND nspname LIKE 'summary_%') GROUP BY usename, nspname ORDER BY usename, nspname; Reference: `System Information Functions `__ for functions like ``has_schema_privilege`` .. _pg-recover-backup: Restore from backup ------------------- pgBackRest ~~~~~~~~~~ .. seealso:: :ref:`pg-setup-backups` Get the stanza name from the ``postgres:backup:stanza`` key in the Pillar data, or from the ``/etc/pgbackrest/pgbackrest.conf`` file on the server. View current backups: .. code-block:: bash pgbackrest info --stanza=example #. Stop PostgreSQL: .. code-block:: bash systemctl stop postgresql.service #. In most cases, restore using the ``--delta`` option, as it saves time by using hashes to restore only files that differ: .. code-block:: bash sudo -u postgres pgbackrest restore --stanza=example --delta To restore from a backup other than the latest, for example: .. code-block:: bash sudo -u postgres pgbackrest restore --stanza=example --delta --set=20210315-145357F_20210315-145459I To restore every file, for example: .. code-block:: bash rm -rf /var/lib/postgresql/15/main mkdir /var/lib/postgresql/15/main sudo -u postgres pgbackrest restore --stanza=example #. Start PostgreSQL: .. code-block:: bash systemctl start postgresql.service .. seealso:: - `Restore `__ - `Delta Option `__ - `Restore a Backup `__ .. _pg-recover-backup-universal: pg_dump and pg_restore ~~~~~~~~~~~~~~~~~~~~~~ If pgBackRest isn't an option, use ``pg_dump`` and ``pg_restore``. #. Connect to the old server, and export the existing database. For example, to export ``spoonbill_web``: .. code-block:: bash su - postgres -c "/usr/bin/pg_dump -Ft 'spoonbill_web' | gzip > ~/spoonbill_web.tar.gz" #. Copy the database dump to your local machine. For example: .. code-block:: bash rsync -avz root@ocp13.open-contracting.org:/var/lib/postgresql/spoonbill_web.tar.gz . #. Copy the database dump to the new server. For example: .. code-block:: bash rsync -avz spoonbill_web.tar.gz root@ocp27.open-contracting.org:~/ #. Restore the database. For example: .. code-block:: bash gunzip ~/spoonbill_web.tar.gz sudo -u postgres pg_restore -cC --if-exists -v -U postgres -d spoonbill_web ~/spoonbill_web.tar .. _pg-recover-replica: Recover the replica ------------------- If replication breaks or the replica server goes offline, you must recover the replica, in two stages: mitigate the downtime, and fix the replication. Mitigate downtime ~~~~~~~~~~~~~~~~~ #. :ref:`Enable public access` to the PostgreSQL service on the main server, by modifying its Pillar file: .. code-block:: yaml postgres: public_access: True For example, for the ``kingfisher-main`` target, modify the ``pillar/kingfisher_main.sls`` file. #. :doc:`Deploy the main server<../../deploy/deploy>` #. Update DNS records: #. Login to `GoDaddy `__ #. If access was delegated, open `Delegate Access `__ and click the *Access Now* button #. Open `DNS Management `__ for open-contracting.org #. Update the replica's CNAME record to point to the main server's A record #. Wait for the changes to propagate, which depends on the original TTL value Fix replication ~~~~~~~~~~~~~~~ #. :doc:`SSH<../use/ssh>` into the replica server as the ``root`` user #. Stop PostgreSQL: .. code-block:: bash systemctl stop postgresql.service #. Download the latest database or a backup from a specific point in time, replacing ``kingfisher`` with the value set in pillar ``postgres:backup:stanza``. pgBackRest is configured to restore the replication configuration (see, for example, ``/var/lib/postgresql/15/main/postgresql.conf``). .. code-block:: bash pgbackrest --stanza=kingfisher --type=standby --delta restore .. note:: See :ref:`pg-recover-backup` for more information on the pgBackRest restore function. #. Start PostgreSQL and monitor: .. code-block:: bash systemctl start postgresql.service tail -f /var/log/postgresql/postgresql-15-main.log You should see messages about recovering from WAL files in the logs. If all else fails, :ref:`rebuild the replica`.