Postgres RDS Operator Handbook¶
Create cluster for a new team/problem domain¶
- create new VM instances and install postgres as described in getting-started
- configure the fresh cluster with
invoke configure_cluster ..., seeinvoke --help configure_cluster
(Rolling) upgrade of a cluster¶
To increase storage, or to upgrade to a new postgres minor version:
- create new VM instances with preinstalled postgres or create, then install
invoke migrate-to-master ...
Ubuntu upgrade (security etc.)¶
Once monitoring detects that VMs need a reboot after an unattended upgrade:
- choose a maintenance window
- reboot the VMs. The order (master first or slave first) does not matter
- for the duration of a reboot the postgres service is unavailable
Point-in-time restore if an application mangled the data¶
A restore should always be made to a new cluster (different name).
So create new VM instances as described above.
Restore using your configure-cluster playbook
developers can now:
- switch their application to restored DB (via application config, use new services url)
- copy data from one or multiple tables over
- or just compare the current and previous state
No work for the infrastructure team.
Developers forgotten their admin password¶
The admin password is generated by the ansible playbook only once and printed at the end of the playbook execution.
To manually reset the admin password:
- ssh to the postgres master instance
sudo -u postgres psql- Now you can change the password with
\password adminQuit with Ctrl+D
Note: do not use ALTER USER 'admin' WITH PASSWORD '... in psql since it
will save clear text password in the history file.
Application can not reach the postgres¶
Note about credentials: those are managed by developers, applications must not
use admin account.
Check, if postgres service url is set and VM is available. E.g.:
dig +short @your-dns-server <the-service-url>
TODO Link to glossary about the-service-url
ssh to the server and check if service is running or not and why. Some ideas:
netstat -tulpen | grep 5432
systemctl status postgresql.service
systemctl start postgresql.service
journalctl -xe
Accessing data:
sudo -u postgres psql
\d
select * from my_table;
Credentials for replication and AWS backup:
cd /var/local/postgresql; ls
Check postgres configuration, data, replication configuration
cd /var/local/postgresql/data; ls. Interesting files:
- recovery.conf, recovery.done for slave replication, promotion
- postgresql.conf for WAL, checkpoint, replication settings
- pg_hba.conf for access/connection rights
Promote slave to master¶
Once you found out, that master is broken or missing and can not be restarted, you can quickly promote the existing slave to master to enable operations again.
1. Ensure the master is really gone and will not come back, e.g. VM deleted in openstack. Or ensure it is really deactivated by renaming /var/local/postgresql/data/postgresql.conf to postgresql.conf.deactivated
2. Run invoke configure-cluster ... It will automatically promote the existing hot-standby slave to master and register it under the service url.
3. Build addional slaves if needed: add VM, configure cluster as described above
Is there data in the database? I can not find any tables¶
Maybe the developers put their tables inside schemas?
Log in to server and run psql there:
sudo -u postgres psql
(Alternatively run psql with a service url - you will need the password)
- List databases with
\l. Choose the desired one, connect with \c database-name-from-the-list
- List databases with
List relations with
\d. If the list is empty, try\dn(list schemas). If you see entries beyondpublic, use the schema name\dt schema-name-from-previous-list.*and finallyselect count(*) from schema-name.table-name;Of list the biggest tables:
# relpages is the number of 8 Kilobyte pages used by each table. SELECT relname, relpages FROM pg_class ORDER BY 2 DESC; # via https://www.chrisnewland.com/find-biggest-tables-in-postgres-91
How do I get an overview, which servers already installed, which is master and which is slave etc.¶
invoke info-list
Example output:
10.31.35.109 performance1-2 9.6.2 CONFIGURED_SLAVE int-postgres-test-1.example.com
10.31.35.180 performance1-2 REBOOT! 9.6.2 CONFIGURED_MASTER int-postgres-test-2.example.com
Shows:
- ip address of the server
- openstack flavor
- if a reboot is required due to OS update
- which postgresql server version is installed
- the server state: master/slave/not configured/not reachable etc.
- the openstack name of the server
Migrating to a different data center¶
What if I want to change my hosting provider or move to a different data center due some other reason?
Solution A, with downtime for duration of restore, little implementation effort:
- Preparation: instantiate postgres servers in the new data center
- Shut down postgres servers gracefully in the old datacenter (downtime begins)
- Restore in the new datacenter from existing backup, e.g. 30 minutes from S3 bucket for a 27 GB database (depends on network connection and server power)
Solution B, less downtime, more implementation effort:
- Instantiate postgres servers in the new data center
- Set up firewall for postgres communication between data centers, port 5432
- Set up replication from old data center to the new - manually create recovery.conf
- Shut down postgres servers gracefully in the old datacenter (downtime begins)
- Promote existing slave in the new data center to master