Postgres RDS Operator Handbook¶

Create cluster for a new team/problem domain¶

create new VM instances and install postgres as described in getting-started
configure the fresh cluster with invoke configure_cluster ..., see invoke --help configure_cluster

(Rolling) upgrade of a cluster¶

To increase storage, or to upgrade to a new postgres minor version:

create new VM instances with preinstalled postgres or create, then install
invoke migrate-to-master ...

Ubuntu upgrade (security etc.)¶

Once monitoring detects that VMs need a reboot after an unattended upgrade:

choose a maintenance window
reboot the VMs. The order (master first or slave first) does not matter
for the duration of a reboot the postgres service is unavailable

Point-in-time restore if an application mangled the data¶

A restore should always be made to a new cluster (different name).

So create new VM instances as described above.
Restore using your configure-cluster playbook
developers can now:
- switch their application to restored DB (via application config, use new services url)
- copy data from one or multiple tables over
- or just compare the current and previous state
No work for the infrastructure team.

Developers forgotten their admin password¶

The admin password is generated by the ansible playbook only once and printed at the end of the playbook execution.

To manually reset the admin password:

ssh to the postgres master instance
sudo -u postgres psql
Now you can change the password with \password admin Quit with Ctrl+D

Note: do not use ALTER USER 'admin' WITH PASSWORD '... in psql since it will save clear text password in the history file.

Application can not reach the postgres¶

Note about credentials: those are managed by developers, applications must not use admin account.

Check, if postgres service url is set and VM is available. E.g.:

dig +short @your-dns-server <the-service-url>

TODO Link to glossary about the-service-url

ssh to the server and check if service is running or not and why. Some ideas:

netstat -tulpen | grep 5432
systemctl status postgresql.service
systemctl start postgresql.service
journalctl -xe

Accessing data:

sudo -u postgres psql
  \d
  select * from my_table;

Credentials for replication and AWS backup:

cd /var/local/postgresql; ls

Check postgres configuration, data, replication configuration cd /var/local/postgresql/data; ls. Interesting files:

recovery.conf, recovery.done for slave replication, promotion
postgresql.conf for WAL, checkpoint, replication settings
pg_hba.conf for access/connection rights

Promote slave to master¶

Once you found out, that master is broken or missing and can not be restarted, you can quickly promote the existing slave to master to enable operations again.

1. Ensure the master is really gone and will not come back, e.g. VM deleted in openstack. Or ensure it is really deactivated by renaming /var/local/postgresql/data/postgresql.conf to postgresql.conf.deactivated

2. Run invoke configure-cluster ... It will automatically promote the existing hot-standby slave to master and register it under the service url.

3. Build addional slaves if needed: add VM, configure cluster as described above

Is there data in the database? I can not find any tables¶

Maybe the developers put their tables inside schemas?

Log in to server and run psql there:
```
sudo -u postgres psql
```

(Alternatively run psql with a service url - you will need the password)

List databases with \l. Choose the desired one, connect with

\c database-name-from-the-list
List relations with \d. If the list is empty, try \dn (list schemas). If you see entries beyond public, use the schema name \dt schema-name-from-previous-list.* and finally select count(*) from schema-name.table-name;

Of list the biggest tables:

# relpages is the number of 8 Kilobyte pages used by each table.
SELECT relname, relpages FROM pg_class ORDER BY 2 DESC;
# via https://www.chrisnewland.com/find-biggest-tables-in-postgres-91

How do I get an overview, which servers already installed, which is master and which is slave etc.¶

invoke info-list

Example output:

10.31.35.109    performance1-2            9.6.2 CONFIGURED_SLAVE     int-postgres-test-1.example.com
10.31.35.180    performance1-2    REBOOT! 9.6.2 CONFIGURED_MASTER    int-postgres-test-2.example.com

Shows:

ip address of the server
openstack flavor
if a reboot is required due to OS update
which postgresql server version is installed
the server state: master/slave/not configured/not reachable etc.
the openstack name of the server

Migrating to a different data center¶

What if I want to change my hosting provider or move to a different data center due some other reason?

Solution A, with downtime for duration of restore, little implementation effort:

Preparation: instantiate postgres servers in the new data center
Shut down postgres servers gracefully in the old datacenter (downtime begins)
Restore in the new datacenter from existing backup, e.g. 30 minutes from S3 bucket for a 27 GB database (depends on network connection and server power)

Solution B, less downtime, more implementation effort:

Instantiate postgres servers in the new data center
Set up firewall for postgres communication between data centers, port 5432
Set up replication from old data center to the new - manually create recovery.conf
Shut down postgres servers gracefully in the old datacenter (downtime begins)
Promote existing slave in the new data center to master