During last year, we got some really good experience with PostgreSQL so we decided to build a cluster solution to improve performance, reliability and availability.
We work here with Postgresql 9.2 and Pgpool2 3.2.4. Sometimes you'd need to change text strings according to the version you use.
Assume we have three machines with Debian Linux (tested with squeeze, should work also with wheezy) on the same network:
192.168.0.1 - for pgpool2
192.168.0.2 - for first postgresql server
192.168.0.3 - for second postgresql server
Of course you can have as many postgresql servers as you like or run pgpool2 on one of them (not recommended but for testing purposes it can cave you some resources).
As the pgpool uses the same protocol as postgresql, we can connect the application to pgpool the same way as to the database. What's going on behind is hidden for the application so it thinks that pgpool is the database.
Step 1 - Add the PostgreSQL repository
Do this on all three machines:
Add the postgresql repository into /etc/apt/sources.list
deb http://apt.postgresql.org/pub/repos/apt/ squeeze-pgdg main
Import the repository key and refresh the package database:
wget --quiet -O - http://apt.postgresql.org/pub/repos/apt/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
Step 2 - Install the PostgreSQL database
Ssh to 192.168.0.2 and Install the postgresql package:
apt-get install postgresql
(more detailed info here: https://wiki.postgresql.org/wiki/Apt)
Change the default postges password:
sudo su postgres
alter user postgres with password ‘newpassword’;
Repeat this step also on the second database machine 192.168.0.3.
Step 3 - Set permissions to access databases from pgpool2
Open /etc/postgresql/9.2/main/pg_hba.conf, scroll to the bottom and add the following lines (you can replace the postgres username with any other user you need or add another line with another user):
host all postgres 192.168.0.0/24 md5
Restart postgresql afterwards and repeat this step also on the second database machine 192.168.0.3.
Step 4 - Prepare databases to be accessed by pgpool2
Pgpool uses a few special functions which are not in postgresql by default so we have to install them. Ssh to the postgresql machine 192.168.0.2 and do the following:
apt-get install postgresql-9.2-pgpool2
This installed for us some shared object libraries with compiled PostgreSQL C functions. We'll focus on those three which have to be imported into PostgreSQL databases which will be accessed with pgpool2:
Download the pgpool2 sources from here:
and find those files:
and make their copies as follows:
now open them all and replace "MODULE PATHNAME" with the path to the shared object. The files should then look like this:
CREATE OR REPLACE FUNCTION pgpool_recovery(text, text, text) RETURNS bool AS '/usr/lib/postgresql/9.2/lib/pgpool-recovery', 'pgpool_recovery' LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION pgpool_remote_start(text, text) RETURNS bool AS '/usr/lib/postgresql/9.2/lib/pgpool-recovery', 'pgpool_remote_start' LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION pgpool_switch_xlog(text) RETURNS text AS '/usr/lib/postgresql/9.2/lib/pgpool-recovery', 'pgpool_switch_xlog' LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION pgpool_regclass(cstring) RETURNS oid AS '/usr/lib/postgresql/9.2/lib/pgpool-regclass', 'pgpool_regclass' LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION pgpool_walrecrunning() RETURNS bool AS '/usr/lib/postgresql/9.2/lib/pgpool-walrecrunning', 'pgpool_walrecrunning' LANGUAGE C VOLATILE STRICT;
Now you can execute them one by one like this:
psql -h localhost -U postgres -W -f pgpool-recovery.sql dbname psql -h localhost -U postgres -W -f pgpool-regclass.sql dbname psql -h localhost -U postgres -W -f pgpool-walrecrunning.sql dbname
This step needs to be done also on the second database server 192.168.0.3.
That's it. The PostgeSQL databases are now ready to be accessed by pgpool2.
Step 5 - Install pgpool2
Ssh to 192.168.0.1 and install pgpool2:
apt-get install pgpool2
Ge the md5 of postges user password (and/or also the user you want to access the databases with). The pg_md5 utility which is shipped together with pgpool generates a wrong md5 (thanks to ftraian). So the safest method how to get the right md5 is this (execute on one of the the postgresql databases):
select passwd from pg_shadow where usename = 'username';
Open the /etc/pgpool2/pcp.conf file and add the following line:
so it will look similar to this:
Then, configure /etc/pgpool2/pgpool.conf. Your node configuration will look like this:
backend_hostname0 = '192.168.0.2' backend_port0 = 5432 backend_weight0 = 1 backend_data_directory0 = '/var/lib/postgresql/9.2/main/' backend_flag0 = 'ALLOW_TO_FAILOVER' backend_hostname1 = '192.168.0.3' backend_port1 = 5432 backend_weight1 = 1 backend_data_directory1 = '/var/lib/postgresql/9.2/main/' backend_flag1 = 'ALLOW_TO_FAILOVER'
The other pgpool2 setup options are described here:
and nicely explained options of the configuration file here:
We used replication and load balancing but there are also other options as paralel query etc. which are beyond the scope of this tutorial.
If you'll decide to use master-slave setup, then the hostname with ID 0 (e.g. backend_hostname0) is the master, others are slaves.
Step 6 - Running pgpool2
Run pgpool with simply writing
into the terminal. Shut it down with
It waits for the clients to disconnect so when you need quick shutdown, you can use
pgpool -m fast stop
(look to pgpool --help to see more options).
Sometimes you need to see what's going on so let's tell pgpool to log the activity:
pgpool -n -d > /tmp/pgpool.log 2>&1 &
With this command, you are telling pgpool to start in the debug mode and log the activity into /tmp/pgpool.log. Be careful when doing this because the log size increases very quickly. Some peoplerecommend using apache rotatelogs which is of course an option:
pgpool -n 2>&1 | /usr/local/apache2/bin/rotatelogs /tmp/pgpool.log 86400
If you have troubles with connecting, check the log file which will tell you what you're missing - most common issues are caused by wrong permissions or missing C functions in the backend databases.
Step 7 - What next?
You may want to install pgpooladmin which is the graphical to pgpool interface running under apache (or any other web server) where you can see the node status, connections etc.
Next helpful utility is monit which has a bunch of plugins for monitoring PostgreSQL. This tool is easy to install with:
apt-get install monit