Replication and Load Balancing with PostgreSQL and Pgpool2

19. 6. 2013 Johan Hornof

During last year, we got some really good experience with PostgreSQL so we decided to build a cluster solution to improve performance, reliability and availability.

We work here with Postgresql 9.2 and Pgpool2 3.2.4. Sometimes you'd need to change text strings according to the version you use.

Assume we have three machines with Debian Linux (tested with squeeze, should work also with wheezy) on the same network:

192.168.0.1 - for pgpool2
192.168.0.2 - for first postgresql server
192.168.0.3 - for second postgresql server

Of course you can have as many postgresql servers as you like or run pgpool2 on one of them (not recommended but for testing purposes it can cave you some resources).

As the pgpool uses the same protocol as postgresql, we can connect the application to pgpool the same way as to the database. What's going on behind is hidden for the application so it thinks that pgpool is the database.

Step 1 - Add the PostgreSQL repository

Do this on all three machines:

Add the postgresql repository into /etc/apt/sources.list

deb http://apt.postgresql.org/pub/repos/apt/ squeeze-pgdg main

Import the repository key and refresh the package database:

wget --quiet -O - http://apt.postgresql.org/pub/repos/apt/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update

Step 2 - Install the PostgreSQL database

Ssh to 192.168.0.2 and Install the postgresql package:

apt-get install postgresql

(more detailed info here: https://wiki.postgresql.org/wiki/Apt)

Change the default postges password:

sudo su postgres
psql
alter user postgres with password ‘newpassword’;
\q

Repeat this step also on the second database machine 192.168.0.3.

Step 3 - Set permissions to access databases from pgpool2

Open /etc/postgresql/9.2/main/pg_hba.conf, scroll to the bottom and add the following lines (you can replace the postgres username with any other user you need or add another line with another user):

host    all             postgres        192.168.0.0/24          md5

Restart postgresql afterwards and repeat this step also on the second database machine 192.168.0.3.

Step 4 - Prepare databases to be accessed by pgpool2

Pgpool uses a few special functions which are not in postgresql by default so we have to install them. Ssh to the postgresql machine 192.168.0.2 and do the following:

apt-get install postgresql-9.2-pgpool2

This installed for us some shared object libraries with compiled PostgreSQL C functions. We'll focus on those three which have to be imported into PostgreSQL databases which will be accessed with pgpool2:

/usr/lib/postgresql/9.2/lib/pgpool-recovery.so
/usr/lib/postgresql/9.2/lib/pgpool-regclass.so
/usr/lib/postgresql/9.2/lib/pgpool-walrecrunning.so

Download the pgpool2 sources from here:

http://www.pgpool.net/mediawiki/index.php/Downloads

and find those files:

sql/pgpool-recovery/pgpool-recovery.sql.in
sql/pgpool-regclass/pgpool-regclass.sql.in
sql/pgpool-regclass/pgpool-walrecrunning.sql.in

and make their copies as follows:

sql/pgpool-recovery/pgpool-recovery.sql
sql/pgpool-regclass/pgpool-regclass.sql
sql/pgpool-regclass/pgpool-walrecrunning.sql 

now open them all and replace "MODULE PATHNAME" with the path to the shared object. The files should then look like this:

CREATE OR REPLACE FUNCTION pgpool_recovery(text, text, text)
RETURNS bool
AS '/usr/lib/postgresql/9.2/lib/pgpool-recovery', 'pgpool_recovery'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION pgpool_remote_start(text, text)
RETURNS bool
AS '/usr/lib/postgresql/9.2/lib/pgpool-recovery', 'pgpool_remote_start'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION pgpool_switch_xlog(text)
RETURNS text
AS '/usr/lib/postgresql/9.2/lib/pgpool-recovery', 'pgpool_switch_xlog'
LANGUAGE C STRICT;

CREATE OR REPLACE FUNCTION pgpool_regclass(cstring)
RETURNS oid
AS '/usr/lib/postgresql/9.2/lib/pgpool-regclass', 'pgpool_regclass'
LANGUAGE C STRICT;

CREATE OR REPLACE FUNCTION pgpool_walrecrunning()
RETURNS bool
AS '/usr/lib/postgresql/9.2/lib/pgpool-walrecrunning', 'pgpool_walrecrunning'
LANGUAGE C VOLATILE STRICT;

Now you can execute them one by one like this:

psql -h localhost -U postgres -W -f pgpool-recovery.sql dbname
psql -h localhost -U postgres -W -f pgpool-regclass.sql dbname
psql -h localhost -U postgres -W -f pgpool-walrecrunning.sql dbname

This step needs to be done also on the second database server 192.168.0.3.

That's it. The PostgeSQL databases are now ready to be accessed by pgpool2.

Step 5 - Install pgpool2

Ssh to 192.168.0.1 and install pgpool2:

apt-get install pgpool2

Ge the md5 of postges user password (and/or also the user you want to access the databases with). The pg_md5 utility which is shipped together with pgpool generates a wrong md5 (thanks to ftraian). So the safest method how to get the right md5 is this (execute on one of the the postgresql databases):

select passwd from pg_shadow where usename = 'username';

Open the /etc/pgpool2/pcp.conf file and add the following line:

user:md5password

so it will look similar to this:

postgres:ahb52d7w3o4n2fg8x8wfd4rdbctbjfkq

Then, configure /etc/pgpool2/pgpool.conf. Your node configuration will look like this:

backend_hostname0 = '192.168.0.2'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/9.2/main/'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '192.168.0.3'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/9.2/main/'
backend_flag1 = 'ALLOW_TO_FAILOVER'

The other pgpool2 setup options are described here:

http://pgpool.projects.pgfoundry.org/pgpool-II/doc/tutorial-en.html

and nicely explained options of the configuration file here:

http://pgpool.projects.pgfoundry.org/pgpool-II/doc/pgpool-en.html

We used replication and load balancing but there are also other options as paralel query etc. which are beyond the scope of this tutorial.

If you'll decide to use master-slave setup, then the hostname with ID 0 (e.g. backend_hostname0) is the master, others are slaves.

Step 6 - Running pgpool2

Run pgpool with simply writing

pgpool 

into the terminal. Shut it down with

pgpool stop

It waits for the clients to disconnect so when you need quick shutdown, you can use 

pgpool -m fast stop

(look to pgpool --help to see more options).

Sometimes you need to see what's going on so let's tell pgpool to log the activity:

pgpool -n -d > /tmp/pgpool.log 2>&1 &

With this command, you are telling pgpool to start in the debug mode and log the activity into /tmp/pgpool.log. Be careful when doing this because the log size increases very quickly. Some peoplerecommend using apache rotatelogs which is of course an option:

pgpool -n 2>&1 | /usr/local/apache2/bin/rotatelogs /tmp/pgpool.log 86400

If you have troubles with connecting, check the log file which will tell you what you're missing - most common issues are caused by wrong permissions or missing C functions in the backend databases.

Step 7 - What next?

You may want to install pgpooladmin which is the graphical to pgpool interface running under apache (or any other web server) where you can see the node status, connections etc.

Next helpful utility is monit which has a bunch of plugins for monitoring PostgreSQL. This tool is easy to install with:

apt-get install monit