The nodes must all be able to communicate with a single PostresSQL database and a single RabbitMQ server. One common configuration is to install and deploy the two services on a single "master" node, but other configurations are possible. It is not necessary and not normally recommended to install PostgreSQL or RabbitMQ on a worker node.
On all nodes,install the python-oq-engine package as described in OpenQuake Engine installation.
The default Postgres configuration does not permit access from other machines: the file /var/lib/pgsql/data/pg_hba.conf
should be modified to allow access to the "openquake2" database from the worker nodes, an example excerpt follows:
host openquake2 oq_admin 192.168.10.0/8 md5 host openquake2 oq_job_init 192.168.10.0/8 md5
The Postgres manual describes a number of runtime configuration parameters that may need to be adjusted depending on your cluster configuration:
- See http://www.postgresql.org/docs/current/static/runtime-config-connection.html#GUC-LISTEN-ADDRESSES
By default Postres allows connections only from localhost. Since celery workers need to push data back to Postgres, it should be exposed to the cluster network:
/var/lib/pgsql/data/postgresql.conf
# This value should be at least the number of worker cores listen_addresses = '*'
- See http://www.postgresql.org/docs/current/static/runtime-config-connection.html#GUC-MAX-CONNECTIONS
By default Postres allows a maximum of 100 simultaneous connections. By default celery will create a worker process for each available core and the OpenQuake Engine uses two connection per worker, so max_connections should be at least twice number of available worker cores (2 * CPU in the cluster).
Note that changing max_connections may also imply operating-system level changes, please see http://www.postgresql.org/docs/current/static/kernel-resources.html for details.
/var/lib/pgsql/data/postgresql.conf
# This value should be at least the number of worker cores max_connections = 100
Note: you have to restart every celery node after a PostgreSQL configuration change or a restart.
In the master node, the following file should be modified to enable the Celery support:
/etc/openquake/openquake.cfg:
[celery]
# enable celery only if you have a cluster
use_celery = false
Please NOTE: the /etc/openquake/openquake.cfg file can be overridden by an openquake.cfg file in the current working directory.
On all worker nodes, the following file should be modified to refer to the Postres and Rabbit servers:
/etc/openquake/openquake.cfg
(see the /etc/openquake/openquake_workers.cfg
as reference):
[amqp]
# Replace localhost with hostname for RabbitMQ
host = localhost
port = 5672
# See the RabbitMQ "Access Control" page for details of the vhost parameter
# http://www.rabbitmq.com/access-control.html
vhost = /
[database]
name = openquake
# replace localhost with the hostname for the Postgres DB
host = localhost
port = 5432
Jobs can be submitted through the master node using the oq-engine
command line interface.
celeryd
must run all of the worker nodes. It can be started using the following commands
cd /usr/share/openquake/engine && celery worker --purge -Ofair &
but we strongly suggest to use supervisord
to manage the celery deamons.
supervisord
can be installed with
sudo yum install supervisor
An example of configuration for the OpenQuake Engine is available in supervisor.md.
The worker nodes should be isolated from the external network using either a dedicated internal network or a firewall. Additionally, access to the RabbitMQ, and PostgreSQL ports should be limited (again by internal LAN or firewall) so that external traffic is excluded.
It is not recommended to run the Celery daemon as root.
If using supervisord
(or similar) to execute celeryd
at boot time please ensure that celery is not run as the root user.
Storage requirements depend a lot on the type of calculations run. On a worker node you will need just the space for the operating system, the logs and the OpenQuake installation: less than 20GB are usually enough. Workers can be also diskless (using iSCSI or NFS for example).
On the master node you will also need space for:
- the users' home directory (usually located under
/home
): it contains the calculations datastore (hdf5
files located in theoqdata
folder) - the PostgreSQL
openquake2
database (on RHEL is usually located under/var/lib/pgsql
) - RabbitMQ mnesia dir (on RHEL usually located under
/var/lib/rabbitmq
)
On large installation we strongly suggest to create separate partition for /home
, /var
, PostgreSQL (/var/lib/pgsql
) and RabbitMQ (/var/lib/rabbitmq
).
Those partitions should be stored on fast local disks or on a high performance SAN (i.e. using a FC or a 10Gbps link).