Configuring Load Balancing and High Availability with HAProxy

HAProxy is an open source utility that is available on most Linux distributions and cloud platforms for load-balancing TCP and HTTP requests. Users can leverage this tool to distribute incoming client requests among the region server nodes on which Splice Machine instances are running.

The advantages of using HAProxy with Splice Machine clusters are:

  • Users need to point to only one JDBC host and port for one Splice Machine cluster, which may have 100s of nodes.
  • The HAProxy service should ideally be running on a separate node that is directing the traffic to the region server nodes; this means that if one of the region server node goes down, users can still access the data from another region server node.
  • The load balance mechanism in HAProxy helps distribute the workload evenly among the set of nodes; you can optionally select this algorithm in your configuration, which can help increase throughput rate.

The remainder of this topic walks you through:

Configuring HAProxy with Splice Machine

The following example shows you how to configure HAProxy load balancer on a non-Splice Machine node on a Red Hat Enterprise Linux system. Follow these steps:

  1. Install HAProxy as superuser :

    # yum install haproxy
    
  2. Configure the /etc/haproxy/haproxy.cfg file, following the comments in the sample file below:

    In this example, we set the incoming requests to haproxy_host:1527, which uses a balancing algorithm of least connections to distribute among the nodes srv127, srv128, srv129, and srv130. This means that the incoming connection is routed to the region server that has the least number of connections; thus, the client JDBC URL should point to <haproxy_host>:1527.

    The HAProxy manual describes other balancing algorithms that you can use.

    Here is the haproxy.cfg file for this example:

    #---------------------------------------------------------------------
    # Global settings
    #---------------------------------------------------------------------
    global
        # to have these messages end up in /var/log/haproxy.log you will
        # need to:
        #
        # 1) configure syslog to accept network log events.  This is done
        #    by adding the '-r' option to the SYSLOGD_OPTIONS in
        #    /etc/sysconfig/syslog
        #
        # 2) configure local2 events to go to the /var/log/haproxy.log
        #   file. A line like the following can be added to
        #   /etc/sysconfig/syslog
        #
        #    local2.*                       /var/log/haproxy.log
        #
        maxconn 4000
        log 127.0.0.1 local2
        user haproxy
        group haproxy
    
    
    #---------------------------------------------------------------------
    # common defaults that all the 'listen' and 'backend' sections will
    # use if not designated in their block
    #---------------------------------------------------------------------
    defaults
        log global
        retries 2
        timeout connect 30000
        timeout server 50000
        timeout client 50000
    
    #----------------------------------------------------------------------
    # This enables jdbc/odbc applications to connect to HAProxy_host:1527 port
    # so that HAProxy can balance between the splice engine cluster nodes
    # where each node's splice engine instance is listening on port 1527
    #----------------------------------------------------------------------
    listen splice-cluster
        bind *:1527
        log global
        mode tcp
        option tcplog
        option tcp-check
        option log-health-checks
        timeout client 3600s
        timeout server 3600s
        balance leastconn
        server srv127 10.1.1.227:1527 check
        server srv128 10.1.1.228:1527 check
        server srv129 10.1.1.229:1527 check
        server srv130 10.1.1.230:1527 check
    
    #--------------------------------------------------------
    # (Optional) set up the stats admin page at port 1936
    #--------------------------------------------------------
    listen   stats :1936
        mode http
        stats enable
        stats hide-version
        stats show-node
        stats auth admin:password
        stats uri  /haproxy?stats
    

    Note that some of the parameters may need tuning per the sizing and workload nature:

    • The maxconnections parameter indicates how many concurrent connections are served at any given time; you may need to configure this, based on size of the cluster and expected inbound requests.
    • Similarly, the timeout values, which are by default in msecs, should be tuned so that the connection does not get terminated while a long-running query is executed.
  3. Start the HAProxy service:

    As superuser, follow these steps to enable the HAProxy service:

    Distribution Instructions
    # chkconfig haproxy on
    # service haproxy start
    

    If you change the configuaration file, reload it with this command:

    # service haproxy reload
    # systemctl enable haproxy
    ln -s '/usr/lib/systemd/system/haproxy.service
    '/etc/systemd/system/multi-user.target.wants/haproxy.service'
    # systemctl start haproxy

    If you change the configuaration file, reload it with this command:

    # systemctl haproxy reload

    You can find the HAProxy process id in: /var/run/haproxy.pid. If you encounter any issues starting the service, check if Selinux is enabled; you may want to disable it initially.

  4. Connect:

    You can now connect JDBC clients, including the Splice Machine command line interpreter, sqlshell.sh. Use the following JDBC URL:

    jdbc:splice://<haproxy_host>:1527/splicedb;user=splice;password=admin
    

    For ODBC clients to connect through HAProxy, ensure that the DSN entry in file .odbc.ini is pointing to the HAProxy host.

  5. Verify that inbound requests are being routed correctly:

    You can check the logs at /var/log/haproxy.log to make sure that inbound requests are being routed to Splice Machine region servers that are receiving inbound requests on port 1527.

  6. View traffic statistics:

    If you have enabled HAProxy stats, as in our example, you can view the overall traffic statistics in browser at:

    http://<haproxy_host>:1936/haproxy?stats
    

    You’ll see a report that looks similar to this:

    HAProxy statistics

Using HAProxy with Splice Machine on a Kerberos-Enabled Cluster

Your JDBC and ODBC applications can authenticate to the backend region server through HAProxy on a Splice Machine cluster that has Kerberos enabled,

You can enable Kerberos mode on a CDH5.8.x or later cluster using the configuration wizard described here:

https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_sg_intro_kerb.html.

Kerberos and Application Access

As a Kerberos pre-requisite for Splice Machine JDBC and ODBC access:

  • Database users must be added in the Kerberos realm as principals
  • Keytab entries must be generated and deployed to the remote clients on which the applications are going to connect.

HAProxy will then transparently forward the connections to the back-end cluster in Kerberos setup.

Kerberos and ODBC Access

To connect to a Kerberos-enabled cluster with ODBC, follow these steps:

  1. Verify that the odbc.ini configuration file for the DSN you’re connecting to includes this setting:

     USE_KERBEROS=1

    See our Using the Splice Machine ODBC Driver for more information.

  2. A default security principal user must be established with a TGT in the ticket cache prior to invoking the driver. You can use the following command to establish the principal user:

     kinit principal

    Where principal is the name of the user who will be accessing Splice Machine. Enter the password for this user when prompted.

  3. Launch the application that will connect using ODBC. The ODBC driver will use that default Kerberos principal when authenticating with Splice Machine.

Example

This example assumes that you are using the default user name splice. Follow these steps to connect with through HAProxy:

  1. Create the principal in Kerberos Key Distribution Center

    Create the principal splice@kerberos_realm_name in Kerberos Key Distribution Center (KDC). This generates a keytab file named splice.keytab.

  2. Copy the generated keytab file

    Copy the splice.keytab file that to all client systems.

  3. Connect:

    You can now connect to the Kerberos-enabled Splice Machine cluster with JDBC through HAProxy, using the following URL:

    jdbc:splice://<haproxy_host>:1527/splicedb;principal=splice@<realm_name>;keytab=/<path>/splice.keytab
    

Use the same steps to allow other Splice Machine users to connect by adding them to the Kerberos realm and copying the keytab files to their client systems. This example sets up access for a new user name jdoe.

  1. Create the user in your Splice Machine database:

    call syscs_util.syscs_create_user( 'jdoe', 'jdoe' );
    
  2. Grant privileges to the user

    For this example, we are granting all privileges on a table named myTable to the new user:

    grant all privileges on splice.myTable to jdoe;
    
  3. Use KDC to create a new principal and generate a keytab file. For example:

    # kadmin.local addprinc -randkey jdoe@SPLICEMACHINE.COLO
    
  4. Set the password for the new principal:

    # kadmin.local cpw jdoeEnter password for principal "jdoe@SPLICEMACHINE.COLO":
    
  5. Create keytab file jdoe.keytab

    # kadmin: xst -k jdoe.keytab jdoe@SPLICEMACHINE.COLO
    
  6. Copy the generated keytab file to the client system

  7. Connect through HAProxy with the following URL:

    jdbc:splice://ha-proxy-host:1527/splicedb;principal=jdoe@SPLICEMACHINE.COLO;keytab=/home/splice/user1.keytab