Thanks Gary. I tried that and I get very erratic results with it; certainly nothing that would give me any confidence about using it in a production system. The heartbeat errors show up after the configured heartbeat interval that was set and the consumer then reconnects to a different server via the LB. But then the consumer begins to receive messages at a much slower rate than they are being produced, and only receives every second message. This is with a producer that has a stable, unbroken connection to the cluster and is sending messages at a rate of 1 every 250ms in my test.
Only after the failed server machine is restored to the cluster does the consumer catch up with both the messages that appeared to go missing, and the production rate. Very odd, but repeatable every time I run the tests and even with a completely rebuilt cluster.
Last edited by davison; Oct 27th, 2012 at 02:54 PM.
Public Key: 0xE855B3EA