Dec 22nd, 2009, 02:29 PM
metrics and error reporting
Does anyone have suggestions for best practices for gathering metrics throughout a SI flow?
I would like to track the following:
received message counts for each step in the flow
passed message counts for each step in the flow
failed message counts for each step in the flow
I need something a bit more sophisticated than a simple CountingChannelInterceptor. In some cases I need to inspect message headers and in other cases I need to inspect the message itself to get counts of specific categories. For example, I have a messageType property. I need to report the counts on the messageType types. I also have info in the message itself which I need to run xpath or something to get the category for the metrics.
I would also like to provide some visibility mechanism for failures.
I don't think I can simply send an email for "data failures" because I will get too many. I would like a generic interface so I can change the actual reporting implementation. At first I might simply log the failures. As I do more research, I might end up storing the failures in a database or something along those lines.
Dec 26th, 2009, 04:48 AM
Instead of cluttering the config with interceptors, you can also analyze your logs (dare i say awk?). Since MessageHistory is implemented now that gives quite nice results.
The advantage of this is that you will spend most your time making your logs better, which is never a waste of time. If you clutter the config it becomes harder to debug when things go wrong.
Of course there are all kinds of hooks in the framework that can help you do it in Java if that's what you want, just don't forget to look a bit further if all you need is to gather some statistics.
What's important to do is add appropriate logging to your error channels, but that's an investment that will pay back quite quickly.
Dec 29th, 2009, 02:41 PM
Our current solution is analyzing the logs with various scripts like you suggest.
However, there are a couple annoying things with that:
We have multiple instances of SI running on multiple hosts so combining the results of all the logs is a bit annoying.
We don't have enough unit tests at the moment to ensure that an innocent change to a log message won't result in a broken log analysis script.
Our customer makes up metrics requirements as we go so we need as much power and flexibility as we can get. For example, the customer may ask, "how many messages from W got filtered between X date and Y date for Z reason?"
The other problem I have right now is that the error logging is pretty embedded in the service activators. I'm looking at making the error handling framework more generic so all error messages go through an error channel which has error handler subscribers. Once subscriber might do the logging. Another subscriber might store info in a db or something like that. We are still trying to work through the pros and cons of different approaches.
Of course given the types of requests that our customer asks us for, we really need to store the outcome of every routing, filtering, splitting and aggregating decision point in the system, along with validation errors, exceptions etc.
Dec 29th, 2009, 03:14 PM
You man want to consider splunk....and no I'm not affiliated it's just something we've used in the past.
Dec 30th, 2009, 02:02 AM
Sounds like you might come up with some interesting ideas as to how you would like to hook into our message history mechanism.
Tags for this Thread