Mar 17th, 2010, 09:57 AM
Problem with monitoring - frequent alerts
I'm running single instance CF configuration with one web application. I turned on monitoring notification (Hyperic) for case of web app unavailability.
Now I randomly receive alert emails (Subject "An alert has been triggered - Deployment myapp - context unavailable") that the application is not running, but it obviously is running fine.
In access log of Apache I see two requests every 15 seconds:
127.0.0.1 - - [17/Mar/2010:15:37:33 +0100] "GET /server-status?auto HTTP/1.1" 200 438 "-" "Jakarta Commons-HttpClient/3.1"
127.0.0.1 - - [17/Mar/2010:15:37:33 +0100] "GET /myapp HTTP/1.1" 200 - "-" "Jakarta Commons-HttpClient/3.1"
At the time when I get the alert emails, everything in log still seems to be fine - two requests.
Do you have idea what could be wrong? Did anybody have this kind of problem?
Mar 17th, 2010, 03:12 PM
A few questions.
1. Do you ever see a statusCode != 200 for GET /myapp in the Apache log?
2. The management agent also does an HTTP GET /myapp on Tomcat/tc server port 8080. It is possible that this GET is failing instead (although the Apache one should fail similarly). Are there any errors in the tomcat log?
3. When you log into the CF.com console and go to the deployment details page - what do you see? Are there error icons for the app server tier, web server tier or both (or neither)?
Mar 17th, 2010, 05:15 PM
3) No errors, everything fine (Apache, TC, DB) - all green icons.
Couple of times I even worked with the app when the alert happened. But obviously I worked with the app without any problem.
Every 15 seconds there is a checking request in apache log, which means 4 cycles per minute. I checked all requests at the time of alert (plus/minus some delta) and all requests were fine with response 200, however I received the alert email.
Do you, please, know how exactly the monitoring engine decides that the app is not running? Thanks.
Mar 18th, 2010, 03:45 PM
Any help, please?
Maybe the monitoring does something wrong for my configuration or so?
Mar 18th, 2010, 04:44 PM
We're still looking into this. Hopefully, we'll have more information in a couple of days.
Mar 18th, 2010, 05:01 PM
If you need any additional information from me, let me know.
Mar 19th, 2010, 10:52 AM
To answer your previous question. A context is considered unavailable if GET /context fails to connect or returns a status other than 200 (after following redirects). This check is done on both Apache port 80 and on Tomcat port 8080.
After looking through the logs, the most likely explanation is that GET /context to tomcat port 8080 is sometimes failing. I'm looking into changing the monitoring code to only generate an alert after multiple failures.
Sorry for the inconvenience.
Mar 19th, 2010, 11:19 AM
to my question: so it means, that if I turned on sending a notification on application unavailability (during deployment creation), I will get email alert whenever anything goes wrong EITHER with tc call on :8080 OR with apache call on :80, right?
I checked again tc logs (/var/log/tcserver-catalina.out) and also my application log4j outputs and there is no error at the time of the alert emails. If you have any idea how I can help with this issue, please, let me know.
Mar 22nd, 2010, 12:28 AM
I've been getting similar false alerts from our two clusters.
The logs are showing no errors and do not appear to be under load when the alerts fire.
Is there some kind of timeout parameter that needs to be increased?
Mar 24th, 2010, 11:30 PM
CF.com has been updated with some fixes that should eliminate spurious context unavailable alerts. Please let us know if you continue to have problems.
Tags for this Thread