Hi, 11g engine is more complex than ever which leads to 'better bugs' that are more difficult to solve, enlarged attack surface for hacking, more often unexpected side-effects of so called fixes, more difficulties in finding ways to turn off unwanted features. as Dilbert said: "if it ain't broke it doesn't have enough features yet" would 11g have enough features now? Regards, Andre 2007/10/3, Jeremiah Wilton <jeremiah@xxxxxxxxxxx>: > > Am I the only one who has been unable to do much with this feature due > to the woefully absent documentation? Three components of "fault > diagnosability" in particular seem very interesting: > > - automatic hang detection > - automatic reactive "health checks" > - incident packages as a replacement for RDA > > Hang detection seems like a great idea, but there is no information on > precisely what constitutes a "hang" according to DIAG and DIA0. These > processes seem never to wake up, even in the most dire of hanging > situations. I did find that by default in single-instance databases, > the _hang_resolution, _hm_analysis_output_disk and _hm_log_incidents > parameters are set to FALSE, which I take to mean the feature is turned > off. Even turned on, long hangs involving chains of waiters visible in > hanganalyze output do not trigger any actions that I can discern. This > is slightly complicated by the fact that two components of "fault > diagnosability" share the initials HM, and packages, parameters and > views use HM interchangeably to mean "hang manager" and "heath monitor". > > As for Health Checks, there is no documentation indicating what kinds of > events or incidents might result in a "reactive" health check. The > existence of reactive health checks is repeatedly asserted in the > documentation, and there is even a parameter called _diag_hm_rc_enabled > with the description "Parameter to enable/disable Diag HM Reactive > Checks". Set to FALSE by default, this parameter does nothing in the > event of a badly degraded and hanging system either. We are left to > wonder what "reactive" health checks react to! > > Finally, the incident packaging service works well enough, but is > predicated completely upon the notion that any and all problems will be > associated with a fatal error of some kind. Anything that does not dump > ORA-600 or another fatal error will not result in an "incident" and thus > there is nothing to package. There is apparently no provision for > problems that do not dump on an error. So, an on-demand incident package > apparently cannot be created. Thus, despite the incident payloads > having many of the same contents as the horrid RDA of yore, you cannot > generate one on demand in a supported way. You can shoot a server > process with a SIGSEGV, but I cannot imagine that is how Oracle intends > us to get diagnostic data for opening an SR. > > You can probably detect that I am frustrated but I have been playing > with this feature set for weeks and it is a frustrating morass of > nonworking undocumented wastes of server memory. Remember, we are all > now running two extra background processes, DIAG and DIA0, just for this > feature. They are up and running and using memory on all of our 11g > systems even if they do nothing and are turned off at the parameter > level by default. > > I am ranting here in hopes that someone else has gotten further than I > have or knows someone on the inside who can shed some light on these > concerns. > > Thanks, > > Jeremiah Wilton > ORA-600 Consulting > -- > //www.freelists.org/webpage/oracle-l > > >