Re: Oracle home headscratcher

  • From: Tim Gorman <tim.evdbt@xxxxxxxxx>
  • To: cjnewman@xxxxxxxxxxxxx, Iggy Fernandez <iggy_fernandez@xxxxxxxxxxx>, William Beldman <wbeldma@xxxxxx>, "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Wed, 19 Feb 2020 14:26:27 -0800

In future, either the "fuser" or "lsof" utilities would have listed files opened along with the process that opened them, based on a directory or a filesystem, just FYI.

Solaris typically includes "fuser" on install, but "lsof" can be downloaded from SunFreeware <http://www.sunfreeware.com/programlistsparc10.html#lsof>.



On 2/19/2020 11:09 AM, Newman, Christopher wrote:


Thanks Chris, Rajeev, J, William and Iggy- turns out it was in fact the sticky bit somehow got flipped on one of the files.  We copied a fresh home over and are in good shape.  How it got flipped is something we’ll continue to pursue.

Thanks again! - Chris

*From:*Iggy Fernandez <iggy_fernandez@xxxxxxxxxxx>
*Sent:* Wednesday, February 19, 2020 12:21 PM
*To:* William Beldman <wbeldma@xxxxxx>; oracle-l@xxxxxxxxxxxxx; Newman, Christopher <cjnewman@xxxxxxxxxxxxx>
*Subject:* Re: Oracle home headscratcher

truss would have diagnosed the issue. sqlplus is a frontend so you would either have to run truss directly against the child oracle process or use "truss -f sqlplus ..." to trace child processes. -c produces a summary.

*–c***

Counts traced system calls, faults, and signals rather than displaying the trace line-by-line. A summary report is produced after the traced command terminates or when truss is interrupted. If –f is also specified, the counts include all traced system calls, faults, and signals for child processes.

*The Northern California Oracle Users Group is a volunteer-run 501(c)(3) organization that has been serving the Oracle Database community of Northern California for more than thirty years by organizing four conferences a year and publishing a quarterly journal. Download the complete digital archive of the NoCOUG Journal using the Linux command: “wget www.nocoug.org/Journal/NoCOUG_Journal_{2001..2019}{02..12..3}.pdf <http://www.nocoug.org/Journal/NoCOUG_Journal_%7b2001..2019%7d%7b02..12..3%7d.pdf>”.*

------------------------------------------------------------------------

*From:*oracle-l-bounce@xxxxxxxxxxxxx <mailto:oracle-l-bounce@xxxxxxxxxxxxx> <oracle-l-bounce@xxxxxxxxxxxxx <mailto:oracle-l-bounce@xxxxxxxxxxxxx>> on behalf of Newman, Christopher <cjnewman@xxxxxxxxxxxxx <mailto:cjnewman@xxxxxxxxxxxxx>>
*Sent:* Tuesday, February 18, 2020 6:40 PM
*To:* William Beldman <wbeldma@xxxxxx <mailto:wbeldma@xxxxxx>>; oracle-l@xxxxxxxxxxxxx <mailto:oracle-l@xxxxxxxxxxxxx> <oracle-l@xxxxxxxxxxxxx <mailto:oracle-l@xxxxxxxxxxxxx>>
*Subject:* RE: Oracle home headscratcher

Yes, that didn’t turn up much.  Unfortunately we’ve rebooted the server (thankfully DEV) and the problem has gone away.

What we did notice is that the shutdown scripts, which include sqlplus calls to shutdown each database, worked fine.  That script was called by root of course, so now we’re thinking it’s something to do with the oracle user and either a permission or resource issue.

*From:*William Beldman <wbeldma@xxxxxx <mailto:wbeldma@xxxxxx>>
*Sent:* Tuesday, February 18, 2020 8:17 PM
*To:* Newman, Christopher <cjnewman@xxxxxxxxxxxxx <mailto:cjnewman@xxxxxxxxxxxxx>>; oracle-l@xxxxxxxxxxxxx <mailto:oracle-l@xxxxxxxxxxxxx>
*Subject:* RE: Oracle home headscratcher

Can you run truss against sqlplus/tnsping/etc. to figure out what it’s doing over the course of those 10 minutes?

*From:*oracle-l-bounce@xxxxxxxxxxxxx <mailto:oracle-l-bounce@xxxxxxxxxxxxx> <oracle-l-bounce@xxxxxxxxxxxxx <mailto:oracle-l-bounce@xxxxxxxxxxxxx>> *On Behalf Of *Newman, Christopher
*Sent:* February 18, 2020 6:38 PM
*To:* oracle-l@xxxxxxxxxxxxx <mailto:oracle-l@xxxxxxxxxxxxx>
*Subject:* Oracle home headscratcher

Hi All,

We’ve got multiple Oracle homes on a Solaris 11.4 server (T8 SPARC).  We are having issues with a single home (12.2.0.1), while others are fine (19.5, a different 12.2.0.1 home).  We haven’t seen this problem on any other hosts, and no known modifications to the environment happened prior to the behavior we’re seeing.

Sqlplus appears to hang, but does eventually connect (by eventually, I’m talking 10+ minutes, and a local connection).

This behavior extends to tnsping (times out, we traced but didn’t get much), but running opatch for example, is not affected.

Standby database on the system fall behind.

External connections to databases are not impacted; only attempting to run the binaries locally from the problematic home exhibit the symptoms.

Our only clue on the host  is very high utilization of our /u01 mount point, but so far our Unix crew hasn’t been able to isolate which process is driving the IO.

Yesterday, on a whim we switched the problematic Oracle home permissions to 755 (from 700), and things “magically” worked and IO plummeted instantly.

Today, we switched back to 700 to see if we could break thing again; we did.  However in this second case, chmod’ing the problematic home back to 755 had zero effect and the hanging behavior persists.

Any thoughts on what to look at next?  Again, the problem is isolated to just this single home.

Thanks- Chris


Other related posts: