[Spacewalk-list] Spacewalk 1.5 Monitoring issue

David Nutter davidn at bioss.ac.uk
Thu Aug 4 11:41:13 UTC 2011


On Tue, Aug 02, 2011 at 07:34:46PM +0100, David Nutter wrote:
> On Tue, Jul 26, 2011 at 11:27:19AM +0530, trm asn wrote:
> 
> *snip*
>  
> > 'ProbeRecord', 'get_hostAddress') ; that error I have resolved by editing
> > "NPRecords.pm" & changed the parameter "hostAddress" to "HOSTADDRESS" .
> > 
> > now my "scout config push" are getting success, but unfortunately still the
> > monitoring status is "PENDING, Awaiting Update" .
> > 
> > 
> > -bash-3.2$ rhn-runprobe --probe=261 --log=all=4
> > Can't locate object method "hostname" via package
> > "NOCpulse::Probe::Config::ProbeRecord" at
> > blib/lib/Class/MethodMaker/Engine.pm (autosplit into
> > blib/lib/auto/Class/MethodMaker/Engine/new.al) line 941.
> > 
> > Any idea...
> 
> Hi, 
> 

*snip*

After a bit more digging I've narrowed down the problem area. 

The method NOCpulse::Probe::Shell::Unix::connect seems to be failing
when running under the scheduler but not when running under
rhn-runprobe.

I jacked up logging for NOCpulse::Probe::Shell::Unix and this is what
I see in kernel-err.log when monitoring is running (sorry for the
mess, there seem to be two different logging frameworks in use which
causes weird things to happen like multiple timestamps):

2011-08-04 12:31:44  2011-08-04 12:31:43 NOCpulse::Probe::Shell::Unix::connect Execute '/usr/bin/ssh -l nocpulse -p 4545 -i /var/lib/nocpulse/.ssh/nocpulse-
identity -o StrictHostKeyChecking=no -o BatchMode=yes 212.219.57.230 /bin/sh -s'
2011-08-04 12:31:44  2011-08-04 12:31:43 NOCpulse::Probe::Shell::Unix::connect Opened pipe to 21109
2011-08-04 12:31:44  2011-08-04 12:31:43 (eval) Sending >>>echo `uname -s`#`uname -r`#$$
2011-08-04 12:31:44  echo NOCPULSE-1312457503-STATUS $?
2011-08-04 12:31:44  <<<
2011-08-04 12:31:44  2011-08-04 12:31:43 (eval) Done
2011-08-04 12:31:44  2011-08-04 12:31:43 NOCpulse::Probe::Shell::Unix::read_result stdout: >>><<<
2011-08-04 12:31:44  2011-08-04 12:31:43 NOCpulse::Probe::Shell::Unix::read_result stderr: >>><<<
2011-08-04 12:31:44  2011-08-04 12:31:43 NOCpulse::Probe::Shell::Unix::read_result status: >>><<<
2011-08-04 12:31:44  2011-08-04 12:31:44 NOCpulse::Probe::Shell::Unix::connect Opened pipe to 21111
2011-08-04 12:31:44  2011-08-04 12:31:44 (eval) Sending >>>echo `uname -s`#`uname -r`#$$
2011-08-04 12:31:44  echo NOCPULSE-1312457504-STATUS $?
2011-08-04 12:31:44  <<<
2011-08-04 12:31:44  2011-08-04 12:31:44 (eval) Done
2011-08-04 12:31:44  2011-08-04 12:31:44 NOCpulse::Probe::Shell::Unix::read_result stdout: >>><<<
2011-08-04 12:31:44  2011-08-04 12:31:44 NOCpulse::Probe::Shell::Unix::read_result stderr: >>><<<
2011-08-04 12:31:44  2011-08-04 12:31:44 NOCpulse::Probe::Shell::Unix::read_result status: >>><<<
2011-08-04 12:31:44  Linux#2.6.18-238.12.1.el5xen#2834
2011-08-04 12:31:44  NOCPULSE-1312457503-STATUS 0


However, when run under rhn-runprobe --log=all=4 I get the expected output of:

2011-08-04 12:36:40 NOCpulse::Probe::Shell::Unix::connect Execute '/usr/bin/ssh -l nocpulse -p 4545 -i /var/lib/nocpulse/.ssh/nocpulse-identity -o StrictHostKeyChecking=no -o BatchMode=yes 212.219.57.233 /bin/sh -s'
2011-08-04 12:36:41 NOCpulse::Probe::Shell::Unix::connect Opened pipe to 22504
2011-08-04 12:36:41 (eval) Sending >>>echo `uname -s`#`uname -r`#$$
echo NOCPULSE-1312457801-STATUS $?
<<<
2011-08-04 12:36:41 (eval) Done
2011-08-04 12:36:41 NOCpulse::Probe::Shell::Unix::read_result Read 32 bytes from 6: >>>Linux#2.6.18-238.12.1.el5#16885
<<<
2011-08-04 12:36:41 NOCpulse::Probe::Shell::Unix::read_result Read 29 bytes from 6: >>>NOCPULSE-1312457801-STATUS 0
<<<
2011-08-04 12:36:41 NOCpulse::Probe::Shell::Unix::read_result stdout: >>>Linux#2.6.18-238.12.1.el5#16885
<<<
2011-08-04 12:36:41 NOCpulse::Probe::Shell::Unix::read_result stderr: >>><<<
2011-08-04 12:36:41 NOCpulse::Probe::Shell::Unix::read_result status: >>>0<<<
2011-08-04 12:36:41 NOCpulse::Probe::Shell::Unix::connect OS Linux 2.6.18-238.12.1.el5, shell pid 16885
2011-08-04 12:36:41 NOCpulse::Probe::Shell::Unix::connect OK
...


What seems to be happening on the scheduler is the command's STDOUT is
getting written to the logfile and not read by Unix::read_result. In
the first trace you can see this in the last two log lines:

2011-08-04 12:31:44  Linux#2.6.18-238.12.1.el5xen#2834
2011-08-04 12:31:44  NOCPULSE-1312457503-STATUS 0

So, somewhere STDOUT is being redirected in such a way that it's
interfering with the STDOUT of all the children. I'll see if I can
find out why.

Incidentally, the files in the TSDBLocalQueue I mentioned earlier are a
red herring - they seem to contain the output from when the probe last
worked properly but with the current time, for some reason.

Nevertheless, they do actually get loaded into the DB (verified with
SQLplus) so I think it's just the problems with output streams that are causing monitoring not to work.

Regards,

-- 
David Nutter  				Tel: +44 (0)131 650 4888
BioSS, JCMB, King's Buildings, Mayfield Rd, EH9 3JZ. Scotland, UK 

Biomathematics and Statistics Scotland (BioSS) is formally part of The
James Hutton Institute (JHI), a registered Scottish charity No. SC041796
and a company limited by guarantee No. SC374831




More information about the Spacewalk-list mailing list