[Spacewalk-list] Spacewalk 1.7 w/ postgresql crashed

Wolfgang Neudorfer mlist at woifi.at
Mon Nov 19 16:47:15 UTC 2012


Hi all,

I overlooked it, but like I assumed there really was a memory issue and Java invoked the oomkiller:

-------------------------------------------------
Nov 17 01:02:51 spacewalk1 kernel: java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Nov 17 01:02:51 spacewalk1 kernel: java cpuset=/ mems_allowed=0
Nov 17 01:02:51 spacewalk1 kernel: Pid: 2823, comm: java Not tainted 2.6.32-279.9.1.el6.x86_64 #1
Nov 17 01:02:51 spacewalk1 kernel: Call Trace:
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810c4c71>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff811173e0>] ? dump_header+0x90/0x1b0
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81214a0c>] ? security_real_capable_noaudit+0x3c/0x70
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81117862>] ? oom_kill_process+0x82/0x2a0
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff811177a1>] ? select_bad_process+0xe1/0x120
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81117ca0>] ? out_of_memory+0x220/0x3c0
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff811279be>] ? __alloc_pages_nodemask+0x89e/0x940
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8115c51a>] ? alloc_pages_current+0xaa/0x110
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff811147e7>] ? __page_cache_alloc+0x87/0x90
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8112a40b>] ? __do_page_cache_readahead+0xdb/0x210
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8112a561>] ? ra_submit+0x21/0x30
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81115b13>] ? filemap_fault+0x4c3/0x500
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8113ef14>] ? __do_fault+0x54/0x510
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8113f4c7>] ? handle_pte_fault+0xf7/0xb50
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810a467e>] ? futex_wake+0x10e/0x120
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81140104>] ? handle_mm_fault+0x1e4/0x2b0
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810a65e0>] ? do_futex+0x100/0xb60
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810444c9>] ? __do_page_fault+0x139/0x480
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81278bec>] ? rb_erase+0x1bc/0x310
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff814fddd0>] ? thread_return+0x4e/0x76e
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8150380e>] ? do_page_fault+0x3e/0xa0
Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81500bc5>] ? page_fault+0x25/0x30
-------------------------------------------------
Further:
-------------------------------------------------
Nov 17 01:02:51 spacewalk1 kernel: Out of memory: Kill process 2934 (java) score 118 or sacrifice child
Nov 17 01:02:51 spacewalk1 kernel: Killed process 2934, UID 0, (java) total-vm:1889112kB, anon-rss:193328kB, file-rss:228kB
Nov 17 01:02:51 spacewalk1 wrapper[2909]: JVM exited unexpectedly.
Nov 17 01:02:51 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9).
Nov 17 01:02:55 spacewalk1 wrapper[2909]: Launching a JVM...
Nov 17 01:03:26 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM.
Nov 17 01:03:26 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated
Nov 17 01:03:26 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9).
Nov 17 01:03:31 spacewalk1 wrapper[2909]: Launching a JVM...
Nov 17 01:04:00 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM.
Nov 17 01:04:00 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated
Nov 17 01:04:00 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9).
Nov 17 01:04:04 spacewalk1 wrapper[2909]: Launching a JVM...
Nov 17 01:04:34 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM.
Nov 17 01:04:34 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated
Nov 17 01:04:34 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9).
Nov 17 01:04:38 spacewalk1 wrapper[2909]: Launching a JVM...
Nov 17 01:05:07 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM.
Nov 17 01:05:08 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated
Nov 17 01:05:08 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9).
Nov 17 01:05:12 spacewalk1 wrapper[2909]: Launching a JVM...
Nov 17 01:05:41 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM.
Nov 17 01:05:41 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated
Nov 17 01:05:41 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9).
Nov 17 01:05:41 spacewalk1 wrapper[2909]: There were 5 failed launches in a row, each lasting less than 300 seconds.  Giving up.
Nov 17 01:05:41 spacewalk1 wrapper[2909]:   There may be a configuration problem: please check the logs.
Nov 17 01:05:41 spacewalk1 wrapper[2909]: <-- Wrapper Stopped
-------------------------------------------------

The box has 2GB RAM (what is the minimal requirement according to https://fedorahosted.org/spacewalk/wiki/HowToInstall) and is currently only managing ~10 hosts.

So after all, maybe this is a Spacewalk issue.

Regards,

Wolfgang


----- Original Message -----
From: "Paul Robert Marino" <prmarino1 at gmail.com>
To: spacewalk-list at redhat.com
Sent: Monday, 19 November, 2012 5:05:04 PM
Subject: Re: [Spacewalk-list] Spacewalk 1.7 w/ postgresql crashed

well here is the thing
some one restarted the database after it was killed by a SIG 9 thats
not something that happens on its own.
So it was either an admin or a rouge app, either way it wasn't
spacewalk. I am curious however if it was on fedora 17 there is a
chance systemd may have respawned it but I'm not sure

On Mon, Nov 19, 2012 at 10:26 AM, Wolfgang Neudorfer <mlist at woifi.at> wrote:
> Hello Paul,
>
> nobody was logged in and the host is only reachable from a very small network range. I think I can say that nobody did "anything naughty".
>
> I cannot outrule that there was a memory issue and oomkiller started it's madness - but I don't see anything related to this in /var/log/messages.
>
> Any other ideas?
>
> Regards,
>
> Wolfgang
>
> ----- Original Message -----
> From: "Paul Robert Marino" <prmarino1 at gmail.com>
> To: spacewalk-list at redhat.com
> Sent: Monday, 19 November, 2012 3:35:56 PM
> Subject: Re: [Spacewalk-list] Spacewalk 1.7 w/ postgresql crashed
>
>
>
>
> Postgresql was killed with a -9 which means some one hard killed the process then restarted it. Looks like some one was doing something naughty on your box.
> This is not a spacewalk problem this is a sysadmin who made a mistake then didn't fess to it.
> On Nov 19, 2012 4:18 AM, "Wolfgang Neudorfer" < mlist at woifi.at > wrote:
>
>
> Hi,
>
> starting Saturday 17/11/2012 01:46, our Spacewalk server started to send out multiple mails per minute (probably on each connection attempt of a client?) like this:
>
> -------------------------------------------------
> RHN TRACEBACK from spacewalk1:
>
> Exception reported from spacewalk1
> Time: Sat Nov 17 01:45:30 2012
> Exception type <class 'spacewalk.server.rhnSQL.sql_base.SQLConnectError'>
> Request object information:
> URI: /XMLRPC
> Remote Host: 192.168.254.xxx
> Server Name: spacewalk1:443
> Headers passed in:
> Accept-Encoding: identity
> CONTENT_LENGTH: 2325
> CONTENT_TYPE: text/xml
> DOCUMENT_ROOT: /var/www/html
> GATEWAY_INTERFACE: CGI/1.1
> HTTPS: 1
> HTTP_ACCEPT_ENCODING: identity
> HTTP_HOST: spacewalk1
> HTTP_USER_AGENT: rhn.rpclib.py/$Revision$
> HTTP_X_CLIENT_VERSION: 1
> HTTP_X_INFO: RPC Processor (C) Red Hat, Inc (version $Revision$)
> HTTP_X_RHN_TRANSPORT_CAPABILITY: follow-redirects=3
> HTTP_X_TRANSPORT_INFO: Extended Capabilities Transport (C) Red Hat, Inc (version $Revision$)
> Host: tsasecspacewalk1.sec
> PATH_INFO:
> QUERY_STRING:
> REMOTE_ADDR: 192.168.254.xxx
> REMOTE_PORT: 59649
> REQUEST_METHOD: POST
> REQUEST_URI: /XMLRPC
> SCRIPT_FILENAME: /usr/share/rhn/wsgi/xmlrpc.py
> SCRIPT_NAME: /XMLRPC
> SCRIPT_URI: https://tsasecspacewalk1.sec/XMLRPC
> SCRIPT_URL: /XMLRPC
> SERVER_ADDR: 192.168.254.xxx
> SERVER_ADMIN: root at localhost
> SERVER_NAME: spacewalk1
> SERVER_PORT: 443
> SERVER_PROTOCOL: HTTP/1.1
> SERVER_SIGNATURE: <address>Apache Server at spacewalk1 Port 443</address>
>
> SERVER_SOFTWARE: Apache
> User-Agent: rhn.rpclib.py/$Revision$
> X-Client-Version: 1
> X-Info: RPC Processor (C) Red Hat, Inc (version $Revision$)
> X-RHN-Transport-Capability: follow-redirects=3
> X-Transport-Info: Extended Capabilities Transport (C) Red Hat, Inc (version $Revision$)
> mod_wsgi.application_group: tsasecspacewalk1.sec|/xmlrpc
> mod_wsgi.callable_object: application
> mod_wsgi.handler_script:
> mod_wsgi.input_chunked: 0
> mod_wsgi.listener_host:
> mod_wsgi.listener_port: 443
> mod_wsgi.process_group:
> mod_wsgi.request_handler: wsgi-script
> mod_wsgi.script_reloading: 1
> mod_wsgi.version: (3, 2)
> wsgi.errors: <mod_wsgi.Log object at 0x7f8e4a83d370>
> wsgi.file_wrapper: <built-in method file_wrapper of mod_wsgi.Adapter object at 0x7f8e4a83c300>
> wsgi.input: <mod_wsgi.Input object at 0x7f8e4a83d330>
> wsgi.multiprocess: True
> wsgi.multithread: False
> wsgi.run_once: False
> wsgi.url_scheme: https
> wsgi.version: (1, 1)
> -------------------------------------------------
>
> Apparently, something happend to the postgres server. In the log I see:
>
> -------------------------------------------------
> LOG: server process (PID 31999) was terminated by signal 9: Killed
> LOG: terminating any other active server processes
> WARNING: terminating connection because of crash of another server process
> DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
>
> ... (the last 2 lines appear multiple times)
>
> FATAL: the database system is in recovery mode
> FATAL: the database system is in recovery mode
> FATAL: the database system is in recovery mode
> FATAL: the database system is in recovery mode
>
> ... (this line apprears multiple times)
> -------------------------------------------------
>
> The harddisk was not full, also RAM was ok. I restarted the host and Spacewalk seems to be fine. I can login an all hosts are there.
>
> Any hints? I am running Spacewalk 1.7 on CentOS x64 6.3 with PostgresSQL 8.4.13.
>
> Thanks,
>
> Wolfgang
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list

_______________________________________________
Spacewalk-list mailing list
Spacewalk-list at redhat.com
https://www.redhat.com/mailman/listinfo/spacewalk-list




More information about the Spacewalk-list mailing list