[Spacewalk-list] Spacewalk 1.7 w/ postgresql crashed

Paul Robert Marino prmarino1 at gmail.com
Mon Nov 19 17:05:59 UTC 2012


well it looks like the kernel started killing stuff which would
explain why it died but not the restart.
I'm not sure if its Java that used too much memory or PostgreSQL, but
I can tell you I never run PostgreSQL on a server with only 2GB of ram
if i can avoid it.
there is probably some tuning that will be required to make it work
correctly also 1.7 has some known issues with leaving connections to
PostgreSQL running in transactions when it doesn't need them any more
and each one of those connections uses ram upgrading to 1.8 might
help.


On Mon, Nov 19, 2012 at 11:47 AM, Wolfgang Neudorfer <mlist at woifi.at> wrote:
> Hi all,
>
> I overlooked it, but like I assumed there really was a memory issue and Java invoked the oomkiller:
>
> -------------------------------------------------
> Nov 17 01:02:51 spacewalk1 kernel: java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
> Nov 17 01:02:51 spacewalk1 kernel: java cpuset=/ mems_allowed=0
> Nov 17 01:02:51 spacewalk1 kernel: Pid: 2823, comm: java Not tainted 2.6.32-279.9.1.el6.x86_64 #1
> Nov 17 01:02:51 spacewalk1 kernel: Call Trace:
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810c4c71>] ? cpuset_print_task_mems_allowed+0x91/0xb0
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff811173e0>] ? dump_header+0x90/0x1b0
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81214a0c>] ? security_real_capable_noaudit+0x3c/0x70
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81117862>] ? oom_kill_process+0x82/0x2a0
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff811177a1>] ? select_bad_process+0xe1/0x120
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81117ca0>] ? out_of_memory+0x220/0x3c0
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff811279be>] ? __alloc_pages_nodemask+0x89e/0x940
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8115c51a>] ? alloc_pages_current+0xaa/0x110
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff811147e7>] ? __page_cache_alloc+0x87/0x90
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8112a40b>] ? __do_page_cache_readahead+0xdb/0x210
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8112a561>] ? ra_submit+0x21/0x30
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81115b13>] ? filemap_fault+0x4c3/0x500
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8113ef14>] ? __do_fault+0x54/0x510
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8113f4c7>] ? handle_pte_fault+0xf7/0xb50
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810a467e>] ? futex_wake+0x10e/0x120
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81140104>] ? handle_mm_fault+0x1e4/0x2b0
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810a65e0>] ? do_futex+0x100/0xb60
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810444c9>] ? __do_page_fault+0x139/0x480
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81278bec>] ? rb_erase+0x1bc/0x310
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff814fddd0>] ? thread_return+0x4e/0x76e
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8150380e>] ? do_page_fault+0x3e/0xa0
> Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81500bc5>] ? page_fault+0x25/0x30
> -------------------------------------------------
> Further:
> -------------------------------------------------
> Nov 17 01:02:51 spacewalk1 kernel: Out of memory: Kill process 2934 (java) score 118 or sacrifice child
> Nov 17 01:02:51 spacewalk1 kernel: Killed process 2934, UID 0, (java) total-vm:1889112kB, anon-rss:193328kB, file-rss:228kB
> Nov 17 01:02:51 spacewalk1 wrapper[2909]: JVM exited unexpectedly.
> Nov 17 01:02:51 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9).
> Nov 17 01:02:55 spacewalk1 wrapper[2909]: Launching a JVM...
> Nov 17 01:03:26 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM.
> Nov 17 01:03:26 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated
> Nov 17 01:03:26 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9).
> Nov 17 01:03:31 spacewalk1 wrapper[2909]: Launching a JVM...
> Nov 17 01:04:00 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM.
> Nov 17 01:04:00 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated
> Nov 17 01:04:00 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9).
> Nov 17 01:04:04 spacewalk1 wrapper[2909]: Launching a JVM...
> Nov 17 01:04:34 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM.
> Nov 17 01:04:34 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated
> Nov 17 01:04:34 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9).
> Nov 17 01:04:38 spacewalk1 wrapper[2909]: Launching a JVM...
> Nov 17 01:05:07 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM.
> Nov 17 01:05:08 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated
> Nov 17 01:05:08 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9).
> Nov 17 01:05:12 spacewalk1 wrapper[2909]: Launching a JVM...
> Nov 17 01:05:41 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM.
> Nov 17 01:05:41 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated
> Nov 17 01:05:41 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9).
> Nov 17 01:05:41 spacewalk1 wrapper[2909]: There were 5 failed launches in a row, each lasting less than 300 seconds.  Giving up.
> Nov 17 01:05:41 spacewalk1 wrapper[2909]:   There may be a configuration problem: please check the logs.
> Nov 17 01:05:41 spacewalk1 wrapper[2909]: <-- Wrapper Stopped
> -------------------------------------------------
>
> The box has 2GB RAM (what is the minimal requirement according to https://fedorahosted.org/spacewalk/wiki/HowToInstall) and is currently only managing ~10 hosts.
>
> So after all, maybe this is a Spacewalk issue.
>
> Regards,
>
> Wolfgang
>
>
> ----- Original Message -----
> From: "Paul Robert Marino" <prmarino1 at gmail.com>
> To: spacewalk-list at redhat.com
> Sent: Monday, 19 November, 2012 5:05:04 PM
> Subject: Re: [Spacewalk-list] Spacewalk 1.7 w/ postgresql crashed
>
> well here is the thing
> some one restarted the database after it was killed by a SIG 9 thats
> not something that happens on its own.
> So it was either an admin or a rouge app, either way it wasn't
> spacewalk. I am curious however if it was on fedora 17 there is a
> chance systemd may have respawned it but I'm not sure
>
> On Mon, Nov 19, 2012 at 10:26 AM, Wolfgang Neudorfer <mlist at woifi.at> wrote:
>> Hello Paul,
>>
>> nobody was logged in and the host is only reachable from a very small network range. I think I can say that nobody did "anything naughty".
>>
>> I cannot outrule that there was a memory issue and oomkiller started it's madness - but I don't see anything related to this in /var/log/messages.
>>
>> Any other ideas?
>>
>> Regards,
>>
>> Wolfgang
>>
>> ----- Original Message -----
>> From: "Paul Robert Marino" <prmarino1 at gmail.com>
>> To: spacewalk-list at redhat.com
>> Sent: Monday, 19 November, 2012 3:35:56 PM
>> Subject: Re: [Spacewalk-list] Spacewalk 1.7 w/ postgresql crashed
>>
>>
>>
>>
>> Postgresql was killed with a -9 which means some one hard killed the process then restarted it. Looks like some one was doing something naughty on your box.
>> This is not a spacewalk problem this is a sysadmin who made a mistake then didn't fess to it.
>> On Nov 19, 2012 4:18 AM, "Wolfgang Neudorfer" < mlist at woifi.at > wrote:
>>
>>
>> Hi,
>>
>> starting Saturday 17/11/2012 01:46, our Spacewalk server started to send out multiple mails per minute (probably on each connection attempt of a client?) like this:
>>
>> -------------------------------------------------
>> RHN TRACEBACK from spacewalk1:
>>
>> Exception reported from spacewalk1
>> Time: Sat Nov 17 01:45:30 2012
>> Exception type <class 'spacewalk.server.rhnSQL.sql_base.SQLConnectError'>
>> Request object information:
>> URI: /XMLRPC
>> Remote Host: 192.168.254.xxx
>> Server Name: spacewalk1:443
>> Headers passed in:
>> Accept-Encoding: identity
>> CONTENT_LENGTH: 2325
>> CONTENT_TYPE: text/xml
>> DOCUMENT_ROOT: /var/www/html
>> GATEWAY_INTERFACE: CGI/1.1
>> HTTPS: 1
>> HTTP_ACCEPT_ENCODING: identity
>> HTTP_HOST: spacewalk1
>> HTTP_USER_AGENT: rhn.rpclib.py/$Revision$
>> HTTP_X_CLIENT_VERSION: 1
>> HTTP_X_INFO: RPC Processor (C) Red Hat, Inc (version $Revision$)
>> HTTP_X_RHN_TRANSPORT_CAPABILITY: follow-redirects=3
>> HTTP_X_TRANSPORT_INFO: Extended Capabilities Transport (C) Red Hat, Inc (version $Revision$)
>> Host: tsasecspacewalk1.sec
>> PATH_INFO:
>> QUERY_STRING:
>> REMOTE_ADDR: 192.168.254.xxx
>> REMOTE_PORT: 59649
>> REQUEST_METHOD: POST
>> REQUEST_URI: /XMLRPC
>> SCRIPT_FILENAME: /usr/share/rhn/wsgi/xmlrpc.py
>> SCRIPT_NAME: /XMLRPC
>> SCRIPT_URI: https://tsasecspacewalk1.sec/XMLRPC
>> SCRIPT_URL: /XMLRPC
>> SERVER_ADDR: 192.168.254.xxx
>> SERVER_ADMIN: root at localhost
>> SERVER_NAME: spacewalk1
>> SERVER_PORT: 443
>> SERVER_PROTOCOL: HTTP/1.1
>> SERVER_SIGNATURE: <address>Apache Server at spacewalk1 Port 443</address>
>>
>> SERVER_SOFTWARE: Apache
>> User-Agent: rhn.rpclib.py/$Revision$
>> X-Client-Version: 1
>> X-Info: RPC Processor (C) Red Hat, Inc (version $Revision$)
>> X-RHN-Transport-Capability: follow-redirects=3
>> X-Transport-Info: Extended Capabilities Transport (C) Red Hat, Inc (version $Revision$)
>> mod_wsgi.application_group: tsasecspacewalk1.sec|/xmlrpc
>> mod_wsgi.callable_object: application
>> mod_wsgi.handler_script:
>> mod_wsgi.input_chunked: 0
>> mod_wsgi.listener_host:
>> mod_wsgi.listener_port: 443
>> mod_wsgi.process_group:
>> mod_wsgi.request_handler: wsgi-script
>> mod_wsgi.script_reloading: 1
>> mod_wsgi.version: (3, 2)
>> wsgi.errors: <mod_wsgi.Log object at 0x7f8e4a83d370>
>> wsgi.file_wrapper: <built-in method file_wrapper of mod_wsgi.Adapter object at 0x7f8e4a83c300>
>> wsgi.input: <mod_wsgi.Input object at 0x7f8e4a83d330>
>> wsgi.multiprocess: True
>> wsgi.multithread: False
>> wsgi.run_once: False
>> wsgi.url_scheme: https
>> wsgi.version: (1, 1)
>> -------------------------------------------------
>>
>> Apparently, something happend to the postgres server. In the log I see:
>>
>> -------------------------------------------------
>> LOG: server process (PID 31999) was terminated by signal 9: Killed
>> LOG: terminating any other active server processes
>> WARNING: terminating connection because of crash of another server process
>> DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
>>
>> ... (the last 2 lines appear multiple times)
>>
>> FATAL: the database system is in recovery mode
>> FATAL: the database system is in recovery mode
>> FATAL: the database system is in recovery mode
>> FATAL: the database system is in recovery mode
>>
>> ... (this line apprears multiple times)
>> -------------------------------------------------
>>
>> The harddisk was not full, also RAM was ok. I restarted the host and Spacewalk seems to be fine. I can login an all hosts are there.
>>
>> Any hints? I am running Spacewalk 1.7 on CentOS x64 6.3 with PostgresSQL 8.4.13.
>>
>> Thanks,
>>
>> Wolfgang
>>
>> _______________________________________________
>> Spacewalk-list mailing list
>> Spacewalk-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>
>> _______________________________________________
>> Spacewalk-list mailing list
>> Spacewalk-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>
>> _______________________________________________
>> Spacewalk-list mailing list
>> Spacewalk-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list




More information about the Spacewalk-list mailing list