[389-users] 389 unusable on F11?

Rich Megginson rmeggins at redhat.com
Thu Oct 1 14:10:56 UTC 2009


Kevin Bowling wrote:
> On 9/11/2009 12:43 PM, Noriko Hosoi wrote:
>> On 09/10/2009 07:46 PM, Kevin Bowling wrote:
>>> Hi,
>>>
>>> I have been running FDS/389 on a F11 xen DomU for several months.  I 
>>> use it as the backend for UNIX username/passwords and also for 
>>> redMine (a Ruby on Rails bug tracker) for http://www.gnucapplus.org/.
>>>
>>> This VM would regularly lock up every week or so when 389 was still 
>>> called FDS.  I've since upgraded to 389 by issuing 'yum upgrade' as 
>>> well as running the 'setup-...-.pl -u' script and now it barely goes 
>>> a day before crashing.  When ldap crashes, the whole box basically 
>>> becomes unresponsive.
>>>
>>> I left the Xen hardware console open to see what was up and the only 
>>> thing I could conclude was that 389 was crashing (if I issued a 
>>> service start it came back to life).  Doing anything like a top or 
>>> ls will completely kill the box.  Likewise, the logs show nothing at 
>>> or before the time of crash.  I suspected too few file descriptors 
>>> but changing that to a very high number had no impact.
>>>
>>> I was about to do a rip and replace with OpenLDAP which I use very 
>>> sucesessfully for our corporate systems but figured I ought to see 
>>> if anyone here can help or if I can submit any kind of meaningful 
>>> bug report first.  I assume I will need to run 389's slapd without 
>>> daemonizing it and hope it spits something useful out to stderr.  
>>> Any advice here would be greatly appreciated, as would any success 
>>> stories of using 389 on F11.
>> Hello Kevin,
>>
>> You specified the platform "F11 xen DomU".  Did you have a chance to 
>> run the 389 server on any other platforms?  I'm wondering if the 
>> crash is observed only on the specific platform or not.  Is the 
>> server running on the 64-bit machine or 32-bit?
>>
>> If you start the server with "-d 1" option, the server will run as 
>> the trace mode.  (E.g., /usr/lib[64]/dirsrv/slapd-YOURID/start-slapd 
>> -d 1)
>>
>> I'm afraid it might be a memory leak.  When you restart the 389 
>> server, could you check the size of ns-slapd some time like every 
>> hour and see if the server size keeps growing or stops?  Also, the 
>> server quits if it fails to write to the errors log.  If it happens, 
>> it's logged in the system log.  Does the messages file on the system  
>> happen to have some logs related to the 389 server?
>>
>> Thanks,
>> --noriko
>>>
>>> I'm not subscribed to the list so please CC.
>>>
>>> Regards,
>>>
>>> Kevin Bowing
>
> It was stable for 17 days while running with debug enabled to 
> console.  I upgraded to the F11 2.6.30 kernel rebase, and now I get 
> some debugging info on the console.  I'm taking a wild guess that it 
> is timing related.  Where should I place a bug report?
Is it related to this - https://bugzilla.redhat.com/show_bug.cgi?id=521637
>
> Regards,
> Kevin
>
> [root at buildbox-a2 ~]# xm console 8
> INFO: task kjournald:61 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kjournald     D ffff88003e932000     0    61      2
>  ffff88003e919d40 0000000000000246 ffffffff8100e45c 0000000000000000
>  000000001cee5db8 ffff88003e919d20 ffffffff8100ee82 0000000000000202
>  ffff88003e9c83a8 000000000000e2e8 ffff88003e9c83a8 0000000000012d00
> Call Trace:
>  [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
>  [<ffffffff8100ee82>] ? check_events+0x12/0x20
>  [<ffffffff8100ee6f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
>  [<ffffffff81496bf6>] schedule+0x21/0x49
>  [<ffffffff811b8b33>] journal_commit_transaction+0x13d/0xe42
>  [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
>  [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
>  [<ffffffff810632bc>] ? try_to_del_timer_sync+0x69/0x87
>  [<ffffffff811bcdf7>] kjournald+0xfd/0x253
>  [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
>  [<ffffffff811bccfa>] ? kjournald+0x0/0x253
>  [<ffffffff81070709>] kthread+0x6d/0xae
>  [<ffffffff8101313a>] child_rip+0xa/0x20
>  [<ffffffff81012afd>] ? restore_args+0x0/0x30
>  [<ffffffff81013130>] ? child_rip+0x0/0x20
> INFO: task ns-slapd:1034 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ns-slapd      D ffffc20000000000     0  1034      1
>  ffff88003dd87908 0000000000000282 ffff88003dd87868 ffffffff8100ed0d
>  ffff88003dd86000 00000000e59205a0 ffff88003dd87888 ffffffff8107957a
>  ffff88003d4fe0e8 000000000000e2e8 ffff88003d4fe0e8 0000000000012d00
> Call Trace:
>  [<ffffffff8100ed0d>] ? xen_clocksource_get_cycles+0x1c/0x32
>  [<ffffffff8107957a>] ? clocksource_read+0x22/0x38
>  [<ffffffff81074986>] ? ktime_get_ts+0x61/0x7d
>  [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
>  [<ffffffff81496bf6>] schedule+0x21/0x49
>  [<ffffffff81496c62>] io_schedule+0x44/0x6c
>  [<ffffffff8113b141>] sync_buffer+0x53/0x6b
>  [<ffffffff81497294>] __wait_on_bit_lock+0x55/0xb2
>  [<ffffffff810d2d1f>] ? find_get_page+0x64/0xa3
>  [<ffffffff8149736e>] out_of_line_wait_on_bit_lock+0x7d/0x9c
>  [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
>  [<ffffffff81070c5a>] ? wake_bit_function+0x0/0x5a
>  [<ffffffff8113b380>] __lock_buffer+0x3d/0x53
>  [<ffffffff811b6eda>] lock_buffer+0x49/0x64
>  [<ffffffff811b7a15>] do_get_write_access+0x82/0x3f3
>  [<ffffffff811bbdb3>] ? journal_add_journal_head+0xce/0x162
>  [<ffffffff811b7dc0>] journal_get_write_access+0x3a/0x65
>  [<ffffffff8118c209>] __ext3_journal_get_write_access+0x34/0x74
>  [<ffffffff8117e464>] ext3_reserve_inode_write+0x50/0xaa
>  [<ffffffff8117e50d>] ext3_mark_inode_dirty+0x4f/0x80
>  [<ffffffff8117e6b8>] ext3_dirty_inode+0x79/0xa7
>  [<ffffffff81135095>] __mark_inode_dirty+0x45/0x190
>  [<ffffffff81129603>] file_update_time+0xc0/0x113
>  [<ffffffff810eb167>] do_wp_page+0x610/0x658
>  [<ffffffff81INFO: task kjournald:61 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kjournald     D ffff88003e932000     0    61      2
>  ffff88003e919d40 0000000000000246 ffffffff8100e45c 0000000000000000
>  000000001cee5db8 ffff88003e919d20 ffffffff8100ee82 0000000000000202
>  ffff88003e9c83a8 000000000000e2e8 ffff88003e9c83a8 0000000000012d00
> Call Trace:
>  [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
>  [<ffffffff8100ee82>] ? check_events+0x12/0x20
>  [<ffffffff8100ee6f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
>  [<ffffffff81496bf6>] schedule+0x21/0x49
>  [<ffffffff811b8b33>] journal_commit_transaction+0x13d/0xe42
>  [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
>  [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
>  [<ffffffff810632bc>] ? try_to_del_timer_sync+0x69/0x87
>  [<ffffffff811bcdf7>] kjournald+0xfd/0x253
>  [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
>  [<ffffffff811bccfa>] ? kjournald+0x0/0x253
>  [<ffffffff81070709>] kthread+0x6d/0xae
>  [<ffffffff8101313a>] child_rip+0xa/0x20
>  [<ffffffff81012afd>] ? restore_args+0x0/0x30
>  [<ffffffff81013130>] ? child_rip+0x0/0x20
> INFO: task ns-slapd:1034 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ns-slapd      D ffffc20000000000     0  1034      1
>  ffff88003dd87908 0000000000000282 ffff88003dd87868 ffffffff8100ed0d
>  ffff88003dd86000 00000000e59205a0 ffff88003dd87888 ffffffff8107957a
>  ffff88003d4fe0e8 000000000000e2e8 ffff88003d4fe0e8 0000000000012d00
> Call Trace:
>  [<ffffffff8100ed0d>] ? xen_clocksource_get_cycles+0x1c/0x32
>  [<ffffffff8107957a>] ? clocksource_read+0x22/0x38
>  [<ffffffff81074986>] ? ktime_get_ts+0x61/0x7d
>  [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
>  [<ffffffff81496bf6>] schedule+0x21/0x49
>  [<ffffffff81496c62>] io_schedule+0x44/0x6c
>  [<ffffffff8113b141>] sync_buffer+0x53/0x6b
>  [<ffffffff81497294>] __wait_on_bit_lock+0x55/0xb2
>  [<ffffffff810d2d1f>] ? find_get_page+0x64/0xa3
>  [<ffffffff8149736e>] out_of_line_wait_on_bit_lock+0x7d/0x9c
>  [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
>  [<ffffffff81070c5a>] ? wake_bit_function+0x0/0x5a
>  [<ffffffff8113b380>] __lock_buffer+0x3d/0x53
>  [<ffffffff811b6eda>] lock_buffer+0x49/0x64
>  [<ffffffff811b7a15>] do_get_write_access+0x82/0x3f3
>  [<ffffffff811bbdb3>] ? journal_add_journal_head+0xce/0x162
>  [<ffffffff811b7dc0>] journal_get_write_access+0x3a/0x65
>  [<ffffffff8118c209>] __ext3_journal_get_write_access+0x34/0x74
>  [<ffffffff8117e464>] ext3_reserve_inode_write+0x50/0xaa
>  [<ffffffff8117e50d>] ext3_mark_inode_dirty+0x4f/0x80
>  [<ffffffff8117e6b8>] ext3_dirty_inode+0x79/0xa7
>  [<ffffffff81135095>] __mark_inode_dirty+0x45/0x190
>  [<ffffffff81129603>] file_update_time+0xc0/0x113
>  [<ffffffff810eb167>] do_wp_page+0x610/0x658
>  [<ffffffff8100bc21>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
>  [<ffffffff810eccd9>] handle_mm_fault+0x6a2/0x72e
>  [<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
>  [<ffffffff8149be99>] do_page_fault+0x226/0x24f
>  [<ffffffff81499965>] page_fault+0x25/0x30
> INFO: task ns-slapd:1040 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ns-slapd      D ffff88003e932024     0  1040      1
>  ffff88003bc119f8 0000000000000282 ffffffff8100e45c ffffc20000025410
>  00000000f1efb74c ffff88003bc119d8 ffffffff8100ee82 0000000000000004
>  ffff88003bc0b248 000000000000e2e8 ffff88003bc0b248 0000000000012d00
> Call Trace:
>  [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
>  [<ffffffff8100ee82>] ? check_events+0x12/0x20
>  [<ffffffff8100ee6f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
>  [<ffffffff8100ee82>] ? check_events+0x12/0x20
>  [<ffffffff81496bf6>] schedule+0x21/0x49
>  [<ffffffff811b841d>] start_this_handle+0x2d4/0x373
>  [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
>  [<ffffffff811b865d>] journal_start+0xb7/0x106
>  [<ffffffff81187903>] ext3_journal_start_sb+0x62/0x78
>  [<ffffffff8117d60b>] ext3_journal_start+0x28/0x3e
>  [<ffffffff8117e67d>] ext3_dirty_inode+0x3e
>
>
> -- 
> 389 users mailing list
> 389-users at redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3258 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/fedora-directory-users/attachments/20091001/d06b4675/attachment.bin>


More information about the Fedora-directory-users mailing list