[Fedora-directory-users] help....unable to start fedora server

Tue Sep 18 05:28:28 UTC 2007

Hi,

The error:

[17/Sep/2007:20:52:06 -0500] - libdb: Ignoring log file:
/opt/fedora-ds/slapd-isec-file/db/log.0000000206: magic number 0, not 40988

indicates that the backend Berkeley failed to use the log file
log.0000000206 as it is not a valid Berkeley DB logfile. Since you mentioned
that you had to shutdown the system manually and do a fsck when it came back
up, one possibility is that the log.0000000206 log file (and may be more
files) could have been corrupted. Have you checked the lost+found directory
for any recovered files ?

In any case, I would recommend that before you do any more troubleshooting
with the server, you take a snapshot (tar ball) of the affected directory
tree (/opt/fedora-ds and any other directories you can think of as belonging
to the directory server) and store the tar ball separately (on another
directory or even on another machine, for example). This would be useful if
you need to go back and change your troubleshooting methodology all over
again. Of course, if files are corrupt to begin with, then I am not sure ho
useful it would be to begin with.

Check whether everything is fine at the system level. Look back in the
directory server error log file to see what types of errors showed up (when
the directory server tried to start the first time after the system reboot).
Check in the system log to make sure that things are fine.

Finally, you can also see if by chance, you had taken any ldif dumps of the
directory server data at any point in time in the past. Or may be the file
system (or the system) itself was backed up by chance for some other
purpose. Do you have just one directory server instance running (i.e., only
1 master and no replicas/consumers) ?

PS: A couple of things that could have helped in this scenario is to have
regular backups of the system and also regular backups of the directory
server data (db2ldif.pl<http://www.redhat.com/docs/manuals/dir-server/cli/scripts.htm#pgfId-26364>).
Also, another system (or a virtual machine) that is part of a development or
test environment and one which is similar to this production server in setup
and operation would be useful to have so that things can be tested on it
first before being deployed into production.

-=Venkat=-
gvenkat at gmail.com

On 9/17/07, Steven Jones <Steven.Jones at vuw.ac.nz> wrote:
>
>  Not knowing a huge amount about FDS/LDAP….I'd start with checking the OS.
> Eg.,
>
> [17/Sep/2007:20:52:06 -0500] - Please make sure there is enough disk space
> for dbcache (10485760 bytes) and db region files
>
> Suggests to me to check the filesystem with df –h to make sure there is
> space left….possibly there is a core dump or something that needs
> deleting…rare in Linux but not known on Solaris….
>
> Or maybe some mount point failed to mount as the OS considered it too
> damaged….make sure all the filespaces are mounted…
>
> Beyond this I cannot help, sorry.
>
> Making no backups or at least not exporting the database is hopefully
> something you will not do again….
>
> regards
>
> Steven Jones
> Senior  Linux/Unix/San/Vmware System Administrator
> APG -Technology Integration Team
> Victoria University of Wellington
> Phone: +64 4 463 6272
>   ------------------------------
>
> *From:* fedora-directory-users-bounces at redhat.com [mailto:
> fedora-directory-users-bounces at redhat.com] *On Behalf Of *bikas gurung
> *Sent:* Tuesday, 18 September 2007 3:50 p.m.
> *To:* fedora-directory-users at redhat.com
> *Subject:* [Fedora-directory-users] help....unable to start fedora server
>
> Hi all,
> I'm certainly in deep s*&#t now. I just updated my file-server with new
> updates and patches and tried to reboot it; but it hanged: reason - Kernel
> Panic. So I had to shutdown the system manually and had to run 'fsck'
> manually afterwards. Everything seemed to run well afterwards. But today
> evening I found that  I was not able to connect my pc to file-server. When I
> checked, it turns out that 'slapd' daemon wasn't started at all. I manually
> tried to start the server using the scripts (in /rc.d/init.d ) but got an
> error. Here's an error logged in log  file:
>
> Fedora-Directory/1.0.2 B2006.060.1928
>         isec-file:636 (/opt/fedora-ds/slapd-isec-file)
>
> [17/Sep/2007:20:52:06 -0500] - Fedora-Directory/1.0.2 B2006.060.1928starting up
> [17/Sep/2007:20:52:06 -0500] - Detected Disorderly Shutdown last time
> Directory Server was running, recovering database.
> [17/Sep/2007:20:52:06 -0500] - libdb: Ignoring log file:
> /opt/fedora-ds/slapd-isec-file/db/log.0000000206: magic number 0, not 40988
> [17/Sep/2007:20:52:06 -0500] - libdb: Invalid log file: log.0000000206:
> Invalid argument
> [17/Sep/2007:20:52:06 -0500] - libdb: PANIC: Invalid argument
> [17/Sep/2007:20:52:06 -0500] - libdb: PANIC: DB_RUNRECOVERY: Fatal error,
> run database recovery
> [17/Sep/2007:20:52:06 -0500] - Database Recovery Process FAILED. The
> database is not recoverable. err=-30978: DB_RUNRECOVERY: Fatal error, run
> database recovery
> [17/Sep/2007:20:52:06 -0500] - Please make sure there is enough disk space
> for dbcache (10485760 bytes) and db region files
> [17/Sep/2007:20:52:06 -0500] - start: Failed to init database, err=-30978
> DB_RUNRECOVERY: Fatal error, run database recovery
> [17/Sep/2007:20:52:06 -0500] - Failed to start database plugin ldbm
> database
> [17/Sep/2007:20:52:06 -0500] - WARNING: ldbm instance userRoot already
> exists
> [17/Sep/2007:20:52:06 -0500] - WARNING: ldbm instance NetscapeRoot already
> exists
> [17/Sep/2007:20:52:06 -0500] binder-based resource limits -
> nsLookThroughLimit: parameter error (slapi_reslimit_register() already
> registered)
> [17/Sep/2007:20:52:06 -0500] - start: Resource limit registration failed
> [17/Sep/2007:20:52:06 -0500] - Failed to start database plugin ldbm
> database
> [17/Sep/2007:20:52:06 -0500] - Error: Failed to resolve plugin
> dependencies
> [17/Sep/2007:20:52:06 -0500] - Error: preoperation plugin 7-bit check is
> not started
> [17/Sep/2007:20:52:06 -0500] - Error: accesscontrol plugin ACL Plugin is
> not started
> [17/Sep/2007:20:52:06 -0500] - Error: preoperation plugin ACL preoperation
> is not started
> [17/Sep/2007:20:52:06 -0500] - Error: postoperation plugin Class of
> Service is not started
> [17/Sep/2007:20:52:06 -0500] - Error: preoperation plugin HTTP Client is
> not started
> [17/Sep/2007:20:52:06 -0500] - Error: database plugin ldbm database is not
> started
> [17/Sep/2007:20:52:06 -0500] - Error: object plugin Legacy Replication
> Plugin is not started
> [17/Sep/2007:20:52:06 -0500] - Error: object plugin Multimaster
> Replication Plugin is not started
> [17/Sep/2007:20:52:06 -0500] - Error: postoperation plugin Roles Plugin is
> not started
> [17/Sep/2007:20:52:06 -0500] - Error: object plugin Views is not started
>
> As all the client machines depend upon this server for authentication and
> as weekend is still far away, I'm in big trouble now. I'm quite clueless
> what to do and would really appreciate any kind of help. And no,
> unfortunately I don't have a backup to fall back to .
>
> Thanking you in advance
> bikas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/fedora-directory-users/attachments/20070917/d0762a93/attachment.htm>