[389-users] more MMR issues
Rich Megginson
rmeggins at redhat.com
Tue Nov 10 20:25:59 UTC 2009
Robert Viduya wrote:
> I didn't get a response to my previous request for help and our
> situation degenerated (we lost 3 of our 4 masters) to the point where
> I felt we had to do a clean rebuild. We did that late last week into
> the weekend and had set up a 2 masters and assorted hubs and slaves.
> We used a clean ldif file to import into the first master, so no
> previous replica IDs were carried over from the previous environment.
>
> We are running directory version 1.2.2 on RHEL5.4, both 64-bit.
>
> Things were running fine until this morning, when one of our masters
> started reporting errors. We found this in it's errorlog:
>
> [10/Nov/2009:08:56:27 -0500] NSMMReplicationPlugin -
> multimaster_be_state_change: replica
> ou=people,dc=gted,dc=gatech,dc=edu is going offline; disabling
> replication
> [10/Nov/2009:08:59:29 -0500] - WARNING: Import is running with
> nsslapd-db-private-import-mem on; No other process is allowed to
> access the database
> [10/Nov/2009:08:59:33 -0500] - ERROR bulk import abandoned
> [10/Nov/2009:08:59:34 -0500] - import people: Aborting all import
> threads...
> [10/Nov/2009:08:59:42 -0500] - import people: Import threads aborted.
> [10/Nov/2009:08:59:43 -0500] - import people: Closing files...
> [10/Nov/2009:08:59:43 -0500] - import people: Import failed.
> [10/Nov/2009:09:01:51 -0500] NSMMReplicationPlugin -
> replica_replace_ruv_tombstone: failed to update replication update
> vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
> [10/Nov/2009:09:01:57 -0500] NSMMReplicationPlugin -
> replica_replace_ruv_tombstone: failed to update replication update
> vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
> [10/Nov/2009:09:02:01 -0500] NSMMReplicationPlugin -
> replica_replace_ruv_tombstone: failed to update replication update
> vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
> [10/Nov/2009:09:02:21 -0500] NSMMReplicationPlugin -
> replica_replace_ruv_tombstone: failed to update replication update
> vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
> [10/Nov/2009:09:02:26 -0500] NSMMReplicationPlugin -
> replica_replace_ruv_tombstone: failed to update replication update
> vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
> [10/Nov/2009:09:02:32 -0500] NSMMReplicationPlugin -
> replica_replace_ruv_tombstone: failed to update replication update
> vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
>
>
> That last line repeats until we brought the server down. The log
> _looks_ like someone/something triggered an import operation, but
> no-one did, on either master.
>
> The errorlog on the other master shows the following:
>
> [10/Nov/2009:08:39:29 -0500] - repl5_inc_waitfor_async_results timed
> out waiting for responses: 38 46
> [10/Nov/2009:08:39:54 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Warning: unable to receive
> endReplication extended operation response (Bad parameter to an ldap
> routine)
> [10/Nov/2009:08:40:04 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Unable to receive the response for a
> startReplication extended operation to consumer (Bad parameter to an
> ldap routine). Will retry later.
> [10/Nov/2009:08:40:08 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Unable to receive the response for a
> startReplication extended operation to consumer (Bad parameter to an
> ldap routine). Will retry later.
> [10/Nov/2009:08:40:14 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Unable to receive the response for a
> startReplication extended operation to consumer (Bad parameter to an
> ldap routine). Will retry later.
> [10/Nov/2009:08:40:38 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Unable to receive the response for a
> startReplication extended operation to consumer (Bad parameter to an
> ldap routine). Will retry later.
> [10/Nov/2009:08:43:05 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Unable to receive the response for a
> startReplication extended operation to consumer (Bad parameter to an
> ldap routine). Will retry later.
> [10/Nov/2009:08:44:50 -0500] - repl5_inc_waitfor_async_results timed
> out waiting for responses: 6 8
> [10/Nov/2009:08:47:08 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Unable to receive the response for a
> startReplication extended operation to consumer (Bad parameter to an
> ldap routine). Will retry later.
> [10/Nov/2009:08:47:08 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Incremental protocol: event
> backoff_timer_expired should not occur in state start_backoff
> [10/Nov/2009:08:47:12 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Unable to receive the response for a
> startReplication extended operation to consumer (Bad parameter to an
> ldap routine). Will retry later.
> [10/Nov/2009:08:47:18 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Incremental update failed and
> requires administrator action
> [10/Nov/2009:08:55:01 -0500] - repl5_inc_waitfor_async_results timed
> out waiting for responses: 13 14
> [10/Nov/2009:08:55:01 -0500] - repl5_inc_waitfor_async_results timed
> out waiting for responses: 59 81
> [10/Nov/2009:08:55:14 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Warning: unable to receive
> endReplication extended operation response (Bad parameter to an ldap
> routine)
> [10/Nov/2009:08:55:24 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Unable to receive the response for a
> startReplication extended operation to consumer (Bad parameter to an
> ldap routine). Will retry later.
> [10/Nov/2009:08:55:28 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Unable to receive the response for a
> startReplication extended operation to consumer (Bad parameter to an
> ldap routine). Will retry later.
> [10/Nov/2009:08:55:34 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Unable to receive the response for a
> startReplication extended operation to consumer (Bad parameter to an
> ldap routine). Will retry later.
> [10/Nov/2009:08:55:46 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Unable to receive the response for a
> startReplication extended operation to consumer (Bad parameter to an
> ldap routine). Will retry later.
> [10/Nov/2009:08:56:10 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Unable to receive the response for a
> startReplication extended operation to consumer (Bad parameter to an
> ldap routine). Will retry later.
> [10/Nov/2009:08:56:58 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Unable to receive the response for a
> startReplication extended operation to consumer (Bad parameter to an
> ldap routine). Will retry later.
> [10/Nov/2009:08:58:34 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Replication bind with SIMPLE auth
> resumed
> [10/Nov/2009:09:01:47 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Consumer failed to replay change
> (uniqueid 51dccc08-9efe11de-8efe8516-22c1043e, CSN
> 4af96f8a000200370000): Operations error. Will retry later.
> [10/Nov/2009:09:01:47 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Consumer failed to replay change
> (uniqueid 5ad5610c-1dd211b2-80b9be51-952a0000, CSN
> 4af96f8b000000370000): Operations error. Will retry later.
> [10/Nov/2009:09:01:47 -0500] NSMMReplicationPlugin - agmt="cn=people
> rewbell gertrude" (gertrude:636): Consumer failed to replay change
> (uniqueid 213cd58e-cd7b11de-b535d108-950067b1, CSN
> 4af96fcf000000370000): Operations error. Will retry later.
>
> Again, that last line repeats until we brought down the errant server.
>
> We've seen this behavior a few times since upgrading. One of our
> masters somehow thinks it's supposed to do an import and trashes it's
> copy of the data. No person had triggered an import or a
> supplier->consumer initialization. Are there conditions where the
> directory server itself would trigger such an operation autonomously?
No. Check the access log to see what operations were submitted to the
directory server at or around [10/Nov/2009:08:56:27 -0500]
Are your servers in time sync? Is cn=people rewbell gertrude the
agreement that sends updates to the master that is having the
spontaneous import problem?
>
> --
> 389 users mailing list
> 389-users at redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3258 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/fedora-directory-users/attachments/20091110/b95cc79a/attachment.bin>
More information about the Fedora-directory-users
mailing list