[389-users] more MMR issues

Robert Viduya robert+fds at shangri-la.ts.gatech.edu
Tue Nov 10 20:21:00 UTC 2009


I didn't get a response to my previous request for help and our  
situation degenerated (we lost 3 of our 4 masters) to the point where  
I felt we had to do a clean rebuild.  We did that late last week into  
the weekend and had set up a 2 masters and assorted hubs and slaves.   
We used a clean ldif file to import into the first master, so no  
previous replica IDs were carried over from the previous environment.

We are running directory version 1.2.2 on RHEL5.4, both 64-bit.

Things were running fine until this morning, when one of our masters  
started reporting errors.  We found this in it's errorlog:

[10/Nov/2009:08:56:27 -0500] NSMMReplicationPlugin -  
multimaster_be_state_change: replica  
ou=people,dc=gted,dc=gatech,dc=edu is going offline; disabling  
replication
[10/Nov/2009:08:59:29 -0500] - WARNING: Import is running with nsslapd- 
db-private-import-mem on; No other process is allowed to access the  
database
[10/Nov/2009:08:59:33 -0500] - ERROR bulk import abandoned
[10/Nov/2009:08:59:34 -0500] - import people: Aborting all import  
threads...
[10/Nov/2009:08:59:42 -0500] - import people: Import threads aborted.
[10/Nov/2009:08:59:43 -0500] - import people: Closing files...
[10/Nov/2009:08:59:43 -0500] - import people: Import failed.
[10/Nov/2009:09:01:51 -0500] NSMMReplicationPlugin -  
replica_replace_ruv_tombstone: failed to update replication update  
vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
[10/Nov/2009:09:01:57 -0500] NSMMReplicationPlugin -  
replica_replace_ruv_tombstone: failed to update replication update  
vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
[10/Nov/2009:09:02:01 -0500] NSMMReplicationPlugin -  
replica_replace_ruv_tombstone: failed to update replication update  
vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
[10/Nov/2009:09:02:21 -0500] NSMMReplicationPlugin -  
replica_replace_ruv_tombstone: failed to update replication update  
vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
[10/Nov/2009:09:02:26 -0500] NSMMReplicationPlugin -  
replica_replace_ruv_tombstone: failed to update replication update  
vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
[10/Nov/2009:09:02:32 -0500] NSMMReplicationPlugin -  
replica_replace_ruv_tombstone: failed to update replication update  
vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1


That last line repeats until we brought the server down.  The log  
_looks_ like someone/something triggered an import operation, but no- 
one did, on either master.

The errorlog on the other master shows the following:

[10/Nov/2009:08:39:29 -0500] - repl5_inc_waitfor_async_results timed  
out waiting for responses: 38 46
[10/Nov/2009:08:39:54 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Warning: unable to receive  
endReplication extended operation response (Bad parameter to an ldap  
routine)
[10/Nov/2009:08:40:04 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Unable to receive the response for a  
startReplication extended operation to consumer (Bad parameter to an  
ldap routine). Will retry later.
[10/Nov/2009:08:40:08 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Unable to receive the response for a  
startReplication extended operation to consumer (Bad parameter to an  
ldap routine). Will retry later.
[10/Nov/2009:08:40:14 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Unable to receive the response for a  
startReplication extended operation to consumer (Bad parameter to an  
ldap routine). Will retry later.
[10/Nov/2009:08:40:38 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Unable to receive the response for a  
startReplication extended operation to consumer (Bad parameter to an  
ldap routine). Will retry later.
[10/Nov/2009:08:43:05 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Unable to receive the response for a  
startReplication extended operation to consumer (Bad parameter to an  
ldap routine). Will retry later.
[10/Nov/2009:08:44:50 -0500] - repl5_inc_waitfor_async_results timed  
out waiting for responses: 6 8
[10/Nov/2009:08:47:08 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Unable to receive the response for a  
startReplication extended operation to consumer (Bad parameter to an  
ldap routine). Will retry later.
[10/Nov/2009:08:47:08 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Incremental protocol: event  
backoff_timer_expired should not occur in state start_backoff
[10/Nov/2009:08:47:12 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Unable to receive the response for a  
startReplication extended operation to consumer (Bad parameter to an  
ldap routine). Will retry later.
[10/Nov/2009:08:47:18 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Incremental update failed and  
requires administrator action
[10/Nov/2009:08:55:01 -0500] - repl5_inc_waitfor_async_results timed  
out waiting for responses: 13 14
[10/Nov/2009:08:55:01 -0500] - repl5_inc_waitfor_async_results timed  
out waiting for responses: 59 81
[10/Nov/2009:08:55:14 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Warning: unable to receive  
endReplication extended operation response (Bad parameter to an ldap  
routine)
[10/Nov/2009:08:55:24 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Unable to receive the response for a  
startReplication extended operation to consumer (Bad parameter to an  
ldap routine). Will retry later.
[10/Nov/2009:08:55:28 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Unable to receive the response for a  
startReplication extended operation to consumer (Bad parameter to an  
ldap routine). Will retry later.
[10/Nov/2009:08:55:34 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Unable to receive the response for a  
startReplication extended operation to consumer (Bad parameter to an  
ldap routine). Will retry later.
[10/Nov/2009:08:55:46 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Unable to receive the response for a  
startReplication extended operation to consumer (Bad parameter to an  
ldap routine). Will retry later.
[10/Nov/2009:08:56:10 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Unable to receive the response for a  
startReplication extended operation to consumer (Bad parameter to an  
ldap routine). Will retry later.
[10/Nov/2009:08:56:58 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Unable to receive the response for a  
startReplication extended operation to consumer (Bad parameter to an  
ldap routine). Will retry later.
[10/Nov/2009:08:58:34 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Replication bind with SIMPLE auth  
resumed
[10/Nov/2009:09:01:47 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Consumer failed to replay change  
(uniqueid 51dccc08-9efe11de-8efe8516-22c1043e, CSN  
4af96f8a000200370000): Operations error. Will retry later.
[10/Nov/2009:09:01:47 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Consumer failed to replay change  
(uniqueid 5ad5610c-1dd211b2-80b9be51-952a0000, CSN  
4af96f8b000000370000): Operations error. Will retry later.
[10/Nov/2009:09:01:47 -0500] NSMMReplicationPlugin - agmt="cn=people  
rewbell gertrude" (gertrude:636): Consumer failed to replay change  
(uniqueid 213cd58e-cd7b11de-b535d108-950067b1, CSN  
4af96fcf000000370000): Operations error. Will retry later.

Again, that last line repeats until we brought down the errant server.

We've seen this behavior a few times since upgrading.  One of our  
masters somehow thinks it's supposed to do an import and trashes it's  
copy of the data.  No person had triggered an import or a supplier- 
 >consumer initialization.  Are there conditions where the directory  
server itself would trigger such an operation autonomously?




More information about the Fedora-directory-users mailing list