[389-users] more MMR issues

Rich Megginson rmeggins at redhat.com
Tue Nov 10 20:25:59 UTC 2009


Robert Viduya wrote:
> I didn't get a response to my previous request for help and our 
> situation degenerated (we lost 3 of our 4 masters) to the point where 
> I felt we had to do a clean rebuild.  We did that late last week into 
> the weekend and had set up a 2 masters and assorted hubs and slaves.  
> We used a clean ldif file to import into the first master, so no 
> previous replica IDs were carried over from the previous environment.
>
> We are running directory version 1.2.2 on RHEL5.4, both 64-bit.
>
> Things were running fine until this morning, when one of our masters 
> started reporting errors.  We found this in it's errorlog:
>
> [10/Nov/2009:08:56:27 -0500] NSMMReplicationPlugin - 
> multimaster_be_state_change: replica 
> ou=people,dc=gted,dc=gatech,dc=edu is going offline; disabling 
> replication
> [10/Nov/2009:08:59:29 -0500] - WARNING: Import is running with 
> nsslapd-db-private-import-mem on; No other process is allowed to 
> access the database
> [10/Nov/2009:08:59:33 -0500] - ERROR bulk import abandoned
> [10/Nov/2009:08:59:34 -0500] - import people: Aborting all import 
> threads...
> [10/Nov/2009:08:59:42 -0500] - import people: Import threads aborted.
> [10/Nov/2009:08:59:43 -0500] - import people: Closing files...
> [10/Nov/2009:08:59:43 -0500] - import people: Import failed.
> [10/Nov/2009:09:01:51 -0500] NSMMReplicationPlugin - 
> replica_replace_ruv_tombstone: failed to update replication update 
> vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
> [10/Nov/2009:09:01:57 -0500] NSMMReplicationPlugin - 
> replica_replace_ruv_tombstone: failed to update replication update 
> vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
> [10/Nov/2009:09:02:01 -0500] NSMMReplicationPlugin - 
> replica_replace_ruv_tombstone: failed to update replication update 
> vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
> [10/Nov/2009:09:02:21 -0500] NSMMReplicationPlugin - 
> replica_replace_ruv_tombstone: failed to update replication update 
> vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
> [10/Nov/2009:09:02:26 -0500] NSMMReplicationPlugin - 
> replica_replace_ruv_tombstone: failed to update replication update 
> vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
> [10/Nov/2009:09:02:32 -0500] NSMMReplicationPlugin - 
> replica_replace_ruv_tombstone: failed to update replication update 
> vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1
>
>
> That last line repeats until we brought the server down.  The log 
> _looks_ like someone/something triggered an import operation, but 
> no-one did, on either master.
>
> The errorlog on the other master shows the following:
>
> [10/Nov/2009:08:39:29 -0500] - repl5_inc_waitfor_async_results timed 
> out waiting for responses: 38 46
> [10/Nov/2009:08:39:54 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Warning: unable to receive 
> endReplication extended operation response (Bad parameter to an ldap 
> routine)
> [10/Nov/2009:08:40:04 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Unable to receive the response for a 
> startReplication extended operation to consumer (Bad parameter to an 
> ldap routine). Will retry later.
> [10/Nov/2009:08:40:08 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Unable to receive the response for a 
> startReplication extended operation to consumer (Bad parameter to an 
> ldap routine). Will retry later.
> [10/Nov/2009:08:40:14 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Unable to receive the response for a 
> startReplication extended operation to consumer (Bad parameter to an 
> ldap routine). Will retry later.
> [10/Nov/2009:08:40:38 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Unable to receive the response for a 
> startReplication extended operation to consumer (Bad parameter to an 
> ldap routine). Will retry later.
> [10/Nov/2009:08:43:05 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Unable to receive the response for a 
> startReplication extended operation to consumer (Bad parameter to an 
> ldap routine). Will retry later.
> [10/Nov/2009:08:44:50 -0500] - repl5_inc_waitfor_async_results timed 
> out waiting for responses: 6 8
> [10/Nov/2009:08:47:08 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Unable to receive the response for a 
> startReplication extended operation to consumer (Bad parameter to an 
> ldap routine). Will retry later.
> [10/Nov/2009:08:47:08 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Incremental protocol: event 
> backoff_timer_expired should not occur in state start_backoff
> [10/Nov/2009:08:47:12 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Unable to receive the response for a 
> startReplication extended operation to consumer (Bad parameter to an 
> ldap routine). Will retry later.
> [10/Nov/2009:08:47:18 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Incremental update failed and 
> requires administrator action
> [10/Nov/2009:08:55:01 -0500] - repl5_inc_waitfor_async_results timed 
> out waiting for responses: 13 14
> [10/Nov/2009:08:55:01 -0500] - repl5_inc_waitfor_async_results timed 
> out waiting for responses: 59 81
> [10/Nov/2009:08:55:14 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Warning: unable to receive 
> endReplication extended operation response (Bad parameter to an ldap 
> routine)
> [10/Nov/2009:08:55:24 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Unable to receive the response for a 
> startReplication extended operation to consumer (Bad parameter to an 
> ldap routine). Will retry later.
> [10/Nov/2009:08:55:28 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Unable to receive the response for a 
> startReplication extended operation to consumer (Bad parameter to an 
> ldap routine). Will retry later.
> [10/Nov/2009:08:55:34 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Unable to receive the response for a 
> startReplication extended operation to consumer (Bad parameter to an 
> ldap routine). Will retry later.
> [10/Nov/2009:08:55:46 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Unable to receive the response for a 
> startReplication extended operation to consumer (Bad parameter to an 
> ldap routine). Will retry later.
> [10/Nov/2009:08:56:10 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Unable to receive the response for a 
> startReplication extended operation to consumer (Bad parameter to an 
> ldap routine). Will retry later.
> [10/Nov/2009:08:56:58 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Unable to receive the response for a 
> startReplication extended operation to consumer (Bad parameter to an 
> ldap routine). Will retry later.
> [10/Nov/2009:08:58:34 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Replication bind with SIMPLE auth 
> resumed
> [10/Nov/2009:09:01:47 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Consumer failed to replay change 
> (uniqueid 51dccc08-9efe11de-8efe8516-22c1043e, CSN 
> 4af96f8a000200370000): Operations error. Will retry later.
> [10/Nov/2009:09:01:47 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Consumer failed to replay change 
> (uniqueid 5ad5610c-1dd211b2-80b9be51-952a0000, CSN 
> 4af96f8b000000370000): Operations error. Will retry later.
> [10/Nov/2009:09:01:47 -0500] NSMMReplicationPlugin - agmt="cn=people 
> rewbell gertrude" (gertrude:636): Consumer failed to replay change 
> (uniqueid 213cd58e-cd7b11de-b535d108-950067b1, CSN 
> 4af96fcf000000370000): Operations error. Will retry later.
>
> Again, that last line repeats until we brought down the errant server.
>
> We've seen this behavior a few times since upgrading.  One of our 
> masters somehow thinks it's supposed to do an import and trashes it's 
> copy of the data.  No person had triggered an import or a 
> supplier->consumer initialization.  Are there conditions where the 
> directory server itself would trigger such an operation autonomously?
No.  Check the access log to see what operations were submitted to the 
directory server at or around [10/Nov/2009:08:56:27 -0500]

Are your servers in time sync?  Is cn=people rewbell gertrude the 
agreement that sends updates to the master that is having the 
spontaneous import problem?
>
> -- 
> 389 users mailing list
> 389-users at redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3258 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/fedora-directory-users/attachments/20091110/b95cc79a/attachment.bin>


More information about the Fedora-directory-users mailing list