[389-users] Multimaster replication out of sync

Thu Dec 17 23:59:08 UTC 2009

Mitja Mihelič wrote:
>
>
> On 12/12/2009 12:06 AM, Rich Megginson wrote:
>> Mitja Mihelič wrote:
>>>
>>>
>>> On 12/07/2009 05:18 PM, Rich Megginson wrote:
>>>> Mitja Mihelic wrote:
>>>>> Hi!
>>>>>
>>>>> We have two instances of the DS in a multimaster replication setup.
>>>>> We had to restore the database of one of the servers from backup.
>>>>> While the second master was down, the first was receiving updates.
>>>>> After we fired up the restored master it started receiving updates as
>>>>> soon as a change occurred on the first master (i.e. after 15 minutes)
>>>>> After the sync finished, we noticed they weren't identical.
>>>>> Clicking "Send updates now" from the replication agreement does 
>>>>> not help.
>>>>>
>>>>> Is there a way to get them synced up again ? Other than 
>>>>> reinitializing
>>>>> the second/restored master ?
>>>> How long was the server down?  How old was the backup it was 
>>>> restored from?
>>> The server was not down long, but the backup was about 10 hours old.
>>> This was a backup at filesystem level made by ufsdump. It was not a 
>>> "regular" DS backup.
>>> When we restored the database file from the dump the server booted OK.
>>>
>>> Then we made little test:
>>> - made another ufsdump of the second master
>>> - shut down the server
>>> - let the primary master update for an hour
>>> - restored the second master's database from the dump
>>> - started the second master
>>> - let them do their replication magic
>>> - isolated both servers (i.e. no updates)
>>> - compared the LDIF dumps
>>> Again, they were not the same.
>>>
>>> We probably should have used the built in backup functionality, right ?
>> Yes, although I'm not sure what would be causing the problems you see.
>>
>> In general, when the database state changes, you have to reinitialize 
>> replication.
> We tried the built-in backup:
> /usr/lib/dirsrv/serverReplica/db2bak 
> /var/lib/dirsrv/serverReplica/bak/`date +%Y_%m_%d_%H_%M_%S`
>
> Executed the same test procedure as described above.
>
> There are still entries on the primary server that do not get replayed 
> on the secondary.
>
> An error message (repeated every 5 minutes) from the primary master 
> SERVER1 occurs when a record, that is missing on the secondary, gets 
> updated on the primary:
> [16/Dec/2009:10:26:02 +0100] NSMMReplicationPlugin - agmt="cn=MM to 
> SERVER2" (SERVER2:389): Consumer failed to replay change (uniqueid 
> 25ab6e01-1dd211b2-bdbbda0a-92130000, CSN 4b28a7ac0000000b0000): No 
> such object. Skipping.
>
> My reasoning would be: if the entry does not exist on the consumer, 
> create it. But I guest that is not how the mechanism works.
> I'm still scratching my head about this one...
In general, if you restore or otherwise change a database, that server 
will have to be reinitialized in order for replication to work.
>
> Regards,
> Mitja
>
> -- 
> 389 users mailing list
> 389-users at redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users