[Freeipa-users] Loss of initial master in multi master setup

Neal Harrington | i-Neda Ltd nharrington at i-neda.com
Wed Dec 7 13:10:53 UTC 2016


> From: Rob Crittenden 
> Martin Babinsky wrote:
> > On 12/01/2016 01:28 PM, Neal Harrington | i-Neda Ltd wrote:
> >> Hi IPA Gurus,
> >>
> >>
> >> I had a 3 site multi master IPA replication setup (1 office and 2
> >> datacentres) with 2 IPA servers at each site. Each server was
> >> replicating successfully to 3 other servers (the other local site
> >> server and one server at each of the two remote sites). Everything is
> >> running on the default packages from CentOS 7.2 and each server is a
> >> full replica (ipa-replica-install
> >> /var/lib/ipa/replica-info-id-myserver.fqdn.com.gpg  --setup-ca
> >> --setup-dns --mkhomedir --forwarder 8.8.8.8)
> >>
> >>
> >> Everything was ticking over nicely until we had notice that the
> >> office site was moving on short notice.
> >>
> >>
> >> I successfully created IPA servers at the new site, setup replication
> >> again between the new office and the two datacentres that were to
> >> remain online, tested and everything worked as expected -
> >> unfortunately in the rush I did not have time to properly retire the
> >> IPA servers in the old office.
> >>
> >>
> >> The problem this has caused is that I only ever created users in one
> >> of the IPA servers in the original office - so only those servers
> >> have a DNA range and I am now unable to create new users on the active
> servers.
> >> The original office servers are still in the IPA replication and
> >> powered on but offline so potential split brain?
> >>
> >>
> >> I now have two things I would like to know before proceeding:
> >>
> >>   * Is the best fix here to force remove the original IPA servers and
> >>     manually add a new dna range significantly different from the
> >>     original to avoid overlaps?
> >>   * Is there anything else I should check? I can't see any issues
> >>     however did not notice the DNA range until I tried to create a user.
> >>
> >> Any pointers greatly appreciated.
> >>
> >>
> >> Thanks,
> >>
> >> Neal.
> >>
> >>
> >>
> >>
> >>
> >>
> >
> > Hi Neal,
> >
> > If you already disconnected/decomissioned the old masters then I thnk
> > the best you can do is option a, i.e. re-set DNA ranges on replicas to
> > new values while avioding overlap with old ranges.
> >
> > We have an upstream document[1] describing the procedure. Hope it
> helps.
> >
> > Also make sure that you migrated CA renewal and CRL master
> > responsibilities to the new replicas, otherwise you may get problems
> > with expiring certificates which are really hard to solve. See the
> > following guide for details. [2]
> >
> > [1] http://www.freeipa.org/page/V3/Recover_DNA_Ranges
> > [2]
> >
> http://www.freeipa.org/page/Howto/Promote_CA_to_Renewal_and_CRL_
> Master
> >
> 
> You may want to look at this too, http://blog-rcritten.rhcloud.com/?p=50
> 
> rob

Hi Rob & Martin,

Thanks for the pointers, I am now able to create new users on different servers - however everything to do with replication seems to be failing.

I have changed my replication from a mesh to a long chain and run "ipa-replica-manage -v re-initialize --from <server>" and the same for ipa-csreplica-manage along the chain which succeeds (and any passwords/user creation etc I have done at the start of the chain is pulled through) however replication fails immediately after. I was hoping that re-initializing the chain like this would flush out any "bad" entries - probably wishful thinking.

"Ipa-replica-manage -v list" only shows servers in the chain. "Ipa-replica-manage list-ruv" did show the two original servers which I lost connection to and I removed those which successfully removed them from all servers so that part of replication seems to be working. When I do an LDAP search I still see those old masters though (and also see one previously retired server with two different ID's - blue-auth01). Will I need to manually delete these? (example search and output below)

Apart from manually deleting the dead servers from LDAP, what else should I do to get replication working again? I'm watching for the CentOS 7.3 release to be able to upgrade to IPA 4.3 as I've seen a few posts about the better handling of replication etc in that version. In the meantime the errors log (copy below) indicates I need to re-initialize which I've done several times without any improvement.

Thanks in advance,
Neal.

[root at office-auth04 ~]# ldapsearch -h $(hostname -f)  -D "cn=directory manager" -W  -b "o=ipaca" "(&(objectclass=nstombstone)(nsUniqueId=ffffffff-ffffffff-ffffffff-ffffffff))" nscpentrywsi
Enter LDAP Password: 
# extended LDIF
#
# LDAPv3
# base <o=ipaca> with scope subtree
# filter: (&(objectclass=nstombstone)(nsUniqueId=ffffffff-ffffffff-ffffffff-ffffffff))
# requesting: nscpentrywsi 
#

# replica, o\3Dipaca, mapping tree, config
dn: cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config
nscpentrywsi: dn: cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config
nscpentrywsi: cn: replica
nscpentrywsi: createTimestamp: 20161122150144Z
nscpentrywsi: creatorsName: cn=directory manager
nscpentrywsi: modifiersName: cn=Multimaster Replication Plugin,cn=plugins,cn=config
nscpentrywsi: modifyTimestamp: 20161204152409Z
nscpentrywsi: nsDS5Flags: 1
nscpentrywsi: nsDS5ReplicaBindDN: cn=Replication Manager cloneAgreement1-office-auth04.int.i-neda.com-pki-tomcat,ou=csusers,cn=config
nscpentrywsi: nsDS5ReplicaBindDN: cn=replication manager,cn=config
nscpentrywsi: nsDS5ReplicaId: 1495
nscpentrywsi: nsDS5ReplicaName: 725aa31e-b0c411e6-b5d989ac-8f24d4e5
nscpentrywsi: nsDS5ReplicaRoot: o=ipaca
nscpentrywsi: nsDS5ReplicaType: 3
nscpentrywsi: nsState:: 1wUAAAAAAAAXNURYAAAAAAAAAAAAAAAAUCAAAAAAAAACAAAAAAAAAA==
nscpentrywsi: objectClass: top
nscpentrywsi: objectClass: nsDS5Replica
nscpentrywsi: objectClass: extensibleobject
nscpentrywsi: numSubordinates: 1
nscpentrywsi: nsds50ruv: {replicageneration} 575ee7d2000000600000                                                                                                                                           [0/694]
nscpentrywsi: nsds50ruv: {replica 1495 ldap://office-auth04.int.i-neda.com:389} 58347e61000005d70000 58445505000005d70000
nscpentrywsi: nsds50ruv: {replica 91 ldap://power-auth01.int.i-neda.com:389} 577549ad0000005b0000 583457150008005b0000
nscpentrywsi: nsds50ruv: {replica 86 ldap://power-auth02.int.i-neda.com:389} 57754d7b000000560000 583469ad000000560000
nscpentrywsi: nsds50ruv: {replica 1395 ldap://blue-auth04.int.i-neda.com:389} 583469b5000005730000 5841c354000305730000
nscpentrywsi: nsds50ruv: {replica 96 ldap://office-auth01.int.i-neda.com:389} 575ee96e000000600000 58349d69000400600000
nscpentrywsi: nsds50ruv: {replica 97 ldap://office-auth02.int.i-neda.com:389} 575ee993000000610000 5820979e000700610000
nscpentrywsi: nsds50ruv: {replica 1095 ldap://blue-auth02.int.i-neda.com:389} 5783b6fb000004470000 582de13b000d04470000
nscpentrywsi: nsds50ruv: {replica 81 ldap://blue-auth01.int.i-neda.com:389} 5783b6fc000800510000 5783b719000800510000
nscpentrywsi: nsds50ruv: {replica 1195 ldap://blue-auth01.int.i-neda.com:389} 5784d185000004ab0000 5784d1af005604ab0000
nscpentrywsi: nsds50ruv: {replica 76 ldap://blue-auth03.int.i-neda.com:389} 5819fea00000004c0000 58306e950000004c0000
nscpentrywsi: nsds50ruv: {replica 1295 ldap://office-auth03.int.i-neda.com:389} 582f1d660009050f0000 582f1d7c000a050f0000
nscpentrywsi: nsruvReplicaLastModified: {replica 1495 ldap://office-auth04.int.i-neda.com:389} 00000000
nscpentrywsi: nsruvReplicaLastModified: {replica 91 ldap://power-auth01.int.i-neda.com:389} 00000000
nscpentrywsi: nsruvReplicaLastModified: {replica 86 ldap://power-auth02.int.i-neda.com:389} 00000000
nscpentrywsi: nsruvReplicaLastModified: {replica 1395 ldap://blue-auth04.int.i-neda.com:389} 00000000
nscpentrywsi: nsruvReplicaLastModified: {replica 96 ldap://office-auth01.int.i-neda.com:389} 00000000
nscpentrywsi: nsruvReplicaLastModified: {replica 97 ldap://office-auth02.int.i-neda.com:389} 00000000
nscpentrywsi: nsruvReplicaLastModified: {replica 1095 ldap://blue-auth02.int.i-neda.com:389} 00000000
nscpentrywsi: nsruvReplicaLastModified: {replica 81 ldap://blue-auth01.int.i-neda.com:389} 00000000
nscpentrywsi: nsruvReplicaLastModified: {replica 1195 ldap://blue-auth01.int.i-neda.com:389} 00000000
nscpentrywsi: nsruvReplicaLastModified: {replica 76 ldap://blue-auth03.int.i-neda.com:389} 00000000
nscpentrywsi: nsruvReplicaLastModified: {replica 1295 ldap://office-auth03.int.i-neda.com:389} 00000000
nscpentrywsi: nsds5ReplicaChangeCount: 2
nscpentrywsi: nsds5replicareapactive: 0

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1


Extract from /var/log/dirsrv/REALM/errors from power-auth01 which is one down from the "start" of the chain (office-auth04):
[07/Dec/2016:12:07:16 +0000] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=int,dc=i-neda,dc=com is going offline; disabling replication
[07/Dec/2016:12:07:17 +0000] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[07/Dec/2016:12:07:21 +0000] - import userRoot: Workers finished; cleaning up...
[07/Dec/2016:12:07:21 +0000] - import userRoot: Workers cleaned up.
[07/Dec/2016:12:07:21 +0000] - import userRoot: Indexing complete.  Post-processing...
[07/Dec/2016:12:07:21 +0000] - import userRoot: Generating numsubordinates (this may take several minutes to complete)...
[07/Dec/2016:12:07:21 +0000] - import userRoot: Generating numSubordinates complete.
[07/Dec/2016:12:07:21 +0000] - import userRoot: Gathering ancestorid non-leaf IDs...
[07/Dec/2016:12:07:21 +0000] - import userRoot: Finished gathering ancestorid non-leaf IDs.
[07/Dec/2016:12:07:21 +0000] - import userRoot: Creating ancestorid index (new idl)...
[07/Dec/2016:12:07:21 +0000] - import userRoot: Created ancestorid index (new idl).
[07/Dec/2016:12:07:21 +0000] - import userRoot: Flushing caches...
[07/Dec/2016:12:07:21 +0000] - import userRoot: Closing files...
[07/Dec/2016:12:07:22 +0000] - import userRoot: Import complete.  Processed 777 entries in 5 seconds. (155.40 entries/sec)
[07/Dec/2016:12:07:22 +0000] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=int,dc=i-neda,dc=com is coming online; enabling replication
[07/Dec/2016:12:07:22 +0000] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica dc=int,dc=i-neda,dc=com does not match the data in the changelog.
 Recreating the changelog file. This could affect replication with replica's  consumers in which case the consumers should be reinitialized.
[07/Dec/2016:12:07:22 +0000] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=int,dc=i-neda,dc=com--no CoS Templates found, which should be added before the CoS Definition.
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=groups,cn=compat,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=computers,cn=compat,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=ng,cn=compat,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target ou=sudoers,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=users,cn=compat,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=vaults,cn=kra,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=vaults,cn=kra,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=vaults,cn=kra,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=vaults,cn=kra,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=vaults,cn=kra,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=vaults,cn=kra,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=vaults,cn=kra,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=vaults,cn=kra,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=vaults,cn=kra,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=vaults,cn=kra,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=vaults,cn=kra,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=ad,cn=etc,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=casigningcert cert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] NSACLPlugin - The ACL target cn=casigningcert cert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=int,dc=i-neda,dc=com does not exist
[07/Dec/2016:12:07:22 +0000] agmt="cn=meTopower-auth02.int.i-neda.com" (power-auth02:389) - Can't locate CSN 5847f3fd0000000d0000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.
[07/Dec/2016:12:07:22 +0000] NSMMReplicationPlugin - changelog program - agmt="cn=meTopower-auth02.int.i-neda.com" (power-auth02:389): CSN 5847f3fd0000000d0000 not found, we aren't as up to date, or we purged
[07/Dec/2016:12:07:22 +0000] NSMMReplicationPlugin - agmt="cn=meTopower-auth02.int.i-neda.com" (power-auth02:389): Data required to update replica has been purged. The replica must be reinitialized.
[07/Dec/2016:12:07:22 +0000] NSMMReplicationPlugin - agmt="cn=meTopower-auth02.int.i-neda.com" (power-auth02:389): Incremental update failed and requires administrator action
[07/Dec/2016:12:07:22 +0000] agmt="cn=meTooffice-auth04.int.i-neda.com" (office-auth04:389) - Can't locate CSN 584412d7000200050000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.
[07/Dec/2016:12:07:22 +0000] NSMMReplicationPlugin - changelog program - agmt="cn=meTooffice-auth04.int.i-neda.com" (office-auth04:389): CSN 584412d7000200050000 not found, we aren't as up to date, or we purged
[07/Dec/2016:12:07:22 +0000] NSMMReplicationPlugin - agmt="cn=meTooffice-auth04.int.i-neda.com" (office-auth04:389): Data required to update replica has been purged. The replica must be reinitialized.
[07/Dec/2016:12:07:22 +0000] NSMMReplicationPlugin - agmt="cn=meTooffice-auth04.int.i-neda.com" (office-auth04:389): Incremental update failed and requires administrator action
[07/Dec/2016:12:07:30 +0000] attrlist_replace - attr_replace (nsslapd-referral, ldap://office-auth04.int.i-neda.com:389/o%3Dipaca) failed.
[07/Dec/2016:12:07:30 +0000] attrlist_replace - attr_replace (nsslapd-referral, ldap://office-auth04.int.i-neda.com:389/o%3Dipaca) failed.
[07/Dec/2016:12:07:30 +0000] attrlist_replace - attr_replace (nsslapd-referral, ldap://office-auth04.int.i-neda.com:389/o%3Dipaca) failed.
[07/Dec/2016:12:07:30 +0000] NSMMReplicationPlugin - multimaster_be_state_change: replica o=ipaca is going offline; disabling replication
[07/Dec/2016:12:07:31 +0000] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=int,dc=i-neda,dc=com--no CoS Templates found, which should be added before the CoS Definition.
[07/Dec/2016:12:07:31 +0000] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[07/Dec/2016:12:07:36 +0000] attrlist_replace - attr_replace (nsslapd-referral, ldap://office-auth04.int.i-neda.com:389/o%3Dipaca) failed.
[07/Dec/2016:12:07:36 +0000] attrlist_replace - attr_replace (nsslapd-referral, ldap://office-auth04.int.i-neda.com:389/o%3Dipaca) failed.
[07/Dec/2016:12:07:36 +0000] attrlist_replace - attr_replace (nsslapd-referral, ldap://office-auth04.int.i-neda.com:389/o%3Dipaca) failed.
[07/Dec/2016:12:07:36 +0000] - import ipaca: Workers finished; cleaning up...
[07/Dec/2016:12:07:36 +0000] - import ipaca: Workers cleaned up.
[07/Dec/2016:12:07:36 +0000] - import ipaca: Indexing complete.  Post-processing...
[07/Dec/2016:12:07:36 +0000] - import ipaca: Generating numsubordinates (this may take several minutes to complete)...
[07/Dec/2016:12:07:36 +0000] - import ipaca: Generating numSubordinates complete.
[07/Dec/2016:12:07:36 +0000] - import ipaca: Gathering ancestorid non-leaf IDs...
[07/Dec/2016:12:07:36 +0000] - import ipaca: Finished gathering ancestorid non-leaf IDs.
[07/Dec/2016:12:07:36 +0000] - import ipaca: Creating ancestorid index (new idl)...
[07/Dec/2016:12:07:36 +0000] - import ipaca: Created ancestorid index (new idl).
[07/Dec/2016:12:07:36 +0000] - import ipaca: Flushing caches...
[07/Dec/2016:12:07:36 +0000] - import ipaca: Closing files...
[07/Dec/2016:12:07:36 +0000] - import ipaca: Import complete.  Processed 375 entries in 6 seconds. (62.50 entries/sec)
[07/Dec/2016:12:07:36 +0000] NSMMReplicationPlugin - multimaster_be_state_change: replica o=ipaca is coming online; enabling replication
[07/Dec/2016:12:07:36 +0000] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica o=ipaca does not match the data in the changelog.
 Recreating the changelog file. This could affect replication with replica's  consumers in which case the consumers should be reinitialized.
[07/Dec/2016:12:07:36 +0000] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=int,dc=i-neda,dc=com--no CoS Templates found, which should be added before the CoS Definition.

Thanks in advance for any hints,
Neal.




More information about the Freeipa-users mailing list