[Freeipa-devel] [PATCH] WIP backup and restore

Mon Apr 8 21:41:56 UTC 2013

Petr Viktorin wrote:
> On 04/05/2013 10:54 PM, Rob Crittenden wrote:
>> Petr Viktorin wrote:
>>> On 04/04/2013 03:04 PM, Rob Crittenden wrote:
>>>> Rob Crittenden wrote:
>>>>> Petr Viktorin wrote:
>>>>>> On 03/23/2013 05:06 AM, Rob Crittenden wrote:
>>>>>>> There are strict limits on what can be restored where. Only exact
>>>>>>> matching hostnames and versions are allowed right now. We can
>>>>>>> probably
>>>>>>> relax the hostname requirement if we're only restoring data, and the
>>>>>>> version perhaps for only the the first two values (so you can
>>>>>>> restore a
>>>>>>> 3.0.0 backup on 3.0.1 but not on 3.1.0).
>>>>>>
>>>>>> Do we also need to limit the versions of Dogtag, 389, Kerberos...?
>>>>>> Or is what they put in /var/lib guaranteed portable across versions?
>>>>>
>>>>> An interesting question. I'd imagine that a major db change would also
>>>>> require a major update to IPA, therefore if we restrict by IPA version
>>>>> we should be ok. I AM doing an untar of files though so yeah, there
>>>>> is a
>>>>> risk here.
>>>>>
>>>>>>
>>>>>>
>>>>>> That's good to hear!
>>>>>>
>>>>>> I think while developing, you should run with -v to get all the
>>>>>> output.
>>>>>> And for production, please have the commands log by default (set
>>>>>> log_file_name).
>>>>>
>>>>> Yes, I plan on adding that in the future.
>>>>>
>>>>>>
>>>>>>> ipa-backup does all its binds using ldapi. ipa-restore needs the DM
>>>>>>> password because we need to contact remote servers to enable/disable
>>>>>>> replication.
>>>>>>
>>>>>> The restore assumes that ipa is already installed, right? I can't
>>>>>> just
>>>>>> run it on a clean machine? Id would be good to check/document this.
>>>>>
>>>>> It depends.
>>>>>
>>>>> For a full backup you can actually restore to an uninstalled
>>>>> server. In
>>>>> fact, you could restore to a machine w/no IPA bits on it at all in all
>>>>> likelihood (which wouldn't be very nice, but not exactly wrong either
>>>>> IMHO).
>>>>>
>>>>> I tested this with:
>>>>>   - ipa-server-install
>>>>>   - ipa-backup
>>>>>   - ipa-server-install --uninstall -U
>>>>>   - ipa-restore
>>>>>   - ran the unit tests
>>>>>
>>>>>> I looked in the backup tarball and saw dirsrv PID file and lock
>>>>>> directories. Are these needed?
>>>>>
>>>>> Probably not. I gathered some of the files to backup based on watching
>>>>> what files that changed during an install that are independent of us.
>>>>> I'll take another pass through it, there may be other oddities too.
>>>>>
>>>>>> The tool runners (install/tools/ipa-backup and
>>>>>> install/tools/ipa-restore) are missing, so IPA doesn't build.
>>>>>> Probably
>>>>>> just a forgotten git add.
>>>>>
>>>>> Yup.
>>>>>
>>>>>>
>>>>>> The patch adds an extra backslash in install/tools/man/Makefile.am;
>>>>>> consider adding $(NULL) at the end.
>>>>>
>>>>> I'll take a look.
>>>>>
>>>>>>
>>>>>> Backup.dirs, Backup.files, Backup.logs:
>>>>>> The code modifies these class-level attributes. This means you can't
>>>>>> run
>>>>>> the command more than once in one Python session, so it can't be
>>>>>> used as
>>>>>> a library (e.g. in a hypothetical test suite).
>>>>>> Please make the class atrributes immutable (tuples), then in
>>>>>> __init__ do
>>>>>> `self.dirs = list(self.dirs)` to to get instance-specific lists.
>>>>>
>>>>> Ok, fair point.
>>>>>
>>>>>> Code that creates temporary files/directories or does chdir()
>>>>>> should be
>>>>>> next to (or in) a try block whose finally: restores things.
>>>>>
>>>>> Yes, I mean to add a big try/finally block around everything in run
>>>>> eventually (or the equivalent anyway).
>>>>>
>>>>>>
>>>>>> Instead of star imports (from ipapython.ipa_log_manager import *),
>>>>>> please explicitly import whatever you're using from that package. In
>>>>>> this case, nothing at all!
>>>>>
>>>>> Yeah, I haven't run this through pylint yet to see all the bad
>>>>> imports I
>>>>> cobbled together.
>>>>>
>>>>>> If there's a get_connection() method, it should return the
>>>>>> connection,
>>>>>> and it should always be used to get it. Store the connection in
>>>>>> self._conn, and rather than:
>>>>>>      self.get_connection()
>>>>>>      self.conn.do_work()
>>>>>> do:
>>>>>>      conn = self.get_connection()
>>>>>>      conn.do_work()
>>>>>> This makes forgetting to call get_connection() impossible.
>>>>>
>>>>> My purpose was to avoid creating multiple connections.
>>>>>
>>>>>> If a variable is only used in a single method, like `filename` in
>>>>>> Restore.extract_backup, don't store it in the admintool object.
>>>>>> In general, try to avoid putting data in self if possible. It's
>>>>>> convenient but it allows complex interdependencies.
>>>>>> (Yes, I'm guilty of not following this myself.)
>>>>>
>>>>> Yes, there is certainly a bit of cleanup to do. I was in a bit of a
>>>>> rush
>>>>> to get this WIP out.
>>>>>
>>>>>> In several places, the backup tool logs critical errors and then just
>>>>>> continues on. Is that by design? I think a nonzero exit code would be
>>>>>> appropriate.
>>>>>
>>>>> I'll take a look at them, all I can say at this point is maybe.
>>>>>
>>>>>> In the ipa-restore man page, --backend is not documented.
>>>>>>
>>>>>> In db2ldif, db2bak, etc., a more conscise way to get the time
>>>>>> string is
>>>>>> `time.strftime('export_%Y_%m_%d_%H_%M_%S')`.
>>>>>>
>>>>>> When validating --gpg-keyring, it would be good to check both files
>>>>>> (.sec and .pub).
>>>>>
>>>>> Ok, I can do those.
>>>>>
>>>>>>
>>>>>> Here:
>>>>>>      if (self.backup_type != 'FULL' and not options.data_only and
>>>>>>          not instances):
>>>>>>          raise admintool.ScriptError('Cannot restore a data backup
>>>>>> into
>>>>>> an empty system')
>>>>>> did you mean `(self.backup_type != 'FULL' or options.data_only) and
>>>>>> not
>>>>>> instances`? (I'd introduce a `data_only` flag to prevent confusion.)
>>>>>
>>>>> Yeah, looks like that should be an or.
>>>>>
>>>>>> In the ReplicationManager.check_repl_init reporting: use
>>>>>> int(d.total_seconds()) instead of d.seconds, as the latter doesn't
>>>>>> include full days. I don't think anyone would wait long enough for
>>>>>> the
>>>>>> overflow, but better to be correct.
>>>>>
>>>>> Sure, I doubt anyone would wait 24 hours either but its a
>>>>> no-brainer to
>>>>> make it safe.
>>>>
>>>> I think I've addressed just about everything.
>>>>
>>>> The reason that /var/run/dirsrv and var/lock/dirsrv are included in the
>>>> backup is for the case of a full restore. These directories are not
>>>> created/provided by the package itself, but by installing the first
>>>> instance, so we need the ownership/SELinux context preserved.
>>>>
>>>> One thing I need tested is restoring over top of new data with multiple
>>>> replicas. So basically install, do a backup, add a bunch of data,
>>>> restore, re-initialize all the other masters and then confirm that ALL
>>>> the new data is gone. I think I've got the sequence right but this is
>>>> critical.
>>>>
>>>> This should work on fresh installs and upgrades from 3.0 (e.g. dogtag
>>>> 9/multi-instance DS configurations).
>>>>
>>>> One thing I tested was:
>>>>
>>>> - ipa-server-install
>>>> - ipa-backup
>>>> - ipa-server-install --uninstall -U
>>>> - ipa-restore ipa-full-...
>>>>
>>>> This will actually get your server back EXCEPT that dogtag won't start.
>>>> This is because the log directories are created by the instance
>>>> creation. There are two solutions: backup with --logs or manually
>>>> re-create the log directories (there are several, depending on dogtag
>>>> version).
>>>
>>> Could backup without --logs save which directories are there, and
>>> restore create empty directories if they don't exist? To me
>>> re-initializing a completely wiped machine looks like a big gain for the
>>> extra effort.
>>
>> I went ahead and made this change, it wasn't a lot of work.
>>
>>>
>>>
>>>
>>> The spec changelog needs a small rebase.
>>
>> Done.
>>
>>>
>>> I've tested some scenarios and didn't find any other issues so far.
>>>
>>> I still want to test some more with upgraded masters; installing them
>>> takes some time.
>>> Also I still need to test CA replicas (with the DNAME patches removed).
>>>
>>
>> rob
>
> I've tested some more. It works well with 3.2 masters, even one upgraded
> from 2.2.
>
> When there's a 2.2 master in the topology, things are not so nice. When
> I ran ipa-replica-manage re-initialize, it started printing "update in
> progress" lines for several minutes with no signs of stopping.
> The following appeared in the slapd error log on the source server:
>
> [08/Apr/2013:13:38:48 +0200] NSMMReplicationPlugin - Replication
> agreement for agmt="cn=meTovm-084.idm.lab.eng.brq.redhat.com"
> (vm-084:389) could not be updated. For replication to take place, please
> enable the suffix and restart the server
>
> I somehow doubt that's caused by this patch, though -- was replica
> re-initialize tested with disparate versions recently?
> I can test again with a simpler scenario to pinpoint the issue.
>

Not sure if we've done any specific testing of this, but this API has 
been very stable throughout IPA (much of it dates back to v1). Similarly 
in 389-ds where this is used this stuff has been fairly stable.

I did some very basic testing with a 6.3 and 6.4 server and didn't have 
any issues, so I'm very interested to know what you discover.

rob