[linux-lvm] LVM archive management ( /etc/lvm/archives) expiry / retention misbehaves after index #100, 000.

Mark Mielke mark.mielke at gmail.com
Mon Aug 14 03:49:21 UTC 2017


I opened this Bugzilla issue for tracking purposes:

https://bugzilla.redhat.com/show_bug.cgi?id=1481085

On Sun, Aug 13, 2017 at 8:05 AM, Mark Mielke <mark.mielke at gmail.com> wrote:

> I searched around for this a bit, and although other users may have hit
> this, I didn't find a good explanation offered. I suspect the users clean
> it up manually and then it disappears for another 2 years. I hope this
> message will get captured by Google, and help somebody else out. Also, I
> hope to have some discussion about this as it seems like an easily
> preventable problem.
>
> The archive file names are generated like:
>
>                 if (dm_snprintf(archive_name, sizeof(archive_name),
>                                  "%s/%s_%05u-%d.vg",
>                                  dir, vg->name, ix, rnum) < 0) {
>
> The directory scanning code that loads the archive file names into memory
> recognizes a problem, although it isn't explicit about what the problem is:
>
>         /* Sort fails beyond 5-digit indexes */
>         if ((count = scandir(dir, &dirent, NULL, alphasort)) < 0) {
>                 log_error("Couldn't scan the archive directory (%s).",
> dir);
>                 return 0;
>         }
>
> The file names encode the index like "00000". The sorting code uses
> "alphasort", which will only work properly as long as the index stays
> within 5 digits. As soon as it exceeds 5 digits, it begins to sort the
> "100000" to the beginning, and "99999" to the end. Then, new archives seems
> to *all* be "100000". We had some 40,000 indexes with "100000" before we
> noticed. And, because the index is followed by a random number, it would
> only expire a few of the "100000" before it would hit one that was younger
> than the 30 days retention period set by default. When I reduced the
> retention period to 7 days, it expired only about 12 archive files of
> 40,000 archive files. This behaviour is probably due to random number
> distribution ensuring that there are always some recent records near 0?
>
> This issue eventually affects everyone, although obviously the people that
> use features like snapshots more frequently (we use it every 15 minutes,
> across multiple volumes) will hit it sooner,
>
> There are a few fixes possible... Probably, "alphasort" should not be used
> at all, but a context aware sort should be used, that can filter and sort
> as it goes, decoding the index correctly as a number, and comparing it as a
> number. Then, if performance is desirable, and scalability, it would be
> ideal if it did it in a single pass, and buffering only the minimum needed
> to expire the correct archive files.
>
> We hit this on RHEL 7.2. I wasn't surprised to find it in RHEL 7.2, but I
> was surprised that it still exists on "master". "git blame" says this has
> been an issue since 2002:
>
> 5be981bab5 (Alasdair Kergon  2002-05-07 12:47:11 +0000 139)     /* Sort
> fails beyond 5-digit indexes */
> 59d6420b9a (Joe Thornber     2002-02-08 11:58:18 +0000 140)     if ((count
> = scandir(dir, &dirent, NULL, alphasort)) < 0) {
> b8f47d5f69 (Alasdair Kergon  2009-07-15 20:02:46 +0000 141)
> log_error("Couldn't scan the archive directory (%s).", dir);
> 952d12a5f5 (Alasdair Kergon  2002-01-09 19:16:48 +0000 142)
> return 0;
> 952d12a5f5 (Alasdair Kergon  2002-01-09 19:16:48 +0000 143)     }
>
> Ouch... :-)
>
> For anybody that does hit this.... Prune the archive files with index <
> 100000 is effective. It starts counting from 100000, and you now have 9X
> more life before it will happen again... :-)
>
> --
> Mark Mielke <mark.mielke at gmail.com>
>
>


-- 
Mark Mielke <mark.mielke at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20170813/e1ca5061/attachment.htm>


More information about the linux-lvm mailing list