[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

We should make blivet.size.py a separate package

Hi all,

It turns out that anaconda uses Size() objects for a number of things that
are only tangentially related to storage. There's a bunch of stuff in
packaging, for instance, which does not seem that nearly related to storage
but which does use Size(). It seems cruel to make anaconda restrict its
usage of this handy Size class to only storage related stuff, but inconsistent,
if it is a storage package, not to.

Clearly, a separate package is the solution and might prove to be generally useful.

Other changes that should be made at the same time are:

1) Disentangle parsing user input from Size() construction. This would change the
interface, so that:

* Size("10 MiB") would have to be changed to Size(10, size.MiB)
* Size("%d KiB" % int(val[i])) would have to be changed to Size(int(val[i]), size.KiB)

Note: All other things being equal, the above changes would result in a speed up of about 4x,
when the argument is changed from a string to a tuple.

* Size(user_input) would have to be changed to Size(parseSpec(user_input))

2) Use int instead of Decimal for underlying representation and use delegation, not extension.
int is unbounded, and appropriate
as Size() stores exact number of bytes. The implementation will still use Decimal within
humanReadable() and convertTo(), where Decimal proves its worth. This would make (1) a bit
easier, for it would be possible to override object.__init__() instead of overriding Decimal.__new__()
which is awkward because of the context parameter which Decimal.__new__ requires.

Benefits to the change would be a 5 to 10x reduction in size of Size() objects and a 10 to 15x
speedup in constructing Size() objects and in arithmetic operations. The only arithmetic operation
I profiled was addition; I'm generalizing here.

There would be no change to the interface, and the changes in the class are actually very simple.

3) Make arithmetic operations more type-sensible.

Currently, our code does a bunch of not so good things, like:

>>> divmod(Size("3 MiB"), Size("2 MiB"))
(Decimal('1'), Decimal('1048576'))

The semantics of divmod should agree with the, in this situation, correct semantics of mod,
which gives the type of the remainder as a Size.

>>> Size("3 MiB") % Size("2 MiB")
Size('1 MiB')

2 multiplied by itself 3 bytes times should raise an exception or something, but it doesn't:
>>> 2 ** Size("3 B")

The number of times 2 MiB can be subtracted from 3 MiB is not 1 bytes times, it's 1 times, but:
>>> Size("3 MiB") / Size("2 MiB")
Size('1 B')

and so forth.

In theory, our code should not be asking for really nonsensical operations like 2 ** Size("1 MiB"),
and I think it is  not.

It can be made configurable how extreme the reaction is to a requested operation that makes no sense.

There are certain operations that are mathematically sensible, but which the implementation
would not, because it could not, support. One such operation is 2 MiB * 2 KiB which yields
4096 KiB^^2. We have no ability to represent that, so it would be better to raise a TypeError when
called upon to calculate it rather than return, as now, Size('4 GiB').
I believe that computations in this category are not ones that we
actually do in Blivet or Anaconda or should expect to. The context in which they could arise is
in, e.g., statistics, where in order to calculate the standard deviation, one has to square
the values. But if we start doing statistics on some MiB's we should not be using Size
class at all, we can use pint, which is good for that, but bad for our regular purposes.

NOTE: Most of the _implementation_ is already done, leaving just packaging and a few places
in anaconda where it will require a little thought to decide if parseSpec is called for.

Please give me your opinions. Maybe we can discuss this in installer meeting.

- mulhern


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]