[dm-devel] [PATCH 2/2] dm-snapshot: Reimplement the cow limit.
Mike Snitzer
snitzer at redhat.com
Fri Oct 4 02:44:41 UTC 2019
On Thu, Oct 03 2019 at 4:06P -0400,
Mikulas Patocka <mpatocka at redhat.com> wrote:
>
>
> On Wed, 2 Oct 2019, Nikos Tsironis wrote:
>
> > Hi Mikulas,
> >
> > I agree that it's better to avoid holding any locks while waiting for
> > some pending kcopyd jobs to finish, but please see the comments below.
> >
> > On 10/2/19 1:15 PM, Mikulas Patocka wrote:
> > > +
> > > +static bool wait_for_in_progress(struct dm_snapshot *s, bool unlock_origins)
> > > +{
> > > + if (unlikely(s->in_progress > cow_threshold)) {
> > > + spin_lock(&s->in_progress_wait.lock);
> > > + if (likely(s->in_progress > cow_threshold)) {
> > > + DECLARE_WAITQUEUE(wait, current);
> > > + __add_wait_queue(&s->in_progress_wait, &wait);
> > > + __set_current_state(TASK_UNINTERRUPTIBLE);
> > > + spin_unlock(&s->in_progress_wait.lock);
> > > + if (unlock_origins)
> > > + up_read(&_origins_lock);
> > > + io_schedule();
> > > + remove_wait_queue(&s->in_progress_wait, &wait);
> > > + return false;
> > > + }
> > > + spin_unlock(&s->in_progress_wait.lock);
> > > + }
> > > + return true;
> > > }
> >
> > wait_for_in_progress() doesn't take into account which chunk is written
> > and whether it has already been reallocated or it is currently
> > reallocating.
> >
> > This means, if I am not missing something, that both origin_map() and
> > snapshot_map() might unnecessarily throttle writes that don't need any
> > COW to take place.
>
> I know about it, but I think it's not serious problem - if there are 2048
> outstanding I/Os the system is already heavily congested. It doesn't
> matter if you allow a few more writes or not.
>
> Mikulas
>
> > For example, if we have some writes coming in, that trigger COW and
> > cause the COW limit to be reached, and then some more writes come in for
> > chunks that have already been reallocated (and before any kcopyd job
> > finishes and decrements s->in_progress), the latter writes would be
> > delayed without a reason, as they don't require any COW to be performed.
> >
> > It seems strange that the COW throttling mechanism might also throttle
> > writes that don't require any COW.
> >
> > Moreover, if we have reached the COW limit and we get a write for a
> > chunk that is currently reallocating we will block the thread, when we
> > could just add the bio to the origin_bios list of the pending exception
> > and move on.
> >
> > wait_for_in_progress() could check if the exception is already
> > reallocated or is being reallocated, but the extra locking in the
> > critical path might have an adverse effect on performance, especially in
> > multi-threaded workloads. Maybe some benchmarks would help clarify that.
> >
> > As a final note, in case the devices are slow, there might be many
> > writers waiting in s->in_progress_wait. When a kcopyd job finishes, all
> > of them will wake up and in some cases we might end up issuing more COW
> > jobs than the cow_count limit, as the accounting for new COW jobs
> > doesn't happen atomically with the check for the cow_count limit in
> > wait_for_in_progress().
> >
> > That said, I don't think having a few more COW jobs in flight, than the
> > defined limit, will cause any issues.
Nikos,
I looked at your concern even before Mikulas replied and found it to be
valid. But in the end I struggled to imagine how imposing extra
throttling once above the thorttle threshold would significantly impact
performance.
So when I saw Mikulas' reply he definitely reinforced my thinking. But
please feel free to explore whether further refinement is needed. I
think your concern about extra locking in the hotpath (to check if a
chunk already triggered an exception) was a great observation but if
such a check is done I'm hopeful it won't be _that_ costly because we'll
have already reached the cow threshold and would already be taking the
lock (as the 2nd phase of the double checked locking).
Anyway, I folded these small tweaks into Mikulas' 2nd patch:
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 560b8cb38026..4fb1a40e68a0 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1525,7 +1525,8 @@ static void account_end_copy(struct dm_snapshot *s)
spin_lock(&s->in_progress_wait.lock);
BUG_ON(!s->in_progress);
s->in_progress--;
- if (likely(s->in_progress <= cow_threshold) && unlikely(waitqueue_active(&s->in_progress_wait)))
+ if (likely(s->in_progress <= cow_threshold) &&
+ unlikely(waitqueue_active(&s->in_progress_wait)))
wake_up_locked(&s->in_progress_wait);
spin_unlock(&s->in_progress_wait.lock);
}
@@ -1535,6 +1536,13 @@ static bool wait_for_in_progress(struct dm_snapshot *s, bool unlock_origins)
if (unlikely(s->in_progress > cow_threshold)) {
spin_lock(&s->in_progress_wait.lock);
if (likely(s->in_progress > cow_threshold)) {
+ /*
+ * NOTE: this throttle doesn't account for whether
+ * the caller is servicing an IO that will trigger a COW
+ * so excess throttling may result for chunks not required
+ * to be COW'd. But if cow_threshold was reached, extra
+ * throttling is unlikely to negatively impact performance.
+ */
DECLARE_WAITQUEUE(wait, current);
__add_wait_queue(&s->in_progress_wait, &wait);
__set_current_state(TASK_UNINTERRUPTIBLE);
@@ -1955,7 +1963,8 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_KILL;
if (bio_data_dir(bio) == WRITE) {
- while (unlikely(!wait_for_in_progress(s, false))) ;
+ while (unlikely(!wait_for_in_progress(s, false)))
+ ; /* wait_for_in_progress() has slept */
}
down_read(&s->lock);
@@ -2538,8 +2547,8 @@ static int do_origin(struct dm_dev *origin, struct bio *bio, bool limit)
down_read(&_origins_lock);
o = __lookup_origin(origin->bdev);
if (o) {
- struct dm_snapshot *s;
if (limit) {
+ struct dm_snapshot *s;
list_for_each_entry(s, &o->snapshots, list)
if (unlikely(!wait_for_in_progress(s, true)))
goto again;
and I've pushed the changes to linux-next via linux-dm.git's for-next
(with tweaked commit headers). But if you or Mikulas find something
that would warrant destaging these changes I'd welcome that feedback.
Thanks,
Mike
More information about the dm-devel
mailing list