[Pulp-dev] Hanging tasks (reloaded)

Simon Baatz gmbnomis at gmail.com
Tue Jan 21 21:37:25 UTC 2020


I can reproduce the problem described by Matthias with pulp_cookbook's
tests almost instantly (I haven't seen a single complete run of the
functional tests).

Box is pulp3-source-fedora31, redis-py 3.3.1, redis 5.0.7.
I don't see these hangs anymore after downgrading redis-py to 3.1.0.

Vagrant's VM provider is libvirt with an additional
`domain.volume_cache = 'unsafe'` setting.  Write I/O is probably
extremely fast in this setup (VM can't force flush to disk)


On Tue, Jan 21, 2020 at 12:22:33PM -0500, Daniel Alley wrote:
>    4 runs on Fedora 31 has not triggered it either. This is using the
>    latest version of redis-py, 3.3.11, and redis 5.0.7.
> 
>    On Tue, Jan 21, 2020 at 11:16 AM Daniel Alley <[1]dalley at redhat.com>
>    wrote:
> 
>    So far I've run it 4 times back to back on a CentOS 7 box and not had
>    any lockups.  I'll try Fedora
> 
>    On Tue, Jan 21, 2020 at 9:36 AM Matthias Dellweg <[2]dellweg at atix.de>
>    wrote:
> 
>      @Daniel
>      It is happening with fresh installations (pulplift with libvirt) on
>      both boxes i have tried (pulp3-source-debian10 and
>      pulp3-source-fedora30). To test, i run `prestart; django-admin test
>      pulpcore` multiple times until it hangs (though i have seen it hang
>      on
>      the first try). There is no modification that i know of. Also, up
>      until
>      now, i seem to be able to reproduce reliably, while downgrading
>      (pinning) redis-py completely solves the issue.
>      On Tue, 21 Jan 2020 09:16:43 -0500
>      Daniel Alley <[3]dalley at redhat.com> wrote:
>      > @Matthias Dellweg <[4]dellweg at atix.de>
>      > did you restart Pulp after upgrading redis-py?  Are you seeing
>      this on
>      > fresh boxes, or does it require some modification to reproduce?  I
>      > don't think any of us have experienced this thus far.
>      >
>      > The only thing I can think of is that maybe when the worker
>      process
>      > forks it ends up using a different version of redis-py than the
>      > parent worker.
>      >
>      > On Tue, Jan 21, 2020 at 4:49 AM Matthias Dellweg
>      <[5]dellweg at atix.de>
>      > wrote:
>      >
>      > > [@ Brian:
>      > >
>      > >
>      [6]https://github.com/pulp/pulpcore/commit/e36e7b5f0eccc176a6e6298df
>      29293b014f4710c
>      > > ]
>      > >
>      > > Hi Daniel,
>      > > thank you for looking into this.
>      > > What i am seeing is:
>      > >
>      > > Jan 21 09:23:10 [7]pulp3-source-fedora30.anubis.example.com
>      > > gunicorn[23274]: 127.0.0.1 - admin [21/Jan/2020:09:23:10 +0000]
>      > > "PATCH
>      > >
>      /pulp/api/v3/repositories/file/file/3a31ed13-585d-4a36-8398-7df40560
>      ffa4/
>      > > HTTP/1.1" 202 67 "-" "python-requests/2.22.0" Jan 21 09:23:10
>      > > [8]pulp3-source-fedora30.anubis.example.com rq[23269]: pulp:
>      > > rq.worker:INFO:
>      [9]23269 at pulp3-source-fedora30.anubis.example.com:
>      > > 3a776d4d-ff0c-4b44-afae-08dc7c6cc415 Jan 21 09:23:10
>      > > [10]pulp3-source-fedora30.anubis.example.com rq[23270]: pulp:
>      > > rq.worker:INFO: resource-manager: Job OK
>      > > (aa22aed0-0363-45ae-8254-4ab14893a4a1) Jan 21 09:23:10
>      > > [11]pulp3-source-fedora30.anubis.example.com rq[23269]: pulp:
>      > > rq.worker:ERROR: Worker
>      > > [12]rq:worker:23269 at pulp3-source-fedora30.anubis.example.com:
>      found an
>      > > unhandled exception, quitting... Jan 21 09:23:10
>      > > [13]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      Traceback (most
>      > > recent call last): Jan 21 09:23:10
>      > > [14]pulp3-source-fedora30.anubis.example.com rq[23269]:   File
>      > >
>      "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py",
>      > > line 782, in prepare_job_execution Jan 21 09:23:10
>      > > [15]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      > > pipeline.execute() Jan 21 09:23:10
>      > > [16]pulp3-source-fedora30.anubis.example.com rq[23269]:   File
>      > >
>      "/usr/local/lib/pulp/lib64/python3.7/site-packages/redis/client.py",
>      > > line 3707, in execute Jan 21 09:23:10
>      > > [17]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      > > self.reset() Jan 21 09:23:10
>      > > [18]pulp3-source-fedora30.anubis.example.com rq[23269]: File
>      > >
>      "/usr/local/lib/pulp/lib64/python3.7/site-packages/redis/client.py",
>      > > line 3476, in reset Jan 21 09:23:10
>      > > [19]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      > > self.connection_pool.release(self.connection) Jan 21 09:23:10
>      > > [20]pulp3-source-fedora30.anubis.example.com rq[23269]:   File
>      > >
>      "/usr/local/lib/pulp/lib64/python3.7/site-packages/redis/connection.
>      py",
>      > > line 1114, in release Jan 21 09:23:10
>      > > [21]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      > > self._in_use_connections.remove(connection) Jan 21 09:23:10
>      > > [22]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      KeyError:
>      > > Connection<host=localhost,port=6379,db=0> Jan 21 09:23:10
>      > > [23]pulp3-source-fedora30.anubis.example.com rq[23269]: During
>      handling
>      > > of the above exception, another exception occurred: Jan 21
>      09:23:10
>      > > [24]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      Traceback (most
>      > > recent call last): Jan 21 09:23:10
>      > > [25]pulp3-source-fedora30.anubis.example.com rq[23269]:   File
>      > >
>      "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py",
>      > > line 515, in work Jan 21 09:23:10
>      > > [26]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      > > self.execute_job(job, queue) Jan 21 09:23:10
>      > > [27]pulp3-source-fedora30.anubis.example.com rq[23269]:   File
>      > > "/home/vagrant/devel/pulpcore/pulpcore/tasking/worker.py", line
>      72,
>      > > in execute_job Jan 21 09:23:10
>      > > [28]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      > > super().execute_job(*args, **kwargs) Jan 21 09:23:10
>      > > [29]pulp3-source-fedora30.anubis.example.com rq[23269]:   File
>      > >
>      "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py",
>      > > line 727, in execute_job Jan 21 09:23:10
>      > > [30]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      > > self.fork_work_horse(job, queue) Jan 21 09:23:10
>      > > [31]pulp3-source-fedora30.anubis.example.com rq[23269]:   File
>      > >
>      "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py",
>      > > line 667, in fork_work_horse Jan 21 09:23:10
>      > > [32]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      > > self.main_work_horse(job, queue) Jan 21 09:23:10
>      > > [33]pulp3-source-fedora30.anubis.example.com rq[23269]:   File
>      > >
>      "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py",
>      > > line 744, in main_work_horse Jan 21 09:23:10
>      > > [34]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      raise e Jan
>      > > 21 09:23:10 [35]pulp3-source-fedora30.anubis.example.com
>      rq[23269]:
>      > > File
>      > >
>      "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py",
>      > > line 741, in main_work_horse Jan 21 09:23:10
>      > > [36]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      > > self.perform_job(job, queue) Jan 21 09:23:10
>      > > [37]pulp3-source-fedora30.anubis.example.com rq[23269]:   File
>      > > "/home/vagrant/devel/pulpcore/pulpcore/tasking/worker.py", line
>      > > 103, in perform_job Jan 21 09:23:10
>      > > [38]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      return
>      > > super().perform_job(job, queue) Jan 21 09:23:10
>      > > [39]pulp3-source-fedora30.anubis.example.com rq[23269]:   File
>      > >
>      "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py",
>      > > line 866, in perform_job Jan 21 09:23:10
>      > > [40]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      > > self.prepare_job_execution(job, heartbeat_ttl) Jan 21 09:23:10
>      > > [41]pulp3-source-fedora30.anubis.example.com rq[23269]:   File
>      > >
>      "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py",
>      > > line 782, in prepare_job_execution Jan 21 09:23:10
>      > > [42]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      > > pipeline.execute() Jan 21 09:23:10
>      > > [43]pulp3-source-fedora30.anubis.example.com rq[23269]:   File
>      > >
>      "/usr/local/lib/pulp/lib64/python3.7/site-packages/redis/client.py",
>      > > line 3445, in __exit__ Jan 21 09:23:10
>      > > [44]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      > > self.reset() Jan 21 09:23:10
>      > > [45]pulp3-source-fedora30.anubis.example.com rq[23269]: File
>      > >
>      "/usr/local/lib/pulp/lib64/python3.7/site-packages/redis/client.py",
>      > > line 3476, in reset Jan 21 09:23:10
>      > > [46]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      > > self.connection_pool.release(self.connection) Jan 21 09:23:10
>      > > [47]pulp3-source-fedora30.anubis.example.com rq[23269]:   File
>      > >
>      "/usr/local/lib/pulp/lib64/python3.7/site-packages/redis/connection.
>      py",
>      > > line 1114, in release Jan 21 09:23:10
>      > > [48]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      > > self._in_use_connections.remove(connection) Jan 21 09:23:10
>      > > [49]pulp3-source-fedora30.anubis.example.com rq[23269]:
>      KeyError:
>      > > Connection<host=localhost,port=6379,db=0> Jan 21 09:23:10
>      > > [50]pulp3-source-fedora30.anubis.example.com gunicorn[23274]:
>      127.0.0.1
>      > > - admin [21/Jan/2020:09:23:10 +0000] "GET
>      > > /pulp/api/v3/tasks/3a776d4d-ff0c-4b44-afae-08dc7c6cc415/
>      HTTP/1.1"
>      > > 200 477 "-" "python-requests/2.22.0"
>      > >
>      > > and that last line repeats forever.
>      > >
>      > > On Fri, 17 Jan 2020 09:09:01 -0500
>      > > Daniel Alley <[51]dalley at redhat.com> wrote:
>      > >
>      > > > Different issue perhaps?  Are you seeing anything in the logs
>      that
>      > > > looks like this? [52]https://github.com/rq/rq/issues/1044
>      > > >
>      > > > On Fri, Jan 17, 2020 at 9:06 AM Daniel Alley
>      <[53]dalley at redhat.com>
>      > > > wrote:
>      > > >
>      > > > > Strange, I'm pretty sure an issue like this is why we pinned
>      > > > > originally, but upstream said that that particular issue was
>      > > > > (supposedly) fixed.
>      > > > >
>      > > > >
>      > >
>      [54]https://github.com/andymccurdy/redis-py/issues/1136#issuecomment
>      -571168161
>      > >
>      > > > >
>      > > > > On Fri, Jan 17, 2020 at 4:15 AM Matthias Dellweg
>      > > > > <[55]dellweg at atix.de> wrote:
>      > > > >> Hello all,
>      > > > >> I believe i have  found a new incarnation of hanging tasks
>      > > > >> (tm). This time it is pulp3 and as hard to nail down as
>      ever.
>      > > > >> I think, it is introduced by
>      > > > >> e36e7b5f0eccc176a6e6298df29293b014f4710c. Where the
>      dependency
>      > > > >> on redis was dropped with the result that 3.3.smth instead
>      of
>      > > > >> 3.1.smth was installed.
>      > > > >>
>      > > > >> Before filing an issue, is there anyone out there to share
>      that
>      > > > >> experience?
>      > > > >>
>      > > > >> Also as a thought protocol of how to reproduce:
>      > > > >> I have seen tasks hanging in both "waiting" and "running"
>      state
>      > > > >> when using the command `prestart; django-admin test
>      pulp_deb`
>      > > > >> or `<...> test pulpcore`. All the tasks I have seen were
>      > > > >> `sync`, `general_create` or `general_delete` and looked
>      like
>      > > > >> they never started to do anything for real. To have
>      consistent
>      > > > >> results, i had the impression that i needed to rebuild the
>      > > > >> vagrant boxes for every bisecting step. Also updating the
>      > > > >> python-redis package on a box that worked, produced a
>      hanging
>      > > > >> task in the next run.
>      > > > >>
>      > > > >> Have a good day,
>      > > > >>   Matthias
> 
> References
> 
>    1. mailto:dalley at redhat.com
>    2. mailto:dellweg at atix.de
>    3. mailto:dalley at redhat.com
>    4. mailto:dellweg at atix.de
>    5. mailto:dellweg at atix.de
>    6. https://github.com/pulp/pulpcore/commit/e36e7b5f0eccc176a6e6298df29293b014f4710c
>    7. http://pulp3-source-fedora30.anubis.example.com/
>    8. http://pulp3-source-fedora30.anubis.example.com/
>    9. mailto:23269 at pulp3-source-fedora30.anubis.example.com
>   10. http://pulp3-source-fedora30.anubis.example.com/
>   11. http://pulp3-source-fedora30.anubis.example.com/
>   12. mailto:rq:worker:23269 at pulp3-source-fedora30.anubis.example.com
>   13. http://pulp3-source-fedora30.anubis.example.com/
>   14. http://pulp3-source-fedora30.anubis.example.com/
>   15. http://pulp3-source-fedora30.anubis.example.com/
>   16. http://pulp3-source-fedora30.anubis.example.com/
>   17. http://pulp3-source-fedora30.anubis.example.com/
>   18. http://pulp3-source-fedora30.anubis.example.com/
>   19. http://pulp3-source-fedora30.anubis.example.com/
>   20. http://pulp3-source-fedora30.anubis.example.com/
>   21. http://pulp3-source-fedora30.anubis.example.com/
>   22. http://pulp3-source-fedora30.anubis.example.com/
>   23. http://pulp3-source-fedora30.anubis.example.com/
>   24. http://pulp3-source-fedora30.anubis.example.com/
>   25. http://pulp3-source-fedora30.anubis.example.com/
>   26. http://pulp3-source-fedora30.anubis.example.com/
>   27. http://pulp3-source-fedora30.anubis.example.com/
>   28. http://pulp3-source-fedora30.anubis.example.com/
>   29. http://pulp3-source-fedora30.anubis.example.com/
>   30. http://pulp3-source-fedora30.anubis.example.com/
>   31. http://pulp3-source-fedora30.anubis.example.com/
>   32. http://pulp3-source-fedora30.anubis.example.com/
>   33. http://pulp3-source-fedora30.anubis.example.com/
>   34. http://pulp3-source-fedora30.anubis.example.com/
>   35. http://pulp3-source-fedora30.anubis.example.com/
>   36. http://pulp3-source-fedora30.anubis.example.com/
>   37. http://pulp3-source-fedora30.anubis.example.com/
>   38. http://pulp3-source-fedora30.anubis.example.com/
>   39. http://pulp3-source-fedora30.anubis.example.com/
>   40. http://pulp3-source-fedora30.anubis.example.com/
>   41. http://pulp3-source-fedora30.anubis.example.com/
>   42. http://pulp3-source-fedora30.anubis.example.com/
>   43. http://pulp3-source-fedora30.anubis.example.com/
>   44. http://pulp3-source-fedora30.anubis.example.com/
>   45. http://pulp3-source-fedora30.anubis.example.com/
>   46. http://pulp3-source-fedora30.anubis.example.com/
>   47. http://pulp3-source-fedora30.anubis.example.com/
>   48. http://pulp3-source-fedora30.anubis.example.com/
>   49. http://pulp3-source-fedora30.anubis.example.com/
>   50. http://pulp3-source-fedora30.anubis.example.com/
>   51. mailto:dalley at redhat.com
>   52. https://github.com/rq/rq/issues/1044
>   53. mailto:dalley at redhat.com
>   54. https://github.com/andymccurdy/redis-py/issues/1136#issuecomment-571168161
>   55. mailto:dellweg at atix.de

> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev


-- 
Simon Baatz <gmbnomis at gmail.com>





More information about the Pulp-dev mailing list