[rdo-list] Updates to RDO slaves and jobs in ci.centos.org

David Moreau Simard dms at redhat.com
Fri Apr 21 12:40:41 UTC 2017


Yeah, jobs are in the ballpark of between 35 and 45 minutes.
What's great, too, is that the hardware is homogenous so we can always
expect reliable and consistent results.

I'm going to repeat myself but this will also alleviate the load on
duffy and help TripleO-based jobs hit the rate limiting less often (if
at all).

David Moreau Simard
Senior Software Engineer | Openstack RDO

dmsimard = [irc, github, twitter]


On Fri, Apr 21, 2017 at 8:08 AM, Alfredo Moralejo Alonso
<amoralej at redhat.com> wrote:
> Yeah, i was only taking into account time running run_test.sh, which
> shouldn't be impacted by slowness in rdo-ci-slave1. This are my
> findings for job weirdo-master-promote-puppet-openstack-scenario002:
>
> RDO-Cloud: 39mins
> n30.dusty: 33mins
> n13.pufty: 60mins
> n54.cursty: 58mins
>
> I think it's pretty good.
>
>
>
> On Fri, Apr 21, 2017 at 1:55 PM, David Moreau Simard <dms at redhat.com> wrote:
>> The performance is not great because of "rdo-ci-slave01" from which Ansible
>> runs on.
>>
>> We all know that node has performance problems (especially i/o).
>> For example, a promote job [1] will take 1 hour and 4 minutes while the
>> equivalent generic job [2] (ran on a cloudslave) will finish in about 35
>> minutes.
>>
>> I mean, it takes rdo-ci-slave01 more than five (5!) minutes to just
>> bootstrap the job (clone weirdo, virtualenv with ara, ansible, shade and
>> initialize ara).
>> The same thing takes less than 30 seconds on a cloudslave.
>>
>> [1]:
>> https://ci.centos.org/job/weirdo-master-promote-packstack-scenario001/1080/
>> [2]:
>> https://ci.centos.org/view/rdo/view/weirdo/job/weirdo-generic-packstack-scenario001/515/
>>
>> David Moreau Simard
>> Senior Software Engineer | Openstack RDO
>>
>> dmsimard = [irc, github, twitter]
>>
>> On Apr 21, 2017 4:22 AM, "Alfredo Moralejo Alonso" <amoralej at redhat.com>
>> wrote:
>>>
>>> On Fri, Apr 21, 2017 at 2:40 AM, David Moreau Simard <dms at redhat.com>
>>> wrote:
>>> > WeIRDO jobs were tested manually on the rdo-ci-slave01 (promote slave)
>>> > on which the jobs would not run successfully yesterday.
>>> >
>>> > Everything now looks good after untangling the update issue from
>>> > yesterday and WeIRDO promote jobs have been switched to rdo-cloud.
>>> >
>>>
>>> Nice!, I've seen weirdo jobs in
>>>
>>> https://ci.centos.org/view/rdo/view/promotion-pipeline/job/rdo_trunk-promote-master-current-tripleo/44/
>>> ran in RDO Cloud with pretty good performance, they seems to run
>>> slower than jobs running in dusty servers in ci.centos but faster that
>>> the rest of servers.
>>>
>>> I'll keep an eye on it too to find out if there is any abnormal behavior.
>>>
>>>
>>> > I'll be monitoring this closely but let me know if you see any problems.
>>> >
>>> > David Moreau Simard
>>> > Senior Software Engineer | Openstack RDO
>>> >
>>> > dmsimard = [irc, github, twitter]
>>> >
>>> >
>>> > On Thu, Apr 20, 2017 at 12:26 AM, David Moreau Simard <dms at redhat.com>
>>> > wrote:
>>> >> Hi,
>>> >>
>>> >> There's been a few updates worth mentioning and explaining to a wider
>>> >> audience as far as RDO is concerned on the ci.centos.org environment.
>>> >>
>>> >> First, please note that all packages on the five RDO slaves have been
>>> >> updated to the latest version.
>>> >> We had not yet updated to 7.3.
>>> >>
>>> >> The rdo-ci-slave01 node (the "promotion" slave) ran into some issues
>>> >> that took some time to fix, EPEL was enabled and it picked up python
>>> >> packages it shouldn't have.
>>> >> Things seem to be back in order now but some jobs might have failed in
>>> >> a weird way, triggering them again should be fine.
>>> >>
>>> >> Otherwise, all generic WeIRDO jobs are now running on OpenStack
>>> >> virtual machines provided by the RDO Cloud.
>>> >> This is provided by using the "rdo-virtualized" slave tags.
>>> >> The "rdo-promote-virtualized" tag will be used for the weirdo promote
>>> >> jobs once we're sure there's no more issues running them on the
>>> >> promotion slave.
>>> >>
>>> >> These tags are designed to work with WeIRDO jobs only for the time
>>> >> being, please contact me if you'd like to run virtualized workloads
>>> >> from ci.centos.org.
>>> >>
>>> >> This amounts to around 35 less jobs per day running on Duffy
>>> >> ci.centos.org hardware in total on a typical day (including generic
>>> >> weirdo jobs and promote weirdo jobs).
>>> >>
>>> >> I've re-shuffled the capacity around a bit, considering we've now
>>> >> freed significant capacity for bare-metal based TripleO jobs.
>>> >> The slave threads are now as follows:
>>> >> - rdo-ci-slave01: 12 threads (up from 11), tagged with "rdo-promote"
>>> >> and "rdo-promote-virtualized"
>>> >> - rdo-ci-cloudslave01: 6 threads (up from 4), tagged with "rdo"
>>> >> - rdo-ci-cloudslave02: 6 threads (up from 4), tagged with "rdo"
>>> >> - rdo-ci-cloudslave03: 8 threads (up from 4), tagged with
>>> >> "rdo-virtualized"
>>> >> - rdo-ci-cloudslave04: 8 threads (down from 15), tagged with
>>> >> "rdo-virtualized"
>>> >>
>>> >> There is a specific reason why cloudslave03 and cloudslave04 amount to
>>> >> 16 threads between the two, it is to match the quota we have been
>>> >> given in terms of capacity at RDO cloud.
>>> >> The threads will be used to artificially limit the amount of jobs run
>>> >> against the cloud concurrently without needing to implement queueing
>>> >> on our end.
>>> >>
>>> >> You'll otherwise notice the net effect for the "rdo" and "rdo-promote"
>>> >> tag isn't much, at least for the time being, it's very much the same
>>> >> since I've re-allocated cloudslave03 to load balance virtualized jobs.
>>> >> However, jobs are likely to be more reliable and faster now that they
>>> >> won't have to retry for nodes because we're less likely to hit
>>> >> rate-limiting.
>>> >>
>>> >> I'll monitor the situation over the next few days and bump the numbers
>>> >> if everything is looking good.
>>> >> That said, I'd like to hear about your feedback if you feel things are
>>> >> looking better and if we are running into "out of inventory" errors
>>> >> less often.
>>> >>
>>> >> Let me know if you have any questions,
>>> >>
>>> >> David Moreau Simard
>>> >> Senior Software Engineer | Openstack RDO
>>> >>
>>> >> dmsimard = [irc, github, twitter]




More information about the rdo-list mailing list