[Platformone] [EXT] Re: Unified Platform Pod Deploy Errors

Kendall, Russell C Russell.Kendall at mantech.com
Fri Dec 6 18:20:58 UTC 2019


Nine tainted pods. Running apps seem to be okay, where they happened to be running at time the taint flood occurred. This will block IATT efforts, since we can not deploy our apps once we have remediated the vulnerabilities and to confirm remediation with TL and Anchore (there is not local scanning capability).
V/R,
Russell C Kendall
________________________________
From: Jonathan Rickard <jrickard at redhat.com>
Sent: Friday, December 6, 2019 12:13:29 PM
To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP
Cc: Kendall, Russell C; platformONE at redhat.com; Miller, Timothy J.; Keegan Reap; Bubb, Mike; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; Jonathan Rickard
Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors

Also, is every application having problems or a specific?


Jonathan Rickard, RHCE, RHCA

Consulting Architect

Red Hat Public Sector<https://www.redhat.com/>

jonny at redhat.com<mailto:jonny at redhat.com>
M: 210.862.9739<tel:210.862.9739>

@redhatjobs<https://twitter.com/redhatjobs>   redhatjobs<https://www.facebook.com/redhatjobs> @redhatjobs<https://instagram.com/redhatjobs>
[https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png]<https://www.redhat.com/>


On Fri, Dec 6, 2019 at 12:06 PM Jonathan Rickard <jrickard at redhat.com<mailto:jrickard at redhat.com>> wrote:
Ade,

What does that mean? You can't login, you can't deploy?


Jonathan Rickard, RHCE, RHCA

Consulting Architect

Red Hat Public Sector<https://www.redhat.com/>

jonny at redhat.com<mailto:jonny at redhat.com>
M: 210.862.9739<tel:210.862.9739>

@redhatjobs<https://twitter.com/redhatjobs>   redhatjobs<https://www.facebook.com/redhatjobs> @redhatjobs<https://instagram.com/redhatjobs>
[https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png]<https://www.redhat.com/>


On Fri, Dec 6, 2019 at 12:02 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <ademola.abodunrin at us.af.mil<mailto:ademola.abodunrin at us.af.mil>> wrote:
ALCON,

The cluster is down again. Please assist.

Most Sincerely,

Ade Abodunrin, GG-12, USAF
Product Owner (Cybertron & Ginyu Force), Unified Platform

[cid:image001.png at 01D4F814.4AA552D0]
LevelUP Code Works
Commercial: (210) 890-2113
NIPR email: ademola.abodunrin at us.af.mil<mailto:ademola.abodunrin at us.af.mil>

From: Kendall, Russell C <Russell.Kendall at ManTech.com>
Sent: Thursday, December 5, 2019 9:55 AM
To: Jonathan Rickard <jrickard at redhat.com<mailto:jrickard at redhat.com>>
Cc: Miller, Timothy J. <tmiller at mitre.org<mailto:tmiller at mitre.org>>; Keegan Reap <kreap at redhat.com<mailto:kreap at redhat.com>>; Bubb, Mike <mbubb at mitre.org<mailto:mbubb at mitre.org>>; platformONE at redhat.com<mailto:platformONE at redhat.com>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP <jose.ramirez.50.ctr at us.af.mil<mailto:jose.ramirez.50.ctr at us.af.mil>>; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <ademola.abodunrin at us.af.mil<mailto:ademola.abodunrin at us.af.mil>>; Jonathan Rickard <jonny at redhat.com<mailto:jonny at redhat.com>>
Subject: [Non-DoD Source] Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors


Jonny,

I'll see you Friday at 500 Nav. Travel safe.



V/R,

Russell C Kendall​



________________________________
From: Jonathan Rickard <jrickard at redhat.com<mailto:jrickard at redhat.com>>
Sent: Wednesday, December 4, 2019 5:29 PM
To: Kendall, Russell C
Cc: Miller, Timothy J.; Keegan Reap; Bubb, Mike; platformONE at redhat.com<mailto:platformONE at redhat.com>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard
Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors

Russell,

I have definitely been terrible with email lately and I apologize for the slow response times. I get back to San Antonio tomorrow but I have a pretty full afternoon. I can stop by Friday if you'd like.

Thanks,
jonny


Jonathan Rickard, RHCA

Principal Consultant, NAPS

Red Hat Remote - Texas<https://www.redhat.com/>

jonny at redhat.com<mailto:jonny at redhat.com>
M: 210-862-9739<tel:210-862-9739>
[https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png]<https://www.redhat.com/>



On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C <Russell.Kendall at mantech.com<mailto:Russell.Kendall at mantech.com>> wrote:
Jonny,
I'd like to suggest you come to 500 to wrap this up, since it seems there are significant delays in communication that are contributing to downtime.
V/R,
Russell C Kendall
________________________________________
From: Miller, Timothy J. <tmiller at mitre.org<mailto:tmiller at mitre.org>>
Sent: Wednesday, December 4, 2019 7:02 AM
To: Jonathan Rickard; Keegan Reap
Cc: Bubb, Mike; platformONE at redhat.com<mailto:platformONE at redhat.com>; Kendall, Russell C; RAMIREZ,    JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN,    ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard
Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors

Johnny--

Update the issue, if you would be so kind.

-- T

On 12/3/19, 18:00, "platformone-bounces at redhat.com<mailto:platformone-bounces at redhat.com> on behalf of Jonathan Rickard" <platformone-bounces at redhat.com<mailto:platformone-bounces at redhat.com> on behalf of jrickard at redhat.com<mailto:jrickard at redhat.com>> wrote:

    Hey Guys - Sorry for taking so long - this has been completed. Please run your builds and let us know if you're having any problems.
    jonny
    Jonathan Rickard, RHCA
    Principal Consultant, NAPS
    Red Hat Remote - Texas <https://www.redhat.com/>

    jonny at redhat.com<mailto:jonny at redhat.com>
    M: 210-862-9739 <tel:210-862-9739>
     <https://www.redhat.com/>












    On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard <jrickard at redhat.com<mailto:jrickard at redhat.com>> wrote:


    Russell / Team,


    We believe we've identified the issue with your application deploying. In order to rectify the issue I need to evacuate pods so you will probably see some hiccups while deploying. I will update when this is resolved.


    Thanks,
    jonny

    Jonathan Rickard, RHCA
    Principal Consultant, NAPS
    Red Hat Remote - Texas <https://www.redhat.com/>

    jonny at redhat.com<mailto:jonny at redhat.com>
    M: 210-862-9739 <tel:210-862-9739>
     <https://www.redhat.com/>












    On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap <kreap at redhat.com<mailto:kreap at redhat.com>> wrote:


    Hey all, we have opened an issue below, that we believe to be the cause, we are currently investigating:


    https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32



    On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard <jrickard at redhat.com<mailto:jrickard at redhat.com>> wrote:


    Russell,


    Getting more eyes on this @platformONE at redhat.com<mailto:platformONE at redhat.com> <mailto:platformONE at redhat.com<mailto:platformONE at redhat.com>>


    We'll keep you posted.
    jonny
    Jonathan Rickard, RHCA
    Principal Consultant, NAPS
    Red Hat Remote - Texas <https://www.redhat.com/>

    jonny at redhat.com<mailto:jonny at redhat.com>
    M: 210-862-9739 <tel:210-862-9739>
     <https://www.redhat.com/>












    On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C <Russell.Kendall at mantech.com<mailto:Russell.Kendall at mantech.com>> wrote:


    Kevin,

    Unfortunately we are receiving deployment errors again. This is the event:

    0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector.

    This is the deployment:

    https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup


    V/R,
    Russell C Kendall
    ________________________________________
    From: Miller, Timothy J. <tmiller at mitre.org<mailto:tmiller at mitre.org>>
    Sent: Monday, December 2, 2019 2:44:21 PM
    To: Kevin O'Donnell
    Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE
     A CTR USAF AFMC AFLCMC/HNCP
    Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors

    Tagged you on it.

    -- T

    On 12/2/19, 14:03, "Kevin O'Donnell" <kodonnel at redhat.com<mailto:kodonnel at redhat.com>> wrote:

        Hello,


        Autoscaling is on our future IAC roadmap.  Tim, the additional ticket would be appreciated.


        We have swapped out the app/worker instances with m5a.8xlarge 32 cores, 128gb of ram. Please let us know if you have any other issues.


        Thanks,

        KEVIN O'DONNELL
        ARCHITECT MANAGER
        Red Hat Red Hat NA Public Sector Consulting <https://www.redhat.com/>

        kodonnell at redhat.com<mailto:kodonnell at redhat.com> <mailto:kodonnell at redhat.com<mailto:kodonnell at redhat.com>%20M:240-605-4654> M: 240-605-4654
         <https://red.ht/sig>
















        On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. <tmiller at mitre.org<mailto:tmiller at mitre.org>> wrote:


        I'll open an issue.  IaC needs to have instance size as a host_var to facilitate scaling.

        -- T

        On 12/2/19, 13:15, "Kevin O'Donnell" <kodonnel at redhat.com<mailto:kodonnel at redhat.com>> wrote:

            Tim,


            Thanks for the information. We are undersized on the app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb of ram. From what I have read each Labs engagement operated on a 3 node worker cluster with each node having 6core's and 28gb
             of ram. We will need to swap out the existing instances with larger spec's.


            We are going to try to flush the existing workload out on one of the workers to see if we can swap them out one at a time.


            Thanks,

            KEVIN O'DONNELL
            ARCHITECT MANAGER
            Red Hat Red Hat NA Public Sector Consulting <https://www.redhat.com/>

            kodonnell at redhat.com<mailto:kodonnell at redhat.com> <mailto:kodonnell at redhat.com<mailto:kodonnell at redhat.com>%20M:240-605-4654> M: 240-605-4654
             <https://red.ht/sig>















            On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. <tmiller at mitre.org<mailto:tmiller at mitre.org>> wrote:


            Here's what I can see, given the perm limits I seem to be under:

            - NS:develop-misp-app and NS:lp-develop-misp-app both have several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned while trying to fetch something from somewhere (URL isn't recorded in the stack trace).

            - NS:minishift-misp-app has most of its pods/jobs stuck in ImagePullBackoff.  No detail there in the event stream so I'll see if I can dig deeper.

            - NS:aam-ci-cd has Jenkins trying to spin up three workers, those are coming back as unschedulable.

            I can't see into NS:aam-bases or NS:dsop-images b/c of perm limits.

            I see no DAS-related project(s).

            The MISP stuff needs debugging before calling "blocked" since it looks like an internal error from this perspective.



            In re: AAM Jenkins:  If this deployment is coming out of the OCP storefront, then maybe it should be ephemeral rather than persistent.  If it's a custom deployment, then it probably needs a rethink.

            I'm also not sure why there are two MISP dev projects.

            -- T



            On 12/2/19, 12:46, "Kevin O'Donnell" <kodonnel at redhat.com<mailto:kodonnel at redhat.com>> wrote:

                Russell,


                Thank you for the information. We can switch out the instance type for the worker nodes. How much memory is required by the apps?



                Thanks,

                KEVIN O'DONNELL
                ARCHITECT MANAGER
                Red Hat Red Hat NA Public Sector Consulting <https://www.redhat.com/>

                kodonnell at redhat.com<mailto:kodonnell at redhat.com> <mailto:kodonnell at redhat.com<mailto:kodonnell at redhat.com>%20M:240-605-4654> M: 240-605-4654
                 <https://red.ht/sig>















                On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C <Russell.Kendall at mantech.com<mailto:Russell.Kendall at mantech.com>> wrote:


                Kevin,
                The lack of resources on
                u-p.io<http://u-p.io> <http://u-p.io> <http://u-p.io> <http://u-p.io> <http://u-p.io>
     cluster is hindering development,
         testing, and integration of the apps from CCAT AAM DAS, which is putting one
             of our PI goals at risk.


                We are blocked by the fact that we (CCAT and AAM) cannot deploy additional pods to the

    unified-platform.io<http://unified-platform.io> <http://unified-platform.io> <http://unified-platform.io> <http://unified-platform.io> <http://unified-platform.io>
         cluster. We have a subset of containers deployed, but rolling
             deployments and new deployments fail. This means that we are not able to execute integration testing or peer reviews.
                 We are temporarily working around by NOT testing/reviewing our code changes live, something that no one likes. Also, we are now running weeks-old instances of our containers, so we are very likely producing some technical debt. We currently have
     developers
                 approaching idle or doing non-priority work until the resource issue is resolved.



                Here is the particular error from the OSP cluster I received while attempting a redeploy of one of our apps.



                0/9 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node selector.11 times in the last minute

                Since we do not have any cluster permissions, I cannot verify which resource is running out, but from experience, I assess it is a memory issue.



                It appears the cluster has been provisioned with a silly allocation of node types. Without knowing exactly what was deployed, it appears only 3 of the 9 hosts are suitable worker nodes. We would expect the cluster to respond to resource limitations
         and
             scale,
                 but if a scheduled downtime is required, please work with us so we can anticipate. As it stands, the cluster does not support resources required by CCAT and the other dev teams (AAM, DAS, etc.). We would accept any downtime if it will improve the
     situation,
                 as we are blocked from progressing under the current constraints. My hope was we could get the cluster redeployed over the TG holiday to eliminate developer impact, but as Mark pointed out, there were limited support folks available. Now I am just
         trying
             to
                 minimize the losses.



                V/R,

                Russell C Kendall





                ________________________________________
                From: Kevin O'Donnell <kodonnel at redhat.com<mailto:kodonnel at redhat.com>>
                Sent: Monday, December 2, 2019 11:52 AM
                To: Kendall, Russell C
                Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org<mailto:mbubb at mitre.org>);
     DIROCCO,
                 ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP
                Subject: Re: Unified Platform Pod Deploy Errors

                Hello Russell,


                Can you elaborate on the term Blocked? What specific issues are the blockers?



                Thanks,

                KEVIN O'DONNELL
                ARCHITECT MANAGER
                Red Hat Red Hat NA Public Sector Consulting <https://www.redhat.com/>

                kodonnell at redhat.com<mailto:kodonnell at redhat.com> <mailto:kodonnell at redhat.com<mailto:kodonnell at redhat.com>%20M:240-605-4654> M: 240-605-4654
                 <https://red.ht/sig>















                On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C <Russell.Kendall at mantech.com<mailto:Russell.Kendall at mantech.com>> wrote:


                Mark,

                Thank for acknowledging, please be aware the San Antonio dev teams working in


    unified-platform.io<http://unified-platform.io> <http://unified-platform.io> <http://unified-platform.io> <http://unified-platform.io> <http://unified-platform.io>
         are currently blocked.

                V/R,

                Russell C Kendall

                ________________________________________
                From: Mark Nissley <mnissley at redhat.com<mailto:mnissley at redhat.com>>
                Sent: Monday, December 2, 2019 9:36 AM
                To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein
                Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org<mailto:mbubb at mitre.org>); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy
         J.;
                 RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP
                Subject: Re: Unified Platform Pod Deploy Errors

                As noted, I don't suspect much got done on this over the holiday weekend. I did see the ticket, as dropped some details into it. I also assigned it to @Jonathan
                 Rickard <mailto:jonny at redhat.com<mailto:jonny at redhat.com>> and @Chris Kuperstein <mailto:ckuperst at redhat.com<mailto:ckuperst at redhat.com>> .



                It looks like short term solutions have been easy but the issue is recurring.




                Mark NISSLEY, PMP,
                 CSM, LEAN

                PROGRAM MaNAGER & SR technical Project Manager
                North American Consulting, Public Sector
                 <https://www.redhat.com/>
                M:
                850-530-3234
                 <https://www.redhat.com/>

                Scheduled Training: October 14-18


























                On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <ademola.abodunrin at us.af.mil<mailto:ademola.abodunrin at us.af.mil>> wrote:


                Mark/Kevin,


                I just heard at the team stand up that we are still blocked. This is also affecting the AAM team from my investigations.


                Please let me know if there is something we need to do to move this forward.

                Most Sincerely,


                Ade Abodunrin, GG-12, USAF
                Product Owner (Cybertron & Ginyu Force), Unified Platform



                LevelUP Code Works
                Commercial:
                 (210) 890-2113
                NIPR email:
                ademola.abodunrin at us.af.mil<mailto:ademola.abodunrin at us.af.mil>








                ________________________________________
                From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP
                Sent: Wednesday, November 27, 2019 12:58 PM
                To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP <austen.bryan.1 at us.af.mil<mailto:austen.bryan.1 at us.af.mil>>; Mark Nissley <mnissley at redhat.com<mailto:mnissley at redhat.com>>; Kevin O'Donnell
     <kodonnel at redhat.com<mailto:kodonnel at redhat.com>>;
                 Brenna Gordon <bgordon at redhat.com<mailto:bgordon at redhat.com>>
                Cc: Kendall, Russell C <Russell.Kendall at ManTech.com<mailto:Russell.Kendall at ManTech.com>>; Bubb, Mike (mbubb at mitre.org<mailto:mbubb at mitre.org>) <mbubb at mitre.org<mailto:mbubb at mitre.org>>; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP
                 <roger.dirocco.4 at us.af.mil<mailto:roger.dirocco.4 at us.af.mil>>; Miller, Timothy J. <tmiller at mitre.org<mailto:tmiller at mitre.org>>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP <jose.ramirez.50.ctr at us.af.mil<mailto:jose.ramirez.50.ctr at us.af.mil>>
                Subject: Re: Unified Platform Pod Deploy Errors

                Thanks a lot Capt Bryan! Russell created the ticket on GitLab UP Node Project.




                Most Sincerely,


                Ade Abodunrin, GG-12, USAF
                Product Owner (Cybertron & Ginyu Force), Unified Platform



                LevelUP Code Works
                Commercial:
                 (210) 890-2113
                NIPR email:
                ademola.abodunrin at us.af.mil<mailto:ademola.abodunrin at us.af.mil>








                ________________________________________
                From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP <austen.bryan.1 at us.af.mil<mailto:austen.bryan.1 at us.af.mil>>
                Sent: Wednesday, November 27, 2019 12:56 PM
                To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <ademola.abodunrin at us.af.mil<mailto:ademola.abodunrin at us.af.mil>>; Mark Nissley <mnissley at redhat.com<mailto:mnissley at redhat.com>>; Kevin
     O'Donnell
                 <kodonnel at redhat.com<mailto:kodonnel at redhat.com>>; Brenna Gordon <bgordon at redhat.com<mailto:bgordon at redhat.com>>
                Cc: Kendall, Russell C <Russell.Kendall at ManTech.com<mailto:Russell.Kendall at ManTech.com>>; Bubb, Mike (mbubb at mitre.org<mailto:mbubb at mitre.org>) <mbubb at mitre.org<mailto:mbubb at mitre.org>>; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP
                 <roger.dirocco.4 at us.af.mil<mailto:roger.dirocco.4 at us.af.mil>>; Miller, Timothy J. <tmiller at mitre.org<mailto:tmiller at mitre.org>>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP <jose.ramirez.50.ctr at us.af.mil<mailto:jose.ramirez.50.ctr at us.af.mil>>
                Subject: RE: Unified Platform Pod Deploy Errors

                Thanks Ade. The team is thin until next week due to the holidays but I will make sure it is addressed.  Were there any issues submitted to Gitlab’s UP Node Project on DCCSCR?

                @Mark/Kevin – can we address?

                -Austen

                From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <ademola.abodunrin at us.af.mil<mailto:ademola.abodunrin at us.af.mil>>

                Sent: Wednesday, November 27, 2019 9:51 AM
                To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP <austen.bryan.1 at us.af.mil<mailto:austen.bryan.1 at us.af.mil>>
                Cc: Kendall, Russell C <Russell.Kendall at ManTech.com<mailto:Russell.Kendall at ManTech.com>>; Bubb, Mike (mbubb at mitre.org<mailto:mbubb at mitre.org>) <mbubb at mitre.org<mailto:mbubb at mitre.org>>
                Subject: Fw: Unified Platform Pod Deploy Errors



                Capt Bryan,

                Please see the explanation on the issue that Ginyu Force is currently experiencing below.



                Most Sincerely,

                Ade Abodunrin, GG-12, USAF
                Product Owner (Cybertron & Ginyu Force), Unified Platform


                LevelUP Code Works
                Commercial:  (210) 890-2113
                NIPR email:
                ademola.abodunrin at us.af.mil<mailto:ademola.abodunrin at us.af.mil>





                ________________________________________

                From: Kendall, Russell C <Russell.Kendall at ManTech.com<mailto:Russell.Kendall at ManTech.com>>
                Sent: Wednesday, November 27, 2019 9:46 AM
                To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <ademola.abodunrin at us.af.mil<mailto:ademola.abodunrin at us.af.mil>>; Buffaloe,
                 Christopher <Christopher.Buffaloe at ManTech.com<mailto:Christopher.Buffaloe at ManTech.com>>; Molina, Toby <Toby.Molina at ManTech.com<mailto:Toby.Molina at ManTech.com>>;
                 Crace, Jared E <Jared.Crace at ManTech.com<mailto:Jared.Crace at ManTech.com>>; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP <mark.sanchez.8 at us.af.mil<mailto:mark.sanchez.8 at us.af.mil>>
                Cc:
                tmiller at mitre.org<mailto:tmiller at mitre.org> <mailto:tmiller at mitre.org<mailto:tmiller at mitre.org>> <tmiller at mitre.org<mailto:tmiller at mitre.org>>
                Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy Errors



                Gentlemen,

                The application development teams working in the new GovCloud OCP environment (unified-platform.io<http://unified-platform.io> <http://unified-platform.io> <http://unified-platform.io>
     <http://unified-platform.io>
         <http://unified-platform.io>)
                 are currently blocked in efforts to deploy new pods for testing, development, and UAT.

                Red Hat and RogueOne SMEs have been notified and have attempted some fixes starting on Monday 11/25, but at this point have not been able to provision resources
                 sufficient to host CCAT and AAM.

                We have taken steps to minimize our footprint (eliminating demonstration environment, deleting developer namespaces), but this is not a sustainable approach,
                 and has only resulted in moderate improvements in cluster performance.

                Our hope is the U-P.io cluster compute resources can be increased very soon, so that we may resume normal development activities. Our understanding is that
                 such a scaling requires a complete redeployment of the cluster, which is unusual, but an acceptable loss to productivity. If the cluster can be scaled up over the Thanksgiving holiday, the impact will be minimal to developers and cluster administrators,
             alike.

                We are currently collaborating on solutions on the following MatterMost channel behind the space camp VPN (link below), and via the email thread forwarded
                 (further below).



    https://chat.spacecamp.ninja/levelup/channels/unified-platform-node <https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> <https://chat.spacecamp.ninja/levelup/channels/unified-platform-node>
     <https://chat.spacecamp.ninja/levelup/channels/unified-platform-node>

                Please keep me posted on developments and I will coordinate developer activities with any scheduled platform outages.

                V/R,
                Russell C Kendall

                ________________________________________

                From: Curran, Daniel M
                Sent: Monday, November 25, 2019 2:47 PM
                To: Jonathan Rickard
                Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
                dlystra at redhat.com<mailto:dlystra at redhat.com> <mailto:dlystra at redhat.com<mailto:dlystra at redhat.com>>; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil
     Soliz;
         Buffaloe,
             Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J
                Subject: Re: Unified Platform Pod Deploy Errors



                Sounds great. Appreciate it.
                I'll watch email and Mattermost in case you need more from us.

                -Daniel

                ________________________________________

                From: Jonathan Rickard <jrickard at redhat.com<mailto:jrickard at redhat.com>>
                Sent: Monday, November 25, 2019 2:44 PM
                To: Curran, Daniel M
                Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
                dlystra at redhat.com<mailto:dlystra at redhat.com> <mailto:dlystra at redhat.com<mailto:dlystra at redhat.com>>; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil
     Soliz;
         Buffaloe,
             Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J
                Subject: Re: Unified Platform Pod Deploy Errors



                Thanks Daniel -



                I'll continue to look into the resource issue that you're seeing - I'd like to identify the root cause and then work with the team to come up with a solution.



                Jonathan Rickard,
                 RHCA
                Principal Consultant, NAPS
                Red
                 Hat Remote - Texas <https://www.redhat.com/>
                jonny at redhat.com<mailto:jonny at redhat.com>

                M: 210-862-9739 <tel:210-862-9739>
                 <https://www.redhat.com/>













                On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M <Daniel.Curran at mantech.com<mailto:Daniel.Curran at mantech.com>>
                 wrote:


                Yeah we hit the limit then had AAM kill some of their projects and then our pods got scheduled.
                We've hit the limit again though. Here's an example pod that cannot be scheduled



    https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth <https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth> <https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth>
     <https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth>
                They're seeing it when their jenkins slaves can't deploy but it's basically any pod after we hit some limit.

                -Daniel
                ________________________________________

                From: Jonathan Rickard <jrickard at redhat.com<mailto:jrickard at redhat.com>>
                Sent: Monday, November 25, 2019 1:26 PM
                To: Curran, Daniel M
                Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
                dlystra at redhat.com<mailto:dlystra at redhat.com> <mailto:dlystra at redhat.com<mailto:dlystra at redhat.com>>; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil
     Soliz;
         Buffaloe,
             Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J
                Subject: Re: Unified Platform Pod Deploy Errors



                Daniel,



                I can see that you have 3 mongo pods, 1 chatup and 1 upbot pod running ... is your app good to go?



                Looks like there was an issue with memory on 1 pod, then some node selector being mismatched - just what i could see in the events...






                Jonathan Rickard,
                 RHCA
                Principal Consultant, NAPS
                Red
                 Hat Remote - Texas <https://www.redhat.com/>
                jonny at redhat.com<mailto:jonny at redhat.com>

                M: 210-862-9739 <tel:210-862-9739>
                 <https://www.redhat.com/>













                On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M <Daniel.Curran at mantech.com<mailto:Daniel.Curran at mantech.com>>
                 wrote:


                Also, AAM was having similar issues. Looks like they had a lot of namespaces and scaling down the pods on their deployments didn't help but actually deleting the namespaces
                 did.
                We have pods scheduling now but I'm adding them and we'd still like to work through what resource limit we were hitting to avoid this in the future.

                -Daniel

                ________________________________________

                From: Curran, Daniel M
                Sent: Monday, November 25, 2019 12:25 PM
                To: Jonathan Rickard
                Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
                dlystra at redhat.com<mailto:dlystra at redhat.com> <mailto:dlystra at redhat.com<mailto:dlystra at redhat.com>>; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil
     Soliz;
         Buffaloe,
             Christopher; Torres, Alexander
                Subject: Re: Unified Platform Pod Deploy Errors



                Thanks, sir.
                Most important for us to get working is "ccat-demo" but it's also happening in "ccat-dev" and "ccat-ci-cd".

                -Daniel
                ________________________________________

                From: Jonathan Rickard <jrickard at redhat.com<mailto:jrickard at redhat.com>>
                Sent: Monday, November 25, 2019 12:22 PM
                To: Curran, Daniel M
                Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
                dlystra at redhat.com<mailto:dlystra at redhat.com> <mailto:dlystra at redhat.com<mailto:dlystra at redhat.com>>; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil
     Soliz;
         Buffaloe,
             Christopher; Torres, Alexander
                Subject: Re: Unified Platform Pod Deploy Errors



                What's the name of the project you're working in? I'm going to be back at my laptop in about 30 and will take a look when I get there.



                Is it just the Jenkins pods failing?







                On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M <Daniel.Curran at mantech.com<mailto:Daniel.Curran at mantech.com>>
                 wrote:


                Adding Dean and Alex.
                Also, sitting in mattermost if anyone needs to get online and chat for more information.

                -Daniel

                ________________________________________

                From: Curran, Daniel M
                Sent: Monday, November 25, 2019 12:07 PM
                To:
                jonny at redhat.com<mailto:jonny at redhat.com> <mailto:jonny at redhat.com<mailto:jonny at redhat.com>>;

                ckuperst at redhat.com<mailto:ckuperst at redhat.com> <mailto:ckuperst at redhat.com<mailto:ckuperst at redhat.com>>; Mark Nissley
                Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher
                Subject: Re: Unified Platform Pod Deploy Errors



                Adding Kupe and Mark.

                -Daniel
                ________________________________________

                From: Curran, Daniel M
                Sent: Monday, November 25, 2019 11:43 AM
                To:
                jonny at redhat.com<mailto:jonny at redhat.com> <mailto:jonny at redhat.com<mailto:jonny at redhat.com>>
                Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher
                Subject: Unified Platform Pod Deploy Errors



                Hey Jonny,

                We met briefly at SpaceCAMP a couple weeks ago when




    cluster.unified-platform.io<http://cluster.unified-platform.io> <http://cluster.unified-platform.io> <http://cluster.unified-platform.io> <http://cluster.unified-platform.io>
     <http://cluster.unified-platform.io>
         was stood up. We've been trying to deploy some apps today and so far today we're getting errors on most (if
             not all) of our pods.

                0/9 nodes are available: 3 Insufficient pods, 6 node(s) didn't match node selector.

                Is what we're seeing. We were thinking it was some volume types weren't correct but some of our pods don't even have volumes attached and still give us this error (i.e. Jenkins
                 slaves or web frontends without persistent storage).
                Any idea what this could be? We're not running out of space on the nodes themselves are we?
                We have a demo scheduled for tomorrow at 9:30 AM CST and are hoping to get a demo env up for them today but this error came up unexpectedly. Also, we're here at 500 Navarro
                 St. in San Antonio working through this in person is better/easier.

                Thanks,
                Daniel Curran





                ________________________________________


                This e-mail and any attachments are intended only for the use of the addressee(s) named herein and may contain proprietary information. If you are not the intended recipient of this e-mail or believe that you received this email in error, please
     take
         immediate
                 action to notify the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments from your computer; and do not disseminate, distribute, use, or copy this message and any attachments.
































































    _______________________________________________
    platformONE mailing list
    platformONE at redhat.com<mailto:platformONE at redhat.com>
    https://www.redhat.com/mailman/listinfo/platformone










-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/platformone/attachments/20191206/39e6a882/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 2127 bytes
Desc: image001.png
URL: <http://listman.redhat.com/archives/platformone/attachments/20191206/39e6a882/attachment.png>


More information about the platformONE mailing list