[Platformone] [EXT] Re: Unified Platform Pod Deploy Errors
Mark Nissley
mnissley at redhat.com
Fri Dec 6 18:26:08 UTC 2019
Issue created here: https://dccscr.dsop.io/ginyu-force/ginyu-force/issues/1
Mark NISSLEY, PMP, CSM, LEAN
PROGRAM MaNAGER & SR technical Project Manager
North American Consulting, Public Sector
<https://www.redhat.com/>
M: 850-530-3234
<https://www.redhat.com/>
*Scheduled Training: October 14-18*
On Fri, Dec 6, 2019 at 12:21 PM Kendall, Russell C <
Russell.Kendall at mantech.com> wrote:
> Nine tainted pods. Running apps seem to be okay, where they happened to be
> running at time the taint flood occurred. This will block IATT efforts,
> since we can not deploy our apps once we have remediated the
> vulnerabilities and to confirm remediation with TL and Anchore (there is
> not local scanning capability).
> V/R,
> Russell C Kendall
> ------------------------------
> *From:* Jonathan Rickard <jrickard at redhat.com>
> *Sent:* Friday, December 6, 2019 12:13:29 PM
> *To:* ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP
> *Cc:* Kendall, Russell C; platformONE at redhat.com; Miller, Timothy J.;
> Keegan Reap; Bubb, Mike; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP;
> Jonathan Rickard
> *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors
>
> Also, is every application having problems or a specific?
>
> Jonathan Rickard, RHCE, RHCA
>
> Consulting Architect
>
> Red Hat Public Sector <https://www.redhat.com/>
>
> jonny at redhat.com
> M: 210.862.9739
> @redhatjobs <https://twitter.com/redhatjobs> redhatjobs
> <https://www.facebook.com/redhatjobs> @redhatjobs
> <https://instagram.com/redhatjobs>
> <https://www.redhat.com/>
>
>
> On Fri, Dec 6, 2019 at 12:06 PM Jonathan Rickard <jrickard at redhat.com>
> wrote:
>
>> Ade,
>>
>> What does that mean? You can't login, you can't deploy?
>>
>> Jonathan Rickard, RHCE, RHCA
>>
>> Consulting Architect
>>
>> Red Hat Public Sector <https://www.redhat.com/>
>>
>> jonny at redhat.com
>> M: 210.862.9739
>> @redhatjobs <https://twitter.com/redhatjobs> redhatjobs
>> <https://www.facebook.com/redhatjobs> @redhatjobs
>> <https://instagram.com/redhatjobs>
>> <https://www.redhat.com/>
>>
>>
>> On Fri, Dec 6, 2019 at 12:02 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC
>> AFLCMC/HNCP <ademola.abodunrin at us.af.mil> wrote:
>>
>>> ALCON,
>>>
>>>
>>>
>>> The cluster is down again. Please assist.
>>>
>>>
>>>
>>> Most Sincerely,
>>>
>>>
>>>
>>> Ade Abodunrin, GG-12, USAF
>>>
>>> Product Owner (Cybertron & Ginyu Force), Unified Platform
>>>
>>>
>>>
>>> [image: cid:image001.png at 01D4F814.4AA552D0]
>>>
>>> LevelUP Code Works
>>>
>>> Commercial: (210) 890-2113
>>>
>>> NIPR email: *ademola.abodunrin at us.af.mil <ademola.abodunrin at us.af.mil>*
>>>
>>>
>>>
>>> *From:* Kendall, Russell C <Russell.Kendall at ManTech.com>
>>> *Sent:* Thursday, December 5, 2019 9:55 AM
>>> *To:* Jonathan Rickard <jrickard at redhat.com>
>>> *Cc:* Miller, Timothy J. <tmiller at mitre.org>; Keegan Reap <
>>> kreap at redhat.com>; Bubb, Mike <mbubb at mitre.org>; platformONE at redhat.com;
>>> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP <jose.ramirez.50.ctr at us.af.mil>;
>>> ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <
>>> ademola.abodunrin at us.af.mil>; Jonathan Rickard <jonny at redhat.com>
>>> *Subject:* [Non-DoD Source] Re: [Platformone] [EXT] Re: Unified
>>> Platform Pod Deploy Errors
>>>
>>>
>>>
>>> Jonny,
>>>
>>> I'll see you Friday at 500 Nav. Travel safe.
>>>
>>>
>>>
>>> V/R,
>>>
>>> Russell C Kendall
>>>
>>>
>>> ------------------------------
>>>
>>> *From:* Jonathan Rickard <jrickard at redhat.com>
>>> *Sent:* Wednesday, December 4, 2019 5:29 PM
>>> *To:* Kendall, Russell C
>>> *Cc:* Miller, Timothy J.; Keegan Reap; Bubb, Mike;
>>> platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP;
>>> ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard
>>> *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy
>>> Errors
>>>
>>>
>>>
>>> Russell,
>>>
>>>
>>>
>>> I have definitely been terrible with email lately and I apologize for
>>> the slow response times. I get back to San Antonio tomorrow but I have a
>>> pretty full afternoon. I can stop by Friday if you'd like.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> jonny
>>>
>>>
>>>
>>> *Jonathan Rickard**, RHCA*
>>>
>>> Principal Consultant, NAPS
>>>
>>> Red Hat Remote - Texas <https://www.redhat.com/>
>>>
>>> jonny at redhat.com
>>> M: 210-862-9739
>>>
>>> <https://www.redhat.com/>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C <
>>> Russell.Kendall at mantech.com> wrote:
>>>
>>> Jonny,
>>> I'd like to suggest you come to 500 to wrap this up, since it seems
>>> there are significant delays in communication that are contributing to
>>> downtime.
>>> V/R,
>>> Russell C Kendall
>>> ________________________________________
>>> From: Miller, Timothy J. <tmiller at mitre.org>
>>> Sent: Wednesday, December 4, 2019 7:02 AM
>>> To: Jonathan Rickard; Keegan Reap
>>> Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ,
>>> JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC
>>> AFLCMC/HNCP; Jonathan Rickard
>>> Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors
>>>
>>> Johnny--
>>>
>>> Update the issue, if you would be so kind.
>>>
>>> -- T
>>>
>>> On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of
>>> Jonathan Rickard" <platformone-bounces at redhat.com on behalf of
>>> jrickard at redhat.com> wrote:
>>>
>>> Hey Guys - Sorry for taking so long - this has been completed.
>>> Please run your builds and let us know if you're having any problems.
>>> jonny
>>> Jonathan Rickard, RHCA
>>> Principal Consultant, NAPS
>>> Red Hat Remote - Texas <https://www.redhat.com/>
>>>
>>> jonny at redhat.com
>>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>>> <https://www.redhat.com/>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard <jrickard at redhat.com>
>>> wrote:
>>>
>>>
>>> Russell / Team,
>>>
>>>
>>> We believe we've identified the issue with your application
>>> deploying. In order to rectify the issue I need to evacuate pods so you
>>> will probably see some hiccups while deploying. I will update when this is
>>> resolved.
>>>
>>>
>>> Thanks,
>>> jonny
>>>
>>> Jonathan Rickard, RHCA
>>> Principal Consultant, NAPS
>>> Red Hat Remote - Texas <https://www.redhat.com/>
>>>
>>> jonny at redhat.com
>>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>>> <https://www.redhat.com/>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap <kreap at redhat.com>
>>> wrote:
>>>
>>>
>>> Hey all, we have opened an issue below, that we believe to be the
>>> cause, we are currently investigating:
>>>
>>>
>>> https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32
>>>
>>>
>>>
>>> On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard <
>>> jrickard at redhat.com> wrote:
>>>
>>>
>>> Russell,
>>>
>>>
>>> Getting more eyes on this @platformONE at redhat.com <mailto:
>>> platformONE at redhat.com>
>>>
>>>
>>> We'll keep you posted.
>>> jonny
>>> Jonathan Rickard, RHCA
>>> Principal Consultant, NAPS
>>> Red Hat Remote - Texas <https://www.redhat.com/>
>>>
>>> jonny at redhat.com
>>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>>> <https://www.redhat.com/>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C <
>>> Russell.Kendall at mantech.com> wrote:
>>>
>>>
>>> Kevin,
>>>
>>> Unfortunately we are receiving deployment errors again. This is the
>>> event:
>>>
>>> 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had
>>> taints that the pod didn't tolerate, 6 node(s) didn't match node selector.
>>>
>>> This is the deployment:
>>>
>>>
>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup
>>>
>>>
>>> V/R,
>>> Russell C Kendall
>>> ________________________________________
>>> From: Miller, Timothy J. <tmiller at mitre.org>
>>> Sent: Monday, December 2, 2019 2:44:21 PM
>>> To: Kevin O'Donnell
>>> Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12
>>> USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R
>>> Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E
>>> GG-13 USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE
>>> A CTR USAF AFMC AFLCMC/HNCP
>>> Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors
>>>
>>> Tagged you on it.
>>>
>>> -- T
>>>
>>> On 12/2/19, 14:03, "Kevin O'Donnell" <kodonnel at redhat.com> wrote:
>>>
>>> Hello,
>>>
>>>
>>> Autoscaling is on our future IAC roadmap. Tim, the additional
>>> ticket would be appreciated.
>>>
>>>
>>> We have swapped out the app/worker instances with m5a.8xlarge 32
>>> cores, 128gb of ram. Please let us know if you have any other issues.
>>>
>>>
>>> Thanks,
>>>
>>> KEVIN O'DONNELL
>>> ARCHITECT MANAGER
>>> Red Hat Red Hat NA Public Sector Consulting <
>>> https://www.redhat.com/>
>>>
>>> kodonnell at redhat.com <mailto:kodonnell at redhat.com%20M:240-605-4654>
>>> M: 240-605-4654
>>> <https://red.ht/sig>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. <
>>> tmiller at mitre.org> wrote:
>>>
>>>
>>> I'll open an issue. IaC needs to have instance size as a
>>> host_var to facilitate scaling.
>>>
>>> -- T
>>>
>>> On 12/2/19, 13:15, "Kevin O'Donnell" <kodonnel at redhat.com>
>>> wrote:
>>>
>>> Tim,
>>>
>>>
>>> Thanks for the information. We are undersized on the
>>> app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb
>>> of ram. From what I have read each Labs engagement operated on a 3 node
>>> worker cluster with each node having 6core's and 28gb
>>> of ram. We will need to swap out the existing instances
>>> with larger spec's.
>>>
>>>
>>> We are going to try to flush the existing workload out on
>>> one of the workers to see if we can swap them out one at a time.
>>>
>>>
>>> Thanks,
>>>
>>> KEVIN O'DONNELL
>>> ARCHITECT MANAGER
>>> Red Hat Red Hat NA Public Sector Consulting <
>>> https://www.redhat.com/>
>>>
>>> kodonnell at redhat.com <mailto:kodonnell at redhat.com%20M:240-605-4654>
>>> M: 240-605-4654
>>> <https://red.ht/sig>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. <
>>> tmiller at mitre.org> wrote:
>>>
>>>
>>> Here's what I can see, given the perm limits I seem to be
>>> under:
>>>
>>> - NS:develop-misp-app and NS:lp-develop-misp-app both have
>>> several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned
>>> while trying to fetch something from somewhere (URL isn't recorded in the
>>> stack trace).
>>>
>>> - NS:minishift-misp-app has most of its pods/jobs stuck in
>>> ImagePullBackoff. No detail there in the event stream so I'll see if I can
>>> dig deeper.
>>>
>>> - NS:aam-ci-cd has Jenkins trying to spin up three workers,
>>> those are coming back as unschedulable.
>>>
>>> I can't see into NS:aam-bases or NS:dsop-images b/c of perm
>>> limits.
>>>
>>> I see no DAS-related project(s).
>>>
>>> The MISP stuff needs debugging before calling "blocked"
>>> since it looks like an internal error from this perspective.
>>>
>>>
>>>
>>> In re: AAM Jenkins: If this deployment is coming out of the
>>> OCP storefront, then maybe it should be ephemeral rather than persistent.
>>> If it's a custom deployment, then it probably needs a rethink.
>>>
>>> I'm also not sure why there are two MISP dev projects.
>>>
>>> -- T
>>>
>>>
>>>
>>> On 12/2/19, 12:46, "Kevin O'Donnell" <kodonnel at redhat.com>
>>> wrote:
>>>
>>> Russell,
>>>
>>>
>>> Thank you for the information. We can switch out the
>>> instance type for the worker nodes. How much memory is required by the apps?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> KEVIN O'DONNELL
>>> ARCHITECT MANAGER
>>> Red Hat Red Hat NA Public Sector Consulting <
>>> https://www.redhat.com/>
>>>
>>> kodonnell at redhat.com <mailto:kodonnell at redhat.com%20M:240-605-4654>
>>> M: 240-605-4654
>>> <https://red.ht/sig>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C <
>>> Russell.Kendall at mantech.com> wrote:
>>>
>>>
>>> Kevin,
>>> The lack of resources on
>>> u-p.io <http://u-p.io> <http://u-p.io> <http://u-p.io> <
>>> http://u-p.io>
>>> cluster is hindering development,
>>> testing, and integration of the apps from CCAT AAM DAS, which
>>> is putting one
>>> of our PI goals at risk.
>>>
>>>
>>> We are blocked by the fact that we (CCAT and AAM) cannot
>>> deploy additional pods to the
>>>
>>> unified-platform.io <http://unified-platform.io> <
>>> http://unified-platform.io> <http://unified-platform.io> <
>>> http://unified-platform.io>
>>> cluster. We have a subset of containers deployed, but rolling
>>> deployments and new deployments fail. This means that we
>>> are not able to execute integration testing or peer reviews.
>>> We are temporarily working around by NOT
>>> testing/reviewing our code changes live, something that no one likes. Also,
>>> we are now running weeks-old instances of our containers, so we are very
>>> likely producing some technical debt. We currently have
>>> developers
>>> approaching idle or doing non-priority work until the
>>> resource issue is resolved.
>>>
>>>
>>>
>>> Here is the particular error from the OSP cluster I
>>> received while attempting a redeploy of one of our apps.
>>>
>>>
>>>
>>> 0/9 nodes are available: 1 node(s) had taints that the
>>> pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node
>>> selector.11 times in the last minute
>>>
>>> Since we do not have any cluster permissions, I cannot
>>> verify which resource is running out, but from experience, I assess it is a
>>> memory issue.
>>>
>>>
>>>
>>> It appears the cluster has been provisioned with a silly
>>> allocation of node types. Without knowing exactly what was deployed, it
>>> appears only 3 of the 9 hosts are suitable worker nodes. We would expect
>>> the cluster to respond to resource limitations
>>> and
>>> scale,
>>> but if a scheduled downtime is required, please work
>>> with us so we can anticipate. As it stands, the cluster does not support
>>> resources required by CCAT and the other dev teams (AAM, DAS, etc.). We
>>> would accept any downtime if it will improve the
>>> situation,
>>> as we are blocked from progressing under the current
>>> constraints. My hope was we could get the cluster redeployed over the TG
>>> holiday to eliminate developer impact, but as Mark pointed out, there were
>>> limited support folks available. Now I am just
>>> trying
>>> to
>>> minimize the losses.
>>>
>>>
>>>
>>> V/R,
>>>
>>> Russell C Kendall
>>>
>>>
>>>
>>>
>>>
>>> ________________________________________
>>> From: Kevin O'Donnell <kodonnel at redhat.com>
>>> Sent: Monday, December 2, 2019 11:52 AM
>>> To: Kendall, Russell C
>>> Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC
>>> AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF
>>> AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org);
>>> DIROCCO,
>>> ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller,
>>> Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP
>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>
>>> Hello Russell,
>>>
>>>
>>> Can you elaborate on the term Blocked? What specific
>>> issues are the blockers?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> KEVIN O'DONNELL
>>> ARCHITECT MANAGER
>>> Red Hat Red Hat NA Public Sector Consulting <
>>> https://www.redhat.com/>
>>>
>>> kodonnell at redhat.com <mailto:kodonnell at redhat.com%20M:240-605-4654>
>>> M: 240-605-4654
>>> <https://red.ht/sig>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C <
>>> Russell.Kendall at mantech.com> wrote:
>>>
>>>
>>> Mark,
>>>
>>> Thank for acknowledging, please be aware the San Antonio
>>> dev teams working in
>>>
>>>
>>> unified-platform.io <http://unified-platform.io> <
>>> http://unified-platform.io> <http://unified-platform.io> <
>>> http://unified-platform.io>
>>> are currently blocked.
>>>
>>> V/R,
>>>
>>> Russell C Kendall
>>>
>>> ________________________________________
>>> From: Mark Nissley <mnissley at redhat.com>
>>> Sent: Monday, December 2, 2019 9:36 AM
>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP;
>>> Jonathan Rickard; Chris Kuperstein
>>> Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin
>>> O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (
>>> mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP;
>>> Miller, Timothy
>>> J.;
>>> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP
>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>
>>> As noted, I don't suspect much got done on this over the
>>> holiday weekend. I did see the ticket, as dropped some details into it. I
>>> also assigned it to @Jonathan
>>> Rickard <mailto:jonny at redhat.com> and @Chris
>>> Kuperstein <mailto:ckuperst at redhat.com> .
>>>
>>>
>>>
>>> It looks like short term solutions have been easy but
>>> the issue is recurring.
>>>
>>>
>>>
>>>
>>> Mark NISSLEY, PMP,
>>> CSM, LEAN
>>>
>>> PROGRAM MaNAGER & SR technical Project Manager
>>> North American Consulting, Public Sector
>>> <https://www.redhat.com/>
>>> M:
>>> 850-530-3234
>>> <https://www.redhat.com/>
>>>
>>> Scheduled Training: October 14-18
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A
>>> GG-12 USAF AFMC AFLCMC/HNCP <ademola.abodunrin at us.af.mil> wrote:
>>>
>>>
>>> Mark/Kevin,
>>>
>>>
>>> I just heard at the team stand up that we are still
>>> blocked. This is also affecting the AAM team from my investigations.
>>>
>>>
>>> Please let me know if there is something we need to do
>>> to move this forward.
>>>
>>> Most Sincerely,
>>>
>>>
>>> Ade Abodunrin, GG-12, USAF
>>> Product Owner (Cybertron & Ginyu Force), Unified Platform
>>>
>>>
>>>
>>> LevelUP Code Works
>>> Commercial:
>>> (210) 890-2113
>>> NIPR email:
>>> ademola.abodunrin at us.af.mil
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ________________________________________
>>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP
>>> Sent: Wednesday, November 27, 2019 12:58 PM
>>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP <
>>> austen.bryan.1 at us.af.mil>; Mark Nissley <mnissley at redhat.com>; Kevin
>>> O'Donnell
>>> <kodonnel at redhat.com>;
>>> Brenna Gordon <bgordon at redhat.com>
>>> Cc: Kendall, Russell C <Russell.Kendall at ManTech.com>;
>>> Bubb, Mike (mbubb at mitre.org) <mbubb at mitre.org>; DIROCCO, ROGER E GG-13
>>> USAF AFMC ESC/AFLCMC/HNCP
>>> <roger.dirocco.4 at us.af.mil>; Miller, Timothy J. <
>>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP <
>>> jose.ramirez.50.ctr at us.af.mil>
>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>
>>> Thanks a lot Capt Bryan! Russell created the ticket on
>>> GitLab UP Node Project.
>>>
>>>
>>>
>>>
>>> Most Sincerely,
>>>
>>>
>>> Ade Abodunrin, GG-12, USAF
>>> Product Owner (Cybertron & Ginyu Force), Unified Platform
>>>
>>>
>>>
>>> LevelUP Code Works
>>> Commercial:
>>> (210) 890-2113
>>> NIPR email:
>>> ademola.abodunrin at us.af.mil
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ________________________________________
>>> From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP <
>>> austen.bryan.1 at us.af.mil>
>>> Sent: Wednesday, November 27, 2019 12:56 PM
>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <
>>> ademola.abodunrin at us.af.mil>; Mark Nissley <mnissley at redhat.com>; Kevin
>>> O'Donnell
>>> <kodonnel at redhat.com>; Brenna Gordon <
>>> bgordon at redhat.com>
>>> Cc: Kendall, Russell C <Russell.Kendall at ManTech.com>;
>>> Bubb, Mike (mbubb at mitre.org) <mbubb at mitre.org>; DIROCCO, ROGER E GG-13
>>> USAF AFMC ESC/AFLCMC/HNCP
>>> <roger.dirocco.4 at us.af.mil>; Miller, Timothy J. <
>>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP <
>>> jose.ramirez.50.ctr at us.af.mil>
>>> Subject: RE: Unified Platform Pod Deploy Errors
>>>
>>> Thanks Ade. The team is thin until next week due to the
>>> holidays but I will make sure it is addressed. Were there any issues
>>> submitted to Gitlab’s UP Node Project on DCCSCR?
>>>
>>> @Mark/Kevin – can we address?
>>>
>>> -Austen
>>>
>>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <
>>> ademola.abodunrin at us.af.mil>
>>>
>>> Sent: Wednesday, November 27, 2019 9:51 AM
>>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP <
>>> austen.bryan.1 at us.af.mil>
>>> Cc: Kendall, Russell C <Russell.Kendall at ManTech.com>;
>>> Bubb, Mike (mbubb at mitre.org) <mbubb at mitre.org>
>>> Subject: Fw: Unified Platform Pod Deploy Errors
>>>
>>>
>>>
>>> Capt Bryan,
>>>
>>> Please see the explanation on the issue that Ginyu Force
>>> is currently experiencing below.
>>>
>>>
>>>
>>> Most Sincerely,
>>>
>>> Ade Abodunrin, GG-12, USAF
>>> Product Owner (Cybertron & Ginyu Force), Unified Platform
>>>
>>>
>>> LevelUP Code Works
>>> Commercial: (210) 890-2113
>>> NIPR email:
>>> ademola.abodunrin at us.af.mil
>>>
>>>
>>>
>>>
>>>
>>> ________________________________________
>>>
>>> From: Kendall, Russell C <Russell.Kendall at ManTech.com>
>>> Sent: Wednesday, November 27, 2019 9:46 AM
>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <
>>> ademola.abodunrin at us.af.mil>; Buffaloe,
>>> Christopher <Christopher.Buffaloe at ManTech.com>;
>>> Molina, Toby <Toby.Molina at ManTech.com>;
>>> Crace, Jared E <Jared.Crace at ManTech.com>; SANCHEZ,
>>> MARK GG-13 USAF AFMC AFLCMC/HNCP <mark.sanchez.8 at us.af.mil>
>>> Cc:
>>> tmiller at mitre.org <mailto:tmiller at mitre.org> <
>>> tmiller at mitre.org>
>>> Subject: [Non-DoD Source] Fw: Unified Platform Pod
>>> Deploy Errors
>>>
>>>
>>>
>>> Gentlemen,
>>>
>>> The application development teams working in the new
>>> GovCloud OCP environment (unified-platform.io <
>>> http://unified-platform.io> <http://unified-platform.io>
>>> <http://unified-platform.io>
>>> <http://unified-platform.io>)
>>> are currently blocked in efforts to deploy new pods for
>>> testing, development, and UAT.
>>>
>>> Red Hat and RogueOne SMEs have been notified and have
>>> attempted some fixes starting on Monday 11/25, but at this point have not
>>> been able to provision resources
>>> sufficient to host CCAT and AAM.
>>>
>>> We have taken steps to minimize our footprint
>>> (eliminating demonstration environment, deleting developer namespaces), but
>>> this is not a sustainable approach,
>>> and has only resulted in moderate improvements in
>>> cluster performance.
>>>
>>> Our hope is the U-P.io cluster compute resources can be
>>> increased very soon, so that we may resume normal development activities.
>>> Our understanding is that
>>> such a scaling requires a complete redeployment of the
>>> cluster, which is unusual, but an acceptable loss to productivity. If the
>>> cluster can be scaled up over the Thanksgiving holiday, the impact will be
>>> minimal to developers and cluster administrators,
>>> alike.
>>>
>>> We are currently collaborating on solutions on the
>>> following MatterMost channel behind the space camp VPN (link below), and
>>> via the email thread forwarded
>>> (further below).
>>>
>>>
>>>
>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node
>>> <https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> <
>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node>
>>> <
>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node>
>>>
>>> Please keep me posted on developments and I will
>>> coordinate developer activities with any scheduled platform outages.
>>>
>>> V/R,
>>> Russell C Kendall
>>>
>>> ________________________________________
>>>
>>> From: Curran, Daniel M
>>> Sent: Monday, November 25, 2019 2:47 PM
>>> To: Jonathan Rickard
>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>>> Phil
>>> Soliz;
>>> Buffaloe,
>>> Christopher; Torres, Alexander; Crace, Jared E; Middleton,
>>> Joseph J
>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>
>>>
>>>
>>> Sounds great. Appreciate it.
>>> I'll watch email and Mattermost in case you need more
>>> from us.
>>>
>>> -Daniel
>>>
>>> ________________________________________
>>>
>>> From: Jonathan Rickard <jrickard at redhat.com>
>>> Sent: Monday, November 25, 2019 2:44 PM
>>> To: Curran, Daniel M
>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>>> Phil
>>> Soliz;
>>> Buffaloe,
>>> Christopher; Torres, Alexander; Crace, Jared E; Middleton,
>>> Joseph J
>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>
>>>
>>>
>>> Thanks Daniel -
>>>
>>>
>>>
>>> I'll continue to look into the resource issue that
>>> you're seeing - I'd like to identify the root cause and then work with the
>>> team to come up with a solution.
>>>
>>>
>>>
>>> Jonathan Rickard,
>>> RHCA
>>> Principal Consultant, NAPS
>>> Red
>>> Hat Remote - Texas <https://www.redhat.com/>
>>> jonny at redhat.com
>>>
>>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>>> <https://www.redhat.com/>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M <
>>> Daniel.Curran at mantech.com>
>>> wrote:
>>>
>>>
>>> Yeah we hit the limit then had AAM kill some of their
>>> projects and then our pods got scheduled.
>>> We've hit the limit again though. Here's an example pod
>>> that cannot be scheduled
>>>
>>>
>>>
>>>
>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth
>>> <
>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth>
>>> <
>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth
>>> >
>>> <
>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth
>>> >
>>> They're seeing it when their jenkins slaves can't deploy
>>> but it's basically any pod after we hit some limit.
>>>
>>> -Daniel
>>> ________________________________________
>>>
>>> From: Jonathan Rickard <jrickard at redhat.com>
>>> Sent: Monday, November 25, 2019 1:26 PM
>>> To: Curran, Daniel M
>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>>> Phil
>>> Soliz;
>>> Buffaloe,
>>> Christopher; Torres, Alexander; Crace, Jared E; Middleton,
>>> Joseph J
>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>
>>>
>>>
>>> Daniel,
>>>
>>>
>>>
>>> I can see that you have 3 mongo pods, 1 chatup and 1
>>> upbot pod running ... is your app good to go?
>>>
>>>
>>>
>>> Looks like there was an issue with memory on 1 pod, then
>>> some node selector being mismatched - just what i could see in the events...
>>>
>>>
>>>
>>>
>>>
>>>
>>> Jonathan Rickard,
>>> RHCA
>>> Principal Consultant, NAPS
>>> Red
>>> Hat Remote - Texas <https://www.redhat.com/>
>>> jonny at redhat.com
>>>
>>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>>> <https://www.redhat.com/>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M <
>>> Daniel.Curran at mantech.com>
>>> wrote:
>>>
>>>
>>> Also, AAM was having similar issues. Looks like they had
>>> a lot of namespaces and scaling down the pods on their deployments didn't
>>> help but actually deleting the namespaces
>>> did.
>>> We have pods scheduling now but I'm adding them and we'd
>>> still like to work through what resource limit we were hitting to avoid
>>> this in the future.
>>>
>>> -Daniel
>>>
>>> ________________________________________
>>>
>>> From: Curran, Daniel M
>>> Sent: Monday, November 25, 2019 12:25 PM
>>> To: Jonathan Rickard
>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>>> Phil
>>> Soliz;
>>> Buffaloe,
>>> Christopher; Torres, Alexander
>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>
>>>
>>>
>>> Thanks, sir.
>>> Most important for us to get working is "ccat-demo" but
>>> it's also happening in "ccat-dev" and "ccat-ci-cd".
>>>
>>> -Daniel
>>> ________________________________________
>>>
>>> From: Jonathan Rickard <jrickard at redhat.com>
>>> Sent: Monday, November 25, 2019 12:22 PM
>>> To: Curran, Daniel M
>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>>> Phil
>>> Soliz;
>>> Buffaloe,
>>> Christopher; Torres, Alexander
>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>
>>>
>>>
>>> What's the name of the project you're working in? I'm
>>> going to be back at my laptop in about 30 and will take a look when I get
>>> there.
>>>
>>>
>>>
>>> Is it just the Jenkins pods failing?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M <
>>> Daniel.Curran at mantech.com>
>>> wrote:
>>>
>>>
>>> Adding Dean and Alex.
>>> Also, sitting in mattermost if anyone needs to get
>>> online and chat for more information.
>>>
>>> -Daniel
>>>
>>> ________________________________________
>>>
>>> From: Curran, Daniel M
>>> Sent: Monday, November 25, 2019 12:07 PM
>>> To:
>>> jonny at redhat.com <mailto:jonny at redhat.com>;
>>>
>>> ckuperst at redhat.com <mailto:ckuperst at redhat.com>; Mark
>>> Nissley
>>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall,
>>> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher
>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>
>>>
>>>
>>> Adding Kupe and Mark.
>>>
>>> -Daniel
>>> ________________________________________
>>>
>>> From: Curran, Daniel M
>>> Sent: Monday, November 25, 2019 11:43 AM
>>> To:
>>> jonny at redhat.com <mailto:jonny at redhat.com>
>>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall,
>>> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher
>>> Subject: Unified Platform Pod Deploy Errors
>>>
>>>
>>>
>>> Hey Jonny,
>>>
>>> We met briefly at SpaceCAMP a couple weeks ago when
>>>
>>>
>>>
>>>
>>> cluster.unified-platform.io <http://cluster.unified-platform.io> <
>>> http://cluster.unified-platform.io> <http://cluster.unified-platform.io>
>>> <http://cluster.unified-platform.io>
>>> was stood up. We've been trying to deploy some apps today and
>>> so far today we're getting errors on most (if
>>> not all) of our pods.
>>>
>>> 0/9 nodes are available: 3 Insufficient pods, 6 node(s)
>>> didn't match node selector.
>>>
>>> Is what we're seeing. We were thinking it was some
>>> volume types weren't correct but some of our pods don't even have volumes
>>> attached and still give us this error (i.e. Jenkins
>>> slaves or web frontends without persistent storage).
>>> Any idea what this could be? We're not running out of
>>> space on the nodes themselves are we?
>>> We have a demo scheduled for tomorrow at 9:30 AM CST and
>>> are hoping to get a demo env up for them today but this error came up
>>> unexpectedly. Also, we're here at 500 Navarro
>>> St. in San Antonio working through this in person is
>>> better/easier.
>>>
>>> Thanks,
>>> Daniel Curran
>>>
>>>
>>>
>>>
>>>
>>> ________________________________________
>>>
>>>
>>> This e-mail and any attachments are intended only for
>>> the use of the addressee(s) named herein and may contain proprietary
>>> information. If you are not the intended recipient of this e-mail or
>>> believe that you received this email in error, please
>>> take
>>> immediate
>>> action to notify the sender of the apparent error by
>>> reply e-mail; permanently delete the e-mail and any attachments from your
>>> computer; and do not disseminate, distribute, use, or copy this message and
>>> any attachments.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> platformONE mailing list
>>> platformONE at redhat.com
>>> https://www.redhat.com/mailman/listinfo/platformone
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
> platformONE mailing list
> platformONE at redhat.com
> https://www.redhat.com/mailman/listinfo/platformone
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/platformone/attachments/20191206/24a5803a/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 2127 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/platformone/attachments/20191206/24a5803a/attachment.png>
More information about the platformONE
mailing list