[Platformone] [EXT] Re: Unified Platform Pod Deploy Errors
Mark Nissley
mnissley at redhat.com
Fri Dec 6 18:13:04 UTC 2019
Screenshots are also very helpful!
Mark NISSLEY, PMP, CSM, LEAN
PROGRAM MaNAGER & SR technical Project Manager
North American Consulting, Public Sector
<https://www.redhat.com/>
M: 850-530-3234
<https://www.redhat.com/>
*Scheduled Training: October 14-18*
On Fri, Dec 6, 2019 at 12:07 PM Jonathan Rickard <jrickard at redhat.com>
wrote:
> Ade,
>
> What does that mean? You can't login, you can't deploy?
>
> Jonathan Rickard, RHCE, RHCA
>
> Consulting Architect
>
> Red Hat Public Sector <https://www.redhat.com/>
>
> jonny at redhat.com
> M: 210.862.9739
> @redhatjobs <https://twitter.com/redhatjobs> redhatjobs
> <https://www.facebook.com/redhatjobs> @redhatjobs
> <https://instagram.com/redhatjobs>
> <https://www.redhat.com/>
>
>
> On Fri, Dec 6, 2019 at 12:02 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC
> AFLCMC/HNCP <ademola.abodunrin at us.af.mil> wrote:
>
>> ALCON,
>>
>>
>>
>> The cluster is down again. Please assist.
>>
>>
>>
>> Most Sincerely,
>>
>>
>>
>> Ade Abodunrin, GG-12, USAF
>>
>> Product Owner (Cybertron & Ginyu Force), Unified Platform
>>
>>
>>
>> [image: cid:image001.png at 01D4F814.4AA552D0]
>>
>> LevelUP Code Works
>>
>> Commercial: (210) 890-2113
>>
>> NIPR email: *ademola.abodunrin at us.af.mil <ademola.abodunrin at us.af.mil>*
>>
>>
>>
>> *From:* Kendall, Russell C <Russell.Kendall at ManTech.com>
>> *Sent:* Thursday, December 5, 2019 9:55 AM
>> *To:* Jonathan Rickard <jrickard at redhat.com>
>> *Cc:* Miller, Timothy J. <tmiller at mitre.org>; Keegan Reap <
>> kreap at redhat.com>; Bubb, Mike <mbubb at mitre.org>; platformONE at redhat.com;
>> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP <jose.ramirez.50.ctr at us.af.mil>;
>> ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <
>> ademola.abodunrin at us.af.mil>; Jonathan Rickard <jonny at redhat.com>
>> *Subject:* [Non-DoD Source] Re: [Platformone] [EXT] Re: Unified Platform
>> Pod Deploy Errors
>>
>>
>>
>> Jonny,
>>
>> I'll see you Friday at 500 Nav. Travel safe.
>>
>>
>>
>> V/R,
>>
>> Russell C Kendall
>>
>>
>> ------------------------------
>>
>> *From:* Jonathan Rickard <jrickard at redhat.com>
>> *Sent:* Wednesday, December 4, 2019 5:29 PM
>> *To:* Kendall, Russell C
>> *Cc:* Miller, Timothy J.; Keegan Reap; Bubb, Mike; platformONE at redhat.com;
>> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF
>> AFMC AFLCMC/HNCP; Jonathan Rickard
>> *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors
>>
>>
>>
>> Russell,
>>
>>
>>
>> I have definitely been terrible with email lately and I apologize for the
>> slow response times. I get back to San Antonio tomorrow but I have a pretty
>> full afternoon. I can stop by Friday if you'd like.
>>
>>
>>
>> Thanks,
>>
>> jonny
>>
>>
>>
>> *Jonathan Rickard**, RHCA*
>>
>> Principal Consultant, NAPS
>>
>> Red Hat Remote - Texas <https://www.redhat.com/>
>>
>> jonny at redhat.com
>> M: 210-862-9739
>>
>> <https://www.redhat.com/>
>>
>>
>>
>>
>>
>> On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C <
>> Russell.Kendall at mantech.com> wrote:
>>
>> Jonny,
>> I'd like to suggest you come to 500 to wrap this up, since it seems there
>> are significant delays in communication that are contributing to downtime.
>> V/R,
>> Russell C Kendall
>> ________________________________________
>> From: Miller, Timothy J. <tmiller at mitre.org>
>> Sent: Wednesday, December 4, 2019 7:02 AM
>> To: Jonathan Rickard; Keegan Reap
>> Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ,
>> JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC
>> AFLCMC/HNCP; Jonathan Rickard
>> Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors
>>
>> Johnny--
>>
>> Update the issue, if you would be so kind.
>>
>> -- T
>>
>> On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of
>> Jonathan Rickard" <platformone-bounces at redhat.com on behalf of
>> jrickard at redhat.com> wrote:
>>
>> Hey Guys - Sorry for taking so long - this has been completed. Please
>> run your builds and let us know if you're having any problems.
>> jonny
>> Jonathan Rickard, RHCA
>> Principal Consultant, NAPS
>> Red Hat Remote - Texas <https://www.redhat.com/>
>>
>> jonny at redhat.com
>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>> <https://www.redhat.com/>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard <jrickard at redhat.com>
>> wrote:
>>
>>
>> Russell / Team,
>>
>>
>> We believe we've identified the issue with your application
>> deploying. In order to rectify the issue I need to evacuate pods so you
>> will probably see some hiccups while deploying. I will update when this is
>> resolved.
>>
>>
>> Thanks,
>> jonny
>>
>> Jonathan Rickard, RHCA
>> Principal Consultant, NAPS
>> Red Hat Remote - Texas <https://www.redhat.com/>
>>
>> jonny at redhat.com
>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>> <https://www.redhat.com/>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap <kreap at redhat.com> wrote:
>>
>>
>> Hey all, we have opened an issue below, that we believe to be the
>> cause, we are currently investigating:
>>
>>
>> https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32
>>
>>
>>
>> On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard <jrickard at redhat.com>
>> wrote:
>>
>>
>> Russell,
>>
>>
>> Getting more eyes on this @platformONE at redhat.com <mailto:
>> platformONE at redhat.com>
>>
>>
>> We'll keep you posted.
>> jonny
>> Jonathan Rickard, RHCA
>> Principal Consultant, NAPS
>> Red Hat Remote - Texas <https://www.redhat.com/>
>>
>> jonny at redhat.com
>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>> <https://www.redhat.com/>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C <
>> Russell.Kendall at mantech.com> wrote:
>>
>>
>> Kevin,
>>
>> Unfortunately we are receiving deployment errors again. This is the
>> event:
>>
>> 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had
>> taints that the pod didn't tolerate, 6 node(s) didn't match node selector.
>>
>> This is the deployment:
>>
>>
>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup
>>
>>
>> V/R,
>> Russell C Kendall
>> ________________________________________
>> From: Miller, Timothy J. <tmiller at mitre.org>
>> Sent: Monday, December 2, 2019 2:44:21 PM
>> To: Kevin O'Donnell
>> Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF
>> AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt
>> USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13
>> USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE
>> A CTR USAF AFMC AFLCMC/HNCP
>> Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors
>>
>> Tagged you on it.
>>
>> -- T
>>
>> On 12/2/19, 14:03, "Kevin O'Donnell" <kodonnel at redhat.com> wrote:
>>
>> Hello,
>>
>>
>> Autoscaling is on our future IAC roadmap. Tim, the additional
>> ticket would be appreciated.
>>
>>
>> We have swapped out the app/worker instances with m5a.8xlarge 32
>> cores, 128gb of ram. Please let us know if you have any other issues.
>>
>>
>> Thanks,
>>
>> KEVIN O'DONNELL
>> ARCHITECT MANAGER
>> Red Hat Red Hat NA Public Sector Consulting <
>> https://www.redhat.com/>
>>
>> kodonnell at redhat.com <mailto:kodonnell at redhat.com%20M:240-605-4654>
>> M: 240-605-4654
>> <https://red.ht/sig>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. <
>> tmiller at mitre.org> wrote:
>>
>>
>> I'll open an issue. IaC needs to have instance size as a
>> host_var to facilitate scaling.
>>
>> -- T
>>
>> On 12/2/19, 13:15, "Kevin O'Donnell" <kodonnel at redhat.com> wrote:
>>
>> Tim,
>>
>>
>> Thanks for the information. We are undersized on the
>> app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb
>> of ram. From what I have read each Labs engagement operated on a 3 node
>> worker cluster with each node having 6core's and 28gb
>> of ram. We will need to swap out the existing instances with
>> larger spec's.
>>
>>
>> We are going to try to flush the existing workload out on one
>> of the workers to see if we can swap them out one at a time.
>>
>>
>> Thanks,
>>
>> KEVIN O'DONNELL
>> ARCHITECT MANAGER
>> Red Hat Red Hat NA Public Sector Consulting <
>> https://www.redhat.com/>
>>
>> kodonnell at redhat.com <mailto:kodonnell at redhat.com%20M:240-605-4654>
>> M: 240-605-4654
>> <https://red.ht/sig>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. <
>> tmiller at mitre.org> wrote:
>>
>>
>> Here's what I can see, given the perm limits I seem to be
>> under:
>>
>> - NS:develop-misp-app and NS:lp-develop-misp-app both have
>> several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned
>> while trying to fetch something from somewhere (URL isn't recorded in the
>> stack trace).
>>
>> - NS:minishift-misp-app has most of its pods/jobs stuck in
>> ImagePullBackoff. No detail there in the event stream so I'll see if I can
>> dig deeper.
>>
>> - NS:aam-ci-cd has Jenkins trying to spin up three workers,
>> those are coming back as unschedulable.
>>
>> I can't see into NS:aam-bases or NS:dsop-images b/c of perm
>> limits.
>>
>> I see no DAS-related project(s).
>>
>> The MISP stuff needs debugging before calling "blocked" since
>> it looks like an internal error from this perspective.
>>
>>
>>
>> In re: AAM Jenkins: If this deployment is coming out of the
>> OCP storefront, then maybe it should be ephemeral rather than persistent.
>> If it's a custom deployment, then it probably needs a rethink.
>>
>> I'm also not sure why there are two MISP dev projects.
>>
>> -- T
>>
>>
>>
>> On 12/2/19, 12:46, "Kevin O'Donnell" <kodonnel at redhat.com>
>> wrote:
>>
>> Russell,
>>
>>
>> Thank you for the information. We can switch out the
>> instance type for the worker nodes. How much memory is required by the apps?
>>
>>
>>
>> Thanks,
>>
>> KEVIN O'DONNELL
>> ARCHITECT MANAGER
>> Red Hat Red Hat NA Public Sector Consulting <
>> https://www.redhat.com/>
>>
>> kodonnell at redhat.com <mailto:kodonnell at redhat.com%20M:240-605-4654>
>> M: 240-605-4654
>> <https://red.ht/sig>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C <
>> Russell.Kendall at mantech.com> wrote:
>>
>>
>> Kevin,
>> The lack of resources on
>> u-p.io <http://u-p.io> <http://u-p.io> <http://u-p.io> <
>> http://u-p.io>
>> cluster is hindering development,
>> testing, and integration of the apps from CCAT AAM DAS, which is
>> putting one
>> of our PI goals at risk.
>>
>>
>> We are blocked by the fact that we (CCAT and AAM) cannot
>> deploy additional pods to the
>>
>> unified-platform.io <http://unified-platform.io> <
>> http://unified-platform.io> <http://unified-platform.io> <
>> http://unified-platform.io>
>> cluster. We have a subset of containers deployed, but rolling
>> deployments and new deployments fail. This means that we are
>> not able to execute integration testing or peer reviews.
>> We are temporarily working around by NOT
>> testing/reviewing our code changes live, something that no one likes. Also,
>> we are now running weeks-old instances of our containers, so we are very
>> likely producing some technical debt. We currently have
>> developers
>> approaching idle or doing non-priority work until the
>> resource issue is resolved.
>>
>>
>>
>> Here is the particular error from the OSP cluster I
>> received while attempting a redeploy of one of our apps.
>>
>>
>>
>> 0/9 nodes are available: 1 node(s) had taints that the
>> pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node
>> selector.11 times in the last minute
>>
>> Since we do not have any cluster permissions, I cannot
>> verify which resource is running out, but from experience, I assess it is a
>> memory issue.
>>
>>
>>
>> It appears the cluster has been provisioned with a silly
>> allocation of node types. Without knowing exactly what was deployed, it
>> appears only 3 of the 9 hosts are suitable worker nodes. We would expect
>> the cluster to respond to resource limitations
>> and
>> scale,
>> but if a scheduled downtime is required, please work
>> with us so we can anticipate. As it stands, the cluster does not support
>> resources required by CCAT and the other dev teams (AAM, DAS, etc.). We
>> would accept any downtime if it will improve the
>> situation,
>> as we are blocked from progressing under the current
>> constraints. My hope was we could get the cluster redeployed over the TG
>> holiday to eliminate developer impact, but as Mark pointed out, there were
>> limited support folks available. Now I am just
>> trying
>> to
>> minimize the losses.
>>
>>
>>
>> V/R,
>>
>> Russell C Kendall
>>
>>
>>
>>
>>
>> ________________________________________
>> From: Kevin O'Donnell <kodonnel at redhat.com>
>> Sent: Monday, December 2, 2019 11:52 AM
>> To: Kendall, Russell C
>> Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC
>> AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF
>> AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org);
>> DIROCCO,
>> ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy
>> J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP
>> Subject: Re: Unified Platform Pod Deploy Errors
>>
>> Hello Russell,
>>
>>
>> Can you elaborate on the term Blocked? What specific
>> issues are the blockers?
>>
>>
>>
>> Thanks,
>>
>> KEVIN O'DONNELL
>> ARCHITECT MANAGER
>> Red Hat Red Hat NA Public Sector Consulting <
>> https://www.redhat.com/>
>>
>> kodonnell at redhat.com <mailto:kodonnell at redhat.com%20M:240-605-4654>
>> M: 240-605-4654
>> <https://red.ht/sig>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C <
>> Russell.Kendall at mantech.com> wrote:
>>
>>
>> Mark,
>>
>> Thank for acknowledging, please be aware the San Antonio
>> dev teams working in
>>
>>
>> unified-platform.io <http://unified-platform.io> <
>> http://unified-platform.io> <http://unified-platform.io> <
>> http://unified-platform.io>
>> are currently blocked.
>>
>> V/R,
>>
>> Russell C Kendall
>>
>> ________________________________________
>> From: Mark Nissley <mnissley at redhat.com>
>> Sent: Monday, December 2, 2019 9:36 AM
>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP;
>> Jonathan Rickard; Chris Kuperstein
>> Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin
>> O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org);
>> DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy
>> J.;
>> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP
>> Subject: Re: Unified Platform Pod Deploy Errors
>>
>> As noted, I don't suspect much got done on this over the
>> holiday weekend. I did see the ticket, as dropped some details into it. I
>> also assigned it to @Jonathan
>> Rickard <mailto:jonny at redhat.com> and @Chris Kuperstein
>> <mailto:ckuperst at redhat.com> .
>>
>>
>>
>> It looks like short term solutions have been easy but the
>> issue is recurring.
>>
>>
>>
>>
>> Mark NISSLEY, PMP,
>> CSM, LEAN
>>
>> PROGRAM MaNAGER & SR technical Project Manager
>> North American Consulting, Public Sector
>> <https://www.redhat.com/>
>> M:
>> 850-530-3234
>> <https://www.redhat.com/>
>>
>> Scheduled Training: October 14-18
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A
>> GG-12 USAF AFMC AFLCMC/HNCP <ademola.abodunrin at us.af.mil> wrote:
>>
>>
>> Mark/Kevin,
>>
>>
>> I just heard at the team stand up that we are still
>> blocked. This is also affecting the AAM team from my investigations.
>>
>>
>> Please let me know if there is something we need to do to
>> move this forward.
>>
>> Most Sincerely,
>>
>>
>> Ade Abodunrin, GG-12, USAF
>> Product Owner (Cybertron & Ginyu Force), Unified Platform
>>
>>
>>
>> LevelUP Code Works
>> Commercial:
>> (210) 890-2113
>> NIPR email:
>> ademola.abodunrin at us.af.mil
>>
>>
>>
>>
>>
>>
>>
>>
>> ________________________________________
>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP
>> Sent: Wednesday, November 27, 2019 12:58 PM
>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP <
>> austen.bryan.1 at us.af.mil>; Mark Nissley <mnissley at redhat.com>; Kevin
>> O'Donnell
>> <kodonnel at redhat.com>;
>> Brenna Gordon <bgordon at redhat.com>
>> Cc: Kendall, Russell C <Russell.Kendall at ManTech.com>;
>> Bubb, Mike (mbubb at mitre.org) <mbubb at mitre.org>; DIROCCO, ROGER E GG-13
>> USAF AFMC ESC/AFLCMC/HNCP
>> <roger.dirocco.4 at us.af.mil>; Miller, Timothy J. <
>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP <
>> jose.ramirez.50.ctr at us.af.mil>
>> Subject: Re: Unified Platform Pod Deploy Errors
>>
>> Thanks a lot Capt Bryan! Russell created the ticket on
>> GitLab UP Node Project.
>>
>>
>>
>>
>> Most Sincerely,
>>
>>
>> Ade Abodunrin, GG-12, USAF
>> Product Owner (Cybertron & Ginyu Force), Unified Platform
>>
>>
>>
>> LevelUP Code Works
>> Commercial:
>> (210) 890-2113
>> NIPR email:
>> ademola.abodunrin at us.af.mil
>>
>>
>>
>>
>>
>>
>>
>>
>> ________________________________________
>> From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP <
>> austen.bryan.1 at us.af.mil>
>> Sent: Wednesday, November 27, 2019 12:56 PM
>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <
>> ademola.abodunrin at us.af.mil>; Mark Nissley <mnissley at redhat.com>; Kevin
>> O'Donnell
>> <kodonnel at redhat.com>; Brenna Gordon <bgordon at redhat.com
>> >
>> Cc: Kendall, Russell C <Russell.Kendall at ManTech.com>;
>> Bubb, Mike (mbubb at mitre.org) <mbubb at mitre.org>; DIROCCO, ROGER E GG-13
>> USAF AFMC ESC/AFLCMC/HNCP
>> <roger.dirocco.4 at us.af.mil>; Miller, Timothy J. <
>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP <
>> jose.ramirez.50.ctr at us.af.mil>
>> Subject: RE: Unified Platform Pod Deploy Errors
>>
>> Thanks Ade. The team is thin until next week due to the
>> holidays but I will make sure it is addressed. Were there any issues
>> submitted to Gitlab’s UP Node Project on DCCSCR?
>>
>> @Mark/Kevin – can we address?
>>
>> -Austen
>>
>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <
>> ademola.abodunrin at us.af.mil>
>>
>> Sent: Wednesday, November 27, 2019 9:51 AM
>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP <
>> austen.bryan.1 at us.af.mil>
>> Cc: Kendall, Russell C <Russell.Kendall at ManTech.com>;
>> Bubb, Mike (mbubb at mitre.org) <mbubb at mitre.org>
>> Subject: Fw: Unified Platform Pod Deploy Errors
>>
>>
>>
>> Capt Bryan,
>>
>> Please see the explanation on the issue that Ginyu Force
>> is currently experiencing below.
>>
>>
>>
>> Most Sincerely,
>>
>> Ade Abodunrin, GG-12, USAF
>> Product Owner (Cybertron & Ginyu Force), Unified Platform
>>
>>
>> LevelUP Code Works
>> Commercial: (210) 890-2113
>> NIPR email:
>> ademola.abodunrin at us.af.mil
>>
>>
>>
>>
>>
>> ________________________________________
>>
>> From: Kendall, Russell C <Russell.Kendall at ManTech.com>
>> Sent: Wednesday, November 27, 2019 9:46 AM
>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <
>> ademola.abodunrin at us.af.mil>; Buffaloe,
>> Christopher <Christopher.Buffaloe at ManTech.com>; Molina,
>> Toby <Toby.Molina at ManTech.com>;
>> Crace, Jared E <Jared.Crace at ManTech.com>; SANCHEZ, MARK
>> GG-13 USAF AFMC AFLCMC/HNCP <mark.sanchez.8 at us.af.mil>
>> Cc:
>> tmiller at mitre.org <mailto:tmiller at mitre.org> <
>> tmiller at mitre.org>
>> Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy
>> Errors
>>
>>
>>
>> Gentlemen,
>>
>> The application development teams working in the new
>> GovCloud OCP environment (unified-platform.io <http://unified-platform.io>
>> <http://unified-platform.io>
>> <http://unified-platform.io>
>> <http://unified-platform.io>)
>> are currently blocked in efforts to deploy new pods for
>> testing, development, and UAT.
>>
>> Red Hat and RogueOne SMEs have been notified and have
>> attempted some fixes starting on Monday 11/25, but at this point have not
>> been able to provision resources
>> sufficient to host CCAT and AAM.
>>
>> We have taken steps to minimize our footprint
>> (eliminating demonstration environment, deleting developer namespaces), but
>> this is not a sustainable approach,
>> and has only resulted in moderate improvements in
>> cluster performance.
>>
>> Our hope is the U-P.io cluster compute resources can be
>> increased very soon, so that we may resume normal development activities.
>> Our understanding is that
>> such a scaling requires a complete redeployment of the
>> cluster, which is unusual, but an acceptable loss to productivity. If the
>> cluster can be scaled up over the Thanksgiving holiday, the impact will be
>> minimal to developers and cluster administrators,
>> alike.
>>
>> We are currently collaborating on solutions on the
>> following MatterMost channel behind the space camp VPN (link below), and
>> via the email thread forwarded
>> (further below).
>>
>>
>>
>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node <
>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> <
>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node>
>> <https://chat.spacecamp.ninja/levelup/channels/unified-platform-node
>> >
>>
>> Please keep me posted on developments and I will
>> coordinate developer activities with any scheduled platform outages.
>>
>> V/R,
>> Russell C Kendall
>>
>> ________________________________________
>>
>> From: Curran, Daniel M
>> Sent: Monday, November 25, 2019 2:47 PM
>> To: Jonathan Rickard
>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>> Phil
>> Soliz;
>> Buffaloe,
>> Christopher; Torres, Alexander; Crace, Jared E; Middleton,
>> Joseph J
>> Subject: Re: Unified Platform Pod Deploy Errors
>>
>>
>>
>> Sounds great. Appreciate it.
>> I'll watch email and Mattermost in case you need more
>> from us.
>>
>> -Daniel
>>
>> ________________________________________
>>
>> From: Jonathan Rickard <jrickard at redhat.com>
>> Sent: Monday, November 25, 2019 2:44 PM
>> To: Curran, Daniel M
>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>> Phil
>> Soliz;
>> Buffaloe,
>> Christopher; Torres, Alexander; Crace, Jared E; Middleton,
>> Joseph J
>> Subject: Re: Unified Platform Pod Deploy Errors
>>
>>
>>
>> Thanks Daniel -
>>
>>
>>
>> I'll continue to look into the resource issue that you're
>> seeing - I'd like to identify the root cause and then work with the team to
>> come up with a solution.
>>
>>
>>
>> Jonathan Rickard,
>> RHCA
>> Principal Consultant, NAPS
>> Red
>> Hat Remote - Texas <https://www.redhat.com/>
>> jonny at redhat.com
>>
>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>> <https://www.redhat.com/>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M <
>> Daniel.Curran at mantech.com>
>> wrote:
>>
>>
>> Yeah we hit the limit then had AAM kill some of their
>> projects and then our pods got scheduled.
>> We've hit the limit again though. Here's an example pod
>> that cannot be scheduled
>>
>>
>>
>>
>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth
>> <
>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth>
>> <
>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth
>> >
>> <
>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth
>> >
>> They're seeing it when their jenkins slaves can't deploy
>> but it's basically any pod after we hit some limit.
>>
>> -Daniel
>> ________________________________________
>>
>> From: Jonathan Rickard <jrickard at redhat.com>
>> Sent: Monday, November 25, 2019 1:26 PM
>> To: Curran, Daniel M
>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>> Phil
>> Soliz;
>> Buffaloe,
>> Christopher; Torres, Alexander; Crace, Jared E; Middleton,
>> Joseph J
>> Subject: Re: Unified Platform Pod Deploy Errors
>>
>>
>>
>> Daniel,
>>
>>
>>
>> I can see that you have 3 mongo pods, 1 chatup and 1
>> upbot pod running ... is your app good to go?
>>
>>
>>
>> Looks like there was an issue with memory on 1 pod, then
>> some node selector being mismatched - just what i could see in the events...
>>
>>
>>
>>
>>
>>
>> Jonathan Rickard,
>> RHCA
>> Principal Consultant, NAPS
>> Red
>> Hat Remote - Texas <https://www.redhat.com/>
>> jonny at redhat.com
>>
>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>> <https://www.redhat.com/>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M <
>> Daniel.Curran at mantech.com>
>> wrote:
>>
>>
>> Also, AAM was having similar issues. Looks like they had
>> a lot of namespaces and scaling down the pods on their deployments didn't
>> help but actually deleting the namespaces
>> did.
>> We have pods scheduling now but I'm adding them and we'd
>> still like to work through what resource limit we were hitting to avoid
>> this in the future.
>>
>> -Daniel
>>
>> ________________________________________
>>
>> From: Curran, Daniel M
>> Sent: Monday, November 25, 2019 12:25 PM
>> To: Jonathan Rickard
>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>> Phil
>> Soliz;
>> Buffaloe,
>> Christopher; Torres, Alexander
>> Subject: Re: Unified Platform Pod Deploy Errors
>>
>>
>>
>> Thanks, sir.
>> Most important for us to get working is "ccat-demo" but
>> it's also happening in "ccat-dev" and "ccat-ci-cd".
>>
>> -Daniel
>> ________________________________________
>>
>> From: Jonathan Rickard <jrickard at redhat.com>
>> Sent: Monday, November 25, 2019 12:22 PM
>> To: Curran, Daniel M
>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>> Phil
>> Soliz;
>> Buffaloe,
>> Christopher; Torres, Alexander
>> Subject: Re: Unified Platform Pod Deploy Errors
>>
>>
>>
>> What's the name of the project you're working in? I'm
>> going to be back at my laptop in about 30 and will take a look when I get
>> there.
>>
>>
>>
>> Is it just the Jenkins pods failing?
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M <
>> Daniel.Curran at mantech.com>
>> wrote:
>>
>>
>> Adding Dean and Alex.
>> Also, sitting in mattermost if anyone needs to get online
>> and chat for more information.
>>
>> -Daniel
>>
>> ________________________________________
>>
>> From: Curran, Daniel M
>> Sent: Monday, November 25, 2019 12:07 PM
>> To:
>> jonny at redhat.com <mailto:jonny at redhat.com>;
>>
>> ckuperst at redhat.com <mailto:ckuperst at redhat.com>; Mark
>> Nissley
>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall,
>> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher
>> Subject: Re: Unified Platform Pod Deploy Errors
>>
>>
>>
>> Adding Kupe and Mark.
>>
>> -Daniel
>> ________________________________________
>>
>> From: Curran, Daniel M
>> Sent: Monday, November 25, 2019 11:43 AM
>> To:
>> jonny at redhat.com <mailto:jonny at redhat.com>
>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall,
>> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher
>> Subject: Unified Platform Pod Deploy Errors
>>
>>
>>
>> Hey Jonny,
>>
>> We met briefly at SpaceCAMP a couple weeks ago when
>>
>>
>>
>>
>> cluster.unified-platform.io <http://cluster.unified-platform.io> <
>> http://cluster.unified-platform.io> <http://cluster.unified-platform.io>
>> <http://cluster.unified-platform.io>
>> was stood up. We've been trying to deploy some apps today and so
>> far today we're getting errors on most (if
>> not all) of our pods.
>>
>> 0/9 nodes are available: 3 Insufficient pods, 6 node(s)
>> didn't match node selector.
>>
>> Is what we're seeing. We were thinking it was some volume
>> types weren't correct but some of our pods don't even have volumes attached
>> and still give us this error (i.e. Jenkins
>> slaves or web frontends without persistent storage).
>> Any idea what this could be? We're not running out of
>> space on the nodes themselves are we?
>> We have a demo scheduled for tomorrow at 9:30 AM CST and
>> are hoping to get a demo env up for them today but this error came up
>> unexpectedly. Also, we're here at 500 Navarro
>> St. in San Antonio working through this in person is
>> better/easier.
>>
>> Thanks,
>> Daniel Curran
>>
>>
>>
>>
>>
>> ________________________________________
>>
>>
>> This e-mail and any attachments are intended only for the
>> use of the addressee(s) named herein and may contain proprietary
>> information. If you are not the intended recipient of this e-mail or
>> believe that you received this email in error, please
>> take
>> immediate
>> action to notify the sender of the apparent error by
>> reply e-mail; permanently delete the e-mail and any attachments from your
>> computer; and do not disseminate, distribute, use, or copy this message and
>> any attachments.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> platformONE mailing list
>> platformONE at redhat.com
>> https://www.redhat.com/mailman/listinfo/platformone
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
> platformONE mailing list
> platformONE at redhat.com
> https://www.redhat.com/mailman/listinfo/platformone
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/platformone/attachments/20191206/1321a04a/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 2127 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/platformone/attachments/20191206/1321a04a/attachment.png>
More information about the platformONE
mailing list