[Platformone] [EXT] Re: Unified Platform Pod Deploy Errors
Jonathan Rickard
jrickard at redhat.com
Fri Dec 6 18:37:26 UTC 2019
Russell,
Is CCAT the only application having problems? I see your project has a few
failed pv's.
The taints appear to be
Jonathan Rickard, RHCE, RHCA
Consulting Architect
Red Hat Public Sector <https://www.redhat.com/>
jonny at redhat.com
M: 210.862.9739
@redhatjobs <https://twitter.com/redhatjobs> redhatjobs
<https://www.facebook.com/redhatjobs> @redhatjobs
<https://instagram.com/redhatjobs>
<https://www.redhat.com/>
On Fri, Dec 6, 2019 at 12:26 PM Mark Nissley <mnissley at redhat.com> wrote:
> Issue created here:
> https://dccscr.dsop.io/ginyu-force/ginyu-force/issues/1
>
>
> Mark NISSLEY, PMP, CSM, LEAN
>
> PROGRAM MaNAGER & SR technical Project Manager
>
> North American Consulting, Public Sector
> <https://www.redhat.com/>
>
> M: 850-530-3234
>
> <https://www.redhat.com/>
>
> *Scheduled Training: October 14-18*
>
>
> On Fri, Dec 6, 2019 at 12:21 PM Kendall, Russell C <
> Russell.Kendall at mantech.com> wrote:
>
>> Nine tainted pods. Running apps seem to be okay, where they happened to
>> be running at time the taint flood occurred. This will block IATT efforts,
>> since we can not deploy our apps once we have remediated the
>> vulnerabilities and to confirm remediation with TL and Anchore (there is
>> not local scanning capability).
>> V/R,
>> Russell C Kendall
>> ------------------------------
>> *From:* Jonathan Rickard <jrickard at redhat.com>
>> *Sent:* Friday, December 6, 2019 12:13:29 PM
>> *To:* ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP
>> *Cc:* Kendall, Russell C; platformONE at redhat.com; Miller, Timothy J.;
>> Keegan Reap; Bubb, Mike; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP;
>> Jonathan Rickard
>> *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors
>>
>> Also, is every application having problems or a specific?
>>
>> Jonathan Rickard, RHCE, RHCA
>>
>> Consulting Architect
>>
>> Red Hat Public Sector <https://www.redhat.com/>
>>
>> jonny at redhat.com
>> M: 210.862.9739
>> @redhatjobs <https://twitter.com/redhatjobs> redhatjobs
>> <https://www.facebook.com/redhatjobs> @redhatjobs
>> <https://instagram.com/redhatjobs>
>> <https://www.redhat.com/>
>>
>>
>> On Fri, Dec 6, 2019 at 12:06 PM Jonathan Rickard <jrickard at redhat.com>
>> wrote:
>>
>>> Ade,
>>>
>>> What does that mean? You can't login, you can't deploy?
>>>
>>> Jonathan Rickard, RHCE, RHCA
>>>
>>> Consulting Architect
>>>
>>> Red Hat Public Sector <https://www.redhat.com/>
>>>
>>> jonny at redhat.com
>>> M: 210.862.9739
>>> @redhatjobs <https://twitter.com/redhatjobs> redhatjobs
>>> <https://www.facebook.com/redhatjobs> @redhatjobs
>>> <https://instagram.com/redhatjobs>
>>> <https://www.redhat.com/>
>>>
>>>
>>> On Fri, Dec 6, 2019 at 12:02 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC
>>> AFLCMC/HNCP <ademola.abodunrin at us.af.mil> wrote:
>>>
>>>> ALCON,
>>>>
>>>>
>>>>
>>>> The cluster is down again. Please assist.
>>>>
>>>>
>>>>
>>>> Most Sincerely,
>>>>
>>>>
>>>>
>>>> Ade Abodunrin, GG-12, USAF
>>>>
>>>> Product Owner (Cybertron & Ginyu Force), Unified Platform
>>>>
>>>>
>>>>
>>>> [image: cid:image001.png at 01D4F814.4AA552D0]
>>>>
>>>> LevelUP Code Works
>>>>
>>>> Commercial: (210) 890-2113
>>>>
>>>> NIPR email: *ademola.abodunrin at us.af.mil <ademola.abodunrin at us.af.mil>*
>>>>
>>>>
>>>>
>>>> *From:* Kendall, Russell C <Russell.Kendall at ManTech.com>
>>>> *Sent:* Thursday, December 5, 2019 9:55 AM
>>>> *To:* Jonathan Rickard <jrickard at redhat.com>
>>>> *Cc:* Miller, Timothy J. <tmiller at mitre.org>; Keegan Reap <
>>>> kreap at redhat.com>; Bubb, Mike <mbubb at mitre.org>; platformONE at redhat.com;
>>>> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP <
>>>> jose.ramirez.50.ctr at us.af.mil>; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC
>>>> AFLCMC/HNCP <ademola.abodunrin at us.af.mil>; Jonathan Rickard <
>>>> jonny at redhat.com>
>>>> *Subject:* [Non-DoD Source] Re: [Platformone] [EXT] Re: Unified
>>>> Platform Pod Deploy Errors
>>>>
>>>>
>>>>
>>>> Jonny,
>>>>
>>>> I'll see you Friday at 500 Nav. Travel safe.
>>>>
>>>>
>>>>
>>>> V/R,
>>>>
>>>> Russell C Kendall
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> *From:* Jonathan Rickard <jrickard at redhat.com>
>>>> *Sent:* Wednesday, December 4, 2019 5:29 PM
>>>> *To:* Kendall, Russell C
>>>> *Cc:* Miller, Timothy J.; Keegan Reap; Bubb, Mike;
>>>> platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP;
>>>> ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard
>>>> *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy
>>>> Errors
>>>>
>>>>
>>>>
>>>> Russell,
>>>>
>>>>
>>>>
>>>> I have definitely been terrible with email lately and I apologize for
>>>> the slow response times. I get back to San Antonio tomorrow but I have a
>>>> pretty full afternoon. I can stop by Friday if you'd like.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> jonny
>>>>
>>>>
>>>>
>>>> *Jonathan Rickard**, RHCA*
>>>>
>>>> Principal Consultant, NAPS
>>>>
>>>> Red Hat Remote - Texas <https://www.redhat.com/>
>>>>
>>>> jonny at redhat.com
>>>> M: 210-862-9739
>>>>
>>>> <https://www.redhat.com/>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C <
>>>> Russell.Kendall at mantech.com> wrote:
>>>>
>>>> Jonny,
>>>> I'd like to suggest you come to 500 to wrap this up, since it seems
>>>> there are significant delays in communication that are contributing to
>>>> downtime.
>>>> V/R,
>>>> Russell C Kendall
>>>> ________________________________________
>>>> From: Miller, Timothy J. <tmiller at mitre.org>
>>>> Sent: Wednesday, December 4, 2019 7:02 AM
>>>> To: Jonathan Rickard; Keegan Reap
>>>> Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ,
>>>> JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC
>>>> AFLCMC/HNCP; Jonathan Rickard
>>>> Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors
>>>>
>>>> Johnny--
>>>>
>>>> Update the issue, if you would be so kind.
>>>>
>>>> -- T
>>>>
>>>> On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of
>>>> Jonathan Rickard" <platformone-bounces at redhat.com on behalf of
>>>> jrickard at redhat.com> wrote:
>>>>
>>>> Hey Guys - Sorry for taking so long - this has been completed.
>>>> Please run your builds and let us know if you're having any problems.
>>>> jonny
>>>> Jonathan Rickard, RHCA
>>>> Principal Consultant, NAPS
>>>> Red Hat Remote - Texas <https://www.redhat.com/>
>>>>
>>>> jonny at redhat.com
>>>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>>>> <https://www.redhat.com/>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard <
>>>> jrickard at redhat.com> wrote:
>>>>
>>>>
>>>> Russell / Team,
>>>>
>>>>
>>>> We believe we've identified the issue with your application
>>>> deploying. In order to rectify the issue I need to evacuate pods so you
>>>> will probably see some hiccups while deploying. I will update when this is
>>>> resolved.
>>>>
>>>>
>>>> Thanks,
>>>> jonny
>>>>
>>>> Jonathan Rickard, RHCA
>>>> Principal Consultant, NAPS
>>>> Red Hat Remote - Texas <https://www.redhat.com/>
>>>>
>>>> jonny at redhat.com
>>>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>>>> <https://www.redhat.com/>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap <kreap at redhat.com>
>>>> wrote:
>>>>
>>>>
>>>> Hey all, we have opened an issue below, that we believe to be the
>>>> cause, we are currently investigating:
>>>>
>>>>
>>>> https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32
>>>>
>>>>
>>>>
>>>> On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard <
>>>> jrickard at redhat.com> wrote:
>>>>
>>>>
>>>> Russell,
>>>>
>>>>
>>>> Getting more eyes on this @platformONE at redhat.com <mailto:
>>>> platformONE at redhat.com>
>>>>
>>>>
>>>> We'll keep you posted.
>>>> jonny
>>>> Jonathan Rickard, RHCA
>>>> Principal Consultant, NAPS
>>>> Red Hat Remote - Texas <https://www.redhat.com/>
>>>>
>>>> jonny at redhat.com
>>>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>>>> <https://www.redhat.com/>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C <
>>>> Russell.Kendall at mantech.com> wrote:
>>>>
>>>>
>>>> Kevin,
>>>>
>>>> Unfortunately we are receiving deployment errors again. This is the
>>>> event:
>>>>
>>>> 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had
>>>> taints that the pod didn't tolerate, 6 node(s) didn't match node selector.
>>>>
>>>> This is the deployment:
>>>>
>>>>
>>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup
>>>>
>>>>
>>>> V/R,
>>>> Russell C Kendall
>>>> ________________________________________
>>>> From: Miller, Timothy J. <tmiller at mitre.org>
>>>> Sent: Monday, December 2, 2019 2:44:21 PM
>>>> To: Kevin O'Donnell
>>>> Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12
>>>> USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R
>>>> Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E
>>>> GG-13 USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE
>>>> A CTR USAF AFMC AFLCMC/HNCP
>>>> Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors
>>>>
>>>> Tagged you on it.
>>>>
>>>> -- T
>>>>
>>>> On 12/2/19, 14:03, "Kevin O'Donnell" <kodonnel at redhat.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>>
>>>> Autoscaling is on our future IAC roadmap. Tim, the additional
>>>> ticket would be appreciated.
>>>>
>>>>
>>>> We have swapped out the app/worker instances with m5a.8xlarge
>>>> 32 cores, 128gb of ram. Please let us know if you have any other issues.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> KEVIN O'DONNELL
>>>> ARCHITECT MANAGER
>>>> Red Hat Red Hat NA Public Sector Consulting <
>>>> https://www.redhat.com/>
>>>>
>>>> kodonnell at redhat.com <mailto:kodonnell at redhat.com%20M:240-605-4654>
>>>> M: 240-605-4654
>>>> <https://red.ht/sig>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. <
>>>> tmiller at mitre.org> wrote:
>>>>
>>>>
>>>> I'll open an issue. IaC needs to have instance size as a
>>>> host_var to facilitate scaling.
>>>>
>>>> -- T
>>>>
>>>> On 12/2/19, 13:15, "Kevin O'Donnell" <kodonnel at redhat.com>
>>>> wrote:
>>>>
>>>> Tim,
>>>>
>>>>
>>>> Thanks for the information. We are undersized on the
>>>> app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb
>>>> of ram. From what I have read each Labs engagement operated on a 3 node
>>>> worker cluster with each node having 6core's and 28gb
>>>> of ram. We will need to swap out the existing instances
>>>> with larger spec's.
>>>>
>>>>
>>>> We are going to try to flush the existing workload out on
>>>> one of the workers to see if we can swap them out one at a time.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> KEVIN O'DONNELL
>>>> ARCHITECT MANAGER
>>>> Red Hat Red Hat NA Public Sector Consulting <
>>>> https://www.redhat.com/>
>>>>
>>>> kodonnell at redhat.com <mailto:kodonnell at redhat.com%20M:240-605-4654>
>>>> M: 240-605-4654
>>>> <https://red.ht/sig>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. <
>>>> tmiller at mitre.org> wrote:
>>>>
>>>>
>>>> Here's what I can see, given the perm limits I seem to be
>>>> under:
>>>>
>>>> - NS:develop-misp-app and NS:lp-develop-misp-app both have
>>>> several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned
>>>> while trying to fetch something from somewhere (URL isn't recorded in the
>>>> stack trace).
>>>>
>>>> - NS:minishift-misp-app has most of its pods/jobs stuck in
>>>> ImagePullBackoff. No detail there in the event stream so I'll see if I can
>>>> dig deeper.
>>>>
>>>> - NS:aam-ci-cd has Jenkins trying to spin up three workers,
>>>> those are coming back as unschedulable.
>>>>
>>>> I can't see into NS:aam-bases or NS:dsop-images b/c of perm
>>>> limits.
>>>>
>>>> I see no DAS-related project(s).
>>>>
>>>> The MISP stuff needs debugging before calling "blocked"
>>>> since it looks like an internal error from this perspective.
>>>>
>>>>
>>>>
>>>> In re: AAM Jenkins: If this deployment is coming out of
>>>> the OCP storefront, then maybe it should be ephemeral rather than
>>>> persistent. If it's a custom deployment, then it probably needs a rethink.
>>>>
>>>> I'm also not sure why there are two MISP dev projects.
>>>>
>>>> -- T
>>>>
>>>>
>>>>
>>>> On 12/2/19, 12:46, "Kevin O'Donnell" <kodonnel at redhat.com>
>>>> wrote:
>>>>
>>>> Russell,
>>>>
>>>>
>>>> Thank you for the information. We can switch out the
>>>> instance type for the worker nodes. How much memory is required by the apps?
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> KEVIN O'DONNELL
>>>> ARCHITECT MANAGER
>>>> Red Hat Red Hat NA Public Sector Consulting <
>>>> https://www.redhat.com/>
>>>>
>>>> kodonnell at redhat.com <mailto:kodonnell at redhat.com%20M:240-605-4654>
>>>> M: 240-605-4654
>>>> <https://red.ht/sig>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C <
>>>> Russell.Kendall at mantech.com> wrote:
>>>>
>>>>
>>>> Kevin,
>>>> The lack of resources on
>>>> u-p.io <http://u-p.io> <http://u-p.io> <http://u-p.io>
>>>> <http://u-p.io>
>>>> cluster is hindering development,
>>>> testing, and integration of the apps from CCAT AAM DAS, which
>>>> is putting one
>>>> of our PI goals at risk.
>>>>
>>>>
>>>> We are blocked by the fact that we (CCAT and AAM)
>>>> cannot deploy additional pods to the
>>>>
>>>> unified-platform.io <http://unified-platform.io> <
>>>> http://unified-platform.io> <http://unified-platform.io> <
>>>> http://unified-platform.io>
>>>> cluster. We have a subset of containers deployed, but rolling
>>>> deployments and new deployments fail. This means that we
>>>> are not able to execute integration testing or peer reviews.
>>>> We are temporarily working around by NOT
>>>> testing/reviewing our code changes live, something that no one likes. Also,
>>>> we are now running weeks-old instances of our containers, so we are very
>>>> likely producing some technical debt. We currently have
>>>> developers
>>>> approaching idle or doing non-priority work until the
>>>> resource issue is resolved.
>>>>
>>>>
>>>>
>>>> Here is the particular error from the OSP cluster I
>>>> received while attempting a redeploy of one of our apps.
>>>>
>>>>
>>>>
>>>> 0/9 nodes are available: 1 node(s) had taints that the
>>>> pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node
>>>> selector.11 times in the last minute
>>>>
>>>> Since we do not have any cluster permissions, I cannot
>>>> verify which resource is running out, but from experience, I assess it is a
>>>> memory issue.
>>>>
>>>>
>>>>
>>>> It appears the cluster has been provisioned with a
>>>> silly allocation of node types. Without knowing exactly what was deployed,
>>>> it appears only 3 of the 9 hosts are suitable worker nodes. We would expect
>>>> the cluster to respond to resource limitations
>>>> and
>>>> scale,
>>>> but if a scheduled downtime is required, please work
>>>> with us so we can anticipate. As it stands, the cluster does not support
>>>> resources required by CCAT and the other dev teams (AAM, DAS, etc.). We
>>>> would accept any downtime if it will improve the
>>>> situation,
>>>> as we are blocked from progressing under the current
>>>> constraints. My hope was we could get the cluster redeployed over the TG
>>>> holiday to eliminate developer impact, but as Mark pointed out, there were
>>>> limited support folks available. Now I am just
>>>> trying
>>>> to
>>>> minimize the losses.
>>>>
>>>>
>>>>
>>>> V/R,
>>>>
>>>> Russell C Kendall
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________________
>>>> From: Kevin O'Donnell <kodonnel at redhat.com>
>>>> Sent: Monday, December 2, 2019 11:52 AM
>>>> To: Kendall, Russell C
>>>> Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC
>>>> AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF
>>>> AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org);
>>>> DIROCCO,
>>>> ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller,
>>>> Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP
>>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>>
>>>> Hello Russell,
>>>>
>>>>
>>>> Can you elaborate on the term Blocked? What specific
>>>> issues are the blockers?
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> KEVIN O'DONNELL
>>>> ARCHITECT MANAGER
>>>> Red Hat Red Hat NA Public Sector Consulting <
>>>> https://www.redhat.com/>
>>>>
>>>> kodonnell at redhat.com <mailto:kodonnell at redhat.com%20M:240-605-4654>
>>>> M: 240-605-4654
>>>> <https://red.ht/sig>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C <
>>>> Russell.Kendall at mantech.com> wrote:
>>>>
>>>>
>>>> Mark,
>>>>
>>>> Thank for acknowledging, please be aware the San
>>>> Antonio dev teams working in
>>>>
>>>>
>>>> unified-platform.io <http://unified-platform.io> <
>>>> http://unified-platform.io> <http://unified-platform.io> <
>>>> http://unified-platform.io>
>>>> are currently blocked.
>>>>
>>>> V/R,
>>>>
>>>> Russell C Kendall
>>>>
>>>> ________________________________________
>>>> From: Mark Nissley <mnissley at redhat.com>
>>>> Sent: Monday, December 2, 2019 9:36 AM
>>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP;
>>>> Jonathan Rickard; Chris Kuperstein
>>>> Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin
>>>> O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (
>>>> mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP;
>>>> Miller, Timothy
>>>> J.;
>>>> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP
>>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>>
>>>> As noted, I don't suspect much got done on this over
>>>> the holiday weekend. I did see the ticket, as dropped some details into it.
>>>> I also assigned it to @Jonathan
>>>> Rickard <mailto:jonny at redhat.com> and @Chris
>>>> Kuperstein <mailto:ckuperst at redhat.com> .
>>>>
>>>>
>>>>
>>>> It looks like short term solutions have been easy but
>>>> the issue is recurring.
>>>>
>>>>
>>>>
>>>>
>>>> Mark NISSLEY, PMP,
>>>> CSM, LEAN
>>>>
>>>> PROGRAM MaNAGER & SR technical Project Manager
>>>> North American Consulting, Public Sector
>>>> <https://www.redhat.com/>
>>>> M:
>>>> 850-530-3234
>>>> <https://www.redhat.com/>
>>>>
>>>> Scheduled Training: October 14-18
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A
>>>> GG-12 USAF AFMC AFLCMC/HNCP <ademola.abodunrin at us.af.mil> wrote:
>>>>
>>>>
>>>> Mark/Kevin,
>>>>
>>>>
>>>> I just heard at the team stand up that we are still
>>>> blocked. This is also affecting the AAM team from my investigations.
>>>>
>>>>
>>>> Please let me know if there is something we need to do
>>>> to move this forward.
>>>>
>>>> Most Sincerely,
>>>>
>>>>
>>>> Ade Abodunrin, GG-12, USAF
>>>> Product Owner (Cybertron & Ginyu Force), Unified
>>>> Platform
>>>>
>>>>
>>>>
>>>> LevelUP Code Works
>>>> Commercial:
>>>> (210) 890-2113
>>>> NIPR email:
>>>> ademola.abodunrin at us.af.mil
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________________
>>>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP
>>>> Sent: Wednesday, November 27, 2019 12:58 PM
>>>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP <
>>>> austen.bryan.1 at us.af.mil>; Mark Nissley <mnissley at redhat.com>; Kevin
>>>> O'Donnell
>>>> <kodonnel at redhat.com>;
>>>> Brenna Gordon <bgordon at redhat.com>
>>>> Cc: Kendall, Russell C <Russell.Kendall at ManTech.com>;
>>>> Bubb, Mike (mbubb at mitre.org) <mbubb at mitre.org>; DIROCCO, ROGER E GG-13
>>>> USAF AFMC ESC/AFLCMC/HNCP
>>>> <roger.dirocco.4 at us.af.mil>; Miller, Timothy J. <
>>>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP <
>>>> jose.ramirez.50.ctr at us.af.mil>
>>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>>
>>>> Thanks a lot Capt Bryan! Russell created the ticket on
>>>> GitLab UP Node Project.
>>>>
>>>>
>>>>
>>>>
>>>> Most Sincerely,
>>>>
>>>>
>>>> Ade Abodunrin, GG-12, USAF
>>>> Product Owner (Cybertron & Ginyu Force), Unified
>>>> Platform
>>>>
>>>>
>>>>
>>>> LevelUP Code Works
>>>> Commercial:
>>>> (210) 890-2113
>>>> NIPR email:
>>>> ademola.abodunrin at us.af.mil
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________________
>>>> From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP <
>>>> austen.bryan.1 at us.af.mil>
>>>> Sent: Wednesday, November 27, 2019 12:56 PM
>>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <
>>>> ademola.abodunrin at us.af.mil>; Mark Nissley <mnissley at redhat.com>; Kevin
>>>> O'Donnell
>>>> <kodonnel at redhat.com>; Brenna Gordon <
>>>> bgordon at redhat.com>
>>>> Cc: Kendall, Russell C <Russell.Kendall at ManTech.com>;
>>>> Bubb, Mike (mbubb at mitre.org) <mbubb at mitre.org>; DIROCCO, ROGER E GG-13
>>>> USAF AFMC ESC/AFLCMC/HNCP
>>>> <roger.dirocco.4 at us.af.mil>; Miller, Timothy J. <
>>>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP <
>>>> jose.ramirez.50.ctr at us.af.mil>
>>>> Subject: RE: Unified Platform Pod Deploy Errors
>>>>
>>>> Thanks Ade. The team is thin until next week due to the
>>>> holidays but I will make sure it is addressed. Were there any issues
>>>> submitted to Gitlab’s UP Node Project on DCCSCR?
>>>>
>>>> @Mark/Kevin – can we address?
>>>>
>>>> -Austen
>>>>
>>>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <
>>>> ademola.abodunrin at us.af.mil>
>>>>
>>>> Sent: Wednesday, November 27, 2019 9:51 AM
>>>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP <
>>>> austen.bryan.1 at us.af.mil>
>>>> Cc: Kendall, Russell C <Russell.Kendall at ManTech.com>;
>>>> Bubb, Mike (mbubb at mitre.org) <mbubb at mitre.org>
>>>> Subject: Fw: Unified Platform Pod Deploy Errors
>>>>
>>>>
>>>>
>>>> Capt Bryan,
>>>>
>>>> Please see the explanation on the issue that Ginyu
>>>> Force is currently experiencing below.
>>>>
>>>>
>>>>
>>>> Most Sincerely,
>>>>
>>>> Ade Abodunrin, GG-12, USAF
>>>> Product Owner (Cybertron & Ginyu Force), Unified
>>>> Platform
>>>>
>>>>
>>>> LevelUP Code Works
>>>> Commercial: (210) 890-2113
>>>> NIPR email:
>>>> ademola.abodunrin at us.af.mil
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________________
>>>>
>>>> From: Kendall, Russell C <Russell.Kendall at ManTech.com>
>>>> Sent: Wednesday, November 27, 2019 9:46 AM
>>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP <
>>>> ademola.abodunrin at us.af.mil>; Buffaloe,
>>>> Christopher <Christopher.Buffaloe at ManTech.com>;
>>>> Molina, Toby <Toby.Molina at ManTech.com>;
>>>> Crace, Jared E <Jared.Crace at ManTech.com>; SANCHEZ,
>>>> MARK GG-13 USAF AFMC AFLCMC/HNCP <mark.sanchez.8 at us.af.mil>
>>>> Cc:
>>>> tmiller at mitre.org <mailto:tmiller at mitre.org> <
>>>> tmiller at mitre.org>
>>>> Subject: [Non-DoD Source] Fw: Unified Platform Pod
>>>> Deploy Errors
>>>>
>>>>
>>>>
>>>> Gentlemen,
>>>>
>>>> The application development teams working in the new
>>>> GovCloud OCP environment (unified-platform.io <
>>>> http://unified-platform.io> <http://unified-platform.io>
>>>> <http://unified-platform.io>
>>>> <http://unified-platform.io>)
>>>> are currently blocked in efforts to deploy new pods
>>>> for testing, development, and UAT.
>>>>
>>>> Red Hat and RogueOne SMEs have been notified and have
>>>> attempted some fixes starting on Monday 11/25, but at this point have not
>>>> been able to provision resources
>>>> sufficient to host CCAT and AAM.
>>>>
>>>> We have taken steps to minimize our footprint
>>>> (eliminating demonstration environment, deleting developer namespaces), but
>>>> this is not a sustainable approach,
>>>> and has only resulted in moderate improvements in
>>>> cluster performance.
>>>>
>>>> Our hope is the U-P.io cluster compute resources can be
>>>> increased very soon, so that we may resume normal development activities.
>>>> Our understanding is that
>>>> such a scaling requires a complete redeployment of the
>>>> cluster, which is unusual, but an acceptable loss to productivity. If the
>>>> cluster can be scaled up over the Thanksgiving holiday, the impact will be
>>>> minimal to developers and cluster administrators,
>>>> alike.
>>>>
>>>> We are currently collaborating on solutions on the
>>>> following MatterMost channel behind the space camp VPN (link below), and
>>>> via the email thread forwarded
>>>> (further below).
>>>>
>>>>
>>>>
>>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node
>>>> <https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> <
>>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node>
>>>> <
>>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node>
>>>>
>>>> Please keep me posted on developments and I will
>>>> coordinate developer activities with any scheduled platform outages.
>>>>
>>>> V/R,
>>>> Russell C Kendall
>>>>
>>>> ________________________________________
>>>>
>>>> From: Curran, Daniel M
>>>> Sent: Monday, November 25, 2019 2:47 PM
>>>> To: Jonathan Rickard
>>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>>>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>>>> Phil
>>>> Soliz;
>>>> Buffaloe,
>>>> Christopher; Torres, Alexander; Crace, Jared E; Middleton,
>>>> Joseph J
>>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>>
>>>>
>>>>
>>>> Sounds great. Appreciate it.
>>>> I'll watch email and Mattermost in case you need more
>>>> from us.
>>>>
>>>> -Daniel
>>>>
>>>> ________________________________________
>>>>
>>>> From: Jonathan Rickard <jrickard at redhat.com>
>>>> Sent: Monday, November 25, 2019 2:44 PM
>>>> To: Curran, Daniel M
>>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>>>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>>>> Phil
>>>> Soliz;
>>>> Buffaloe,
>>>> Christopher; Torres, Alexander; Crace, Jared E; Middleton,
>>>> Joseph J
>>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>>
>>>>
>>>>
>>>> Thanks Daniel -
>>>>
>>>>
>>>>
>>>> I'll continue to look into the resource issue that
>>>> you're seeing - I'd like to identify the root cause and then work with the
>>>> team to come up with a solution.
>>>>
>>>>
>>>>
>>>> Jonathan Rickard,
>>>> RHCA
>>>> Principal Consultant, NAPS
>>>> Red
>>>> Hat Remote - Texas <https://www.redhat.com/>
>>>> jonny at redhat.com
>>>>
>>>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>>>> <https://www.redhat.com/>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M <
>>>> Daniel.Curran at mantech.com>
>>>> wrote:
>>>>
>>>>
>>>> Yeah we hit the limit then had AAM kill some of their
>>>> projects and then our pods got scheduled.
>>>> We've hit the limit again though. Here's an example pod
>>>> that cannot be scheduled
>>>>
>>>>
>>>>
>>>>
>>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth
>>>> <
>>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth>
>>>> <
>>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth
>>>> >
>>>> <
>>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth
>>>> >
>>>> They're seeing it when their jenkins slaves can't
>>>> deploy but it's basically any pod after we hit some limit.
>>>>
>>>> -Daniel
>>>> ________________________________________
>>>>
>>>> From: Jonathan Rickard <jrickard at redhat.com>
>>>> Sent: Monday, November 25, 2019 1:26 PM
>>>> To: Curran, Daniel M
>>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>>>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>>>> Phil
>>>> Soliz;
>>>> Buffaloe,
>>>> Christopher; Torres, Alexander; Crace, Jared E; Middleton,
>>>> Joseph J
>>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>>
>>>>
>>>>
>>>> Daniel,
>>>>
>>>>
>>>>
>>>> I can see that you have 3 mongo pods, 1 chatup and 1
>>>> upbot pod running ... is your app good to go?
>>>>
>>>>
>>>>
>>>> Looks like there was an issue with memory on 1 pod,
>>>> then some node selector being mismatched - just what i could see in the
>>>> events...
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Jonathan Rickard,
>>>> RHCA
>>>> Principal Consultant, NAPS
>>>> Red
>>>> Hat Remote - Texas <https://www.redhat.com/>
>>>> jonny at redhat.com
>>>>
>>>> M: 210-862-9739 <tel:210-862-9739 <210-862-9739>>
>>>> <https://www.redhat.com/>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M <
>>>> Daniel.Curran at mantech.com>
>>>> wrote:
>>>>
>>>>
>>>> Also, AAM was having similar issues. Looks like they
>>>> had a lot of namespaces and scaling down the pods on their deployments
>>>> didn't help but actually deleting the namespaces
>>>> did.
>>>> We have pods scheduling now but I'm adding them and
>>>> we'd still like to work through what resource limit we were hitting to
>>>> avoid this in the future.
>>>>
>>>> -Daniel
>>>>
>>>> ________________________________________
>>>>
>>>> From: Curran, Daniel M
>>>> Sent: Monday, November 25, 2019 12:25 PM
>>>> To: Jonathan Rickard
>>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>>>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>>>> Phil
>>>> Soliz;
>>>> Buffaloe,
>>>> Christopher; Torres, Alexander
>>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>>
>>>>
>>>>
>>>> Thanks, sir.
>>>> Most important for us to get working is "ccat-demo" but
>>>> it's also happening in "ccat-dev" and "ccat-ci-cd".
>>>>
>>>> -Daniel
>>>> ________________________________________
>>>>
>>>> From: Jonathan Rickard <jrickard at redhat.com>
>>>> Sent: Monday, November 25, 2019 12:22 PM
>>>> To: Curran, Daniel M
>>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley;
>>>> dlystra at redhat.com <mailto:dlystra at redhat.com>; Sison,
>>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J;
>>>> Phil
>>>> Soliz;
>>>> Buffaloe,
>>>> Christopher; Torres, Alexander
>>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>>
>>>>
>>>>
>>>> What's the name of the project you're working in? I'm
>>>> going to be back at my laptop in about 30 and will take a look when I get
>>>> there.
>>>>
>>>>
>>>>
>>>> Is it just the Jenkins pods failing?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M <
>>>> Daniel.Curran at mantech.com>
>>>> wrote:
>>>>
>>>>
>>>> Adding Dean and Alex.
>>>> Also, sitting in mattermost if anyone needs to get
>>>> online and chat for more information.
>>>>
>>>> -Daniel
>>>>
>>>> ________________________________________
>>>>
>>>> From: Curran, Daniel M
>>>> Sent: Monday, November 25, 2019 12:07 PM
>>>> To:
>>>> jonny at redhat.com <mailto:jonny at redhat.com>;
>>>>
>>>> ckuperst at redhat.com <mailto:ckuperst at redhat.com>; Mark
>>>> Nissley
>>>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall,
>>>> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher
>>>> Subject: Re: Unified Platform Pod Deploy Errors
>>>>
>>>>
>>>>
>>>> Adding Kupe and Mark.
>>>>
>>>> -Daniel
>>>> ________________________________________
>>>>
>>>> From: Curran, Daniel M
>>>> Sent: Monday, November 25, 2019 11:43 AM
>>>> To:
>>>> jonny at redhat.com <mailto:jonny at redhat.com>
>>>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall,
>>>> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher
>>>> Subject: Unified Platform Pod Deploy Errors
>>>>
>>>>
>>>>
>>>> Hey Jonny,
>>>>
>>>> We met briefly at SpaceCAMP a couple weeks ago when
>>>>
>>>>
>>>>
>>>>
>>>> cluster.unified-platform.io <http://cluster.unified-platform.io> <
>>>> http://cluster.unified-platform.io> <http://cluster.unified-platform.io
>>>> >
>>>> <http://cluster.unified-platform.io>
>>>> was stood up. We've been trying to deploy some apps today and
>>>> so far today we're getting errors on most (if
>>>> not all) of our pods.
>>>>
>>>> 0/9 nodes are available: 3 Insufficient pods, 6 node(s)
>>>> didn't match node selector.
>>>>
>>>> Is what we're seeing. We were thinking it was some
>>>> volume types weren't correct but some of our pods don't even have volumes
>>>> attached and still give us this error (i.e. Jenkins
>>>> slaves or web frontends without persistent storage).
>>>> Any idea what this could be? We're not running out of
>>>> space on the nodes themselves are we?
>>>> We have a demo scheduled for tomorrow at 9:30 AM CST
>>>> and are hoping to get a demo env up for them today but this error came up
>>>> unexpectedly. Also, we're here at 500 Navarro
>>>> St. in San Antonio working through this in person is
>>>> better/easier.
>>>>
>>>> Thanks,
>>>> Daniel Curran
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________________
>>>>
>>>>
>>>> This e-mail and any attachments are intended only for
>>>> the use of the addressee(s) named herein and may contain proprietary
>>>> information. If you are not the intended recipient of this e-mail or
>>>> believe that you received this email in error, please
>>>> take
>>>> immediate
>>>> action to notify the sender of the apparent error by
>>>> reply e-mail; permanently delete the e-mail and any attachments from your
>>>> computer; and do not disseminate, distribute, use, or copy this message and
>>>> any attachments.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> platformONE mailing list
>>>> platformONE at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/platformone
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>> platformONE mailing list
>> platformONE at redhat.com
>> https://www.redhat.com/mailman/listinfo/platformone
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/platformone/attachments/20191206/54111482/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 2127 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/platformone/attachments/20191206/54111482/attachment.png>
More information about the platformONE
mailing list