From noreply at redhat.com Tue Dec 3 00:11:33 2019 From: noreply at redhat.com (noreply at redhat.com) Date: Mon, 2 Dec 2019 19:11:33 -0500 Subject: [Platformone] test msg to platformONE Message-ID: <201912030011.xB30BXBr028360@lists01.pubmisc.prod.ext.phx2.redhat.com> test From mnissley at redhat.com Tue Dec 3 00:30:17 2019 From: mnissley at redhat.com (Mark Nissley) Date: Mon, 2 Dec 2019 19:30:17 -0500 Subject: [Platformone] This list is now active. Message-ID: Current membership is: - aslawter at redhat.com - bgordon at redhat.com - ckuperst at redhat.com - cmckee at redhat.com - darachch at redhat.com - dlystra at redhat.com - jayissi at redhat.com - jhultz at redhat.com - jhurlock at redhat.com - jrickard at redhat.com - kmendez at redhat.com - kmorgan at redhat.com - kodonnel at redhat.com - kreap at redhat.com - mholmes at redhat.com - mnissley at redhat.com - pminchew at redhat.com - tbiggs at redhat.com - tcort at redhat.com - huston at diux.org - roger.dirocco.4 at us.af.mil - tmiller at mitre.org - andrew.goss at accenturefederal.com - austen.bryan.1 at us.af.mil Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled Training: October 14-18* -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnissley at redhat.com Tue Dec 3 11:43:15 2019 From: mnissley at redhat.com (Mark Nissley) Date: Tue, 3 Dec 2019 06:43:15 -0500 Subject: [Platformone] test Message-ID: tes1 Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled Training: October 14-18* -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Tue Dec 3 14:12:50 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Tue, 3 Dec 2019 09:12:50 -0500 Subject: [Platformone] test In-Reply-To: References: Message-ID: Test for Mark Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 6:43 AM Mark Nissley wrote: > tes1 > > > Mark NISSLEY, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled Training: October 14-18* > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmiller at mitre.org Tue Dec 3 14:18:21 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Tue, 3 Dec 2019 14:18:21 +0000 Subject: [Platformone] [EXT] test In-Reply-To: <2683_1575373436_5DE64A79_2683_1031_1_CAPeAGCd6NHO3T9LaQYL_uoJF=rHd-OspSvgvstVV75z_-QGP8w@mail.gmail.com> References: <2683_1575373436_5DE64A79_2683_1031_1_CAPeAGCd6NHO3T9LaQYL_uoJF=rHd-OspSvgvstVV75z_-QGP8w@mail.gmail.com> Message-ID: <1504FDC9-D6C0-4BF3-B6A8-ED40371DF949@mitre.org> Oh, thank Bob! Managing distros by hand is so...1985. :) -- T ?On 12/3/19, 05:44, "platformone-bounces at redhat.com on behalf of Mark Nissley" wrote: tes1 Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 Scheduled Training: October 14-18 From mnissley at redhat.com Tue Dec 3 15:05:36 2019 From: mnissley at redhat.com (Mark Nissley) Date: Tue, 3 Dec 2019 10:05:36 -0500 Subject: [Platformone] Rogue One IATT Actions Message-ID: I am on call with UP Node aka Rogue One. They are getting ready for IATT. Here is the actions that they asked of our team, due COB today: 1. They asked if we can utilize Anchore and/or Twistlock to scan their apps and provide a report. They will be glad to do it as well if we want to make the containers available, but they emphasized that the shortest course of action is the best. 2. A plan of action for all High and Critical items scan results from Colleen's scan (if hardening scripts will be needed, they must be delivered IATT, 20 December) As this is the highest urgency task on our list right now, we need to be able to assign these tasks to specific people and knock them out. *The deadline is COB today on both items*. Who can work with me to make these things happen? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled Training: October 14-18* -------------- next part -------------- An HTML attachment was scrubbed... URL: From kreap at redhat.com Tue Dec 3 15:21:00 2019 From: kreap at redhat.com (Keegan Reap) Date: Tue, 3 Dec 2019 09:21:00 -0600 Subject: [Platformone] Rogue One IATT Actions In-Reply-To: References: Message-ID: Hey Mark & all, As far as the first objective, Twistlock has been deployed to the environment and is ready to start scanning, link below. The Twistlock app is locked behind an admin account, so we will need a POC to share the admin account with. As far as Anchore goes, we have it deployed but it seems something is preventing it from coming up successfully, we are currently going to investigate. https://cluster.unified-platform.io/console/project/levelup-twistlock/overview https://levelup-twistlock.apps.cluster.unified-platform.io/ Thanks, Keegan Reap On Tue, Dec 3, 2019 at 9:07 AM Mark Nissley wrote: > I am on call with UP Node aka Rogue One. They are getting ready for IATT. > Here is the actions that they asked of our team, due COB today: > > 1. They asked if we can utilize Anchore and/or Twistlock to scan their > apps and provide a report. They will be glad to do it as well if we want to > make the containers available, but they emphasized that the shortest course > of action is the best. > 2. A plan of action for all High and Critical items scan results from > Colleen's scan (if hardening scripts will be needed, they must be delivered > IATT, 20 December) > > As this is the highest urgency task on our list right now, we need to be > able to assign these tasks to specific people and knock them out. *The > deadline is COB today on both items*. Who can work with me to make these > things happen? > > > Mark NISSLEY, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled Training: October 14-18* > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhultz at redhat.com Tue Dec 3 15:24:34 2019 From: jhultz at redhat.com (Jonathan Hultz) Date: Tue, 3 Dec 2019 10:24:34 -0500 Subject: [Platformone] Rogue One IATT Actions In-Reply-To: References: Message-ID: Mark, Here is the results for the initial stig run against one of the UP Node ec2 instances. https://dccscr.dsop.io/levelup-automation/security/rhel7-stig/issues/1 There are several Cat 1 and 2s that are not implemented and the reasoning is in the ticket. Corey is currently working on the Sat role which will also need several stigs disabled to run correctly. We are currently waiting for Colleen to rescan the UP Prod host with the stigs applied. Cheers, Jon On Tue, Dec 3, 2019 at 10:07 AM Mark Nissley wrote: > I am on call with UP Node aka Rogue One. They are getting ready for IATT. > Here is the actions that they asked of our team, due COB today: > > 1. They asked if we can utilize Anchore and/or Twistlock to scan their > apps and provide a report. They will be glad to do it as well if we want to > make the containers available, but they emphasized that the shortest course > of action is the best. > 2. A plan of action for all High and Critical items scan results from > Colleen's scan (if hardening scripts will be needed, they must be delivered > IATT, 20 December) > > As this is the highest urgency task on our list right now, we need to be > able to assign these tasks to specific people and knock them out. *The > deadline is COB today on both items*. Who can work with me to make these > things happen? > > > Mark NISSLEY, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled Training: October 14-18* > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -- JONATHAN HULTZ, RHCSA SENIOR CONSULTANT Red Hat Remote US CA jhultz at redhat.com M: 609-713-9778 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnissley at redhat.com Tue Dec 3 15:26:54 2019 From: mnissley at redhat.com (Mark Nissley) Date: Tue, 3 Dec 2019 10:26:54 -0500 Subject: [Platformone] AAM Twistlock scans In-Reply-To: References: Message-ID: Team- Can someone help Mike? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 On Tue, Dec 3, 2019 at 10:19 AM Bubb, Mike wrote: > Mark, > > > > The AAM team has tried to stand up Twistlock, but they do not have admin > privileges. Jay has done Anchore scans on his local machine only. Ed > Mucker has code to stand up Twistlock from what Lettie has told me. > > > > I will address the DAS and CCAT teams after I speak with them. > > > > V/R, > > > > Bubb > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnissley at redhat.com Tue Dec 3 15:29:15 2019 From: mnissley at redhat.com (Mark Nissley) Date: Tue, 3 Dec 2019 10:29:15 -0500 Subject: [Platformone] Rogue One IATT Actions In-Reply-To: References: Message-ID: Adding some of the UP Nodes Team to this thread. Mike, in a separate thread you noted that you were having trouble with Twistlock. Could you send a name and email address for someone on your team that we can grant access to? That may be you... Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled Training: October 14-18* On Tue, Dec 3, 2019 at 10:25 AM Jonathan Hultz wrote: > Mark, > > Here is the results for the initial stig run against one of the > UP Node ec2 instances. > https://dccscr.dsop.io/levelup-automation/security/rhel7-stig/issues/1 > > There are several Cat 1 and 2s that are not implemented and the reasoning > is in the ticket. Corey is currently working on the Sat role which will > also need several stigs disabled to run correctly. > > We are currently waiting for Colleen to rescan the UP Prod host with the > stigs applied. > > Cheers, Jon > > On Tue, Dec 3, 2019 at 10:07 AM Mark Nissley wrote: > >> I am on call with UP Node aka Rogue One. They are getting ready for IATT. >> Here is the actions that they asked of our team, due COB today: >> >> 1. They asked if we can utilize Anchore and/or Twistlock to scan >> their apps and provide a report. They will be glad to do it as well if we >> want to make the containers available, but they emphasized that the >> shortest course of action is the best. >> 2. A plan of action for all High and Critical items scan results from >> Colleen's scan (if hardening scripts will be needed, they must be delivered >> IATT, 20 December) >> >> As this is the highest urgency task on our list right now, we need to be >> able to assign these tasks to specific people and knock them out. *The >> deadline is COB today on both items*. Who can work with me to make these >> things happen? >> >> >> Mark NISSLEY, PMP, CSM, LEAN >> >> PROGRAM MaNAGER & SR technical Project Manager >> >> North American Consulting, Public Sector >> >> >> M: 850-530-3234 >> >> >> >> *Scheduled Training: October 14-18* >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > > > -- > > JONATHAN HULTZ, RHCSA > > SENIOR CONSULTANT > > Red Hat Remote US CA > > jhultz at redhat.com M: 609-713-9778 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kreap at redhat.com Tue Dec 3 15:33:44 2019 From: kreap at redhat.com (Keegan Reap) Date: Tue, 3 Dec 2019 09:33:44 -0600 Subject: [Platformone] Rogue One IATT Actions In-Reply-To: References: Message-ID: With the added people to the thread, I will go ahead and reiterate these points just in case, for full transparency: Hey Mark & all, As far as the first objective, Twistlock has been deployed to the environment and is ready to start scanning, link below. The Twistlock app is locked behind an admin account, so we will need a POC to share the admin account with. As far as Anchore goes, we have it deployed but it seems something is preventing it from coming up successfully, we are currently going to investigate. https://cluster.unified-platform.io/console/project/levelup-twistlock/overview https://levelup-twistlock.apps.cluster.unified-platform.io/ Thanks, Keegan Reap On Tue, Dec 3, 2019 at 9:29 AM Mark Nissley wrote: > Adding some of the UP Nodes Team to this thread. Mike, in a > separate thread you noted that you were having trouble with Twistlock. > Could you send a name and email address for someone on your team that we > can grant access to? That may be you... > > > Mark NISSLEY, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled Training: October 14-18* > > > On Tue, Dec 3, 2019 at 10:25 AM Jonathan Hultz wrote: > >> Mark, >> >> Here is the results for the initial stig run against one of the >> UP Node ec2 instances. >> https://dccscr.dsop.io/levelup-automation/security/rhel7-stig/issues/1 >> >> There are several Cat 1 and 2s that are not implemented and the reasoning >> is in the ticket. Corey is currently working on the Sat role which will >> also need several stigs disabled to run correctly. >> >> We are currently waiting for Colleen to rescan the UP Prod host with the >> stigs applied. >> >> Cheers, Jon >> >> On Tue, Dec 3, 2019 at 10:07 AM Mark Nissley wrote: >> >>> I am on call with UP Node aka Rogue One. They are getting ready for >>> IATT. Here is the actions that they asked of our team, due COB today: >>> >>> 1. They asked if we can utilize Anchore and/or Twistlock to scan >>> their apps and provide a report. They will be glad to do it as well if we >>> want to make the containers available, but they emphasized that the >>> shortest course of action is the best. >>> 2. A plan of action for all High and Critical items scan results >>> from Colleen's scan (if hardening scripts will be needed, they must be >>> delivered IATT, 20 December) >>> >>> As this is the highest urgency task on our list right now, we need to be >>> able to assign these tasks to specific people and knock them out. *The >>> deadline is COB today on both items*. Who can work with me to make >>> these things happen? >>> >>> >>> Mark NISSLEY, PMP, CSM, LEAN >>> >>> PROGRAM MaNAGER & SR technical Project Manager >>> >>> North American Consulting, Public Sector >>> >>> >>> M: 850-530-3234 >>> >>> >>> >>> *Scheduled Training: October 14-18* >>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >> >> >> -- >> >> JONATHAN HULTZ, RHCSA >> >> SENIOR CONSULTANT >> >> Red Hat Remote US CA >> >> jhultz at redhat.com M: 609-713-9778 >> >> > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbubb at mitre.org Tue Dec 3 15:35:12 2019 From: mbubb at mitre.org (Bubb, Mike) Date: Tue, 3 Dec 2019 15:35:12 +0000 Subject: [Platformone] [EXT] Re: Rogue One IATT Actions In-Reply-To: <18425_1575386985_5DE67F68_18425_284_5_CAPeAGCcGEpjzahh+yHFGR_-2xkNmhb7suzts=wBJmb=wsgXSbg@mail.gmail.com> References: <18425_1575386985_5DE67F68_18425_284_5_CAPeAGCcGEpjzahh+yHFGR_-2xkNmhb7suzts=wBJmb=wsgXSbg@mail.gmail.com> Message-ID: Mark, Please use Jay Pascal (cc?d) as the Twistlock POC for the AAM team Bubb From: Mark Nissley Sent: Tuesday, December 3, 2019 9:29 AM To: Jonathan Hultz ; Bubb, Mike ; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP ; jay.pascal at diligent-us.com Cc: platformONE at redhat.com Subject: [EXT] Re: [Platformone] Rogue One IATT Actions Adding some of the UP Nodes Team to this thread. Mike, in a separate thread you noted that you were having trouble with Twistlock. Could you send a name and email address for someone on your team that we can grant access to? That may be you... Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 On Tue, Dec 3, 2019 at 10:25 AM Jonathan Hultz > wrote: Mark, Here is the results for the initial stig run against one of the UP Node ec2 instances. https://dccscr.dsop.io/levelup-automation/security/rhel7-stig/issues/1 There are several Cat 1 and 2s that are not implemented and the reasoning is in the ticket. Corey is currently working on the Sat role which will also need several stigs disabled to run correctly. We are currently waiting for Colleen to rescan the UP Prod host with the stigs applied. Cheers, Jon On Tue, Dec 3, 2019 at 10:07 AM Mark Nissley > wrote: I am on call with UP Node aka Rogue One. They are getting ready for IATT. Here is the actions that they asked of our team, due COB today: 1. They asked if we can utilize Anchore and/or Twistlock to scan their apps and provide a report. They will be glad to do it as well if we want to make the containers available, but they emphasized that the shortest course of action is the best. 2. A plan of action for all High and Critical items scan results from Colleen's scan (if hardening scripts will be needed, they must be delivered IATT, 20 December) As this is the highest urgency task on our list right now, we need to be able to assign these tasks to specific people and knock them out. The deadline is COB today on both items. Who can work with me to make these things happen? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- JONATHAN HULTZ, RHCSA SENIOR CONSULTANT Red Hat Remote US CA jhultz at redhat.com M: 609-713-9778 [https://www.redhat.com/files/brand/email/sig-redhat.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Tue Dec 3 17:26:26 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Tue, 3 Dec 2019 12:26:26 -0500 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: <3052516616bb44caa350c3c6897bc122@DCCPRDEXCH02.ManTech.com> References: <1574703796843.94672@ManTech.com> <1574705271520.5593@ManTech.com> <1574705967080.55870@ManTech.com> <1574706319228.45425@ManTech.com> <1574707851362.16520@ManTech.com> <1574713055636.69810@ManTech.com> <1574714851887.51023@ManTech.com> <1574869597001.92851@ManTech.com> <1575303080504.65964@ManTech.com> <1575311511650.96981@ManTech.com> <6277_1575312368_5DE55BEF_6277_803_1_CA+EGyAB6=xB9Na5Wf-nWY52dwt1200Rk8uh6krw1CXojEaO1PQ@mail.gmail.com> <7E65F8DF-F96B-4657-82A2-4A95715FAC0B@mitre.org> <5A89B24C-3C36-4205-B083-CBFBF71ED122@mitre.org> <3052516616bb44caa350c3c6897bc122@DCCPRDEXCH02.ManTech.com> Message-ID: Russell, Getting more eyes on this @platformONE at redhat.com We'll keep you posted. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C < Russell.Kendall at mantech.com> wrote: > Kevin, > > Unfortunately we are receiving deployment errors again. This is the event: > > 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had taints > that the pod didn't tolerate, 6 node(s) didn't match node selector. > > This is the deployment: > > > https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup > > > V/R, > Russell C Kendall > ________________________________________ > From: Miller, Timothy J. > Sent: Monday, December 2, 2019 2:44:21 PM > To: Kevin O'Donnell > Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC > AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF > AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 USAF > AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors > > Tagged you on it. > > -- T > > ?On 12/2/19, 14:03, "Kevin O'Donnell" wrote: > > Hello, > > > Autoscaling is on our future IAC roadmap. Tim, the additional ticket > would be appreciated. > > > We have swapped out the app/worker instances with m5a.8xlarge 32 > cores, 128gb of ram. Please let us know if you have any other issues. > > > Thanks, > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. > wrote: > > > I'll open an issue. IaC needs to have instance size as a host_var to > facilitate scaling. > > -- T > > On 12/2/19, 13:15, "Kevin O'Donnell" wrote: > > Tim, > > > Thanks for the information. We are undersized on the app/worker > nodes, the cluster has 3 and they are m5.xlarge with only 16gb of ram. From > what I have read each Labs engagement operated on a 3 node worker cluster > with each node having 6core's and 28gb > of ram. We will need to swap out the existing instances with > larger spec's. > > > We are going to try to flush the existing workload out on one of > the workers to see if we can swap them out one at a time. > > > Thanks, > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting < > https://www.redhat.com/> > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. < > tmiller at mitre.org> wrote: > > > Here's what I can see, given the perm limits I seem to be under: > > - NS:develop-misp-app and NS:lp-develop-misp-app both have several > sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned while > trying to fetch something from somewhere (URL isn't recorded in the stack > trace). > > - NS:minishift-misp-app has most of its pods/jobs stuck in > ImagePullBackoff. No detail there in the event stream so I'll see if I can > dig deeper. > > - NS:aam-ci-cd has Jenkins trying to spin up three workers, those > are coming back as unschedulable. > > I can't see into NS:aam-bases or NS:dsop-images b/c of perm limits. > > I see no DAS-related project(s). > > The MISP stuff needs debugging before calling "blocked" since it > looks like an internal error from this perspective. > > > > In re: AAM Jenkins: If this deployment is coming out of the OCP > storefront, then maybe it should be ephemeral rather than persistent. If > it's a custom deployment, then it probably needs a rethink. > > I'm also not sure why there are two MISP dev projects. > > -- T > > > > On 12/2/19, 12:46, "Kevin O'Donnell" wrote: > > Russell, > > > Thank you for the information. We can switch out the instance > type for the worker nodes. How much memory is required by the apps? > > > > Thanks, > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting < > https://www.redhat.com/> > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C < > Russell.Kendall at mantech.com> wrote: > > > Kevin, > The lack of resources on > u-p.io > cluster is hindering development, > testing, and integration of the apps from CCAT AAM DAS, which is > putting one > of our PI goals at risk. > > > We are blocked by the fact that we (CCAT and AAM) cannot > deploy additional pods to the > unified-platform.io < > http://unified-platform.io> > cluster. We have a subset of containers deployed, but rolling > deployments and new deployments fail. This means that we are not > able to execute integration testing or peer reviews. > We are temporarily working around by NOT testing/reviewing > our code changes live, something that no one likes. Also, we are now > running weeks-old instances of our containers, so we are very likely > producing some technical debt. We currently have developers > approaching idle or doing non-priority work until the > resource issue is resolved. > > > > Here is the particular error from the OSP cluster I received > while attempting a redeploy of one of our apps. > > > > 0/9 nodes are available: 1 node(s) had taints that the pod > didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node > selector.11 times in the last minute > > Since we do not have any cluster permissions, I cannot verify > which resource is running out, but from experience, I assess it is a memory > issue. > > > > It appears the cluster has been provisioned with a silly > allocation of node types. Without knowing exactly what was deployed, it > appears only 3 of the 9 hosts are suitable worker nodes. We would expect > the cluster to respond to resource limitations > and > scale, > but if a scheduled downtime is required, please work with us > so we can anticipate. As it stands, the cluster does not support resources > required by CCAT and the other dev teams (AAM, DAS, etc.). We would accept > any downtime if it will improve the situation, > as we are blocked from progressing under the current > constraints. My hope was we could get the cluster redeployed over the TG > holiday to eliminate developer impact, but as Mark pointed out, there were > limited support folks available. Now I am just > trying > to > minimize the losses. > > > > V/R, > > Russell C Kendall > > > > > > ________________________________________ > From: Kevin O'Donnell > Sent: Monday, December 2, 2019 11:52 AM > To: Kendall, Russell C > Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC > AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF > AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); DIROCCO, > ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; > RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: Re: Unified Platform Pod Deploy Errors > > Hello Russell, > > > Can you elaborate on the term Blocked? What specific issues > are the blockers? > > > > Thanks, > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting < > https://www.redhat.com/> > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C < > Russell.Kendall at mantech.com> wrote: > > > Mark, > > Thank for acknowledging, please be aware the San Antonio dev > teams working in > > unified-platform.io < > http://unified-platform.io> > are currently blocked. > > V/R, > > Russell C Kendall > > ________________________________________ > From: Mark Nissley > Sent: Monday, December 2, 2019 9:36 AM > To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan > Rickard; Chris Kuperstein > Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin > O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org); > DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy > J.; > RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: Re: Unified Platform Pod Deploy Errors > > As noted, I don't suspect much got done on this over the > holiday weekend. I did see the ticket, as dropped some details into it. I > also assigned it to @Jonathan > Rickard and @Chris Kuperstein > . > > > > It looks like short term solutions have been easy but the > issue is recurring. > > > > > Mark NISSLEY, PMP, > CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > North American Consulting, Public Sector > > M: > 850-530-3234 > > > Scheduled Training: October 14-18 > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 > USAF AFMC AFLCMC/HNCP wrote: > > > Mark/Kevin, > > > I just heard at the team stand up that we are still blocked. > This is also affecting the AAM team from my investigations. > > > Please let me know if there is something we need to do to move > this forward. > > Most Sincerely, > > > Ade Abodunrin, GG-12, USAF > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > > LevelUP Code Works > Commercial: > (210) 890-2113 > NIPR email: > ademola.abodunrin at us.af.mil > > > > > > > > > ________________________________________ > From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > Sent: Wednesday, November 27, 2019 12:58 PM > To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < > austen.bryan.1 at us.af.mil>; Mark Nissley ; Kevin > O'Donnell ; > Brenna Gordon > Cc: Kendall, Russell C ; Bubb, > Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 USAF > AFMC ESC/AFLCMC/HNCP > ; Miller, Timothy J. < > tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < > jose.ramirez.50.ctr at us.af.mil> > Subject: Re: Unified Platform Pod Deploy Errors > > Thanks a lot Capt Bryan! Russell created the ticket on GitLab > UP Node Project. > > > > > Most Sincerely, > > > Ade Abodunrin, GG-12, USAF > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > > LevelUP Code Works > Commercial: > (210) 890-2113 > NIPR email: > ademola.abodunrin at us.af.mil > > > > > > > > > ________________________________________ > From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < > austen.bryan.1 at us.af.mil> > Sent: Wednesday, November 27, 2019 12:56 PM > To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < > ademola.abodunrin at us.af.mil>; Mark Nissley ; Kevin > O'Donnell > ; Brenna Gordon > Cc: Kendall, Russell C ; Bubb, > Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 USAF > AFMC ESC/AFLCMC/HNCP > ; Miller, Timothy J. < > tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < > jose.ramirez.50.ctr at us.af.mil> > Subject: RE: Unified Platform Pod Deploy Errors > > Thanks Ade. The team is thin until next week due to the > holidays but I will make sure it is addressed. Were there any issues > submitted to Gitlab?s UP Node Project on DCCSCR? > > @Mark/Kevin ? can we address? > > -Austen > > From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < > ademola.abodunrin at us.af.mil> > > Sent: Wednesday, November 27, 2019 9:51 AM > To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < > austen.bryan.1 at us.af.mil> > Cc: Kendall, Russell C ; Bubb, > Mike (mbubb at mitre.org) > Subject: Fw: Unified Platform Pod Deploy Errors > > > > Capt Bryan, > > Please see the explanation on the issue that Ginyu Force is > currently experiencing below. > > > > Most Sincerely, > > Ade Abodunrin, GG-12, USAF > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > LevelUP Code Works > Commercial: (210) 890-2113 > NIPR email: > ademola.abodunrin at us.af.mil > > > > > > ________________________________________ > > From: Kendall, Russell C > Sent: Wednesday, November 27, 2019 9:46 AM > To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < > ademola.abodunrin at us.af.mil>; Buffaloe, > Christopher ; Molina, Toby > ; > Crace, Jared E ; SANCHEZ, MARK > GG-13 USAF AFMC AFLCMC/HNCP > Cc: > tmiller at mitre.org < > tmiller at mitre.org> > Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy > Errors > > > > Gentlemen, > > The application development teams working in the new GovCloud > OCP environment (unified-platform.io < > http://unified-platform.io> > ) > are currently blocked in efforts to deploy new pods for > testing, development, and UAT. > > Red Hat and RogueOne SMEs have been notified and have > attempted some fixes starting on Monday 11/25, but at this point have not > been able to provision resources > sufficient to host CCAT and AAM. > > We have taken steps to minimize our footprint (eliminating > demonstration environment, deleting developer namespaces), but this is not > a sustainable approach, > and has only resulted in moderate improvements in cluster > performance. > > Our hope is the U-P.io cluster compute resources can be > increased very soon, so that we may resume normal development activities. > Our understanding is that > such a scaling requires a complete redeployment of the > cluster, which is unusual, but an acceptable loss to productivity. If the > cluster can be scaled up over the Thanksgiving holiday, the impact will be > minimal to developers and cluster administrators, > alike. > > We are currently collaborating on solutions on the following > MatterMost channel behind the space camp VPN (link below), and via the > email thread forwarded > (further below). > > > https://chat.spacecamp.ninja/levelup/channels/unified-platform-node < > https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> < > https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> > > Please keep me posted on developments and I will coordinate > developer activities with any scheduled platform outages. > > V/R, > Russell C Kendall > > ________________________________________ > > From: Curran, Daniel M > Sent: Monday, November 25, 2019 2:47 PM > To: Jonathan Rickard > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, Mark > Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph > J > Subject: Re: Unified Platform Pod Deploy Errors > > > > Sounds great. Appreciate it. > I'll watch email and Mattermost in case you need more from us. > > -Daniel > > ________________________________________ > > From: Jonathan Rickard > Sent: Monday, November 25, 2019 2:44 PM > To: Curran, Daniel M > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, Mark > Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph > J > Subject: Re: Unified Platform Pod Deploy Errors > > > > Thanks Daniel - > > > > I'll continue to look into the resource issue that you're > seeing - I'd like to identify the root cause and then work with the team to > come up with a solution. > > > > Jonathan Rickard, > RHCA > Principal Consultant, NAPS > Red > Hat Remote - Texas > jonny at redhat.com > > M: 210-862-9739 > > > > > > > > > > > > > > > On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M < > Daniel.Curran at mantech.com> > wrote: > > > Yeah we hit the limit then had AAM kill some of their projects > and then our pods got scheduled. > We've hit the limit again though. Here's an example pod that > cannot be scheduled > > > > https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth > < > https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth> > < > https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth > > > They're seeing it when their jenkins slaves can't deploy but > it's basically any pod after we hit some limit. > > -Daniel > ________________________________________ > > From: Jonathan Rickard > Sent: Monday, November 25, 2019 1:26 PM > To: Curran, Daniel M > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, Mark > Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph > J > Subject: Re: Unified Platform Pod Deploy Errors > > > > Daniel, > > > > I can see that you have 3 mongo pods, 1 chatup and 1 upbot pod > running ... is your app good to go? > > > > Looks like there was an issue with memory on 1 pod, then some > node selector being mismatched - just what i could see in the events... > > > > > > > Jonathan Rickard, > RHCA > Principal Consultant, NAPS > Red > Hat Remote - Texas > jonny at redhat.com > > M: 210-862-9739 > > > > > > > > > > > > > > > On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M < > Daniel.Curran at mantech.com> > wrote: > > > Also, AAM was having similar issues. Looks like they had a lot > of namespaces and scaling down the pods on their deployments didn't help > but actually deleting the namespaces > did. > We have pods scheduling now but I'm adding them and we'd still > like to work through what resource limit we were hitting to avoid this in > the future. > > -Daniel > > ________________________________________ > > From: Curran, Daniel M > Sent: Monday, November 25, 2019 12:25 PM > To: Jonathan Rickard > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, Mark > Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander > Subject: Re: Unified Platform Pod Deploy Errors > > > > Thanks, sir. > Most important for us to get working is "ccat-demo" but it's > also happening in "ccat-dev" and "ccat-ci-cd". > > -Daniel > ________________________________________ > > From: Jonathan Rickard > Sent: Monday, November 25, 2019 12:22 PM > To: Curran, Daniel M > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, Mark > Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander > Subject: Re: Unified Platform Pod Deploy Errors > > > > What's the name of the project you're working in? I'm going to > be back at my laptop in about 30 and will take a look when I get there. > > > > Is it just the Jenkins pods failing? > > > > > > > > On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M < > Daniel.Curran at mantech.com> > wrote: > > > Adding Dean and Alex. > Also, sitting in mattermost if anyone needs to get online and > chat for more information. > > -Daniel > > ________________________________________ > > From: Curran, Daniel M > Sent: Monday, November 25, 2019 12:07 PM > To: > jonny at redhat.com ; > > ckuperst at redhat.com ; Mark Nissley > Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; > Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher > Subject: Re: Unified Platform Pod Deploy Errors > > > > Adding Kupe and Mark. > > -Daniel > ________________________________________ > > From: Curran, Daniel M > Sent: Monday, November 25, 2019 11:43 AM > To: > jonny at redhat.com > Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; > Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher > Subject: Unified Platform Pod Deploy Errors > > > > Hey Jonny, > > We met briefly at SpaceCAMP a couple weeks ago when > > > > cluster.unified-platform.io < > http://cluster.unified-platform.io> > was stood up. We've been trying to deploy some apps today and so far > today we're getting errors on most (if > not all) of our pods. > > 0/9 nodes are available: 3 Insufficient pods, 6 node(s) didn't > match node selector. > > Is what we're seeing. We were thinking it was some volume > types weren't correct but some of our pods don't even have volumes attached > and still give us this error (i.e. Jenkins > slaves or web frontends without persistent storage). > Any idea what this could be? We're not running out of space on > the nodes themselves are we? > We have a demo scheduled for tomorrow at 9:30 AM CST and are > hoping to get a demo env up for them today but this error came up > unexpectedly. Also, we're here at 500 Navarro > St. in San Antonio working through this in person is > better/easier. > > Thanks, > Daniel Curran > > > > > > ________________________________________ > > > This e-mail and any attachments are intended only for the use > of the addressee(s) named herein and may contain proprietary information. > If you are not the intended recipient of this e-mail or believe that you > received this email in error, please take > immediate > action to notify the sender of the apparent error by reply > e-mail; permanently delete the e-mail and any attachments from your > computer; and do not disseminate, distribute, use, or copy this message and > any attachments. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Edward.Mucker at ngc.com Tue Dec 3 17:41:53 2019 From: Edward.Mucker at ngc.com (Mucker, Edward [US] (MS) (Contr)) Date: Tue, 3 Dec 2019 17:41:53 +0000 Subject: [Platformone] EXT :Re: AAM Twistlock scans In-Reply-To: References: , Message-ID: <1575394913213.71273@ngc.com> Mike, Give me a call when you get a chance and we can figure out the best way to help out. Thanks, Ed 210-854-2155 ________________________________ From: Mark Nissley Sent: Tuesday, December 3, 2019 9:26 AM To: Bubb, Mike; platformONE at redhat.com Cc: SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP; Pascal, Jay [Diligent Consulting Inc.,]; Mucker, Edward [US] (MS) (Contr) Subject: EXT :Re: AAM Twistlock scans Team- Can someone help Mike? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Tue, Dec 3, 2019 at 10:19 AM Bubb, Mike > wrote: Mark, The AAM team has tried to stand up Twistlock, but they do not have admin privileges. Jay has done Anchore scans on his local machine only. Ed Mucker has code to stand up Twistlock from what Lettie has told me. I will address the DAS and CCAT teams after I speak with them. V/R, Bubb -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhurlock at redhat.com Tue Dec 3 17:50:26 2019 From: jhurlock at redhat.com (John Hurlocker) Date: Tue, 3 Dec 2019 12:50:26 -0500 Subject: [Platformone] EXT :Re: AAM Twistlock scans In-Reply-To: <1575394913213.71273@ngc.com> References: <1575394913213.71273@ngc.com> Message-ID: Both Twistlock and Anchore are up now. Twistlock https://cluster.unified-platform.io/console/project/levelup-twistlock/overview Anchore https://cluster.unified-platform.io/console/project/levelup-anchore/overview On Tue, Dec 3, 2019 at 12:42 PM Mucker, Edward [US] (MS) (Contr) < Edward.Mucker at ngc.com> wrote: > Mike, > > > Give me a call when you get a chance and we can figure out the best way to > help out. > > > Thanks, > > > Ed > > 210-854-2155 > ------------------------------ > *From:* Mark Nissley > *Sent:* Tuesday, December 3, 2019 9:26 AM > *To:* Bubb, Mike; platformONE at redhat.com > *Cc:* SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP; Pascal, Jay [Diligent > Consulting Inc.,]; Mucker, Edward [US] (MS) (Contr) > *Subject:* EXT :Re: AAM Twistlock scans > > Team- > > Can someone help Mike? > > > Mark NISSLEY, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > > > > On Tue, Dec 3, 2019 at 10:19 AM Bubb, Mike wrote: > >> Mark, >> >> >> >> The AAM team has tried to stand up Twistlock, but they do not have admin >> privileges. Jay has done Anchore scans on his local machine only. Ed >> Mucker has code to stand up Twistlock from what Lettie has told me. >> >> >> >> I will address the DAS and CCAT teams after I speak with them. >> >> >> >> V/R, >> >> >> >> Bubb >> > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kreap at redhat.com Tue Dec 3 17:53:14 2019 From: kreap at redhat.com (Keegan Reap) Date: Tue, 3 Dec 2019 11:53:14 -0600 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: References: <1574703796843.94672@ManTech.com> <1574705271520.5593@ManTech.com> <1574705967080.55870@ManTech.com> <1574706319228.45425@ManTech.com> <1574707851362.16520@ManTech.com> <1574713055636.69810@ManTech.com> <1574714851887.51023@ManTech.com> <1574869597001.92851@ManTech.com> <1575303080504.65964@ManTech.com> <1575311511650.96981@ManTech.com> <6277_1575312368_5DE55BEF_6277_803_1_CA+EGyAB6=xB9Na5Wf-nWY52dwt1200Rk8uh6krw1CXojEaO1PQ@mail.gmail.com> <7E65F8DF-F96B-4657-82A2-4A95715FAC0B@mitre.org> <5A89B24C-3C36-4205-B083-CBFBF71ED122@mitre.org> <3052516616bb44caa350c3c6897bc122@DCCPRDEXCH02.ManTech.com> Message-ID: Hey all, we have opened an issue below, that we believe to be the cause, we are currently investigating: https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard wrote: > Russell, > > Getting more eyes on this @platformONE at redhat.com > > > We'll keep you posted. > jonny > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C < > Russell.Kendall at mantech.com> wrote: > >> Kevin, >> >> Unfortunately we are receiving deployment errors again. This is the event: >> >> 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had >> taints that the pod didn't tolerate, 6 node(s) didn't match node selector. >> >> This is the deployment: >> >> >> https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup >> >> >> V/R, >> Russell C Kendall >> ________________________________________ >> From: Miller, Timothy J. >> Sent: Monday, December 2, 2019 2:44:21 PM >> To: Kevin O'Donnell >> Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF >> AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt >> USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 >> USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >> Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors >> >> Tagged you on it. >> >> -- T >> >> ?On 12/2/19, 14:03, "Kevin O'Donnell" wrote: >> >> Hello, >> >> >> Autoscaling is on our future IAC roadmap. Tim, the additional ticket >> would be appreciated. >> >> >> We have swapped out the app/worker instances with m5a.8xlarge 32 >> cores, 128gb of ram. Please let us know if you have any other issues. >> >> >> Thanks, >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. >> wrote: >> >> >> I'll open an issue. IaC needs to have instance size as a host_var to >> facilitate scaling. >> >> -- T >> >> On 12/2/19, 13:15, "Kevin O'Donnell" wrote: >> >> Tim, >> >> >> Thanks for the information. We are undersized on the app/worker >> nodes, the cluster has 3 and they are m5.xlarge with only 16gb of ram. From >> what I have read each Labs engagement operated on a 3 node worker cluster >> with each node having 6core's and 28gb >> of ram. We will need to swap out the existing instances with >> larger spec's. >> >> >> We are going to try to flush the existing workload out on one of >> the workers to see if we can swap them out one at a time. >> >> >> Thanks, >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting < >> https://www.redhat.com/> >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. < >> tmiller at mitre.org> wrote: >> >> >> Here's what I can see, given the perm limits I seem to be under: >> >> - NS:develop-misp-app and NS:lp-develop-misp-app both have >> several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned >> while trying to fetch something from somewhere (URL isn't recorded in the >> stack trace). >> >> - NS:minishift-misp-app has most of its pods/jobs stuck in >> ImagePullBackoff. No detail there in the event stream so I'll see if I can >> dig deeper. >> >> - NS:aam-ci-cd has Jenkins trying to spin up three workers, those >> are coming back as unschedulable. >> >> I can't see into NS:aam-bases or NS:dsop-images b/c of perm >> limits. >> >> I see no DAS-related project(s). >> >> The MISP stuff needs debugging before calling "blocked" since it >> looks like an internal error from this perspective. >> >> >> >> In re: AAM Jenkins: If this deployment is coming out of the OCP >> storefront, then maybe it should be ephemeral rather than persistent. If >> it's a custom deployment, then it probably needs a rethink. >> >> I'm also not sure why there are two MISP dev projects. >> >> -- T >> >> >> >> On 12/2/19, 12:46, "Kevin O'Donnell" wrote: >> >> Russell, >> >> >> Thank you for the information. We can switch out the instance >> type for the worker nodes. How much memory is required by the apps? >> >> >> >> Thanks, >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting < >> https://www.redhat.com/> >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C < >> Russell.Kendall at mantech.com> wrote: >> >> >> Kevin, >> The lack of resources on >> u-p.io >> cluster is hindering development, >> testing, and integration of the apps from CCAT AAM DAS, which is >> putting one >> of our PI goals at risk. >> >> >> We are blocked by the fact that we (CCAT and AAM) cannot >> deploy additional pods to the >> unified-platform.io < >> http://unified-platform.io> >> cluster. We have a subset of containers deployed, but rolling >> deployments and new deployments fail. This means that we are not >> able to execute integration testing or peer reviews. >> We are temporarily working around by NOT testing/reviewing >> our code changes live, something that no one likes. Also, we are now >> running weeks-old instances of our containers, so we are very likely >> producing some technical debt. We currently have developers >> approaching idle or doing non-priority work until the >> resource issue is resolved. >> >> >> >> Here is the particular error from the OSP cluster I received >> while attempting a redeploy of one of our apps. >> >> >> >> 0/9 nodes are available: 1 node(s) had taints that the pod >> didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node >> selector.11 times in the last minute >> >> Since we do not have any cluster permissions, I cannot verify >> which resource is running out, but from experience, I assess it is a memory >> issue. >> >> >> >> It appears the cluster has been provisioned with a silly >> allocation of node types. Without knowing exactly what was deployed, it >> appears only 3 of the 9 hosts are suitable worker nodes. We would expect >> the cluster to respond to resource limitations >> and >> scale, >> but if a scheduled downtime is required, please work with us >> so we can anticipate. As it stands, the cluster does not support resources >> required by CCAT and the other dev teams (AAM, DAS, etc.). We would accept >> any downtime if it will improve the situation, >> as we are blocked from progressing under the current >> constraints. My hope was we could get the cluster redeployed over the TG >> holiday to eliminate developer impact, but as Mark pointed out, there were >> limited support folks available. Now I am just >> trying >> to >> minimize the losses. >> >> >> >> V/R, >> >> Russell C Kendall >> >> >> >> >> >> ________________________________________ >> From: Kevin O'Donnell >> Sent: Monday, December 2, 2019 11:52 AM >> To: Kendall, Russell C >> Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >> AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF >> AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); DIROCCO, >> ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; >> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >> Subject: Re: Unified Platform Pod Deploy Errors >> >> Hello Russell, >> >> >> Can you elaborate on the term Blocked? What specific issues >> are the blockers? >> >> >> >> Thanks, >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting < >> https://www.redhat.com/> >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C < >> Russell.Kendall at mantech.com> wrote: >> >> >> Mark, >> >> Thank for acknowledging, please be aware the San Antonio dev >> teams working in >> >> unified-platform.io < >> http://unified-platform.io> >> are currently blocked. >> >> V/R, >> >> Russell C Kendall >> >> ________________________________________ >> From: Mark Nissley >> Sent: Monday, December 2, 2019 9:36 AM >> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; >> Jonathan Rickard; Chris Kuperstein >> Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin >> O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org); >> DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy >> J.; >> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >> Subject: Re: Unified Platform Pod Deploy Errors >> >> As noted, I don't suspect much got done on this over the >> holiday weekend. I did see the ticket, as dropped some details into it. I >> also assigned it to @Jonathan >> Rickard and @Chris Kuperstein >> . >> >> >> >> It looks like short term solutions have been easy but the >> issue is recurring. >> >> >> >> >> Mark NISSLEY, PMP, >> CSM, LEAN >> >> PROGRAM MaNAGER & SR technical Project Manager >> North American Consulting, Public Sector >> >> M: >> 850-530-3234 >> >> >> Scheduled Training: October 14-18 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 >> USAF AFMC AFLCMC/HNCP wrote: >> >> >> Mark/Kevin, >> >> >> I just heard at the team stand up that we are still blocked. >> This is also affecting the AAM team from my investigations. >> >> >> Please let me know if there is something we need to do to >> move this forward. >> >> Most Sincerely, >> >> >> Ade Abodunrin, GG-12, USAF >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> >> LevelUP Code Works >> Commercial: >> (210) 890-2113 >> NIPR email: >> ademola.abodunrin at us.af.mil >> >> >> >> >> >> >> >> >> ________________________________________ >> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >> Sent: Wednesday, November 27, 2019 12:58 PM >> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >> austen.bryan.1 at us.af.mil>; Mark Nissley ; Kevin >> O'Donnell ; >> Brenna Gordon >> Cc: Kendall, Russell C ; Bubb, >> Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 USAF >> AFMC ESC/AFLCMC/HNCP >> ; Miller, Timothy J. < >> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >> jose.ramirez.50.ctr at us.af.mil> >> Subject: Re: Unified Platform Pod Deploy Errors >> >> Thanks a lot Capt Bryan! Russell created the ticket on GitLab >> UP Node Project. >> >> >> >> >> Most Sincerely, >> >> >> Ade Abodunrin, GG-12, USAF >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> >> LevelUP Code Works >> Commercial: >> (210) 890-2113 >> NIPR email: >> ademola.abodunrin at us.af.mil >> >> >> >> >> >> >> >> >> ________________________________________ >> From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >> austen.bryan.1 at us.af.mil> >> Sent: Wednesday, November 27, 2019 12:56 PM >> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >> ademola.abodunrin at us.af.mil>; Mark Nissley ; Kevin >> O'Donnell >> ; Brenna Gordon >> Cc: Kendall, Russell C ; Bubb, >> Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 USAF >> AFMC ESC/AFLCMC/HNCP >> ; Miller, Timothy J. < >> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >> jose.ramirez.50.ctr at us.af.mil> >> Subject: RE: Unified Platform Pod Deploy Errors >> >> Thanks Ade. The team is thin until next week due to the >> holidays but I will make sure it is addressed. Were there any issues >> submitted to Gitlab?s UP Node Project on DCCSCR? >> >> @Mark/Kevin ? can we address? >> >> -Austen >> >> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >> ademola.abodunrin at us.af.mil> >> >> Sent: Wednesday, November 27, 2019 9:51 AM >> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >> austen.bryan.1 at us.af.mil> >> Cc: Kendall, Russell C ; Bubb, >> Mike (mbubb at mitre.org) >> Subject: Fw: Unified Platform Pod Deploy Errors >> >> >> >> Capt Bryan, >> >> Please see the explanation on the issue that Ginyu Force is >> currently experiencing below. >> >> >> >> Most Sincerely, >> >> Ade Abodunrin, GG-12, USAF >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> LevelUP Code Works >> Commercial: (210) 890-2113 >> NIPR email: >> ademola.abodunrin at us.af.mil >> >> >> >> >> >> ________________________________________ >> >> From: Kendall, Russell C >> Sent: Wednesday, November 27, 2019 9:46 AM >> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >> ademola.abodunrin at us.af.mil>; Buffaloe, >> Christopher ; Molina, >> Toby ; >> Crace, Jared E ; SANCHEZ, MARK >> GG-13 USAF AFMC AFLCMC/HNCP >> Cc: >> tmiller at mitre.org < >> tmiller at mitre.org> >> Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy >> Errors >> >> >> >> Gentlemen, >> >> The application development teams working in the new GovCloud >> OCP environment (unified-platform.io < >> http://unified-platform.io> >> ) >> are currently blocked in efforts to deploy new pods for >> testing, development, and UAT. >> >> Red Hat and RogueOne SMEs have been notified and have >> attempted some fixes starting on Monday 11/25, but at this point have not >> been able to provision resources >> sufficient to host CCAT and AAM. >> >> We have taken steps to minimize our footprint (eliminating >> demonstration environment, deleting developer namespaces), but this is not >> a sustainable approach, >> and has only resulted in moderate improvements in cluster >> performance. >> >> Our hope is the U-P.io cluster compute resources can be >> increased very soon, so that we may resume normal development activities. >> Our understanding is that >> such a scaling requires a complete redeployment of the >> cluster, which is unusual, but an acceptable loss to productivity. If the >> cluster can be scaled up over the Thanksgiving holiday, the impact will be >> minimal to developers and cluster administrators, >> alike. >> >> We are currently collaborating on solutions on the following >> MatterMost channel behind the space camp VPN (link below), and via the >> email thread forwarded >> (further below). >> >> >> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node < >> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> < >> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> >> >> Please keep me posted on developments and I will coordinate >> developer activities with any scheduled platform outages. >> >> V/R, >> Russell C Kendall >> >> ________________________________________ >> >> From: Curran, Daniel M >> Sent: Monday, November 25, 2019 2:47 PM >> To: Jonathan Rickard >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, Mark >> Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >> Joseph J >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Sounds great. Appreciate it. >> I'll watch email and Mattermost in case you need more from us. >> >> -Daniel >> >> ________________________________________ >> >> From: Jonathan Rickard >> Sent: Monday, November 25, 2019 2:44 PM >> To: Curran, Daniel M >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, Mark >> Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >> Joseph J >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Thanks Daniel - >> >> >> >> I'll continue to look into the resource issue that you're >> seeing - I'd like to identify the root cause and then work with the team to >> come up with a solution. >> >> >> >> Jonathan Rickard, >> RHCA >> Principal Consultant, NAPS >> Red >> Hat Remote - Texas >> jonny at redhat.com >> >> M: 210-862-9739 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M < >> Daniel.Curran at mantech.com> >> wrote: >> >> >> Yeah we hit the limit then had AAM kill some of their >> projects and then our pods got scheduled. >> We've hit the limit again though. Here's an example pod that >> cannot be scheduled >> >> >> >> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >> < >> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth> >> < >> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >> > >> They're seeing it when their jenkins slaves can't deploy but >> it's basically any pod after we hit some limit. >> >> -Daniel >> ________________________________________ >> >> From: Jonathan Rickard >> Sent: Monday, November 25, 2019 1:26 PM >> To: Curran, Daniel M >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, Mark >> Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >> Joseph J >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Daniel, >> >> >> >> I can see that you have 3 mongo pods, 1 chatup and 1 upbot >> pod running ... is your app good to go? >> >> >> >> Looks like there was an issue with memory on 1 pod, then some >> node selector being mismatched - just what i could see in the events... >> >> >> >> >> >> >> Jonathan Rickard, >> RHCA >> Principal Consultant, NAPS >> Red >> Hat Remote - Texas >> jonny at redhat.com >> >> M: 210-862-9739 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M < >> Daniel.Curran at mantech.com> >> wrote: >> >> >> Also, AAM was having similar issues. Looks like they had a >> lot of namespaces and scaling down the pods on their deployments didn't >> help but actually deleting the namespaces >> did. >> We have pods scheduling now but I'm adding them and we'd >> still like to work through what resource limit we were hitting to avoid >> this in the future. >> >> -Daniel >> >> ________________________________________ >> >> From: Curran, Daniel M >> Sent: Monday, November 25, 2019 12:25 PM >> To: Jonathan Rickard >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, Mark >> Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Thanks, sir. >> Most important for us to get working is "ccat-demo" but it's >> also happening in "ccat-dev" and "ccat-ci-cd". >> >> -Daniel >> ________________________________________ >> >> From: Jonathan Rickard >> Sent: Monday, November 25, 2019 12:22 PM >> To: Curran, Daniel M >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, Mark >> Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> What's the name of the project you're working in? I'm going >> to be back at my laptop in about 30 and will take a look when I get there. >> >> >> >> Is it just the Jenkins pods failing? >> >> >> >> >> >> >> >> On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M < >> Daniel.Curran at mantech.com> >> wrote: >> >> >> Adding Dean and Alex. >> Also, sitting in mattermost if anyone needs to get online and >> chat for more information. >> >> -Daniel >> >> ________________________________________ >> >> From: Curran, Daniel M >> Sent: Monday, November 25, 2019 12:07 PM >> To: >> jonny at redhat.com ; >> >> ckuperst at redhat.com ; Mark >> Nissley >> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; >> Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Adding Kupe and Mark. >> >> -Daniel >> ________________________________________ >> >> From: Curran, Daniel M >> Sent: Monday, November 25, 2019 11:43 AM >> To: >> jonny at redhat.com >> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; >> Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >> Subject: Unified Platform Pod Deploy Errors >> >> >> >> Hey Jonny, >> >> We met briefly at SpaceCAMP a couple weeks ago when >> >> >> >> cluster.unified-platform.io < >> http://cluster.unified-platform.io> >> was stood up. We've been trying to deploy some apps today and so far >> today we're getting errors on most (if >> not all) of our pods. >> >> 0/9 nodes are available: 3 Insufficient pods, 6 node(s) >> didn't match node selector. >> >> Is what we're seeing. We were thinking it was some volume >> types weren't correct but some of our pods don't even have volumes attached >> and still give us this error (i.e. Jenkins >> slaves or web frontends without persistent storage). >> Any idea what this could be? We're not running out of space >> on the nodes themselves are we? >> We have a demo scheduled for tomorrow at 9:30 AM CST and are >> hoping to get a demo env up for them today but this error came up >> unexpectedly. Also, we're here at 500 Navarro >> St. in San Antonio working through this in person is >> better/easier. >> >> Thanks, >> Daniel Curran >> >> >> >> >> >> ________________________________________ >> >> >> This e-mail and any attachments are intended only for the use >> of the addressee(s) named herein and may contain proprietary information. >> If you are not the intended recipient of this e-mail or believe that you >> received this email in error, please take >> immediate >> action to notify the sender of the apparent error by reply >> e-mail; permanently delete the e-mail and any attachments from your >> computer; and do not disseminate, distribute, use, or copy this message and >> any attachments. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ademola.abodunrin at us.af.mil Tue Dec 3 18:43:10 2019 From: ademola.abodunrin at us.af.mil (ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP) Date: Tue, 3 Dec 2019 18:43:10 +0000 Subject: [Platformone] [Non-DoD Source] Re: EXT :Re: OpenShift Questions In-Reply-To: References: <0db5dd4e86c24c69ace02da1309ccb22@XCGC3021.northgrum.com> <1574192134493.86386@ManTech.com> , Message-ID: Good afternoon All, Please assist us with the problem below. The team has logged a ticket in the GitLab as well. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform [cid:image001.png at 01D4F814.4AA552D0] LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________ From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Sent: Friday, November 22, 2019 1:50 PM To: Mike Knoth ; Kendall, Russell C ; Walter Steins ; Blade, Eric D [US] (MS) Cc: McKay, Brent [US] (MS) (Contr) ; Marc Cooper Subject: RE: [Non-DoD Source] Re: EXT :Re: OpenShift Questions Good afternoon Walter/Eric, Please who is able to assist us with Mike?s concern below? Thanks for your help! Most Sincerely, Ade Abodunrin, GG-12, USAF Acquisition Program Manager [cid:image001.png at 01D4F814.4AA552D0] LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil From: Mike Knoth Sent: Wednesday, November 20, 2019 10:22 AM To: Kendall, Russell C Cc: Walter Steins ; Blade, Eric D [US] (MS) ; McKay, Brent [US] (MS) (Contr) ; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP ; Marc Cooper Subject: [Non-DoD Source] Re: EXT :Re: OpenShift Questions Thanks I got a lot closer now, with some components being deployed. I'm getting some errors unique to this Openshift though. The below is something I have in my YAML file, for several of the components. securityContext: fsGroup: 11111 runAsUser: 11111 With the "runAsUser", Openshift would say: Error creating: pods "openam-1-" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{11111}: 11111 is not an allowed group spec.initContainers[0].securityContext.securityContext.runAsUser: Invalid value: 11111: must be in the ranges: [1000910000, 1000919999] I fixed that by making the "runAsUser" 1000911111 instead, though I'm not sure what affects that will have once everything is running. And then for the group, it says: Error creating: pods "openig-1-" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{11111}: 11111 is not an allowed group] I tried changing this "fsGroup" to 1000911111 but that also fails. So I'm not sure what to put in this value. Do you know how you can make your policy less restrictive, or how I could make the policy less restrictive, to fix the above? On Tue, Nov 19, 2019 at 2:35 PM Kendall, Russell C > wrote: Mike, Here's the URL for the registry: https://docker-registry-default.apps.cluster.unified-platform.io I'm not sure how you deploy your pipeline and apps, but our Ansible scripts take care of creating the namespaces (projects) for us. For example, you may deploy your projects stored locally via oc new-app /path/to/project There are a number of existing projects, you just don't have visibility. Mr. Steins is responsible for assigning roles and is figuring out group memberships that will allow you to control access to your projects by groups instead of by individual. In the meantime you'll need to add each user to each project. V/R, Russell C Kendall ________________________________ From: Mike Knoth > Sent: Tuesday, November 19, 2019 12:35 PM To: Walter Steins Cc: Blade, Eric D [US] (MS); McKay, Brent [US] (MS) (Contr); Kendall, Russell C Subject: Re: EXT :Re: OpenShift Questions Yes I'm logged on openshift right now. And I'm logged on the OC console. But I'm a bit stuck until I can figure out how to docker login, as something like this does not work: docker login -u $(oc whoami) -p $(oc whoami -t) docker-registry-default.unified-platform.io And I'm also stuck until this can show my project which I can deploy to: UrsaMajor:up mike.knoth$ oc projects You have one project on this server: "dsop-images". On Tue, Nov 19, 2019 at 1:33 PM Walter Steins > wrote: Eric, All of the requested accounts were created. [cid:image001.jpg at 01D5791E.F20F5AD0] Walter ?Wally? Steins Cloud Engineer m: 210.383.9227 | walter.steins at bylight.com By Light Professional IT Services LLC 8484 Westpark Drive Suite 600 McLean VA 22102 f: 703.778.7835 | www.bylight.com From: Blade, Eric D [US] (MS) > Sent: Tuesday, November 19, 2019 12:32 PM To: 'Mike Knoth' >; McKay, Brent [US] (MS) (Contr) > Cc: Kendall, Russell C >; Walter Steins > Subject: RE: EXT :Re: OpenShift Questions [EXTERNAL EMAIL] Mike, This is deployed as a ?production cluster?, so there is no development capabilities. Just an OpenShift environment for running the apps. You will need to get your Openshift account created if it was not done so already. Wally Stein (CC?d) can do that for you. After that my knowledge runs thin. Russell was able to get their app deployed via the OpenShift console. Thanks Eric From: Mike Knoth > Sent: Tuesday, November 19, 2019 1:27 PM To: McKay, Brent [US] (MS) (Contr) > Cc: Kendall, Russell C >; Blade, Eric D [US] (MS) > Subject: EXT :Re: OpenShift Questions Russell/Eric, Hi - do either of you know how I can login to docker from my local macbook? (to the openshift on https://cluster.unified-platform.io/) I was going to use the "bastion" box (52.222.26.122) to do development on, but that doesn't even have git on it. So I guess I have to use my macbook. Also do you know who can create new openshift projects for me on https://cluster.unified-platform.io/? On Tue, Nov 19, 2019 at 1:23 PM McKay, Brent [US] (MS) (Contr) > wrote: Russell/Eric, Mike Knoth(cc?d) approached me regarding the OpenShift deployment I understand the two of you stood up last week while at SpaceCAMP. I believe he was instructed to deploy DAS on said cluster. I wanted to get him in contact with the two of you so he can get his questions to the individuals in the know. Thanks, Brent -- Mike Knoth Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division [https://lh6.googleusercontent.com/JthxIRRs8H68c8eZoIPzuaQByK3jEdbuNj59yB9juKJ8PLnRr8ZDwXL4mzmYmA-IYpuwjak8UIeh6PR58XzU9TCCwHjQqGZC5-Lw2AN8OYXHyzxIlgfTNwDu-ADOz8wCza_qi2a5] 302 Sentinel Drive | Annapolis Junction, MD 20701 Email: mike.knoth at g2-inc.com Mobile: (320) 305-6453 Confidentiality Statement: HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains information proprietary or private to Huntington Ingalls Industries, Inc., and is not to be disclosed to, copied by, or used in any manner by others without the prior express, written permission. If you are not the intended recipient, please delete without copying and kindly advise the sender by e-mail of the mistake in delivery. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -- Mike Knoth Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division [https://lh6.googleusercontent.com/JthxIRRs8H68c8eZoIPzuaQByK3jEdbuNj59yB9juKJ8PLnRr8ZDwXL4mzmYmA-IYpuwjak8UIeh6PR58XzU9TCCwHjQqGZC5-Lw2AN8OYXHyzxIlgfTNwDu-ADOz8wCza_qi2a5] 302 Sentinel Drive | Annapolis Junction, MD 20701 Email: mike.knoth at g2-inc.com Mobile: (320) 305-6453 Confidentiality Statement: HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains information proprietary or private to Huntington Ingalls Industries, Inc., and is not to be disclosed to, copied by, or used in any manner by others without the prior express, written permission. If you are not the intended recipient, please delete without copying and kindly advise the sender by e-mail of the mistake in delivery. ________________________________ This e-mail and any attachments are intended only for the use of the addressee(s) named herein and may contain proprietary information. If you are not the intended recipient of this e-mail or believe that you received this email in error, please take immediate action to notify the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments from your computer; and do not disseminate, distribute, use, or copy this message and any attachments. -- Mike Knoth Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division [https://lh6.googleusercontent.com/JthxIRRs8H68c8eZoIPzuaQByK3jEdbuNj59yB9juKJ8PLnRr8ZDwXL4mzmYmA-IYpuwjak8UIeh6PR58XzU9TCCwHjQqGZC5-Lw2AN8OYXHyzxIlgfTNwDu-ADOz8wCza_qi2a5] 302 Sentinel Drive | Annapolis Junction, MD 20701 Email: mike.knoth at g2-inc.com Mobile: (320) 305-6453 Confidentiality Statement: HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains information proprietary or private to Huntington Ingalls Industries, Inc., and is not to be disclosed to, copied by, or used in any manner by others without the prior express, written permission. If you are not the intended recipient, please delete without copying and kindly advise the sender by e-mail of the mistake in delivery. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.knoth at g2-inc.com Tue Dec 3 18:51:13 2019 From: mike.knoth at g2-inc.com (Mike Knoth) Date: Tue, 3 Dec 2019 13:51:13 -0500 Subject: [Platformone] [Non-DoD Source] Re: EXT :Re: OpenShift Questions In-Reply-To: References: <0db5dd4e86c24c69ace02da1309ccb22@XCGC3021.northgrum.com> <1574192134493.86386@ManTech.com> Message-ID: yes here is the ticket - https://dccscr.dsop.io/dsop/dccscr/issues/195 On Tue, Dec 3, 2019 at 1:43 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP wrote: > Good afternoon All, > > > Please assist us with the problem below. The team has logged a ticket in > the GitLab as well. > > > > > Most Sincerely, > > > Ade Abodunrin, GG-12, USAF > > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > [image: cid:image001.png at 01D4F814.4AA552D0] > > LevelUP Code Works > > Commercial: (210) 890-2113 > > NIPR email: *ademola.abodunrin at us.af.mil * > > > > > > ------------------------------ > *From:* ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > *Sent:* Friday, November 22, 2019 1:50 PM > *To:* Mike Knoth ; Kendall, Russell C < > Russell.Kendall at mantech.com>; Walter Steins ; > Blade, Eric D [US] (MS) > *Cc:* McKay, Brent [US] (MS) (Contr) ; Marc Cooper < > marc.cooper at g2-inc.com> > *Subject:* RE: [Non-DoD Source] Re: EXT :Re: OpenShift Questions > > > Good afternoon Walter/Eric, > > > > Please who is able to assist us with Mike?s concern below? > > > > Thanks for your help! > > > > Most Sincerely, > > > > Ade Abodunrin, GG-12, USAF > > Acquisition Program Manager > > > > [image: cid:image001.png at 01D4F814.4AA552D0] > > LevelUP Code Works > > > > Commercial: (210) 890-2113 > > NIPR email: *ademola.abodunrin at us.af.mil * > > > > *From:* Mike Knoth > *Sent:* Wednesday, November 20, 2019 10:22 AM > *To:* Kendall, Russell C > *Cc:* Walter Steins ; Blade, Eric D [US] (MS) < > Eric.Blade at ngc.com>; McKay, Brent [US] (MS) (Contr) ; > ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < > ademola.abodunrin at us.af.mil>; Marc Cooper > *Subject:* [Non-DoD Source] Re: EXT :Re: OpenShift Questions > > > > Thanks I got a lot closer now, with some components being deployed. I'm > getting some errors unique to this Openshift though. The below is something > I have in my YAML file, for several of the components. > > > > securityContext: > fsGroup: 11111 > runAsUser: 11111 > > > > With the "runAsUser", Openshift would say: > > Error creating: pods "openam-1-" is forbidden: unable to validate against > any security context constraint: [fsGroup: Invalid value: []int64{11111}: > 11111 is not an allowed group spec.initContainers[0].securityContext.securityContext.runAsUser: > Invalid value: 11111: must be in the ranges: [1000910000, 1000919999] > > > > I fixed that by making the "runAsUser" 1000911111 instead, though I'm not > sure what affects that will have once everything is running. > > > > And then for the group, it says: > > Error creating: pods "openig-1-" is forbidden: unable to validate against > any security context constraint: [fsGroup: Invalid value: []int64{11111}: > 11111 is not an allowed group] > > > > I tried changing this "fsGroup" to 1000911111 but that also fails. So I'm > not sure what to put in this value. > > > > *Do you know how you can make your policy less restrictive, or how I could > make the policy less restrictive, to fix the above?* > > > > > > > > > > > > > > On Tue, Nov 19, 2019 at 2:35 PM Kendall, Russell C < > Russell.Kendall at mantech.com> wrote: > > Mike, > > Here's the URL for the registry: > > https://docker-registry-default.apps.cluster.unified-platform.io > > > > > I'm not sure how you deploy your pipeline and apps, but our Ansible > scripts take care of creating the namespaces (projects) for us. For > example, you may deploy your projects stored locally via oc new-app > /path/to/project > > > > There are a number of existing projects, you just don't have > visibility. Mr. Steins is responsible for assigning roles and is figuring > out group memberships that will allow you to control access to your > projects by groups instead of by individual. In the meantime you'll need to > add each user to each project. > > > > V/R, > > Russell C Kendall > ------------------------------ > > *From:* Mike Knoth > *Sent:* Tuesday, November 19, 2019 12:35 PM > *To:* Walter Steins > *Cc:* Blade, Eric D [US] (MS); McKay, Brent [US] (MS) (Contr); Kendall, > Russell C > *Subject:* Re: EXT :Re: OpenShift Questions > > > > Yes I'm logged on openshift right now. And I'm logged on the OC console. *But > I'm a bit stuck until I can figure out how to docker login*, as something > like this does not work: > > > > docker login -u $(oc whoami) -p $(oc whoami -t) > docker-registry-default.unified-platform.io > > > > > > And *I'm also stuck until this can show my project which I can deploy to:* > > > > UrsaMajor:up mike.knoth$ oc projects > You have one project on this server: "dsop-images". > > > > > > > > On Tue, Nov 19, 2019 at 1:33 PM Walter Steins > wrote: > > Eric, > > > > All of the requested accounts were created. > > > > > > [image: cid:image001.jpg at 01D5791E.F20F5AD0] > > *Walter ?Wally? Steins* > > Cloud Engineer > > m: 210.383.9227 | *walter.steins at bylight.com * > > *By Light Professional IT Services LLC* > 8484 Westpark Drive Suite 600 McLean VA 22102 > f: 703.778.7835 | *www.bylight.com * > > > > > > > > *From:* Blade, Eric D [US] (MS) > *Sent:* Tuesday, November 19, 2019 12:32 PM > *To:* 'Mike Knoth' ; McKay, Brent [US] (MS) > (Contr) > *Cc:* Kendall, Russell C ; Walter Steins < > walter.steins at bylight.com> > *Subject:* RE: EXT :Re: OpenShift Questions > > > > [EXTERNAL EMAIL] > > Mike, > > This is deployed as a ?production cluster?, so there is no development > capabilities. Just an OpenShift environment for running the apps. > > > > You will need to get your Openshift account created if it was not done so > already. Wally Stein (CC?d) can do that for you. After that my knowledge > runs thin. Russell was able to get their app deployed via the OpenShift > console. > > > > Thanks > > > > Eric > > > > > > *From:* Mike Knoth > *Sent:* Tuesday, November 19, 2019 1:27 PM > *To:* McKay, Brent [US] (MS) (Contr) > *Cc:* Kendall, Russell C ; Blade, Eric D > [US] (MS) > *Subject:* EXT :Re: OpenShift Questions > > > > Russell/Eric, > > > > Hi - do either of you know how I can login to docker from my local > macbook? (to the openshift on https://cluster.unified-platform.io/) > > > > I was going to use the "bastion" box (52.222.26.122) to do development > on, but that doesn't even have git on it. So I guess I have to use my > macbook. > > > > Also do you know who can create new openshift projects for me on > https://cluster.unified-platform.io/? > > > > On Tue, Nov 19, 2019 at 1:23 PM McKay, Brent [US] (MS) (Contr) < > Brent.McKay at ngc.com> wrote: > > Russell/Eric, > > > > Mike Knoth(cc?d) approached me regarding the OpenShift deployment I > understand the two of you stood up last week while at SpaceCAMP. I believe > he was instructed to deploy DAS on said cluster. I wanted to get him in > contact with the two of you so he can get his questions to the individuals > in the know. Thanks, > > > > Brent > > > > > -- > > Mike Knoth > > Software Engineer > > HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. > > Technical Solutions Division > > 302 Sentinel Drive | Annapolis Junction, MD 20701 > > Email: mike.knoth at g2-inc.com > > Mobile: (320) 305-6453 > > > > Confidentiality Statement: > > HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains > information proprietary or private to Huntington Ingalls Industries, Inc., > and is not to be disclosed to, copied by, or used in any manner by others > without the prior express, written permission. If you are not the intended > recipient, please delete without copying and kindly advise the sender by > e-mail of the mistake in delivery. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > > > > -- > > Mike Knoth > > Software Engineer > > HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. > > Technical Solutions Division > > 302 Sentinel Drive | Annapolis Junction, MD 20701 > > Email: mike.knoth at g2-inc.com > > Mobile: (320) 305-6453 > > > > Confidentiality Statement: > > HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains > information proprietary or private to Huntington Ingalls Industries, Inc., > and is not to be disclosed to, copied by, or used in any manner by others > without the prior express, written permission. If you are not the intended > recipient, please delete without copying and kindly advise the sender by > e-mail of the mistake in delivery. > > > ------------------------------ > > > This e-mail and any attachments are intended only for the use of the > addressee(s) named herein and may contain proprietary information. If you > are not the intended recipient of this e-mail or believe that you received > this email in error, please take immediate action to notify the sender of > the apparent error by reply e-mail; permanently delete the e-mail and any > attachments from your computer; and do not disseminate, distribute, use, or > copy this message and any attachments. > > > > > -- > > Mike Knoth > > Software Engineer > > HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. > > Technical Solutions Division > > 302 Sentinel Drive | Annapolis Junction, MD 20701 > > Email: mike.knoth at g2-inc.com > > Mobile: (320) 305-6453 > > > > Confidentiality Statement: > > HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains > information proprietary or private to Huntington Ingalls Industries, Inc., > and is not to be disclosed to, copied by, or used in any manner by others > without the prior express, written permission. If you are not the intended > recipient, please delete without copying and kindly advise the sender by > e-mail of the mistake in delivery. > -- Mike Knoth Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division 302 Sentinel Drive | Annapolis Junction, MD 20701 Email: mike.knoth at g2-inc.com Mobile: (320) 305-6453 Confidentiality Statement: HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains information proprietary or private to Huntington Ingalls Industries, Inc., and is not to be disclosed to, copied by, or used in any manner by others without the prior express, written permission. If you are not the intended recipient, please delete without copying and kindly advise the sender by e-mail of the mistake in delivery. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmendez at redhat.com Tue Dec 3 19:06:18 2019 From: kmendez at redhat.com (Khary Mendez) Date: Tue, 3 Dec 2019 14:06:18 -0500 Subject: [Platformone] [Non-DoD Source] Re: EXT :Re: OpenShift Questions In-Reply-To: References: <0db5dd4e86c24c69ace02da1309ccb22@XCGC3021.northgrum.com> <1574192134493.86386@ManTech.com> Message-ID: Thanks Mike - I just added a comment to your ticket with a preferred path forward along with a less preferred option. Khary A. Mendez, RHCA (150-047-298) Senior Principal Consultant Red Hat Public Sector khary at redhat.com M: (240)888-9170 On Tue, Dec 3, 2019 at 1:52 PM Mike Knoth wrote: > yes here is the ticket - https://dccscr.dsop.io/dsop/dccscr/issues/195 > > On Tue, Dec 3, 2019 at 1:43 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC > AFLCMC/HNCP wrote: > >> Good afternoon All, >> >> >> Please assist us with the problem below. The team has logged a ticket in >> the GitLab as well. >> >> >> >> >> Most Sincerely, >> >> >> Ade Abodunrin, GG-12, USAF >> >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> [image: cid:image001.png at 01D4F814.4AA552D0] >> >> LevelUP Code Works >> >> Commercial: (210) 890-2113 >> >> NIPR email: *ademola.abodunrin at us.af.mil * >> >> >> >> >> >> ------------------------------ >> *From:* ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >> *Sent:* Friday, November 22, 2019 1:50 PM >> *To:* Mike Knoth ; Kendall, Russell C < >> Russell.Kendall at mantech.com>; Walter Steins ; >> Blade, Eric D [US] (MS) >> *Cc:* McKay, Brent [US] (MS) (Contr) ; Marc Cooper < >> marc.cooper at g2-inc.com> >> *Subject:* RE: [Non-DoD Source] Re: EXT :Re: OpenShift Questions >> >> >> Good afternoon Walter/Eric, >> >> >> >> Please who is able to assist us with Mike?s concern below? >> >> >> >> Thanks for your help! >> >> >> >> Most Sincerely, >> >> >> >> Ade Abodunrin, GG-12, USAF >> >> Acquisition Program Manager >> >> >> >> [image: cid:image001.png at 01D4F814.4AA552D0] >> >> LevelUP Code Works >> >> >> >> Commercial: (210) 890-2113 >> >> NIPR email: *ademola.abodunrin at us.af.mil * >> >> >> >> *From:* Mike Knoth >> *Sent:* Wednesday, November 20, 2019 10:22 AM >> *To:* Kendall, Russell C >> *Cc:* Walter Steins ; Blade, Eric D [US] (MS) >> ; McKay, Brent [US] (MS) (Contr) ; >> ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >> ademola.abodunrin at us.af.mil>; Marc Cooper >> *Subject:* [Non-DoD Source] Re: EXT :Re: OpenShift Questions >> >> >> >> Thanks I got a lot closer now, with some components being deployed. I'm >> getting some errors unique to this Openshift though. The below is something >> I have in my YAML file, for several of the components. >> >> >> >> securityContext: >> fsGroup: 11111 >> runAsUser: 11111 >> >> >> >> With the "runAsUser", Openshift would say: >> >> Error creating: pods "openam-1-" is forbidden: unable to validate >> against any security context constraint: [fsGroup: Invalid value: >> []int64{11111}: 11111 is not an allowed group spec.initContainers[0].securityContext.securityContext.runAsUser: >> Invalid value: 11111: must be in the ranges: [1000910000, 1000919999] >> >> >> >> I fixed that by making the "runAsUser" 1000911111 instead, though I'm >> not sure what affects that will have once everything is running. >> >> >> >> And then for the group, it says: >> >> Error creating: pods "openig-1-" is forbidden: unable to validate against >> any security context constraint: [fsGroup: Invalid value: []int64{11111}: >> 11111 is not an allowed group] >> >> >> >> I tried changing this "fsGroup" to 1000911111 but that also fails. So >> I'm not sure what to put in this value. >> >> >> >> *Do you know how you can make your policy less restrictive, or how I >> could make the policy less restrictive, to fix the above?* >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Nov 19, 2019 at 2:35 PM Kendall, Russell C < >> Russell.Kendall at mantech.com> wrote: >> >> Mike, >> >> Here's the URL for the registry: >> >> https://docker-registry-default.apps.cluster.unified-platform.io >> >> >> >> >> I'm not sure how you deploy your pipeline and apps, but our Ansible >> scripts take care of creating the namespaces (projects) for us. For >> example, you may deploy your projects stored locally via oc new-app >> /path/to/project >> >> >> >> There are a number of existing projects, you just don't have >> visibility. Mr. Steins is responsible for assigning roles and is figuring >> out group memberships that will allow you to control access to your >> projects by groups instead of by individual. In the meantime you'll need to >> add each user to each project. >> >> >> >> V/R, >> >> Russell C Kendall >> ------------------------------ >> >> *From:* Mike Knoth >> *Sent:* Tuesday, November 19, 2019 12:35 PM >> *To:* Walter Steins >> *Cc:* Blade, Eric D [US] (MS); McKay, Brent [US] (MS) (Contr); Kendall, >> Russell C >> *Subject:* Re: EXT :Re: OpenShift Questions >> >> >> >> Yes I'm logged on openshift right now. And I'm logged on the OC console. *But >> I'm a bit stuck until I can figure out how to docker login*, as >> something like this does not work: >> >> >> >> docker login -u $(oc whoami) -p $(oc whoami -t) >> docker-registry-default.unified-platform.io >> >> >> >> >> >> And *I'm also stuck until this can show my project which I can deploy >> to:* >> >> >> >> UrsaMajor:up mike.knoth$ oc projects >> You have one project on this server: "dsop-images". >> >> >> >> >> >> >> >> On Tue, Nov 19, 2019 at 1:33 PM Walter Steins >> wrote: >> >> Eric, >> >> >> >> All of the requested accounts were created. >> >> >> >> >> >> [image: cid:image001.jpg at 01D5791E.F20F5AD0] >> >> *Walter ?Wally? Steins* >> >> Cloud Engineer >> >> m: 210.383.9227 | *walter.steins at bylight.com * >> >> *By Light Professional IT Services LLC* >> 8484 Westpark Drive Suite 600 McLean VA 22102 >> f: 703.778.7835 | *www.bylight.com * >> >> >> >> >> >> >> >> *From:* Blade, Eric D [US] (MS) >> *Sent:* Tuesday, November 19, 2019 12:32 PM >> *To:* 'Mike Knoth' ; McKay, Brent [US] (MS) >> (Contr) >> *Cc:* Kendall, Russell C ; Walter Steins < >> walter.steins at bylight.com> >> *Subject:* RE: EXT :Re: OpenShift Questions >> >> >> >> [EXTERNAL EMAIL] >> >> Mike, >> >> This is deployed as a ?production cluster?, so there is no development >> capabilities. Just an OpenShift environment for running the apps. >> >> >> >> You will need to get your Openshift account created if it was not done so >> already. Wally Stein (CC?d) can do that for you. After that my knowledge >> runs thin. Russell was able to get their app deployed via the OpenShift >> console. >> >> >> >> Thanks >> >> >> >> Eric >> >> >> >> >> >> *From:* Mike Knoth >> *Sent:* Tuesday, November 19, 2019 1:27 PM >> *To:* McKay, Brent [US] (MS) (Contr) >> *Cc:* Kendall, Russell C ; Blade, Eric D >> [US] (MS) >> *Subject:* EXT :Re: OpenShift Questions >> >> >> >> Russell/Eric, >> >> >> >> Hi - do either of you know how I can login to docker from my local >> macbook? (to the openshift on https://cluster.unified-platform.io/) >> >> >> >> I was going to use the "bastion" box (52.222.26.122) to do development >> on, but that doesn't even have git on it. So I guess I have to use my >> macbook. >> >> >> >> Also do you know who can create new openshift projects for me on >> https://cluster.unified-platform.io/? >> >> >> >> On Tue, Nov 19, 2019 at 1:23 PM McKay, Brent [US] (MS) (Contr) < >> Brent.McKay at ngc.com> wrote: >> >> Russell/Eric, >> >> >> >> Mike Knoth(cc?d) approached me regarding the OpenShift deployment I >> understand the two of you stood up last week while at SpaceCAMP. I believe >> he was instructed to deploy DAS on said cluster. I wanted to get him in >> contact with the two of you so he can get his questions to the individuals >> in the know. Thanks, >> >> >> >> Brent >> >> >> >> >> -- >> >> Mike Knoth >> >> Software Engineer >> >> HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. >> >> Technical Solutions Division >> >> 302 Sentinel Drive | Annapolis Junction, MD 20701 >> >> Email: mike.knoth at g2-inc.com >> >> Mobile: (320) 305-6453 >> >> >> >> Confidentiality Statement: >> >> HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains >> information proprietary or private to Huntington Ingalls Industries, Inc., >> and is not to be disclosed to, copied by, or used in any manner by others >> without the prior express, written permission. If you are not the intended >> recipient, please delete without copying and kindly advise the sender by >> e-mail of the mistake in delivery. >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> >> >> >> >> -- >> >> Mike Knoth >> >> Software Engineer >> >> HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. >> >> Technical Solutions Division >> >> 302 Sentinel Drive | Annapolis Junction, MD 20701 >> >> Email: mike.knoth at g2-inc.com >> >> Mobile: (320) 305-6453 >> >> >> >> Confidentiality Statement: >> >> HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains >> information proprietary or private to Huntington Ingalls Industries, Inc., >> and is not to be disclosed to, copied by, or used in any manner by others >> without the prior express, written permission. If you are not the intended >> recipient, please delete without copying and kindly advise the sender by >> e-mail of the mistake in delivery. >> >> >> ------------------------------ >> >> >> This e-mail and any attachments are intended only for the use of the >> addressee(s) named herein and may contain proprietary information. If you >> are not the intended recipient of this e-mail or believe that you received >> this email in error, please take immediate action to notify the sender of >> the apparent error by reply e-mail; permanently delete the e-mail and any >> attachments from your computer; and do not disseminate, distribute, use, or >> copy this message and any attachments. >> >> >> >> >> -- >> >> Mike Knoth >> >> Software Engineer >> >> HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. >> >> Technical Solutions Division >> >> 302 Sentinel Drive | Annapolis Junction, MD 20701 >> >> Email: mike.knoth at g2-inc.com >> >> Mobile: (320) 305-6453 >> >> >> >> Confidentiality Statement: >> >> HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains >> information proprietary or private to Huntington Ingalls Industries, Inc., >> and is not to be disclosed to, copied by, or used in any manner by others >> without the prior express, written permission. If you are not the intended >> recipient, please delete without copying and kindly advise the sender by >> e-mail of the mistake in delivery. >> > > > -- > Mike Knoth > Software Engineer > > HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. > > Technical Solutions Division > > 302 Sentinel Drive | Annapolis Junction, MD 20701 > > Email: mike.knoth at g2-inc.com > > Mobile: (320) 305-6453 > > Confidentiality Statement: > > HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains > information proprietary or private to Huntington Ingalls Industries, Inc., > and is not to be disclosed to, copied by, or used in any manner by others > without the prior express, written permission. If you are not the intended > recipient, please delete without copying and kindly advise the sender by > e-mail of the mistake in delivery. > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmiller at mitre.org Tue Dec 3 20:31:42 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Tue, 3 Dec 2019 20:31:42 +0000 Subject: [Platformone] Riddle me this, Batman (odd things in up-prod) Message-ID: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> - There are three bastion hosts (up-prod-bastion, up-prod-ocp-bastion, and "onetime"). Of these, I can find only up-prod-ocp-bastion in the IaC definition. Both up-prod-bastion and "onetime" look like they were built separately ("onetime" is baselined on CentOS--which is a giveaway--and up-prod-bastion is attached to the `bastion-ssh` security group--which AFAICT is also not part of the IaC). I recall someone (Dean?) telling me that there's no BH in the IaC, but that's not true (see consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). - up-prod-openscap and up-prod-sso-server have a public IP but its inbound rules permit only traffic from the VPC subnets (10.40.0.0/16) and the up-ss-vpc gitlab-ci-runner instance. - up-prod-openscap is attached to the up-prod-ocp-nodes SG, which is doesn't seem right. That opens a bunch of ports that probably don't matter to a scan host. - up-prod-sso-server has a public IP it doesn't need since traffic is handled by up-prod-sso-elb. FWIW, public IPs are assigned to up-prod-bastion, up-prod-openscap, up-prod-satellite, up-prod-sso-server, and "onetime". The bastion host and openscap kinda make sense, though you can jump to openscap from the BH. Damnfino what "onetime" is supposed to be. I'm not sure which of these or all of 'em should be turned into issues. Comments? -- T From jrickard at redhat.com Tue Dec 3 20:47:52 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Tue, 3 Dec 2019 15:47:52 -0500 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: References: <1574703796843.94672@ManTech.com> <1574705271520.5593@ManTech.com> <1574705967080.55870@ManTech.com> <1574706319228.45425@ManTech.com> <1574707851362.16520@ManTech.com> <1574713055636.69810@ManTech.com> <1574714851887.51023@ManTech.com> <1574869597001.92851@ManTech.com> <1575303080504.65964@ManTech.com> <1575311511650.96981@ManTech.com> <6277_1575312368_5DE55BEF_6277_803_1_CA+EGyAB6=xB9Na5Wf-nWY52dwt1200Rk8uh6krw1CXojEaO1PQ@mail.gmail.com> <7E65F8DF-F96B-4657-82A2-4A95715FAC0B@mitre.org> <5A89B24C-3C36-4205-B083-CBFBF71ED122@mitre.org> <3052516616bb44caa350c3c6897bc122@DCCPRDEXCH02.ManTech.com> Message-ID: Russell / Team, We believe we've identified the issue with your application deploying. In order to rectify the issue I need to evacuate pods so you will probably see some hiccups while deploying. I will update when this is resolved. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap wrote: > Hey all, we have opened an issue below, that we believe to be the cause, > we are currently investigating: > > https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 > > On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard > wrote: > >> Russell, >> >> Getting more eyes on this @platformONE at redhat.com >> >> >> We'll keep you posted. >> jonny >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C < >> Russell.Kendall at mantech.com> wrote: >> >>> Kevin, >>> >>> Unfortunately we are receiving deployment errors again. This is the >>> event: >>> >>> 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had >>> taints that the pod didn't tolerate, 6 node(s) didn't match node selector. >>> >>> This is the deployment: >>> >>> >>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup >>> >>> >>> V/R, >>> Russell C Kendall >>> ________________________________________ >>> From: Miller, Timothy J. >>> Sent: Monday, December 2, 2019 2:44:21 PM >>> To: Kevin O'Donnell >>> Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF >>> AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt >>> USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 >>> USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >>> Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors >>> >>> Tagged you on it. >>> >>> -- T >>> >>> ?On 12/2/19, 14:03, "Kevin O'Donnell" wrote: >>> >>> Hello, >>> >>> >>> Autoscaling is on our future IAC roadmap. Tim, the additional >>> ticket would be appreciated. >>> >>> >>> We have swapped out the app/worker instances with m5a.8xlarge 32 >>> cores, 128gb of ram. Please let us know if you have any other issues. >>> >>> >>> Thanks, >>> >>> KEVIN O'DONNELL >>> ARCHITECT MANAGER >>> Red Hat Red Hat NA Public Sector Consulting >> > >>> >>> kodonnell at redhat.com >>> M: 240-605-4654 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. >>> wrote: >>> >>> >>> I'll open an issue. IaC needs to have instance size as a host_var >>> to facilitate scaling. >>> >>> -- T >>> >>> On 12/2/19, 13:15, "Kevin O'Donnell" wrote: >>> >>> Tim, >>> >>> >>> Thanks for the information. We are undersized on the app/worker >>> nodes, the cluster has 3 and they are m5.xlarge with only 16gb of ram. From >>> what I have read each Labs engagement operated on a 3 node worker cluster >>> with each node having 6core's and 28gb >>> of ram. We will need to swap out the existing instances with >>> larger spec's. >>> >>> >>> We are going to try to flush the existing workload out on one of >>> the workers to see if we can swap them out one at a time. >>> >>> >>> Thanks, >>> >>> KEVIN O'DONNELL >>> ARCHITECT MANAGER >>> Red Hat Red Hat NA Public Sector Consulting < >>> https://www.redhat.com/> >>> >>> kodonnell at redhat.com >>> M: 240-605-4654 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. < >>> tmiller at mitre.org> wrote: >>> >>> >>> Here's what I can see, given the perm limits I seem to be under: >>> >>> - NS:develop-misp-app and NS:lp-develop-misp-app both have >>> several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned >>> while trying to fetch something from somewhere (URL isn't recorded in the >>> stack trace). >>> >>> - NS:minishift-misp-app has most of its pods/jobs stuck in >>> ImagePullBackoff. No detail there in the event stream so I'll see if I can >>> dig deeper. >>> >>> - NS:aam-ci-cd has Jenkins trying to spin up three workers, >>> those are coming back as unschedulable. >>> >>> I can't see into NS:aam-bases or NS:dsop-images b/c of perm >>> limits. >>> >>> I see no DAS-related project(s). >>> >>> The MISP stuff needs debugging before calling "blocked" since it >>> looks like an internal error from this perspective. >>> >>> >>> >>> In re: AAM Jenkins: If this deployment is coming out of the OCP >>> storefront, then maybe it should be ephemeral rather than persistent. If >>> it's a custom deployment, then it probably needs a rethink. >>> >>> I'm also not sure why there are two MISP dev projects. >>> >>> -- T >>> >>> >>> >>> On 12/2/19, 12:46, "Kevin O'Donnell" >>> wrote: >>> >>> Russell, >>> >>> >>> Thank you for the information. We can switch out the >>> instance type for the worker nodes. How much memory is required by the apps? >>> >>> >>> >>> Thanks, >>> >>> KEVIN O'DONNELL >>> ARCHITECT MANAGER >>> Red Hat Red Hat NA Public Sector Consulting < >>> https://www.redhat.com/> >>> >>> kodonnell at redhat.com >>> M: 240-605-4654 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C < >>> Russell.Kendall at mantech.com> wrote: >>> >>> >>> Kevin, >>> The lack of resources on >>> u-p.io >>> cluster is hindering development, >>> testing, and integration of the apps from CCAT AAM DAS, which is >>> putting one >>> of our PI goals at risk. >>> >>> >>> We are blocked by the fact that we (CCAT and AAM) cannot >>> deploy additional pods to the >>> unified-platform.io < >>> http://unified-platform.io> >>> cluster. We have a subset of containers deployed, but rolling >>> deployments and new deployments fail. This means that we are >>> not able to execute integration testing or peer reviews. >>> We are temporarily working around by NOT testing/reviewing >>> our code changes live, something that no one likes. Also, we are now >>> running weeks-old instances of our containers, so we are very likely >>> producing some technical debt. We currently have developers >>> approaching idle or doing non-priority work until the >>> resource issue is resolved. >>> >>> >>> >>> Here is the particular error from the OSP cluster I received >>> while attempting a redeploy of one of our apps. >>> >>> >>> >>> 0/9 nodes are available: 1 node(s) had taints that the pod >>> didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node >>> selector.11 times in the last minute >>> >>> Since we do not have any cluster permissions, I cannot >>> verify which resource is running out, but from experience, I assess it is a >>> memory issue. >>> >>> >>> >>> It appears the cluster has been provisioned with a silly >>> allocation of node types. Without knowing exactly what was deployed, it >>> appears only 3 of the 9 hosts are suitable worker nodes. We would expect >>> the cluster to respond to resource limitations >>> and >>> scale, >>> but if a scheduled downtime is required, please work with >>> us so we can anticipate. As it stands, the cluster does not support >>> resources required by CCAT and the other dev teams (AAM, DAS, etc.). We >>> would accept any downtime if it will improve the situation, >>> as we are blocked from progressing under the current >>> constraints. My hope was we could get the cluster redeployed over the TG >>> holiday to eliminate developer impact, but as Mark pointed out, there were >>> limited support folks available. Now I am just >>> trying >>> to >>> minimize the losses. >>> >>> >>> >>> V/R, >>> >>> Russell C Kendall >>> >>> >>> >>> >>> >>> ________________________________________ >>> From: Kevin O'Donnell >>> Sent: Monday, December 2, 2019 11:52 AM >>> To: Kendall, Russell C >>> Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >>> AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF >>> AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); DIROCCO, >>> ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy >>> J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> Hello Russell, >>> >>> >>> Can you elaborate on the term Blocked? What specific issues >>> are the blockers? >>> >>> >>> >>> Thanks, >>> >>> KEVIN O'DONNELL >>> ARCHITECT MANAGER >>> Red Hat Red Hat NA Public Sector Consulting < >>> https://www.redhat.com/> >>> >>> kodonnell at redhat.com >>> M: 240-605-4654 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C < >>> Russell.Kendall at mantech.com> wrote: >>> >>> >>> Mark, >>> >>> Thank for acknowledging, please be aware the San Antonio dev >>> teams working in >>> >>> unified-platform.io < >>> http://unified-platform.io> >>> are currently blocked. >>> >>> V/R, >>> >>> Russell C Kendall >>> >>> ________________________________________ >>> From: Mark Nissley >>> Sent: Monday, December 2, 2019 9:36 AM >>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; >>> Jonathan Rickard; Chris Kuperstein >>> Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin >>> O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike ( >>> mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; >>> Miller, Timothy >>> J.; >>> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> As noted, I don't suspect much got done on this over the >>> holiday weekend. I did see the ticket, as dropped some details into it. I >>> also assigned it to @Jonathan >>> Rickard and @Chris Kuperstein >>> . >>> >>> >>> >>> It looks like short term solutions have been easy but the >>> issue is recurring. >>> >>> >>> >>> >>> Mark NISSLEY, PMP, >>> CSM, LEAN >>> >>> PROGRAM MaNAGER & SR technical Project Manager >>> North American Consulting, Public Sector >>> >>> M: >>> 850-530-3234 >>> >>> >>> Scheduled Training: October 14-18 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 >>> USAF AFMC AFLCMC/HNCP wrote: >>> >>> >>> Mark/Kevin, >>> >>> >>> I just heard at the team stand up that we are still blocked. >>> This is also affecting the AAM team from my investigations. >>> >>> >>> Please let me know if there is something we need to do to >>> move this forward. >>> >>> Most Sincerely, >>> >>> >>> Ade Abodunrin, GG-12, USAF >>> Product Owner (Cybertron & Ginyu Force), Unified Platform >>> >>> >>> >>> LevelUP Code Works >>> Commercial: >>> (210) 890-2113 >>> NIPR email: >>> ademola.abodunrin at us.af.mil >>> >>> >>> >>> >>> >>> >>> >>> >>> ________________________________________ >>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >>> Sent: Wednesday, November 27, 2019 12:58 PM >>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>> austen.bryan.1 at us.af.mil>; Mark Nissley ; Kevin >>> O'Donnell ; >>> Brenna Gordon >>> Cc: Kendall, Russell C ; Bubb, >>> Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 USAF >>> AFMC ESC/AFLCMC/HNCP >>> ; Miller, Timothy J. < >>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >>> jose.ramirez.50.ctr at us.af.mil> >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> Thanks a lot Capt Bryan! Russell created the ticket on >>> GitLab UP Node Project. >>> >>> >>> >>> >>> Most Sincerely, >>> >>> >>> Ade Abodunrin, GG-12, USAF >>> Product Owner (Cybertron & Ginyu Force), Unified Platform >>> >>> >>> >>> LevelUP Code Works >>> Commercial: >>> (210) 890-2113 >>> NIPR email: >>> ademola.abodunrin at us.af.mil >>> >>> >>> >>> >>> >>> >>> >>> >>> ________________________________________ >>> From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>> austen.bryan.1 at us.af.mil> >>> Sent: Wednesday, November 27, 2019 12:56 PM >>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>> ademola.abodunrin at us.af.mil>; Mark Nissley ; Kevin >>> O'Donnell >>> ; Brenna Gordon >>> Cc: Kendall, Russell C ; Bubb, >>> Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 USAF >>> AFMC ESC/AFLCMC/HNCP >>> ; Miller, Timothy J. < >>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >>> jose.ramirez.50.ctr at us.af.mil> >>> Subject: RE: Unified Platform Pod Deploy Errors >>> >>> Thanks Ade. The team is thin until next week due to the >>> holidays but I will make sure it is addressed. Were there any issues >>> submitted to Gitlab?s UP Node Project on DCCSCR? >>> >>> @Mark/Kevin ? can we address? >>> >>> -Austen >>> >>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>> ademola.abodunrin at us.af.mil> >>> >>> Sent: Wednesday, November 27, 2019 9:51 AM >>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>> austen.bryan.1 at us.af.mil> >>> Cc: Kendall, Russell C ; Bubb, >>> Mike (mbubb at mitre.org) >>> Subject: Fw: Unified Platform Pod Deploy Errors >>> >>> >>> >>> Capt Bryan, >>> >>> Please see the explanation on the issue that Ginyu Force is >>> currently experiencing below. >>> >>> >>> >>> Most Sincerely, >>> >>> Ade Abodunrin, GG-12, USAF >>> Product Owner (Cybertron & Ginyu Force), Unified Platform >>> >>> >>> LevelUP Code Works >>> Commercial: (210) 890-2113 >>> NIPR email: >>> ademola.abodunrin at us.af.mil >>> >>> >>> >>> >>> >>> ________________________________________ >>> >>> From: Kendall, Russell C >>> Sent: Wednesday, November 27, 2019 9:46 AM >>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>> ademola.abodunrin at us.af.mil>; Buffaloe, >>> Christopher ; Molina, >>> Toby ; >>> Crace, Jared E ; SANCHEZ, MARK >>> GG-13 USAF AFMC AFLCMC/HNCP >>> Cc: >>> tmiller at mitre.org < >>> tmiller at mitre.org> >>> Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy >>> Errors >>> >>> >>> >>> Gentlemen, >>> >>> The application development teams working in the new >>> GovCloud OCP environment (unified-platform.io < >>> http://unified-platform.io> >>> ) >>> are currently blocked in efforts to deploy new pods for >>> testing, development, and UAT. >>> >>> Red Hat and RogueOne SMEs have been notified and have >>> attempted some fixes starting on Monday 11/25, but at this point have not >>> been able to provision resources >>> sufficient to host CCAT and AAM. >>> >>> We have taken steps to minimize our footprint (eliminating >>> demonstration environment, deleting developer namespaces), but this is not >>> a sustainable approach, >>> and has only resulted in moderate improvements in cluster >>> performance. >>> >>> Our hope is the U-P.io cluster compute resources can be >>> increased very soon, so that we may resume normal development activities. >>> Our understanding is that >>> such a scaling requires a complete redeployment of the >>> cluster, which is unusual, but an acceptable loss to productivity. If the >>> cluster can be scaled up over the Thanksgiving holiday, the impact will be >>> minimal to developers and cluster administrators, >>> alike. >>> >>> We are currently collaborating on solutions on the following >>> MatterMost channel behind the space camp VPN (link below), and via the >>> email thread forwarded >>> (further below). >>> >>> >>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node >>> < >>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> >>> >>> Please keep me posted on developments and I will coordinate >>> developer activities with any scheduled platform outages. >>> >>> V/R, >>> Russell C Kendall >>> >>> ________________________________________ >>> >>> From: Curran, Daniel M >>> Sent: Monday, November 25, 2019 2:47 PM >>> To: Jonathan Rickard >>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>> dlystra at redhat.com ; Sison, Mark >>> Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil >>> Soliz; >>> Buffaloe, >>> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >>> Joseph J >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> >>> >>> Sounds great. Appreciate it. >>> I'll watch email and Mattermost in case you need more from >>> us. >>> >>> -Daniel >>> >>> ________________________________________ >>> >>> From: Jonathan Rickard >>> Sent: Monday, November 25, 2019 2:44 PM >>> To: Curran, Daniel M >>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>> dlystra at redhat.com ; Sison, Mark >>> Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil >>> Soliz; >>> Buffaloe, >>> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >>> Joseph J >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> >>> >>> Thanks Daniel - >>> >>> >>> >>> I'll continue to look into the resource issue that you're >>> seeing - I'd like to identify the root cause and then work with the team to >>> come up with a solution. >>> >>> >>> >>> Jonathan Rickard, >>> RHCA >>> Principal Consultant, NAPS >>> Red >>> Hat Remote - Texas >>> jonny at redhat.com >>> >>> M: 210-862-9739 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M < >>> Daniel.Curran at mantech.com> >>> wrote: >>> >>> >>> Yeah we hit the limit then had AAM kill some of their >>> projects and then our pods got scheduled. >>> We've hit the limit again though. Here's an example pod that >>> cannot be scheduled >>> >>> >>> >>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >>> < >>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth> >>> < >>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >>> > >>> They're seeing it when their jenkins slaves can't deploy but >>> it's basically any pod after we hit some limit. >>> >>> -Daniel >>> ________________________________________ >>> >>> From: Jonathan Rickard >>> Sent: Monday, November 25, 2019 1:26 PM >>> To: Curran, Daniel M >>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>> dlystra at redhat.com ; Sison, Mark >>> Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil >>> Soliz; >>> Buffaloe, >>> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >>> Joseph J >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> >>> >>> Daniel, >>> >>> >>> >>> I can see that you have 3 mongo pods, 1 chatup and 1 upbot >>> pod running ... is your app good to go? >>> >>> >>> >>> Looks like there was an issue with memory on 1 pod, then >>> some node selector being mismatched - just what i could see in the events... >>> >>> >>> >>> >>> >>> >>> Jonathan Rickard, >>> RHCA >>> Principal Consultant, NAPS >>> Red >>> Hat Remote - Texas >>> jonny at redhat.com >>> >>> M: 210-862-9739 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M < >>> Daniel.Curran at mantech.com> >>> wrote: >>> >>> >>> Also, AAM was having similar issues. Looks like they had a >>> lot of namespaces and scaling down the pods on their deployments didn't >>> help but actually deleting the namespaces >>> did. >>> We have pods scheduling now but I'm adding them and we'd >>> still like to work through what resource limit we were hitting to avoid >>> this in the future. >>> >>> -Daniel >>> >>> ________________________________________ >>> >>> From: Curran, Daniel M >>> Sent: Monday, November 25, 2019 12:25 PM >>> To: Jonathan Rickard >>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>> dlystra at redhat.com ; Sison, Mark >>> Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil >>> Soliz; >>> Buffaloe, >>> Christopher; Torres, Alexander >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> >>> >>> Thanks, sir. >>> Most important for us to get working is "ccat-demo" but it's >>> also happening in "ccat-dev" and "ccat-ci-cd". >>> >>> -Daniel >>> ________________________________________ >>> >>> From: Jonathan Rickard >>> Sent: Monday, November 25, 2019 12:22 PM >>> To: Curran, Daniel M >>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>> dlystra at redhat.com ; Sison, Mark >>> Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil >>> Soliz; >>> Buffaloe, >>> Christopher; Torres, Alexander >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> >>> >>> What's the name of the project you're working in? I'm going >>> to be back at my laptop in about 30 and will take a look when I get there. >>> >>> >>> >>> Is it just the Jenkins pods failing? >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M < >>> Daniel.Curran at mantech.com> >>> wrote: >>> >>> >>> Adding Dean and Alex. >>> Also, sitting in mattermost if anyone needs to get online >>> and chat for more information. >>> >>> -Daniel >>> >>> ________________________________________ >>> >>> From: Curran, Daniel M >>> Sent: Monday, November 25, 2019 12:07 PM >>> To: >>> jonny at redhat.com ; >>> >>> ckuperst at redhat.com ; Mark >>> Nissley >>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell >>> C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> >>> >>> Adding Kupe and Mark. >>> >>> -Daniel >>> ________________________________________ >>> >>> From: Curran, Daniel M >>> Sent: Monday, November 25, 2019 11:43 AM >>> To: >>> jonny at redhat.com >>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell >>> C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >>> Subject: Unified Platform Pod Deploy Errors >>> >>> >>> >>> Hey Jonny, >>> >>> We met briefly at SpaceCAMP a couple weeks ago when >>> >>> >>> >>> cluster.unified-platform.io < >>> http://cluster.unified-platform.io> >>> was stood up. We've been trying to deploy some apps today and so >>> far today we're getting errors on most (if >>> not all) of our pods. >>> >>> 0/9 nodes are available: 3 Insufficient pods, 6 node(s) >>> didn't match node selector. >>> >>> Is what we're seeing. We were thinking it was some volume >>> types weren't correct but some of our pods don't even have volumes attached >>> and still give us this error (i.e. Jenkins >>> slaves or web frontends without persistent storage). >>> Any idea what this could be? We're not running out of space >>> on the nodes themselves are we? >>> We have a demo scheduled for tomorrow at 9:30 AM CST and are >>> hoping to get a demo env up for them today but this error came up >>> unexpectedly. Also, we're here at 500 Navarro >>> St. in San Antonio working through this in person is >>> better/easier. >>> >>> Thanks, >>> Daniel Curran >>> >>> >>> >>> >>> >>> ________________________________________ >>> >>> >>> This e-mail and any attachments are intended only for the >>> use of the addressee(s) named herein and may contain proprietary >>> information. If you are not the intended recipient of this e-mail or >>> believe that you received this email in error, please take >>> immediate >>> action to notify the sender of the apparent error by reply >>> e-mail; permanently delete the e-mail and any attachments from your >>> computer; and do not disseminate, distribute, use, or copy this message and >>> any attachments. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kodonnel at redhat.com Tue Dec 3 22:14:53 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Tue, 3 Dec 2019 17:14:53 -0500 Subject: [Platformone] Riddle me this, Batman (odd things in up-prod) In-Reply-To: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> References: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> Message-ID: Bastion creation is iac, and the other ec2 that?s running in prod is for acas and was created to scan and will be shutdown after the scans are done On Tue, Dec 3, 2019 at 3:34 PM Miller, Timothy J. wrote: > - There are three bastion hosts (up-prod-bastion, up-prod-ocp-bastion, and > "onetime"). Of these, I can find only up-prod-ocp-bastion in the IaC > definition. Both up-prod-bastion and "onetime" look like they were built > separately ("onetime" is baselined on CentOS--which is a giveaway--and > up-prod-bastion is attached to the `bastion-ssh` security group--which > AFAICT is also not part of the IaC). > > I recall someone (Dean?) telling me that there's no BH in the IaC, but > that's not true (see > consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). > > - up-prod-openscap and up-prod-sso-server have a public IP but its inbound > rules permit only traffic from the VPC subnets (10.40.0.0/16) and the > up-ss-vpc gitlab-ci-runner instance. > > - up-prod-openscap is attached to the up-prod-ocp-nodes SG, which is > doesn't seem right. That opens a bunch of ports that probably don't matter > to a scan host. > > - up-prod-sso-server has a public IP it doesn't need since traffic is > handled by up-prod-sso-elb. > > FWIW, public IPs are assigned to up-prod-bastion, up-prod-openscap, > up-prod-satellite, up-prod-sso-server, and "onetime". The bastion host and > openscap kinda make sense, though you can jump to openscap from the BH. > > Damnfino what "onetime" is supposed to be. > > I'm not sure which of these or all of 'em should be turned into issues. > Comments? > > -- T > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > -- KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dlystra at redhat.com Tue Dec 3 22:33:46 2019 From: dlystra at redhat.com (Dean Lystra) Date: Tue, 3 Dec 2019 14:33:46 -0800 Subject: [Platformone] Riddle me this, Batman (odd things in up-prod) In-Reply-To: References: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> Message-ID: One bastion host was created for the sole purpose of allowing access to the IdM CLI. This was done as a quick fix to get the users created and for administrative purposes. Access to IdM via web console or CLI is not available from the internet. onetime is a mystery to me. On Tue, Dec 3, 2019, 2:15 PM Kevin O'Donnell wrote: > Bastion creation is iac, and the other ec2 that?s running in prod is for > acas and was created to scan and will be shutdown after the scans are done > > > > On Tue, Dec 3, 2019 at 3:34 PM Miller, Timothy J. > wrote: > >> - There are three bastion hosts (up-prod-bastion, up-prod-ocp-bastion, >> and "onetime"). Of these, I can find only up-prod-ocp-bastion in the IaC >> definition. Both up-prod-bastion and "onetime" look like they were built >> separately ("onetime" is baselined on CentOS--which is a giveaway--and >> up-prod-bastion is attached to the `bastion-ssh` security group--which >> AFAICT is also not part of the IaC). >> >> I recall someone (Dean?) telling me that there's no BH in the IaC, but >> that's not true (see >> consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). >> >> - up-prod-openscap and up-prod-sso-server have a public IP but its >> inbound rules permit only traffic from the VPC subnets (10.40.0.0/16) >> and the up-ss-vpc gitlab-ci-runner instance. >> >> - up-prod-openscap is attached to the up-prod-ocp-nodes SG, which is >> doesn't seem right. That opens a bunch of ports that probably don't matter >> to a scan host. >> >> - up-prod-sso-server has a public IP it doesn't need since traffic is >> handled by up-prod-sso-elb. >> >> FWIW, public IPs are assigned to up-prod-bastion, up-prod-openscap, >> up-prod-satellite, up-prod-sso-server, and "onetime". The bastion host and >> openscap kinda make sense, though you can jump to openscap from the BH. >> >> Damnfino what "onetime" is supposed to be. >> >> I'm not sure which of these or all of 'em should be turned into issues. >> Comments? >> >> -- T >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> -- > > KEVIN O'DONNELL > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Tue Dec 3 23:58:10 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Tue, 3 Dec 2019 18:58:10 -0500 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: References: <1574703796843.94672@ManTech.com> <1574705271520.5593@ManTech.com> <1574705967080.55870@ManTech.com> <1574706319228.45425@ManTech.com> <1574707851362.16520@ManTech.com> <1574713055636.69810@ManTech.com> <1574714851887.51023@ManTech.com> <1574869597001.92851@ManTech.com> <1575303080504.65964@ManTech.com> <1575311511650.96981@ManTech.com> <6277_1575312368_5DE55BEF_6277_803_1_CA+EGyAB6=xB9Na5Wf-nWY52dwt1200Rk8uh6krw1CXojEaO1PQ@mail.gmail.com> <7E65F8DF-F96B-4657-82A2-4A95715FAC0B@mitre.org> <5A89B24C-3C36-4205-B083-CBFBF71ED122@mitre.org> <3052516616bb44caa350c3c6897bc122@DCCPRDEXCH02.ManTech.com> Message-ID: Hey Guys - Sorry for taking so long - this has been completed. Please run your builds and let us know if you're having any problems. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard wrote: > Russell / Team, > > We believe we've identified the issue with your application deploying. In > order to rectify the issue I need to evacuate pods so you will probably see > some hiccups while deploying. I will update when this is resolved. > > Thanks, > jonny > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap wrote: > >> Hey all, we have opened an issue below, that we believe to be the cause, >> we are currently investigating: >> >> https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 >> >> On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard >> wrote: >> >>> Russell, >>> >>> Getting more eyes on this @platformONE at redhat.com >>> >>> >>> We'll keep you posted. >>> jonny >>> >>> Jonathan Rickard, RHCA >>> >>> Principal Consultant, NAPS >>> >>> Red Hat Remote - Texas >>> >>> jonny at redhat.com >>> M: 210-862-9739 >>> >>> >>> >>> On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C < >>> Russell.Kendall at mantech.com> wrote: >>> >>>> Kevin, >>>> >>>> Unfortunately we are receiving deployment errors again. This is the >>>> event: >>>> >>>> 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had >>>> taints that the pod didn't tolerate, 6 node(s) didn't match node selector. >>>> >>>> This is the deployment: >>>> >>>> >>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup >>>> >>>> >>>> V/R, >>>> Russell C Kendall >>>> ________________________________________ >>>> From: Miller, Timothy J. >>>> Sent: Monday, December 2, 2019 2:44:21 PM >>>> To: Kevin O'Donnell >>>> Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF >>>> AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt >>>> USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 >>>> USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >>>> Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors >>>> >>>> Tagged you on it. >>>> >>>> -- T >>>> >>>> ?On 12/2/19, 14:03, "Kevin O'Donnell" wrote: >>>> >>>> Hello, >>>> >>>> >>>> Autoscaling is on our future IAC roadmap. Tim, the additional >>>> ticket would be appreciated. >>>> >>>> >>>> We have swapped out the app/worker instances with m5a.8xlarge 32 >>>> cores, 128gb of ram. Please let us know if you have any other issues. >>>> >>>> >>>> Thanks, >>>> >>>> KEVIN O'DONNELL >>>> ARCHITECT MANAGER >>>> Red Hat Red Hat NA Public Sector Consulting < >>>> https://www.redhat.com/> >>>> >>>> kodonnell at redhat.com >>>> M: 240-605-4654 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. < >>>> tmiller at mitre.org> wrote: >>>> >>>> >>>> I'll open an issue. IaC needs to have instance size as a host_var >>>> to facilitate scaling. >>>> >>>> -- T >>>> >>>> On 12/2/19, 13:15, "Kevin O'Donnell" wrote: >>>> >>>> Tim, >>>> >>>> >>>> Thanks for the information. We are undersized on the app/worker >>>> nodes, the cluster has 3 and they are m5.xlarge with only 16gb of ram. From >>>> what I have read each Labs engagement operated on a 3 node worker cluster >>>> with each node having 6core's and 28gb >>>> of ram. We will need to swap out the existing instances with >>>> larger spec's. >>>> >>>> >>>> We are going to try to flush the existing workload out on one >>>> of the workers to see if we can swap them out one at a time. >>>> >>>> >>>> Thanks, >>>> >>>> KEVIN O'DONNELL >>>> ARCHITECT MANAGER >>>> Red Hat Red Hat NA Public Sector Consulting < >>>> https://www.redhat.com/> >>>> >>>> kodonnell at redhat.com >>>> M: 240-605-4654 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. < >>>> tmiller at mitre.org> wrote: >>>> >>>> >>>> Here's what I can see, given the perm limits I seem to be under: >>>> >>>> - NS:develop-misp-app and NS:lp-develop-misp-app both have >>>> several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned >>>> while trying to fetch something from somewhere (URL isn't recorded in the >>>> stack trace). >>>> >>>> - NS:minishift-misp-app has most of its pods/jobs stuck in >>>> ImagePullBackoff. No detail there in the event stream so I'll see if I can >>>> dig deeper. >>>> >>>> - NS:aam-ci-cd has Jenkins trying to spin up three workers, >>>> those are coming back as unschedulable. >>>> >>>> I can't see into NS:aam-bases or NS:dsop-images b/c of perm >>>> limits. >>>> >>>> I see no DAS-related project(s). >>>> >>>> The MISP stuff needs debugging before calling "blocked" since >>>> it looks like an internal error from this perspective. >>>> >>>> >>>> >>>> In re: AAM Jenkins: If this deployment is coming out of the >>>> OCP storefront, then maybe it should be ephemeral rather than persistent. >>>> If it's a custom deployment, then it probably needs a rethink. >>>> >>>> I'm also not sure why there are two MISP dev projects. >>>> >>>> -- T >>>> >>>> >>>> >>>> On 12/2/19, 12:46, "Kevin O'Donnell" >>>> wrote: >>>> >>>> Russell, >>>> >>>> >>>> Thank you for the information. We can switch out the >>>> instance type for the worker nodes. How much memory is required by the apps? >>>> >>>> >>>> >>>> Thanks, >>>> >>>> KEVIN O'DONNELL >>>> ARCHITECT MANAGER >>>> Red Hat Red Hat NA Public Sector Consulting < >>>> https://www.redhat.com/> >>>> >>>> kodonnell at redhat.com >>>> M: 240-605-4654 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C < >>>> Russell.Kendall at mantech.com> wrote: >>>> >>>> >>>> Kevin, >>>> The lack of resources on >>>> u-p.io >>>> cluster is hindering development, >>>> testing, and integration of the apps from CCAT AAM DAS, which is >>>> putting one >>>> of our PI goals at risk. >>>> >>>> >>>> We are blocked by the fact that we (CCAT and AAM) cannot >>>> deploy additional pods to the >>>> unified-platform.io < >>>> http://unified-platform.io> >>>> cluster. We have a subset of containers deployed, but rolling >>>> deployments and new deployments fail. This means that we are >>>> not able to execute integration testing or peer reviews. >>>> We are temporarily working around by NOT testing/reviewing >>>> our code changes live, something that no one likes. Also, we are now >>>> running weeks-old instances of our containers, so we are very likely >>>> producing some technical debt. We currently have developers >>>> approaching idle or doing non-priority work until the >>>> resource issue is resolved. >>>> >>>> >>>> >>>> Here is the particular error from the OSP cluster I >>>> received while attempting a redeploy of one of our apps. >>>> >>>> >>>> >>>> 0/9 nodes are available: 1 node(s) had taints that the pod >>>> didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node >>>> selector.11 times in the last minute >>>> >>>> Since we do not have any cluster permissions, I cannot >>>> verify which resource is running out, but from experience, I assess it is a >>>> memory issue. >>>> >>>> >>>> >>>> It appears the cluster has been provisioned with a silly >>>> allocation of node types. Without knowing exactly what was deployed, it >>>> appears only 3 of the 9 hosts are suitable worker nodes. We would expect >>>> the cluster to respond to resource limitations >>>> and >>>> scale, >>>> but if a scheduled downtime is required, please work with >>>> us so we can anticipate. As it stands, the cluster does not support >>>> resources required by CCAT and the other dev teams (AAM, DAS, etc.). We >>>> would accept any downtime if it will improve the situation, >>>> as we are blocked from progressing under the current >>>> constraints. My hope was we could get the cluster redeployed over the TG >>>> holiday to eliminate developer impact, but as Mark pointed out, there were >>>> limited support folks available. Now I am just >>>> trying >>>> to >>>> minimize the losses. >>>> >>>> >>>> >>>> V/R, >>>> >>>> Russell C Kendall >>>> >>>> >>>> >>>> >>>> >>>> ________________________________________ >>>> From: Kevin O'Donnell >>>> Sent: Monday, December 2, 2019 11:52 AM >>>> To: Kendall, Russell C >>>> Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >>>> AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF >>>> AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); DIROCCO, >>>> ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy >>>> J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> Hello Russell, >>>> >>>> >>>> Can you elaborate on the term Blocked? What specific issues >>>> are the blockers? >>>> >>>> >>>> >>>> Thanks, >>>> >>>> KEVIN O'DONNELL >>>> ARCHITECT MANAGER >>>> Red Hat Red Hat NA Public Sector Consulting < >>>> https://www.redhat.com/> >>>> >>>> kodonnell at redhat.com >>>> M: 240-605-4654 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C < >>>> Russell.Kendall at mantech.com> wrote: >>>> >>>> >>>> Mark, >>>> >>>> Thank for acknowledging, please be aware the San Antonio >>>> dev teams working in >>>> >>>> unified-platform.io < >>>> http://unified-platform.io> >>>> are currently blocked. >>>> >>>> V/R, >>>> >>>> Russell C Kendall >>>> >>>> ________________________________________ >>>> From: Mark Nissley >>>> Sent: Monday, December 2, 2019 9:36 AM >>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; >>>> Jonathan Rickard; Chris Kuperstein >>>> Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin >>>> O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike ( >>>> mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; >>>> Miller, Timothy >>>> J.; >>>> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> As noted, I don't suspect much got done on this over the >>>> holiday weekend. I did see the ticket, as dropped some details into it. I >>>> also assigned it to @Jonathan >>>> Rickard and @Chris Kuperstein >>>> . >>>> >>>> >>>> >>>> It looks like short term solutions have been easy but the >>>> issue is recurring. >>>> >>>> >>>> >>>> >>>> Mark NISSLEY, PMP, >>>> CSM, LEAN >>>> >>>> PROGRAM MaNAGER & SR technical Project Manager >>>> North American Consulting, Public Sector >>>> >>>> M: >>>> 850-530-3234 >>>> >>>> >>>> Scheduled Training: October 14-18 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 >>>> USAF AFMC AFLCMC/HNCP wrote: >>>> >>>> >>>> Mark/Kevin, >>>> >>>> >>>> I just heard at the team stand up that we are still >>>> blocked. This is also affecting the AAM team from my investigations. >>>> >>>> >>>> Please let me know if there is something we need to do to >>>> move this forward. >>>> >>>> Most Sincerely, >>>> >>>> >>>> Ade Abodunrin, GG-12, USAF >>>> Product Owner (Cybertron & Ginyu Force), Unified Platform >>>> >>>> >>>> >>>> LevelUP Code Works >>>> Commercial: >>>> (210) 890-2113 >>>> NIPR email: >>>> ademola.abodunrin at us.af.mil >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ________________________________________ >>>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >>>> Sent: Wednesday, November 27, 2019 12:58 PM >>>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>>> austen.bryan.1 at us.af.mil>; Mark Nissley ; Kevin >>>> O'Donnell ; >>>> Brenna Gordon >>>> Cc: Kendall, Russell C ; >>>> Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 >>>> USAF AFMC ESC/AFLCMC/HNCP >>>> ; Miller, Timothy J. < >>>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >>>> jose.ramirez.50.ctr at us.af.mil> >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> Thanks a lot Capt Bryan! Russell created the ticket on >>>> GitLab UP Node Project. >>>> >>>> >>>> >>>> >>>> Most Sincerely, >>>> >>>> >>>> Ade Abodunrin, GG-12, USAF >>>> Product Owner (Cybertron & Ginyu Force), Unified Platform >>>> >>>> >>>> >>>> LevelUP Code Works >>>> Commercial: >>>> (210) 890-2113 >>>> NIPR email: >>>> ademola.abodunrin at us.af.mil >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ________________________________________ >>>> From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>>> austen.bryan.1 at us.af.mil> >>>> Sent: Wednesday, November 27, 2019 12:56 PM >>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>>> ademola.abodunrin at us.af.mil>; Mark Nissley ; >>>> Kevin O'Donnell >>>> ; Brenna Gordon >>>> Cc: Kendall, Russell C ; >>>> Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 >>>> USAF AFMC ESC/AFLCMC/HNCP >>>> ; Miller, Timothy J. < >>>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >>>> jose.ramirez.50.ctr at us.af.mil> >>>> Subject: RE: Unified Platform Pod Deploy Errors >>>> >>>> Thanks Ade. The team is thin until next week due to the >>>> holidays but I will make sure it is addressed. Were there any issues >>>> submitted to Gitlab?s UP Node Project on DCCSCR? >>>> >>>> @Mark/Kevin ? can we address? >>>> >>>> -Austen >>>> >>>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>>> ademola.abodunrin at us.af.mil> >>>> >>>> Sent: Wednesday, November 27, 2019 9:51 AM >>>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>>> austen.bryan.1 at us.af.mil> >>>> Cc: Kendall, Russell C ; >>>> Bubb, Mike (mbubb at mitre.org) >>>> Subject: Fw: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Capt Bryan, >>>> >>>> Please see the explanation on the issue that Ginyu Force is >>>> currently experiencing below. >>>> >>>> >>>> >>>> Most Sincerely, >>>> >>>> Ade Abodunrin, GG-12, USAF >>>> Product Owner (Cybertron & Ginyu Force), Unified Platform >>>> >>>> >>>> LevelUP Code Works >>>> Commercial: (210) 890-2113 >>>> NIPR email: >>>> ademola.abodunrin at us.af.mil >>>> >>>> >>>> >>>> >>>> >>>> ________________________________________ >>>> >>>> From: Kendall, Russell C >>>> Sent: Wednesday, November 27, 2019 9:46 AM >>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>>> ademola.abodunrin at us.af.mil>; Buffaloe, >>>> Christopher ; Molina, >>>> Toby ; >>>> Crace, Jared E ; SANCHEZ, MARK >>>> GG-13 USAF AFMC AFLCMC/HNCP >>>> Cc: >>>> tmiller at mitre.org < >>>> tmiller at mitre.org> >>>> Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy >>>> Errors >>>> >>>> >>>> >>>> Gentlemen, >>>> >>>> The application development teams working in the new >>>> GovCloud OCP environment (unified-platform.io < >>>> http://unified-platform.io> >>>> ) >>>> are currently blocked in efforts to deploy new pods for >>>> testing, development, and UAT. >>>> >>>> Red Hat and RogueOne SMEs have been notified and have >>>> attempted some fixes starting on Monday 11/25, but at this point have not >>>> been able to provision resources >>>> sufficient to host CCAT and AAM. >>>> >>>> We have taken steps to minimize our footprint (eliminating >>>> demonstration environment, deleting developer namespaces), but this is not >>>> a sustainable approach, >>>> and has only resulted in moderate improvements in cluster >>>> performance. >>>> >>>> Our hope is the U-P.io cluster compute resources can be >>>> increased very soon, so that we may resume normal development activities. >>>> Our understanding is that >>>> such a scaling requires a complete redeployment of the >>>> cluster, which is unusual, but an acceptable loss to productivity. If the >>>> cluster can be scaled up over the Thanksgiving holiday, the impact will be >>>> minimal to developers and cluster administrators, >>>> alike. >>>> >>>> We are currently collaborating on solutions on the >>>> following MatterMost channel behind the space camp VPN (link below), and >>>> via the email thread forwarded >>>> (further below). >>>> >>>> >>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node >>>> < >>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> >>>> >>>> Please keep me posted on developments and I will coordinate >>>> developer activities with any scheduled platform outages. >>>> >>>> V/R, >>>> Russell C Kendall >>>> >>>> ________________________________________ >>>> >>>> From: Curran, Daniel M >>>> Sent: Monday, November 25, 2019 2:47 PM >>>> To: Jonathan Rickard >>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>> dlystra at redhat.com ; Sison, >>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>>> Phil Soliz; >>>> Buffaloe, >>>> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >>>> Joseph J >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Sounds great. Appreciate it. >>>> I'll watch email and Mattermost in case you need more from >>>> us. >>>> >>>> -Daniel >>>> >>>> ________________________________________ >>>> >>>> From: Jonathan Rickard >>>> Sent: Monday, November 25, 2019 2:44 PM >>>> To: Curran, Daniel M >>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>> dlystra at redhat.com ; Sison, >>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>>> Phil Soliz; >>>> Buffaloe, >>>> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >>>> Joseph J >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Thanks Daniel - >>>> >>>> >>>> >>>> I'll continue to look into the resource issue that you're >>>> seeing - I'd like to identify the root cause and then work with the team to >>>> come up with a solution. >>>> >>>> >>>> >>>> Jonathan Rickard, >>>> RHCA >>>> Principal Consultant, NAPS >>>> Red >>>> Hat Remote - Texas >>>> jonny at redhat.com >>>> >>>> M: 210-862-9739 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M < >>>> Daniel.Curran at mantech.com> >>>> wrote: >>>> >>>> >>>> Yeah we hit the limit then had AAM kill some of their >>>> projects and then our pods got scheduled. >>>> We've hit the limit again though. Here's an example pod >>>> that cannot be scheduled >>>> >>>> >>>> >>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >>>> < >>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth> >>>> < >>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >>>> > >>>> They're seeing it when their jenkins slaves can't deploy >>>> but it's basically any pod after we hit some limit. >>>> >>>> -Daniel >>>> ________________________________________ >>>> >>>> From: Jonathan Rickard >>>> Sent: Monday, November 25, 2019 1:26 PM >>>> To: Curran, Daniel M >>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>> dlystra at redhat.com ; Sison, >>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>>> Phil Soliz; >>>> Buffaloe, >>>> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >>>> Joseph J >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Daniel, >>>> >>>> >>>> >>>> I can see that you have 3 mongo pods, 1 chatup and 1 upbot >>>> pod running ... is your app good to go? >>>> >>>> >>>> >>>> Looks like there was an issue with memory on 1 pod, then >>>> some node selector being mismatched - just what i could see in the events... >>>> >>>> >>>> >>>> >>>> >>>> >>>> Jonathan Rickard, >>>> RHCA >>>> Principal Consultant, NAPS >>>> Red >>>> Hat Remote - Texas >>>> jonny at redhat.com >>>> >>>> M: 210-862-9739 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M < >>>> Daniel.Curran at mantech.com> >>>> wrote: >>>> >>>> >>>> Also, AAM was having similar issues. Looks like they had a >>>> lot of namespaces and scaling down the pods on their deployments didn't >>>> help but actually deleting the namespaces >>>> did. >>>> We have pods scheduling now but I'm adding them and we'd >>>> still like to work through what resource limit we were hitting to avoid >>>> this in the future. >>>> >>>> -Daniel >>>> >>>> ________________________________________ >>>> >>>> From: Curran, Daniel M >>>> Sent: Monday, November 25, 2019 12:25 PM >>>> To: Jonathan Rickard >>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>> dlystra at redhat.com ; Sison, >>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>>> Phil Soliz; >>>> Buffaloe, >>>> Christopher; Torres, Alexander >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Thanks, sir. >>>> Most important for us to get working is "ccat-demo" but >>>> it's also happening in "ccat-dev" and "ccat-ci-cd". >>>> >>>> -Daniel >>>> ________________________________________ >>>> >>>> From: Jonathan Rickard >>>> Sent: Monday, November 25, 2019 12:22 PM >>>> To: Curran, Daniel M >>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>> dlystra at redhat.com ; Sison, >>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>>> Phil Soliz; >>>> Buffaloe, >>>> Christopher; Torres, Alexander >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> What's the name of the project you're working in? I'm going >>>> to be back at my laptop in about 30 and will take a look when I get there. >>>> >>>> >>>> >>>> Is it just the Jenkins pods failing? >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M < >>>> Daniel.Curran at mantech.com> >>>> wrote: >>>> >>>> >>>> Adding Dean and Alex. >>>> Also, sitting in mattermost if anyone needs to get online >>>> and chat for more information. >>>> >>>> -Daniel >>>> >>>> ________________________________________ >>>> >>>> From: Curran, Daniel M >>>> Sent: Monday, November 25, 2019 12:07 PM >>>> To: >>>> jonny at redhat.com ; >>>> >>>> ckuperst at redhat.com ; Mark >>>> Nissley >>>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell >>>> C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Adding Kupe and Mark. >>>> >>>> -Daniel >>>> ________________________________________ >>>> >>>> From: Curran, Daniel M >>>> Sent: Monday, November 25, 2019 11:43 AM >>>> To: >>>> jonny at redhat.com >>>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell >>>> C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >>>> Subject: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Hey Jonny, >>>> >>>> We met briefly at SpaceCAMP a couple weeks ago when >>>> >>>> >>>> >>>> cluster.unified-platform.io < >>>> http://cluster.unified-platform.io> >>> > >>>> was stood up. We've been trying to deploy some apps today and so >>>> far today we're getting errors on most (if >>>> not all) of our pods. >>>> >>>> 0/9 nodes are available: 3 Insufficient pods, 6 node(s) >>>> didn't match node selector. >>>> >>>> Is what we're seeing. We were thinking it was some volume >>>> types weren't correct but some of our pods don't even have volumes attached >>>> and still give us this error (i.e. Jenkins >>>> slaves or web frontends without persistent storage). >>>> Any idea what this could be? We're not running out of space >>>> on the nodes themselves are we? >>>> We have a demo scheduled for tomorrow at 9:30 AM CST and >>>> are hoping to get a demo env up for them today but this error came up >>>> unexpectedly. Also, we're here at 500 Navarro >>>> St. in San Antonio working through this in person is >>>> better/easier. >>>> >>>> Thanks, >>>> Daniel Curran >>>> >>>> >>>> >>>> >>>> >>>> ________________________________________ >>>> >>>> >>>> This e-mail and any attachments are intended only for the >>>> use of the addressee(s) named herein and may contain proprietary >>>> information. If you are not the intended recipient of this e-mail or >>>> believe that you received this email in error, please take >>>> immediate >>>> action to notify the sender of the apparent error by reply >>>> e-mail; permanently delete the e-mail and any attachments from your >>>> computer; and do not disseminate, distribute, use, or copy this message and >>>> any attachments. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmiller at mitre.org Wed Dec 4 13:02:08 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Wed, 4 Dec 2019 13:02:08 +0000 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: References: <1574703796843.94672@ManTech.com> <1574705271520.5593@ManTech.com> <1574705967080.55870@ManTech.com> <1574706319228.45425@ManTech.com> <1574707851362.16520@ManTech.com> <1574713055636.69810@ManTech.com> <1574714851887.51023@ManTech.com> <1574869597001.92851@ManTech.com> <1575303080504.65964@ManTech.com> <1575311511650.96981@ManTech.com> <6277_1575312368_5DE55BEF_6277_803_1_CA+EGyAB6=xB9Na5Wf-nWY52dwt1200Rk8uh6krw1CXojEaO1PQ@mail.gmail.com> <7E65F8DF-F96B-4657-82A2-4A95715FAC0B@mitre.org> <5A89B24C-3C36-4205-B083-CBFBF71ED122@mitre.org> <3052516616bb44caa350c3c6897bc122@DCCPRDEXCH02.ManTech.com> Message-ID: <406B92B0-26F9-48FD-B533-549CC36B4FB2@mitre.org> Johnny-- Update the issue, if you would be so kind. -- T ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of Jonathan Rickard" wrote: Hey Guys - Sorry for taking so long - this has been completed. Please run your builds and let us know if you're having any problems. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard wrote: Russell / Team, We believe we've identified the issue with your application deploying. In order to rectify the issue I need to evacuate pods so you will probably see some hiccups while deploying. I will update when this is resolved. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap wrote: Hey all, we have opened an issue below, that we believe to be the cause, we are currently investigating: https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard wrote: Russell, Getting more eyes on this @platformONE at redhat.com We'll keep you posted. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C wrote: Kevin, Unfortunately we are receiving deployment errors again. This is the event: 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. This is the deployment: https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. Sent: Monday, December 2, 2019 2:44:21 PM To: Kevin O'Donnell Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors Tagged you on it. -- T On 12/2/19, 14:03, "Kevin O'Donnell" wrote: Hello, Autoscaling is on our future IAC roadmap. Tim, the additional ticket would be appreciated. We have swapped out the app/worker instances with m5a.8xlarge 32 cores, 128gb of ram. Please let us know if you have any other issues. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. wrote: I'll open an issue. IaC needs to have instance size as a host_var to facilitate scaling. -- T On 12/2/19, 13:15, "Kevin O'Donnell" wrote: Tim, Thanks for the information. We are undersized on the app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb of ram. From what I have read each Labs engagement operated on a 3 node worker cluster with each node having 6core's and 28gb of ram. We will need to swap out the existing instances with larger spec's. We are going to try to flush the existing workload out on one of the workers to see if we can swap them out one at a time. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. wrote: Here's what I can see, given the perm limits I seem to be under: - NS:develop-misp-app and NS:lp-develop-misp-app both have several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned while trying to fetch something from somewhere (URL isn't recorded in the stack trace). - NS:minishift-misp-app has most of its pods/jobs stuck in ImagePullBackoff. No detail there in the event stream so I'll see if I can dig deeper. - NS:aam-ci-cd has Jenkins trying to spin up three workers, those are coming back as unschedulable. I can't see into NS:aam-bases or NS:dsop-images b/c of perm limits. I see no DAS-related project(s). The MISP stuff needs debugging before calling "blocked" since it looks like an internal error from this perspective. In re: AAM Jenkins: If this deployment is coming out of the OCP storefront, then maybe it should be ephemeral rather than persistent. If it's a custom deployment, then it probably needs a rethink. I'm also not sure why there are two MISP dev projects. -- T On 12/2/19, 12:46, "Kevin O'Donnell" wrote: Russell, Thank you for the information. We can switch out the instance type for the worker nodes. How much memory is required by the apps? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C wrote: Kevin, The lack of resources on u-p.io cluster is hindering development, testing, and integration of the apps from CCAT AAM DAS, which is putting one of our PI goals at risk. We are blocked by the fact that we (CCAT and AAM) cannot deploy additional pods to the unified-platform.io cluster. We have a subset of containers deployed, but rolling deployments and new deployments fail. This means that we are not able to execute integration testing or peer reviews. We are temporarily working around by NOT testing/reviewing our code changes live, something that no one likes. Also, we are now running weeks-old instances of our containers, so we are very likely producing some technical debt. We currently have developers approaching idle or doing non-priority work until the resource issue is resolved. Here is the particular error from the OSP cluster I received while attempting a redeploy of one of our apps. 0/9 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node selector.11 times in the last minute Since we do not have any cluster permissions, I cannot verify which resource is running out, but from experience, I assess it is a memory issue. It appears the cluster has been provisioned with a silly allocation of node types. Without knowing exactly what was deployed, it appears only 3 of the 9 hosts are suitable worker nodes. We would expect the cluster to respond to resource limitations and scale, but if a scheduled downtime is required, please work with us so we can anticipate. As it stands, the cluster does not support resources required by CCAT and the other dev teams (AAM, DAS, etc.). We would accept any downtime if it will improve the situation, as we are blocked from progressing under the current constraints. My hope was we could get the cluster redeployed over the TG holiday to eliminate developer impact, but as Mark pointed out, there were limited support folks available. Now I am just trying to minimize the losses. V/R, Russell C Kendall ________________________________________ From: Kevin O'Donnell Sent: Monday, December 2, 2019 11:52 AM To: Kendall, Russell C Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors Hello Russell, Can you elaborate on the term Blocked? What specific issues are the blockers? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C wrote: Mark, Thank for acknowledging, please be aware the San Antonio dev teams working in unified-platform.io are currently blocked. V/R, Russell C Kendall ________________________________________ From: Mark Nissley Sent: Monday, December 2, 2019 9:36 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors As noted, I don't suspect much got done on this over the holiday weekend. I did see the ticket, as dropped some details into it. I also assigned it to @Jonathan Rickard and @Chris Kuperstein . It looks like short term solutions have been easy but the issue is recurring. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 Scheduled Training: October 14-18 On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP wrote: Mark/Kevin, I just heard at the team stand up that we are still blocked. This is also affecting the AAM team from my investigations. Please let me know if there is something we need to do to move this forward. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Sent: Wednesday, November 27, 2019 12:58 PM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP ; Mark Nissley ; Kevin O'Donnell ; Brenna Gordon Cc: Kendall, Russell C ; Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Miller, Timothy J. ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors Thanks a lot Capt Bryan! Russell created the ticket on GitLab UP Node Project. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP Sent: Wednesday, November 27, 2019 12:56 PM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP ; Mark Nissley ; Kevin O'Donnell ; Brenna Gordon Cc: Kendall, Russell C ; Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Miller, Timothy J. ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: RE: Unified Platform Pod Deploy Errors Thanks Ade. The team is thin until next week due to the holidays but I will make sure it is addressed. Were there any issues submitted to Gitlab?s UP Node Project on DCCSCR? @Mark/Kevin ? can we address? -Austen From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Sent: Wednesday, November 27, 2019 9:51 AM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP Cc: Kendall, Russell C ; Bubb, Mike (mbubb at mitre.org) Subject: Fw: Unified Platform Pod Deploy Errors Capt Bryan, Please see the explanation on the issue that Ginyu Force is currently experiencing below. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: Kendall, Russell C Sent: Wednesday, November 27, 2019 9:46 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP ; Buffaloe, Christopher ; Molina, Toby ; Crace, Jared E ; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP Cc: tmiller at mitre.org Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy Errors Gentlemen, The application development teams working in the new GovCloud OCP environment (unified-platform.io ) are currently blocked in efforts to deploy new pods for testing, development, and UAT. Red Hat and RogueOne SMEs have been notified and have attempted some fixes starting on Monday 11/25, but at this point have not been able to provision resources sufficient to host CCAT and AAM. We have taken steps to minimize our footprint (eliminating demonstration environment, deleting developer namespaces), but this is not a sustainable approach, and has only resulted in moderate improvements in cluster performance. Our hope is the U-P.io cluster compute resources can be increased very soon, so that we may resume normal development activities. Our understanding is that such a scaling requires a complete redeployment of the cluster, which is unusual, but an acceptable loss to productivity. If the cluster can be scaled up over the Thanksgiving holiday, the impact will be minimal to developers and cluster administrators, alike. We are currently collaborating on solutions on the following MatterMost channel behind the space camp VPN (link below), and via the email thread forwarded (further below). https://chat.spacecamp.ninja/levelup/channels/unified-platform-node Please keep me posted on developments and I will coordinate developer activities with any scheduled platform outages. V/R, Russell C Kendall ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 2:47 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Sounds great. Appreciate it. I'll watch email and Mattermost in case you need more from us. -Daniel ________________________________________ From: Jonathan Rickard Sent: Monday, November 25, 2019 2:44 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Thanks Daniel - I'll continue to look into the resource issue that you're seeing - I'd like to identify the root cause and then work with the team to come up with a solution. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M wrote: Yeah we hit the limit then had AAM kill some of their projects and then our pods got scheduled. We've hit the limit again though. Here's an example pod that cannot be scheduled https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth They're seeing it when their jenkins slaves can't deploy but it's basically any pod after we hit some limit. -Daniel ________________________________________ From: Jonathan Rickard Sent: Monday, November 25, 2019 1:26 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Daniel, I can see that you have 3 mongo pods, 1 chatup and 1 upbot pod running ... is your app good to go? Looks like there was an issue with memory on 1 pod, then some node selector being mismatched - just what i could see in the events... Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M wrote: Also, AAM was having similar issues. Looks like they had a lot of namespaces and scaling down the pods on their deployments didn't help but actually deleting the namespaces did. We have pods scheduling now but I'm adding them and we'd still like to work through what resource limit we were hitting to avoid this in the future. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:25 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors Thanks, sir. Most important for us to get working is "ccat-demo" but it's also happening in "ccat-dev" and "ccat-ci-cd". -Daniel ________________________________________ From: Jonathan Rickard Sent: Monday, November 25, 2019 12:22 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors What's the name of the project you're working in? I'm going to be back at my laptop in about 30 and will take a look when I get there. Is it just the Jenkins pods failing? On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M wrote: Adding Dean and Alex. Also, sitting in mattermost if anyone needs to get online and chat for more information. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:07 PM To: jonny at redhat.com ; ckuperst at redhat.com ; Mark Nissley Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Re: Unified Platform Pod Deploy Errors Adding Kupe and Mark. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 11:43 AM To: jonny at redhat.com Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Unified Platform Pod Deploy Errors Hey Jonny, We met briefly at SpaceCAMP a couple weeks ago when cluster.unified-platform.io was stood up. We've been trying to deploy some apps today and so far today we're getting errors on most (if not all) of our pods. 0/9 nodes are available: 3 Insufficient pods, 6 node(s) didn't match node selector. Is what we're seeing. We were thinking it was some volume types weren't correct but some of our pods don't even have volumes attached and still give us this error (i.e. Jenkins slaves or web frontends without persistent storage). Any idea what this could be? We're not running out of space on the nodes themselves are we? We have a demo scheduled for tomorrow at 9:30 AM CST and are hoping to get a demo env up for them today but this error came up unexpectedly. Also, we're here at 500 Navarro St. in San Antonio working through this in person is better/easier. Thanks, Daniel Curran ________________________________________ This e-mail and any attachments are intended only for the use of the addressee(s) named herein and may contain proprietary information. If you are not the intended recipient of this e-mail or believe that you received this email in error, please take immediate action to notify the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments from your computer; and do not disseminate, distribute, use, or copy this message and any attachments. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From tmiller at mitre.org Wed Dec 4 13:44:16 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Wed, 4 Dec 2019 13:44:16 +0000 Subject: [Platformone] [EXT] Re: Riddle me this, Batman (odd things in up-prod) In-Reply-To: <31917_1575411307_5DE6DE6B_31917_556_1_CA+EGyABXovDYz0NacD6P3gs3Ufze1Qy_0eon40BP8VN-cP0X7w@mail.gmail.com> References: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> <31917_1575411307_5DE6DE6B_31917_556_1_CA+EGyABXovDYz0NacD6P3gs3Ufze1Qy_0eon40BP8VN-cP0X7w@mail.gmail.com> Message-ID: <222A25E0-4FB0-41E4-B573-3F69C72B760C@mitre.org> up-prod-ocp-bastion is in the IaC but is only accessible from up-ss-vpc's gitlab-ci-runner and from up-prod's /16 address space. So it's not much use as a bastion. That leaves "onetime" and up-prod-bastion. Neither is IaC. Which is the ACAS host? -- T ?On 12/3/19, 16:15, "Kevin O'Donnell" wrote: Bastion creation is iac, and the other ec2 that?s running in prod is for acas and was created to scan and will be shutdown after the scans are done On Tue, Dec 3, 2019 at 3:34 PM Miller, Timothy J. wrote: - There are three bastion hosts (up-prod-bastion, up-prod-ocp-bastion, and "onetime"). Of these, I can find only up-prod-ocp-bastion in the IaC definition. Both up-prod-bastion and "onetime" look like they were built separately ("onetime" is baselined on CentOS--which is a giveaway--and up-prod-bastion is attached to the `bastion-ssh` security group--which AFAICT is also not part of the IaC). I recall someone (Dean?) telling me that there's no BH in the IaC, but that's not true (see consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). - up-prod-openscap and up-prod-sso-server have a public IP but its inbound rules permit only traffic from the VPC subnets (10.40.0.0/16 ) and the up-ss-vpc gitlab-ci-runner instance. - up-prod-openscap is attached to the up-prod-ocp-nodes SG, which is doesn't seem right. That opens a bunch of ports that probably don't matter to a scan host. - up-prod-sso-server has a public IP it doesn't need since traffic is handled by up-prod-sso-elb. FWIW, public IPs are assigned to up-prod-bastion, up-prod-openscap, up-prod-satellite, up-prod-sso-server, and "onetime". The bastion host and openscap kinda make sense, though you can jump to openscap from the BH. Damnfino what "onetime" is supposed to be. I'm not sure which of these or all of 'em should be turned into issues. Comments? -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 From tmiller at mitre.org Wed Dec 4 13:53:48 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Wed, 4 Dec 2019 13:53:48 +0000 Subject: [Platformone] [EXT] Re: Riddle me this, Batman (odd things in up-prod) In-Reply-To: <7098_1575412443_5DE6E2DB_7098_1003_1_CAPeU9HY-y6dX2ZqyDe_xk=xs9CxuAGcifQyT8nSgxo6wWZ0XPQ@mail.gmail.com> References: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> <7098_1575412443_5DE6E2DB_7098_1003_1_CAPeU9HY-y6dX2ZqyDe_xk=xs9CxuAGcifQyT8nSgxo6wWZ0XPQ@mail.gmail.com> Message-ID: Is that one up-prod-bastion? I'm putting an issue against platform-infrastructure. The bastion is broken in a couple ways: - inbound SG rule defaults to `{{ cidr }}` address space, which resolves out to the VPC addresses - it's in the private subnet (probably doesn't matter, but helps humans keep things straight) - no public IP. -- T ?On 12/3/19, 16:34, "Dean Lystra" wrote: One bastion host was created for the sole purpose of allowing access to the IdM CLI. This was done as a quick fix to get the users created and for administrative purposes. Access to IdM via web console or CLI is not available from the internet. onetime is a mystery to me. On Tue, Dec 3, 2019, 2:15 PM Kevin O'Donnell wrote: Bastion creation is iac, and the other ec2 that?s running in prod is for acas and was created to scan and will be shutdown after the scans are done On Tue, Dec 3, 2019 at 3:34 PM Miller, Timothy J. wrote: - There are three bastion hosts (up-prod-bastion, up-prod-ocp-bastion, and "onetime"). Of these, I can find only up-prod-ocp-bastion in the IaC definition. Both up-prod-bastion and "onetime" look like they were built separately ("onetime" is baselined on CentOS--which is a giveaway--and up-prod-bastion is attached to the `bastion-ssh` security group--which AFAICT is also not part of the IaC). I recall someone (Dean?) telling me that there's no BH in the IaC, but that's not true (see consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). - up-prod-openscap and up-prod-sso-server have a public IP but its inbound rules permit only traffic from the VPC subnets (10.40.0.0/16 ) and the up-ss-vpc gitlab-ci-runner instance. - up-prod-openscap is attached to the up-prod-ocp-nodes SG, which is doesn't seem right. That opens a bunch of ports that probably don't matter to a scan host. - up-prod-sso-server has a public IP it doesn't need since traffic is handled by up-prod-sso-elb. FWIW, public IPs are assigned to up-prod-bastion, up-prod-openscap, up-prod-satellite, up-prod-sso-server, and "onetime". The bastion host and openscap kinda make sense, though you can jump to openscap from the BH. Damnfino what "onetime" is supposed to be. I'm not sure which of these or all of 'em should be turned into issues. Comments? -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From tmiller at mitre.org Wed Dec 4 14:34:48 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Wed, 4 Dec 2019 14:34:48 +0000 Subject: [Platformone] Uneven scheduling? Message-ID: Since I still lack a cluster scope role binding of sufficient ?power I can't actually poke into the cluster APIs to investigate, but from AWS stats it looks like only one of three workers (i-0b5853ff5f6d7a2ad) is actually scheduling pods. i-0b5853ff5f6d7a2ad is clocking in @~20% CPU util, but i-0ee498e48327725ec is clocking @~5% and i-02fa1e07bc88e79c6 is clocking @~0%. I don't (yet) grok the internals of kube-scheduler, but is this a case of associated workloads being kept together to limit SDN utilization, or is it just normal for the scheduler to be this uneven? -- T From Colleen.Feiglstok at ngc.com Wed Dec 4 15:05:24 2019 From: Colleen.Feiglstok at ngc.com (Feiglstok, Colleen M [US] (MS)) Date: Wed, 4 Dec 2019 15:05:24 +0000 Subject: [Platformone] [EXT] Re: Riddle me this, Batman (odd things in up-prod) In-Reply-To: References: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> <7098_1575412443_5DE6E2DB_7098_1003_1_CAPeU9HY-y6dX2ZqyDe_xk=xs9CxuAGcifQyT8nSgxo6wWZ0XPQ@mail.gmail.com> Message-ID: <71b7c7b0221a45189020af37a02537c2@XCGVAG22.northgrum.com> The onetime was stood up for security testing. Eric Blade is in the process of creating an AMI that we will use in the future that will have all security tools in one place. It is not part of the IAC. -----Original Message----- From: Miller, Timothy J. Sent: Wednesday, December 4, 2019 8:54 AM To: Dean Lystra ; Kevin O'Donnell Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) ; Mathew Huston ; Nunez, Carlos A [US] (MS) (Contr) Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman (odd things in up-prod) Is that one up-prod-bastion? I'm putting an issue against platform-infrastructure. The bastion is broken in a couple ways: - inbound SG rule defaults to `{{ cidr }}` address space, which resolves out to the VPC addresses - it's in the private subnet (probably doesn't matter, but helps humans keep things straight) - no public IP. -- T ?On 12/3/19, 16:34, "Dean Lystra" wrote: One bastion host was created for the sole purpose of allowing access to the IdM CLI. This was done as a quick fix to get the users created and for administrative purposes. Access to IdM via web console or CLI is not available from the internet. onetime is a mystery to me. On Tue, Dec 3, 2019, 2:15 PM Kevin O'Donnell wrote: Bastion creation is iac, and the other ec2 that?s running in prod is for acas and was created to scan and will be shutdown after the scans are done On Tue, Dec 3, 2019 at 3:34 PM Miller, Timothy J. wrote: - There are three bastion hosts (up-prod-bastion, up-prod-ocp-bastion, and "onetime"). Of these, I can find only up-prod-ocp-bastion in the IaC definition. Both up-prod-bastion and "onetime" look like they were built separately ("onetime" is baselined on CentOS--which is a giveaway--and up-prod-bastion is attached to the `bastion-ssh` security group--which AFAICT is also not part of the IaC). I recall someone (Dean?) telling me that there's no BH in the IaC, but that's not true (see consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). - up-prod-openscap and up-prod-sso-server have a public IP but its inbound rules permit only traffic from the VPC subnets (10.40.0.0/16 ) and the up-ss-vpc gitlab-ci-runner instance. - up-prod-openscap is attached to the up-prod-ocp-nodes SG, which is doesn't seem right. That opens a bunch of ports that probably don't matter to a scan host. - up-prod-sso-server has a public IP it doesn't need since traffic is handled by up-prod-sso-elb. FWIW, public IPs are assigned to up-prod-bastion, up-prod-openscap, up-prod-satellite, up-prod-sso-server, and "onetime". The bastion host and openscap kinda make sense, though you can jump to openscap from the BH. Damnfino what "onetime" is supposed to be. I'm not sure which of these or all of 'em should be turned into issues. Comments? -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From Carlos.Nunez2 at ngc.com Wed Dec 4 15:07:00 2019 From: Carlos.Nunez2 at ngc.com (Nunez, Carlos A [US] (MS) (Contr)) Date: Wed, 4 Dec 2019 15:07:00 +0000 Subject: [Platformone] [EXT] Re: Riddle me this, Batman (odd things in up-prod) In-Reply-To: References: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> <7098_1575412443_5DE6E2DB_7098_1003_1_CAPeU9HY-y6dX2ZqyDe_xk=xs9CxuAGcifQyT8nSgxo6wWZ0XPQ@mail.gmail.com> Message-ID: <98b636fa65e74df9a32cca3a37381c1c@XCGVAG21.northgrum.com> Bastion cannot be in a private subnet. That is the EC2 that admins/devs will connect to at port 22. If it is in a private subnet it has no way to be public facing and cannot have a public IP assigned to it. V/R Adrian Nu?ez Senior Cloud Architect NG Email:? carlos.nunez2 at ngc.com Bylight email:? carlos.nunez at bylight.com Cell:? (571)230-5289 -----Original Message----- From: Miller, Timothy J. Sent: Wednesday, December 4, 2019 8:54 AM To: Dean Lystra ; Kevin O'Donnell Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) ; Mathew Huston ; Nunez, Carlos A [US] (MS) (Contr) Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman (odd things in up-prod) Is that one up-prod-bastion? I'm putting an issue against platform-infrastructure. The bastion is broken in a couple ways: - inbound SG rule defaults to `{{ cidr }}` address space, which resolves out to the VPC addresses - it's in the private subnet (probably doesn't matter, but helps humans keep things straight) - no public IP. -- T ?On 12/3/19, 16:34, "Dean Lystra" wrote: One bastion host was created for the sole purpose of allowing access to the IdM CLI. This was done as a quick fix to get the users created and for administrative purposes. Access to IdM via web console or CLI is not available from the internet. onetime is a mystery to me. On Tue, Dec 3, 2019, 2:15 PM Kevin O'Donnell wrote: Bastion creation is iac, and the other ec2 that?s running in prod is for acas and was created to scan and will be shutdown after the scans are done On Tue, Dec 3, 2019 at 3:34 PM Miller, Timothy J. wrote: - There are three bastion hosts (up-prod-bastion, up-prod-ocp-bastion, and "onetime"). Of these, I can find only up-prod-ocp-bastion in the IaC definition. Both up-prod-bastion and "onetime" look like they were built separately ("onetime" is baselined on CentOS--which is a giveaway--and up-prod-bastion is attached to the `bastion-ssh` security group--which AFAICT is also not part of the IaC). I recall someone (Dean?) telling me that there's no BH in the IaC, but that's not true (see consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). - up-prod-openscap and up-prod-sso-server have a public IP but its inbound rules permit only traffic from the VPC subnets (10.40.0.0/16 ) and the up-ss-vpc gitlab-ci-runner instance. - up-prod-openscap is attached to the up-prod-ocp-nodes SG, which is doesn't seem right. That opens a bunch of ports that probably don't matter to a scan host. - up-prod-sso-server has a public IP it doesn't need since traffic is handled by up-prod-sso-elb. FWIW, public IPs are assigned to up-prod-bastion, up-prod-openscap, up-prod-satellite, up-prod-sso-server, and "onetime". The bastion host and openscap kinda make sense, though you can jump to openscap from the BH. Damnfino what "onetime" is supposed to be. I'm not sure which of these or all of 'em should be turned into issues. Comments? -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From Russell.Kendall at mantech.com Wed Dec 4 15:12:01 2019 From: Russell.Kendall at mantech.com (Kendall, Russell C) Date: Wed, 4 Dec 2019 15:12:01 +0000 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: <406B92B0-26F9-48FD-B533-549CC36B4FB2@mitre.org> References: <1574703796843.94672@ManTech.com> <1574705271520.5593@ManTech.com> <1574705967080.55870@ManTech.com> <1574706319228.45425@ManTech.com> <1574707851362.16520@ManTech.com> <1574713055636.69810@ManTech.com> <1574714851887.51023@ManTech.com> <1574869597001.92851@ManTech.com> <1575303080504.65964@ManTech.com> <1575311511650.96981@ManTech.com> <6277_1575312368_5DE55BEF_6277_803_1_CA+EGyAB6=xB9Na5Wf-nWY52dwt1200Rk8uh6krw1CXojEaO1PQ@mail.gmail.com> <7E65F8DF-F96B-4657-82A2-4A95715FAC0B@mitre.org> <5A89B24C-3C36-4205-B083-CBFBF71ED122@mitre.org> <3052516616bb44caa350c3c6897bc122@DCCPRDEXCH02.ManTech.com> , <406B92B0-26F9-48FD-B533-549CC36B4FB2@mitre.org> Message-ID: <1575472320840.60379@ManTech.com> Jonny, This issue does not appear to be resolved. 3 nodes have taints. V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. Sent: Wednesday, December 4, 2019 7:02 AM To: Jonathan Rickard; Keegan Reap Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Johnny-- Update the issue, if you would be so kind. -- T ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of Jonathan Rickard" wrote: Hey Guys - Sorry for taking so long - this has been completed. Please run your builds and let us know if you're having any problems. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard wrote: Russell / Team, We believe we've identified the issue with your application deploying. In order to rectify the issue I need to evacuate pods so you will probably see some hiccups while deploying. I will update when this is resolved. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap wrote: Hey all, we have opened an issue below, that we believe to be the cause, we are currently investigating: https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard wrote: Russell, Getting more eyes on this @platformONE at redhat.com We'll keep you posted. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C wrote: Kevin, Unfortunately we are receiving deployment errors again. This is the event: 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. This is the deployment: https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. Sent: Monday, December 2, 2019 2:44:21 PM To: Kevin O'Donnell Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors Tagged you on it. -- T On 12/2/19, 14:03, "Kevin O'Donnell" wrote: Hello, Autoscaling is on our future IAC roadmap. Tim, the additional ticket would be appreciated. We have swapped out the app/worker instances with m5a.8xlarge 32 cores, 128gb of ram. Please let us know if you have any other issues. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. wrote: I'll open an issue. IaC needs to have instance size as a host_var to facilitate scaling. -- T On 12/2/19, 13:15, "Kevin O'Donnell" wrote: Tim, Thanks for the information. We are undersized on the app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb of ram. From what I have read each Labs engagement operated on a 3 node worker cluster with each node having 6core's and 28gb of ram. We will need to swap out the existing instances with larger spec's. We are going to try to flush the existing workload out on one of the workers to see if we can swap them out one at a time. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. wrote: Here's what I can see, given the perm limits I seem to be under: - NS:develop-misp-app and NS:lp-develop-misp-app both have several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned while trying to fetch something from somewhere (URL isn't recorded in the stack trace). - NS:minishift-misp-app has most of its pods/jobs stuck in ImagePullBackoff. No detail there in the event stream so I'll see if I can dig deeper. - NS:aam-ci-cd has Jenkins trying to spin up three workers, those are coming back as unschedulable. I can't see into NS:aam-bases or NS:dsop-images b/c of perm limits. I see no DAS-related project(s). The MISP stuff needs debugging before calling "blocked" since it looks like an internal error from this perspective. In re: AAM Jenkins: If this deployment is coming out of the OCP storefront, then maybe it should be ephemeral rather than persistent. If it's a custom deployment, then it probably needs a rethink. I'm also not sure why there are two MISP dev projects. -- T On 12/2/19, 12:46, "Kevin O'Donnell" wrote: Russell, Thank you for the information. We can switch out the instance type for the worker nodes. How much memory is required by the apps? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C wrote: Kevin, The lack of resources on u-p.io cluster is hindering development, testing, and integration of the apps from CCAT AAM DAS, which is putting one of our PI goals at risk. We are blocked by the fact that we (CCAT and AAM) cannot deploy additional pods to the unified-platform.io cluster. We have a subset of containers deployed, but rolling deployments and new deployments fail. This means that we are not able to execute integration testing or peer reviews. We are temporarily working around by NOT testing/reviewing our code changes live, something that no one likes. Also, we are now running weeks-old instances of our containers, so we are very likely producing some technical debt. We currently have developers approaching idle or doing non-priority work until the resource issue is resolved. Here is the particular error from the OSP cluster I received while attempting a redeploy of one of our apps. 0/9 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node selector.11 times in the last minute Since we do not have any cluster permissions, I cannot verify which resource is running out, but from experience, I assess it is a memory issue. It appears the cluster has been provisioned with a silly allocation of node types. Without knowing exactly what was deployed, it appears only 3 of the 9 hosts are suitable worker nodes. We would expect the cluster to respond to resource limitations and scale, but if a scheduled downtime is required, please work with us so we can anticipate. As it stands, the cluster does not support resources required by CCAT and the other dev teams (AAM, DAS, etc.). We would accept any downtime if it will improve the situation, as we are blocked from progressing under the current constraints. My hope was we could get the cluster redeployed over the TG holiday to eliminate developer impact, but as Mark pointed out, there were limited support folks available. Now I am just trying to minimize the losses. V/R, Russell C Kendall ________________________________________ From: Kevin O'Donnell Sent: Monday, December 2, 2019 11:52 AM To: Kendall, Russell C Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors Hello Russell, Can you elaborate on the term Blocked? What specific issues are the blockers? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C wrote: Mark, Thank for acknowledging, please be aware the San Antonio dev teams working in unified-platform.io are currently blocked. V/R, Russell C Kendall ________________________________________ From: Mark Nissley Sent: Monday, December 2, 2019 9:36 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors As noted, I don't suspect much got done on this over the holiday weekend. I did see the ticket, as dropped some details into it. I also assigned it to @Jonathan Rickard and @Chris Kuperstein . It looks like short term solutions have been easy but the issue is recurring. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 Scheduled Training: October 14-18 On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP wrote: Mark/Kevin, I just heard at the team stand up that we are still blocked. This is also affecting the AAM team from my investigations. Please let me know if there is something we need to do to move this forward. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Sent: Wednesday, November 27, 2019 12:58 PM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP ; Mark Nissley ; Kevin O'Donnell ; Brenna Gordon Cc: Kendall, Russell C ; Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Miller, Timothy J. ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors Thanks a lot Capt Bryan! Russell created the ticket on GitLab UP Node Project. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP Sent: Wednesday, November 27, 2019 12:56 PM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP ; Mark Nissley ; Kevin O'Donnell ; Brenna Gordon Cc: Kendall, Russell C ; Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Miller, Timothy J. ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: RE: Unified Platform Pod Deploy Errors Thanks Ade. The team is thin until next week due to the holidays but I will make sure it is addressed. Were there any issues submitted to Gitlab?s UP Node Project on DCCSCR? @Mark/Kevin ? can we address? -Austen From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Sent: Wednesday, November 27, 2019 9:51 AM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP Cc: Kendall, Russell C ; Bubb, Mike (mbubb at mitre.org) Subject: Fw: Unified Platform Pod Deploy Errors Capt Bryan, Please see the explanation on the issue that Ginyu Force is currently experiencing below. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: Kendall, Russell C Sent: Wednesday, November 27, 2019 9:46 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP ; Buffaloe, Christopher ; Molina, Toby ; Crace, Jared E ; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP Cc: tmiller at mitre.org Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy Errors Gentlemen, The application development teams working in the new GovCloud OCP environment (unified-platform.io ) are currently blocked in efforts to deploy new pods for testing, development, and UAT. Red Hat and RogueOne SMEs have been notified and have attempted some fixes starting on Monday 11/25, but at this point have not been able to provision resources sufficient to host CCAT and AAM. We have taken steps to minimize our footprint (eliminating demonstration environment, deleting developer namespaces), but this is not a sustainable approach, and has only resulted in moderate improvements in cluster performance. Our hope is the U-P.io cluster compute resources can be increased very soon, so that we may resume normal development activities. Our understanding is that such a scaling requires a complete redeployment of the cluster, which is unusual, but an acceptable loss to productivity. If the cluster can be scaled up over the Thanksgiving holiday, the impact will be minimal to developers and cluster administrators, alike. We are currently collaborating on solutions on the following MatterMost channel behind the space camp VPN (link below), and via the email thread forwarded (further below). https://chat.spacecamp.ninja/levelup/channels/unified-platform-node Please keep me posted on developments and I will coordinate developer activities with any scheduled platform outages. V/R, Russell C Kendall ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 2:47 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Sounds great. Appreciate it. I'll watch email and Mattermost in case you need more from us. -Daniel ________________________________________ From: Jonathan Rickard Sent: Monday, November 25, 2019 2:44 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Thanks Daniel - I'll continue to look into the resource issue that you're seeing - I'd like to identify the root cause and then work with the team to come up with a solution. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M wrote: Yeah we hit the limit then had AAM kill some of their projects and then our pods got scheduled. We've hit the limit again though. Here's an example pod that cannot be scheduled https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth They're seeing it when their jenkins slaves can't deploy but it's basically any pod after we hit some limit. -Daniel ________________________________________ From: Jonathan Rickard Sent: Monday, November 25, 2019 1:26 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Daniel, I can see that you have 3 mongo pods, 1 chatup and 1 upbot pod running ... is your app good to go? Looks like there was an issue with memory on 1 pod, then some node selector being mismatched - just what i could see in the events... Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M wrote: Also, AAM was having similar issues. Looks like they had a lot of namespaces and scaling down the pods on their deployments didn't help but actually deleting the namespaces did. We have pods scheduling now but I'm adding them and we'd still like to work through what resource limit we were hitting to avoid this in the future. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:25 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors Thanks, sir. Most important for us to get working is "ccat-demo" but it's also happening in "ccat-dev" and "ccat-ci-cd". -Daniel ________________________________________ From: Jonathan Rickard Sent: Monday, November 25, 2019 12:22 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors What's the name of the project you're working in? I'm going to be back at my laptop in about 30 and will take a look when I get there. Is it just the Jenkins pods failing? On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M wrote: Adding Dean and Alex. Also, sitting in mattermost if anyone needs to get online and chat for more information. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:07 PM To: jonny at redhat.com ; ckuperst at redhat.com ; Mark Nissley Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Re: Unified Platform Pod Deploy Errors Adding Kupe and Mark. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 11:43 AM To: jonny at redhat.com Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Unified Platform Pod Deploy Errors Hey Jonny, We met briefly at SpaceCAMP a couple weeks ago when cluster.unified-platform.io was stood up. We've been trying to deploy some apps today and so far today we're getting errors on most (if not all) of our pods. 0/9 nodes are available: 3 Insufficient pods, 6 node(s) didn't match node selector. Is what we're seeing. We were thinking it was some volume types weren't correct but some of our pods don't even have volumes attached and still give us this error (i.e. Jenkins slaves or web frontends without persistent storage). Any idea what this could be? We're not running out of space on the nodes themselves are we? We have a demo scheduled for tomorrow at 9:30 AM CST and are hoping to get a demo env up for them today but this error came up unexpectedly. Also, we're here at 500 Navarro St. in San Antonio working through this in person is better/easier. Thanks, Daniel Curran ________________________________________ This e-mail and any attachments are intended only for the use of the addressee(s) named herein and may contain proprietary information. If you are not the intended recipient of this e-mail or believe that you received this email in error, please take immediate action to notify the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments from your computer; and do not disseminate, distribute, use, or copy this message and any attachments. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From Russell.Kendall at mantech.com Wed Dec 4 15:16:08 2019 From: Russell.Kendall at mantech.com (Kendall, Russell C) Date: Wed, 4 Dec 2019 15:16:08 +0000 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: <406B92B0-26F9-48FD-B533-549CC36B4FB2@mitre.org> References: <1574703796843.94672@ManTech.com> <1574705271520.5593@ManTech.com> <1574705967080.55870@ManTech.com> <1574706319228.45425@ManTech.com> <1574707851362.16520@ManTech.com> <1574713055636.69810@ManTech.com> <1574714851887.51023@ManTech.com> <1574869597001.92851@ManTech.com> <1575303080504.65964@ManTech.com> <1575311511650.96981@ManTech.com> <6277_1575312368_5DE55BEF_6277_803_1_CA+EGyAB6=xB9Na5Wf-nWY52dwt1200Rk8uh6krw1CXojEaO1PQ@mail.gmail.com> <7E65F8DF-F96B-4657-82A2-4A95715FAC0B@mitre.org> <5A89B24C-3C36-4205-B083-CBFBF71ED122@mitre.org> <3052516616bb44caa350c3c6897bc122@DCCPRDEXCH02.ManTech.com> , <406B92B0-26F9-48FD-B533-549CC36B4FB2@mitre.org> Message-ID: <1575472568190.35402@ManTech.com> Jonny, I'd like to suggest you come to 500 to wrap this up, since it seems there are significant delays in communication that are contributing to downtime. V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. Sent: Wednesday, December 4, 2019 7:02 AM To: Jonathan Rickard; Keegan Reap Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Johnny-- Update the issue, if you would be so kind. -- T ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of Jonathan Rickard" wrote: Hey Guys - Sorry for taking so long - this has been completed. Please run your builds and let us know if you're having any problems. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard wrote: Russell / Team, We believe we've identified the issue with your application deploying. In order to rectify the issue I need to evacuate pods so you will probably see some hiccups while deploying. I will update when this is resolved. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap wrote: Hey all, we have opened an issue below, that we believe to be the cause, we are currently investigating: https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard wrote: Russell, Getting more eyes on this @platformONE at redhat.com We'll keep you posted. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C wrote: Kevin, Unfortunately we are receiving deployment errors again. This is the event: 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. This is the deployment: https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. Sent: Monday, December 2, 2019 2:44:21 PM To: Kevin O'Donnell Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors Tagged you on it. -- T On 12/2/19, 14:03, "Kevin O'Donnell" wrote: Hello, Autoscaling is on our future IAC roadmap. Tim, the additional ticket would be appreciated. We have swapped out the app/worker instances with m5a.8xlarge 32 cores, 128gb of ram. Please let us know if you have any other issues. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. wrote: I'll open an issue. IaC needs to have instance size as a host_var to facilitate scaling. -- T On 12/2/19, 13:15, "Kevin O'Donnell" wrote: Tim, Thanks for the information. We are undersized on the app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb of ram. From what I have read each Labs engagement operated on a 3 node worker cluster with each node having 6core's and 28gb of ram. We will need to swap out the existing instances with larger spec's. We are going to try to flush the existing workload out on one of the workers to see if we can swap them out one at a time. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. wrote: Here's what I can see, given the perm limits I seem to be under: - NS:develop-misp-app and NS:lp-develop-misp-app both have several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned while trying to fetch something from somewhere (URL isn't recorded in the stack trace). - NS:minishift-misp-app has most of its pods/jobs stuck in ImagePullBackoff. No detail there in the event stream so I'll see if I can dig deeper. - NS:aam-ci-cd has Jenkins trying to spin up three workers, those are coming back as unschedulable. I can't see into NS:aam-bases or NS:dsop-images b/c of perm limits. I see no DAS-related project(s). The MISP stuff needs debugging before calling "blocked" since it looks like an internal error from this perspective. In re: AAM Jenkins: If this deployment is coming out of the OCP storefront, then maybe it should be ephemeral rather than persistent. If it's a custom deployment, then it probably needs a rethink. I'm also not sure why there are two MISP dev projects. -- T On 12/2/19, 12:46, "Kevin O'Donnell" wrote: Russell, Thank you for the information. We can switch out the instance type for the worker nodes. How much memory is required by the apps? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C wrote: Kevin, The lack of resources on u-p.io cluster is hindering development, testing, and integration of the apps from CCAT AAM DAS, which is putting one of our PI goals at risk. We are blocked by the fact that we (CCAT and AAM) cannot deploy additional pods to the unified-platform.io cluster. We have a subset of containers deployed, but rolling deployments and new deployments fail. This means that we are not able to execute integration testing or peer reviews. We are temporarily working around by NOT testing/reviewing our code changes live, something that no one likes. Also, we are now running weeks-old instances of our containers, so we are very likely producing some technical debt. We currently have developers approaching idle or doing non-priority work until the resource issue is resolved. Here is the particular error from the OSP cluster I received while attempting a redeploy of one of our apps. 0/9 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node selector.11 times in the last minute Since we do not have any cluster permissions, I cannot verify which resource is running out, but from experience, I assess it is a memory issue. It appears the cluster has been provisioned with a silly allocation of node types. Without knowing exactly what was deployed, it appears only 3 of the 9 hosts are suitable worker nodes. We would expect the cluster to respond to resource limitations and scale, but if a scheduled downtime is required, please work with us so we can anticipate. As it stands, the cluster does not support resources required by CCAT and the other dev teams (AAM, DAS, etc.). We would accept any downtime if it will improve the situation, as we are blocked from progressing under the current constraints. My hope was we could get the cluster redeployed over the TG holiday to eliminate developer impact, but as Mark pointed out, there were limited support folks available. Now I am just trying to minimize the losses. V/R, Russell C Kendall ________________________________________ From: Kevin O'Donnell Sent: Monday, December 2, 2019 11:52 AM To: Kendall, Russell C Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors Hello Russell, Can you elaborate on the term Blocked? What specific issues are the blockers? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C wrote: Mark, Thank for acknowledging, please be aware the San Antonio dev teams working in unified-platform.io are currently blocked. V/R, Russell C Kendall ________________________________________ From: Mark Nissley Sent: Monday, December 2, 2019 9:36 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors As noted, I don't suspect much got done on this over the holiday weekend. I did see the ticket, as dropped some details into it. I also assigned it to @Jonathan Rickard and @Chris Kuperstein . It looks like short term solutions have been easy but the issue is recurring. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 Scheduled Training: October 14-18 On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP wrote: Mark/Kevin, I just heard at the team stand up that we are still blocked. This is also affecting the AAM team from my investigations. Please let me know if there is something we need to do to move this forward. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Sent: Wednesday, November 27, 2019 12:58 PM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP ; Mark Nissley ; Kevin O'Donnell ; Brenna Gordon Cc: Kendall, Russell C ; Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Miller, Timothy J. ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors Thanks a lot Capt Bryan! Russell created the ticket on GitLab UP Node Project. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP Sent: Wednesday, November 27, 2019 12:56 PM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP ; Mark Nissley ; Kevin O'Donnell ; Brenna Gordon Cc: Kendall, Russell C ; Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Miller, Timothy J. ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: RE: Unified Platform Pod Deploy Errors Thanks Ade. The team is thin until next week due to the holidays but I will make sure it is addressed. Were there any issues submitted to Gitlab?s UP Node Project on DCCSCR? @Mark/Kevin ? can we address? -Austen From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Sent: Wednesday, November 27, 2019 9:51 AM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP Cc: Kendall, Russell C ; Bubb, Mike (mbubb at mitre.org) Subject: Fw: Unified Platform Pod Deploy Errors Capt Bryan, Please see the explanation on the issue that Ginyu Force is currently experiencing below. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: Kendall, Russell C Sent: Wednesday, November 27, 2019 9:46 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP ; Buffaloe, Christopher ; Molina, Toby ; Crace, Jared E ; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP Cc: tmiller at mitre.org Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy Errors Gentlemen, The application development teams working in the new GovCloud OCP environment (unified-platform.io ) are currently blocked in efforts to deploy new pods for testing, development, and UAT. Red Hat and RogueOne SMEs have been notified and have attempted some fixes starting on Monday 11/25, but at this point have not been able to provision resources sufficient to host CCAT and AAM. We have taken steps to minimize our footprint (eliminating demonstration environment, deleting developer namespaces), but this is not a sustainable approach, and has only resulted in moderate improvements in cluster performance. Our hope is the U-P.io cluster compute resources can be increased very soon, so that we may resume normal development activities. Our understanding is that such a scaling requires a complete redeployment of the cluster, which is unusual, but an acceptable loss to productivity. If the cluster can be scaled up over the Thanksgiving holiday, the impact will be minimal to developers and cluster administrators, alike. We are currently collaborating on solutions on the following MatterMost channel behind the space camp VPN (link below), and via the email thread forwarded (further below). https://chat.spacecamp.ninja/levelup/channels/unified-platform-node Please keep me posted on developments and I will coordinate developer activities with any scheduled platform outages. V/R, Russell C Kendall ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 2:47 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Sounds great. Appreciate it. I'll watch email and Mattermost in case you need more from us. -Daniel ________________________________________ From: Jonathan Rickard Sent: Monday, November 25, 2019 2:44 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Thanks Daniel - I'll continue to look into the resource issue that you're seeing - I'd like to identify the root cause and then work with the team to come up with a solution. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M wrote: Yeah we hit the limit then had AAM kill some of their projects and then our pods got scheduled. We've hit the limit again though. Here's an example pod that cannot be scheduled https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth They're seeing it when their jenkins slaves can't deploy but it's basically any pod after we hit some limit. -Daniel ________________________________________ From: Jonathan Rickard Sent: Monday, November 25, 2019 1:26 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Daniel, I can see that you have 3 mongo pods, 1 chatup and 1 upbot pod running ... is your app good to go? Looks like there was an issue with memory on 1 pod, then some node selector being mismatched - just what i could see in the events... Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M wrote: Also, AAM was having similar issues. Looks like they had a lot of namespaces and scaling down the pods on their deployments didn't help but actually deleting the namespaces did. We have pods scheduling now but I'm adding them and we'd still like to work through what resource limit we were hitting to avoid this in the future. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:25 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors Thanks, sir. Most important for us to get working is "ccat-demo" but it's also happening in "ccat-dev" and "ccat-ci-cd". -Daniel ________________________________________ From: Jonathan Rickard Sent: Monday, November 25, 2019 12:22 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com ; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors What's the name of the project you're working in? I'm going to be back at my laptop in about 30 and will take a look when I get there. Is it just the Jenkins pods failing? On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M wrote: Adding Dean and Alex. Also, sitting in mattermost if anyone needs to get online and chat for more information. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:07 PM To: jonny at redhat.com ; ckuperst at redhat.com ; Mark Nissley Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Re: Unified Platform Pod Deploy Errors Adding Kupe and Mark. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 11:43 AM To: jonny at redhat.com Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Unified Platform Pod Deploy Errors Hey Jonny, We met briefly at SpaceCAMP a couple weeks ago when cluster.unified-platform.io was stood up. We've been trying to deploy some apps today and so far today we're getting errors on most (if not all) of our pods. 0/9 nodes are available: 3 Insufficient pods, 6 node(s) didn't match node selector. Is what we're seeing. We were thinking it was some volume types weren't correct but some of our pods don't even have volumes attached and still give us this error (i.e. Jenkins slaves or web frontends without persistent storage). Any idea what this could be? We're not running out of space on the nodes themselves are we? We have a demo scheduled for tomorrow at 9:30 AM CST and are hoping to get a demo env up for them today but this error came up unexpectedly. Also, we're here at 500 Navarro St. in San Antonio working through this in person is better/easier. Thanks, Daniel Curran ________________________________________ This e-mail and any attachments are intended only for the use of the addressee(s) named herein and may contain proprietary information. If you are not the intended recipient of this e-mail or believe that you received this email in error, please take immediate action to notify the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments from your computer; and do not disseminate, distribute, use, or copy this message and any attachments. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From tmiller at mitre.org Wed Dec 4 15:28:30 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Wed, 4 Dec 2019 15:28:30 +0000 Subject: [Platformone] [EXT] Re: Riddle me this, Batman (odd things in up-prod) In-Reply-To: <98b636fa65e74df9a32cca3a37381c1c@XCGVAG21.northgrum.com> References: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> <7098_1575412443_5DE6E2DB_7098_1003_1_CAPeU9HY-y6dX2ZqyDe_xk=xs9CxuAGcifQyT8nSgxo6wWZ0XPQ@mail.gmail.com> <98b636fa65e74df9a32cca3a37381c1c@XCGVAG21.northgrum.com> Message-ID: That's great, once a VPN into the VPC is active. In the meantime...? -- T ?On 12/4/19, 09:07, "Nunez, Carlos A [US] (MS) (Contr)" wrote: Bastion cannot be in a private subnet. That is the EC2 that admins/devs will connect to at port 22. If it is in a private subnet it has no way to be public facing and cannot have a public IP assigned to it. V/R Adrian Nu?ez Senior Cloud Architect NG Email: carlos.nunez2 at ngc.com Bylight email: carlos.nunez at bylight.com Cell: (571)230-5289 -----Original Message----- From: Miller, Timothy J. Sent: Wednesday, December 4, 2019 8:54 AM To: Dean Lystra ; Kevin O'Donnell Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) ; Mathew Huston ; Nunez, Carlos A [US] (MS) (Contr) Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman (odd things in up-prod) Is that one up-prod-bastion? I'm putting an issue against platform-infrastructure. The bastion is broken in a couple ways: - inbound SG rule defaults to `{{ cidr }}` address space, which resolves out to the VPC addresses - it's in the private subnet (probably doesn't matter, but helps humans keep things straight) - no public IP. -- T On 12/3/19, 16:34, "Dean Lystra" wrote: One bastion host was created for the sole purpose of allowing access to the IdM CLI. This was done as a quick fix to get the users created and for administrative purposes. Access to IdM via web console or CLI is not available from the internet. onetime is a mystery to me. On Tue, Dec 3, 2019, 2:15 PM Kevin O'Donnell wrote: Bastion creation is iac, and the other ec2 that?s running in prod is for acas and was created to scan and will be shutdown after the scans are done On Tue, Dec 3, 2019 at 3:34 PM Miller, Timothy J. wrote: - There are three bastion hosts (up-prod-bastion, up-prod-ocp-bastion, and "onetime"). Of these, I can find only up-prod-ocp-bastion in the IaC definition. Both up-prod-bastion and "onetime" look like they were built separately ("onetime" is baselined on CentOS--which is a giveaway--and up-prod-bastion is attached to the `bastion-ssh` security group--which AFAICT is also not part of the IaC). I recall someone (Dean?) telling me that there's no BH in the IaC, but that's not true (see consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). - up-prod-openscap and up-prod-sso-server have a public IP but its inbound rules permit only traffic from the VPC subnets (10.40.0.0/16 ) and the up-ss-vpc gitlab-ci-runner instance. - up-prod-openscap is attached to the up-prod-ocp-nodes SG, which is doesn't seem right. That opens a bunch of ports that probably don't matter to a scan host. - up-prod-sso-server has a public IP it doesn't need since traffic is handled by up-prod-sso-elb. FWIW, public IPs are assigned to up-prod-bastion, up-prod-openscap, up-prod-satellite, up-prod-sso-server, and "onetime". The bastion host and openscap kinda make sense, though you can jump to openscap from the BH. Damnfino what "onetime" is supposed to be. I'm not sure which of these or all of 'em should be turned into issues. Comments? -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From Carlos.Nunez2 at ngc.com Wed Dec 4 15:33:14 2019 From: Carlos.Nunez2 at ngc.com (Nunez, Carlos A [US] (MS) (Contr)) Date: Wed, 4 Dec 2019 15:33:14 +0000 Subject: [Platformone] [EXT] Re: Riddle me this, Batman (odd things in up-prod) In-Reply-To: References: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> <7098_1575412443_5DE6E2DB_7098_1003_1_CAPeU9HY-y6dX2ZqyDe_xk=xs9CxuAGcifQyT8nSgxo6wWZ0XPQ@mail.gmail.com> <98b636fa65e74df9a32cca3a37381c1c@XCGVAG21.northgrum.com> Message-ID: <682371889abb42abaff12c59764e6798@XCGVAG21.northgrum.com> AWS will not allow a public IP in a private subnet. No cloud provider does. The NAT is what translates Private Subnet resources to be connected to Public Subnet resources. V/R Adrian -----Original Message----- From: Miller, Timothy J. Sent: Wednesday, December 4, 2019 10:29 AM To: Nunez, Carlos A [US] (MS) (Contr) ; Dean Lystra ; Kevin O'Donnell Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) ; Mathew Huston Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman (odd things in up-prod) That's great, once a VPN into the VPC is active. In the meantime...? -- T ?On 12/4/19, 09:07, "Nunez, Carlos A [US] (MS) (Contr)" wrote: Bastion cannot be in a private subnet. That is the EC2 that admins/devs will connect to at port 22. If it is in a private subnet it has no way to be public facing and cannot have a public IP assigned to it. V/R Adrian Nu?ez Senior Cloud Architect NG Email: carlos.nunez2 at ngc.com Bylight email: carlos.nunez at bylight.com Cell: (571)230-5289 -----Original Message----- From: Miller, Timothy J. Sent: Wednesday, December 4, 2019 8:54 AM To: Dean Lystra ; Kevin O'Donnell Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) ; Mathew Huston ; Nunez, Carlos A [US] (MS) (Contr) Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman (odd things in up-prod) Is that one up-prod-bastion? I'm putting an issue against platform-infrastructure. The bastion is broken in a couple ways: - inbound SG rule defaults to `{{ cidr }}` address space, which resolves out to the VPC addresses - it's in the private subnet (probably doesn't matter, but helps humans keep things straight) - no public IP. -- T On 12/3/19, 16:34, "Dean Lystra" wrote: One bastion host was created for the sole purpose of allowing access to the IdM CLI. This was done as a quick fix to get the users created and for administrative purposes. Access to IdM via web console or CLI is not available from the internet. onetime is a mystery to me. On Tue, Dec 3, 2019, 2:15 PM Kevin O'Donnell wrote: Bastion creation is iac, and the other ec2 that?s running in prod is for acas and was created to scan and will be shutdown after the scans are done On Tue, Dec 3, 2019 at 3:34 PM Miller, Timothy J. wrote: - There are three bastion hosts (up-prod-bastion, up-prod-ocp-bastion, and "onetime"). Of these, I can find only up-prod-ocp-bastion in the IaC definition. Both up-prod-bastion and "onetime" look like they were built separately ("onetime" is baselined on CentOS--which is a giveaway--and up-prod-bastion is attached to the `bastion-ssh` security group--which AFAICT is also not part of the IaC). I recall someone (Dean?) telling me that there's no BH in the IaC, but that's not true (see consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). - up-prod-openscap and up-prod-sso-server have a public IP but its inbound rules permit only traffic from the VPC subnets (10.40.0.0/16 ) and the up-ss-vpc gitlab-ci-runner instance. - up-prod-openscap is attached to the up-prod-ocp-nodes SG, which is doesn't seem right. That opens a bunch of ports that probably don't matter to a scan host. - up-prod-sso-server has a public IP it doesn't need since traffic is handled by up-prod-sso-elb. FWIW, public IPs are assigned to up-prod-bastion, up-prod-openscap, up-prod-satellite, up-prod-sso-server, and "onetime". The bastion host and openscap kinda make sense, though you can jump to openscap from the BH. Damnfino what "onetime" is supposed to be. I'm not sure which of these or all of 'em should be turned into issues. Comments? -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From Eric.Blade at ngc.com Wed Dec 4 15:37:28 2019 From: Eric.Blade at ngc.com (Blade, Eric D [US] (MS)) Date: Wed, 4 Dec 2019 15:37:28 +0000 Subject: [Platformone] [EXT] Re: Riddle me this, Batman (odd things in up-prod) In-Reply-To: <71b7c7b0221a45189020af37a02537c2@XCGVAG22.northgrum.com> References: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> <7098_1575412443_5DE6E2DB_7098_1003_1_CAPeU9HY-y6dX2ZqyDe_xk=xs9CxuAGcifQyT8nSgxo6wWZ0XPQ@mail.gmail.com> <71b7c7b0221a45189020af37a02537c2@XCGVAG22.northgrum.com> Message-ID: The Onetime can be destroyed if Colleen is done testing with it. Eric -----Original Message----- From: Feiglstok, Colleen M [US] (MS) Sent: Wednesday, December 04, 2019 10:05 AM To: Miller, Timothy J. ; Dean Lystra ; Kevin O'Donnell Cc: platformONE at redhat.com; Mathew Huston ; Nunez, Carlos A [US] (MS) (Contr) ; Blade, Eric D [US] (MS) Subject: RE: [EXT] Re: [Platformone] Riddle me this, Batman (odd things in up-prod) The onetime was stood up for security testing. Eric Blade is in the process of creating an AMI that we will use in the future that will have all security tools in one place. It is not part of the IAC. -----Original Message----- From: Miller, Timothy J. Sent: Wednesday, December 4, 2019 8:54 AM To: Dean Lystra ; Kevin O'Donnell Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) ; Mathew Huston ; Nunez, Carlos A [US] (MS) (Contr) Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman (odd things in up-prod) Is that one up-prod-bastion? I'm putting an issue against platform-infrastructure. The bastion is broken in a couple ways: - inbound SG rule defaults to `{{ cidr }}` address space, which resolves out to the VPC addresses - it's in the private subnet (probably doesn't matter, but helps humans keep things straight) - no public IP. -- T ?On 12/3/19, 16:34, "Dean Lystra" wrote: One bastion host was created for the sole purpose of allowing access to the IdM CLI. This was done as a quick fix to get the users created and for administrative purposes. Access to IdM via web console or CLI is not available from the internet. onetime is a mystery to me. On Tue, Dec 3, 2019, 2:15 PM Kevin O'Donnell wrote: Bastion creation is iac, and the other ec2 that?s running in prod is for acas and was created to scan and will be shutdown after the scans are done On Tue, Dec 3, 2019 at 3:34 PM Miller, Timothy J. wrote: - There are three bastion hosts (up-prod-bastion, up-prod-ocp-bastion, and "onetime"). Of these, I can find only up-prod-ocp-bastion in the IaC definition. Both up-prod-bastion and "onetime" look like they were built separately ("onetime" is baselined on CentOS--which is a giveaway--and up-prod-bastion is attached to the `bastion-ssh` security group--which AFAICT is also not part of the IaC). I recall someone (Dean?) telling me that there's no BH in the IaC, but that's not true (see consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). - up-prod-openscap and up-prod-sso-server have a public IP but its inbound rules permit only traffic from the VPC subnets (10.40.0.0/16 ) and the up-ss-vpc gitlab-ci-runner instance. - up-prod-openscap is attached to the up-prod-ocp-nodes SG, which is doesn't seem right. That opens a bunch of ports that probably don't matter to a scan host. - up-prod-sso-server has a public IP it doesn't need since traffic is handled by up-prod-sso-elb. FWIW, public IPs are assigned to up-prod-bastion, up-prod-openscap, up-prod-satellite, up-prod-sso-server, and "onetime". The bastion host and openscap kinda make sense, though you can jump to openscap from the BH. Damnfino what "onetime" is supposed to be. I'm not sure which of these or all of 'em should be turned into issues. Comments? -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From dlystra at redhat.com Wed Dec 4 16:09:15 2019 From: dlystra at redhat.com (Dean Lystra) Date: Wed, 4 Dec 2019 08:09:15 -0800 Subject: [Platformone] [EXT] Re: Riddle me this, Batman (odd things in up-prod) In-Reply-To: References: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> <7098_1575412443_5DE6E2DB_7098_1003_1_CAPeU9HY-y6dX2ZqyDe_xk=xs9CxuAGcifQyT8nSgxo6wWZ0XPQ@mail.gmail.com> <71b7c7b0221a45189020af37a02537c2@XCGVAG22.northgrum.com> Message-ID: The OCP bastion instance is used as part of the OCP installation process. This is where the openshift installation playbooks are run. It is not meant to be a jump box. The prod bastion instance was configured with an EIP in the public subnet for external access to the IdM CLI, but was manually deployed. On Wed, Dec 4, 2019, 7:37 AM Blade, Eric D [US] (MS) wrote: > The Onetime can be destroyed if Colleen is done testing with it. > > Eric > > -----Original Message----- > From: Feiglstok, Colleen M [US] (MS) > Sent: Wednesday, December 04, 2019 10:05 AM > To: Miller, Timothy J. ; Dean Lystra < > dlystra at redhat.com>; Kevin O'Donnell > Cc: platformONE at redhat.com; Mathew Huston ; Nunez, > Carlos A [US] (MS) (Contr) ; Blade, Eric D [US] > (MS) > Subject: RE: [EXT] Re: [Platformone] Riddle me this, Batman (odd things in > up-prod) > > The onetime was stood up for security testing. Eric Blade is in the > process of creating an AMI that we will use in the future that will have > all security tools in one place. It is not part of the IAC. > > -----Original Message----- > From: Miller, Timothy J. > Sent: Wednesday, December 4, 2019 8:54 AM > To: Dean Lystra ; Kevin O'Donnell > > Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) < > Colleen.Feiglstok at ngc.com>; Mathew Huston ; Nunez, > Carlos A [US] (MS) (Contr) > Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman (odd > things in up-prod) > > Is that one up-prod-bastion? > > I'm putting an issue against platform-infrastructure. The bastion is > broken in a couple ways: > > - inbound SG rule defaults to `{{ cidr }}` address space, which resolves > out to the VPC addresses > - it's in the private subnet (probably doesn't matter, but helps humans > keep things straight) > - no public IP. > > -- T > > ?On 12/3/19, 16:34, "Dean Lystra" wrote: > > One bastion host was created for the sole purpose of allowing access > to the IdM CLI. This was done as a quick fix to get the users created and > for administrative purposes. Access to IdM via web console or CLI is not > available from the internet. > onetime is a mystery to me. > > On Tue, Dec 3, 2019, 2:15 PM Kevin O'Donnell > wrote: > > > Bastion creation is iac, and the other ec2 that?s running in prod is > for acas and was created to scan and will be shutdown after the scans are > done > > > > > > > On Tue, Dec 3, 2019 at 3:34 PM Miller, Timothy J. > wrote: > > > - There are three bastion hosts (up-prod-bastion, up-prod-ocp-bastion, > and "onetime"). Of these, I can find only up-prod-ocp-bastion in the IaC > definition. Both up-prod-bastion and "onetime" look like they were built > separately ("onetime" is baselined on > CentOS--which is a giveaway--and up-prod-bastion is attached to the > `bastion-ssh` security group--which AFAICT is also not part of the IaC). > > I recall someone (Dean?) telling me that there's no BH in the IaC, but > that's not true (see > consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). > > - up-prod-openscap and up-prod-sso-server have a public IP but its > inbound rules permit only traffic from the VPC subnets (10.40.0.0/16 < > http://10.40.0.0/16>) and the up-ss-vpc gitlab-ci-runner instance. > > - up-prod-openscap is attached to the up-prod-ocp-nodes SG, which is > doesn't seem right. That opens a bunch of ports that probably don't matter > to a scan host. > > - up-prod-sso-server has a public IP it doesn't need since traffic is > handled by up-prod-sso-elb. > > FWIW, public IPs are assigned to up-prod-bastion, up-prod-openscap, > up-prod-satellite, up-prod-sso-server, and "onetime". The bastion host and > openscap kinda make sense, though you can jump to openscap from the BH. > > Damnfino what "onetime" is supposed to be. > > I'm not sure which of these or all of 'em should be turned into > issues. Comments? > > -- T > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > > > > > -- > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmiller at mitre.org Wed Dec 4 16:20:14 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Wed, 4 Dec 2019 16:20:14 +0000 Subject: [Platformone] [EXT] Re: Riddle me this, Batman (odd things in up-prod) In-Reply-To: <682371889abb42abaff12c59764e6798@XCGVAG21.northgrum.com> References: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> <7098_1575412443_5DE6E2DB_7098_1003_1_CAPeU9HY-y6dX2ZqyDe_xk=xs9CxuAGcifQyT8nSgxo6wWZ0XPQ@mail.gmail.com> <98b636fa65e74df9a32cca3a37381c1c@XCGVAG21.northgrum.com> <682371889abb42abaff12c59764e6798@XCGVAG21.northgrum.com> Message-ID: <6C0974A0-0436-4D05-9F6A-8FB5B21FECF8@mitre.org> I wouldn't expect a public IP into a private subnet, and I'm wouldn't be comfy with port forwarding into it either. I sense a broken concept of operations here. As currently defined in IaC, ocp-bastion is useless or redundant: - It's in the private subnet and unreachable from any admin console by the current SG anyway. A VPN is required to reach it. - OTOH, if we have a VPN, we can whitelist admin consoles via the VPN in the default SG. This would save a little $. - In C1, ocp-bastion is duplicative with the C1 provided bastion. We can whitelist the C1 bastion in the default SG and again save some $. -- T ?On 12/4/19, 09:33, "Nunez, Carlos A [US] (MS) (Contr)" wrote: AWS will not allow a public IP in a private subnet. No cloud provider does. The NAT is what translates Private Subnet resources to be connected to Public Subnet resources. V/R Adrian -----Original Message----- From: Miller, Timothy J. Sent: Wednesday, December 4, 2019 10:29 AM To: Nunez, Carlos A [US] (MS) (Contr) ; Dean Lystra ; Kevin O'Donnell Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) ; Mathew Huston Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman (odd things in up-prod) That's great, once a VPN into the VPC is active. In the meantime...? -- T On 12/4/19, 09:07, "Nunez, Carlos A [US] (MS) (Contr)" wrote: Bastion cannot be in a private subnet. That is the EC2 that admins/devs will connect to at port 22. If it is in a private subnet it has no way to be public facing and cannot have a public IP assigned to it. V/R Adrian Nu?ez Senior Cloud Architect NG Email: carlos.nunez2 at ngc.com Bylight email: carlos.nunez at bylight.com Cell: (571)230-5289 -----Original Message----- From: Miller, Timothy J. Sent: Wednesday, December 4, 2019 8:54 AM To: Dean Lystra ; Kevin O'Donnell Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) ; Mathew Huston ; Nunez, Carlos A [US] (MS) (Contr) Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman (odd things in up-prod) Is that one up-prod-bastion? I'm putting an issue against platform-infrastructure. The bastion is broken in a couple ways: - inbound SG rule defaults to `{{ cidr }}` address space, which resolves out to the VPC addresses - it's in the private subnet (probably doesn't matter, but helps humans keep things straight) - no public IP. -- T On 12/3/19, 16:34, "Dean Lystra" wrote: One bastion host was created for the sole purpose of allowing access to the IdM CLI. This was done as a quick fix to get the users created and for administrative purposes. Access to IdM via web console or CLI is not available from the internet. onetime is a mystery to me. On Tue, Dec 3, 2019, 2:15 PM Kevin O'Donnell wrote: Bastion creation is iac, and the other ec2 that?s running in prod is for acas and was created to scan and will be shutdown after the scans are done On Tue, Dec 3, 2019 at 3:34 PM Miller, Timothy J. wrote: - There are three bastion hosts (up-prod-bastion, up-prod-ocp-bastion, and "onetime"). Of these, I can find only up-prod-ocp-bastion in the IaC definition. Both up-prod-bastion and "onetime" look like they were built separately ("onetime" is baselined on CentOS--which is a giveaway--and up-prod-bastion is attached to the `bastion-ssh` security group--which AFAICT is also not part of the IaC). I recall someone (Dean?) telling me that there's no BH in the IaC, but that's not true (see consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). - up-prod-openscap and up-prod-sso-server have a public IP but its inbound rules permit only traffic from the VPC subnets (10.40.0.0/16 ) and the up-ss-vpc gitlab-ci-runner instance. - up-prod-openscap is attached to the up-prod-ocp-nodes SG, which is doesn't seem right. That opens a bunch of ports that probably don't matter to a scan host. - up-prod-sso-server has a public IP it doesn't need since traffic is handled by up-prod-sso-elb. FWIW, public IPs are assigned to up-prod-bastion, up-prod-openscap, up-prod-satellite, up-prod-sso-server, and "onetime". The bastion host and openscap kinda make sense, though you can jump to openscap from the BH. Damnfino what "onetime" is supposed to be. I'm not sure which of these or all of 'em should be turned into issues. Comments? -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From dlystra at redhat.com Wed Dec 4 16:29:18 2019 From: dlystra at redhat.com (Dean Lystra) Date: Wed, 4 Dec 2019 08:29:18 -0800 Subject: [Platformone] [EXT] Re: Riddle me this, Batman (odd things in up-prod) In-Reply-To: <6C0974A0-0436-4D05-9F6A-8FB5B21FECF8@mitre.org> References: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> <7098_1575412443_5DE6E2DB_7098_1003_1_CAPeU9HY-y6dX2ZqyDe_xk=xs9CxuAGcifQyT8nSgxo6wWZ0XPQ@mail.gmail.com> <98b636fa65e74df9a32cca3a37381c1c@XCGVAG21.northgrum.com> <682371889abb42abaff12c59764e6798@XCGVAG21.northgrum.com> <6C0974A0-0436-4D05-9F6A-8FB5B21FECF8@mitre.org> Message-ID: IMO a VPN solution would be the most preferred option. This would be up to the decision makers to go this route. We used to have OpenVPN in our VPCs, but I am unsure why we no longer use it. On Wed, Dec 4, 2019, 8:21 AM Miller, Timothy J. wrote: > I wouldn't expect a public IP into a private subnet, and I'm wouldn't be > comfy with port forwarding into it either. > > I sense a broken concept of operations here. As currently defined in IaC, > ocp-bastion is useless or redundant: > > - It's in the private subnet and unreachable from any admin console by the > current SG anyway. A VPN is required to reach it. > > - OTOH, if we have a VPN, we can whitelist admin consoles via the VPN in > the default SG. This would save a little $. > > - In C1, ocp-bastion is duplicative with the C1 provided bastion. We can > whitelist the C1 bastion in the default SG and again save some $. > > -- T > > ?On 12/4/19, 09:33, "Nunez, Carlos A [US] (MS) (Contr)" < > Carlos.Nunez2 at ngc.com> wrote: > > AWS will not allow a public IP in a private subnet. No cloud provider > does. The NAT is what translates Private Subnet resources to be connected > to Public Subnet resources. > > V/R > Adrian > > -----Original Message----- > From: Miller, Timothy J. > Sent: Wednesday, December 4, 2019 10:29 AM > To: Nunez, Carlos A [US] (MS) (Contr) ; Dean > Lystra ; Kevin O'Donnell > Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) < > Colleen.Feiglstok at ngc.com>; Mathew Huston > Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman (odd > things in up-prod) > > That's great, once a VPN into the VPC is active. > > In the meantime...? > > -- T > > On 12/4/19, 09:07, "Nunez, Carlos A [US] (MS) (Contr)" < > Carlos.Nunez2 at ngc.com> wrote: > > Bastion cannot be in a private subnet. That is the EC2 that > admins/devs will connect to at port 22. If it is in a private subnet it > has no way to be public facing and cannot have a public IP assigned to it. > > V/R > Adrian Nu?ez > Senior Cloud Architect > NG Email: carlos.nunez2 at ngc.com > Bylight email: carlos.nunez at bylight.com > Cell: (571)230-5289 > > > > -----Original Message----- > From: Miller, Timothy J. > Sent: Wednesday, December 4, 2019 8:54 AM > To: Dean Lystra ; Kevin O'Donnell < > kodonnel at redhat.com> > Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) < > Colleen.Feiglstok at ngc.com>; Mathew Huston ; Nunez, > Carlos A [US] (MS) (Contr) > Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman > (odd things in up-prod) > > Is that one up-prod-bastion? > > I'm putting an issue against platform-infrastructure. The bastion > is broken in a couple ways: > > - inbound SG rule defaults to `{{ cidr }}` address space, which > resolves out to the VPC addresses > - it's in the private subnet (probably doesn't matter, but helps > humans keep things straight) > - no public IP. > > -- T > > On 12/3/19, 16:34, "Dean Lystra" wrote: > > One bastion host was created for the sole purpose of allowing > access to the IdM CLI. This was done as a quick fix to get the users > created and for administrative purposes. Access to IdM via web console or > CLI is not available from the internet. > onetime is a mystery to me. > > On Tue, Dec 3, 2019, 2:15 PM Kevin O'Donnell < > kodonnel at redhat.com> wrote: > > > Bastion creation is iac, and the other ec2 that?s running in > prod is for acas and was created to scan and will be shutdown after the > scans are done > > > > > > > On Tue, Dec 3, 2019 at 3:34 PM Miller, Timothy J. < > tmiller at mitre.org> wrote: > > > - There are three bastion hosts (up-prod-bastion, > up-prod-ocp-bastion, and "onetime"). Of these, I can find only > up-prod-ocp-bastion in the IaC definition. Both up-prod-bastion and > "onetime" look like they were built separately ("onetime" is baselined on > CentOS--which is a giveaway--and up-prod-bastion is attached > to the `bastion-ssh` security group--which AFAICT is also not part of the > IaC). > > I recall someone (Dean?) telling me that there's no BH in the > IaC, but that's not true (see > consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). > > - up-prod-openscap and up-prod-sso-server have a public IP but > its inbound rules permit only traffic from the VPC subnets (10.40.0.0/16 < > http://10.40.0.0/16>) and the up-ss-vpc gitlab-ci-runner instance. > > - up-prod-openscap is attached to the up-prod-ocp-nodes SG, > which is doesn't seem right. That opens a bunch of ports that probably > don't matter to a scan host. > > - up-prod-sso-server has a public IP it doesn't need since > traffic is handled by up-prod-sso-elb. > > FWIW, public IPs are assigned to up-prod-bastion, > up-prod-openscap, up-prod-satellite, up-prod-sso-server, and "onetime". > The bastion host and openscap kinda make sense, though you can jump to > openscap from the BH. > > Damnfino what "onetime" is supposed to be. > > I'm not sure which of these or all of 'em should be turned > into issues. Comments? > > -- T > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > > > > > -- > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting < > https://www.redhat.com/> > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jay.Pascal at diligent-us.com Wed Dec 4 16:36:51 2019 From: Jay.Pascal at diligent-us.com (Pascal, Jay) Date: Wed, 4 Dec 2019 16:36:51 +0000 Subject: [Platformone] [PossibleSpam] Re: Rogue One IATT Actions In-Reply-To: References: , Message-ID: <1575477411616.89460@diligent-us.com> I was able to get the scan results for Twistlock. I have also been able to log in to Anchore. However, when I attempt to analyze a repository or tag, Anchore does not appear able to locate it. I have tried various values. I am attempting to analyze the following image. docker-registry-default.apps.cluster.unified-platform.io/aam-ci-cd/develop-misp-app-web:latest I have tried docker-registry.default.svc as the registry and aam-ci-cd as the repository. I have tried both the analyze repository and analyze tag options. using docker-registry.default.svc seems to attempt to pull from the registry, but does not find the image. It may be the Anchore user does not have privileges to pull from the registry/repository. I don't know which user/service account Anchore attempts to use to pull from the registry in order to grant them privileges. v/r, Jay L Pascal Senior Systems Engineer DILIGENT Consulting Inc. (O) 210.826.9300 (C) 210.827.5323 A Service Disabled Veteran Owned Small Business CMMI-DEV Maturity Level 3 ISO 9001:2015 certified ________________________________ From: Keegan Reap Sent: Tuesday, December 3, 2019 9:33 AM To: Mark Nissley Cc: Jonathan Hultz; Bubb, Mike; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP; Pascal, Jay; platformONE at redhat.com Subject: [PossibleSpam] Re: [Platformone] Rogue One IATT Actions With the added people to the thread, I will go ahead and reiterate these points just in case, for full transparency: Hey Mark & all, As far as the first objective, Twistlock has been deployed to the environment and is ready to start scanning, link below. The Twistlock app is locked behind an admin account, so we will need a POC to share the admin account with. As far as Anchore goes, we have it deployed but it seems something is preventing it from coming up successfully, we are currently going to investigate. https://cluster.unified-platform.io/console/project/levelup-twistlock/overview https://levelup-twistlock.apps.cluster.unified-platform.io/ Thanks, Keegan Reap On Tue, Dec 3, 2019 at 9:29 AM Mark Nissley > wrote: Adding some of the UP Nodes Team to this thread. Mike, in a separate thread you noted that you were having trouble with Twistlock. Could you send a name and email address for someone on your team that we can grant access to? That may be you... Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 On Tue, Dec 3, 2019 at 10:25 AM Jonathan Hultz > wrote: Mark, Here is the results for the initial stig run against one of the UP Node ec2 instances. https://dccscr.dsop.io/levelup-automation/security/rhel7-stig/issues/1 There are several Cat 1 and 2s that are not implemented and the reasoning is in the ticket. Corey is currently working on the Sat role which will also need several stigs disabled to run correctly. We are currently waiting for Colleen to rescan the UP Prod host with the stigs applied. Cheers, Jon On Tue, Dec 3, 2019 at 10:07 AM Mark Nissley > wrote: I am on call with UP Node aka Rogue One. They are getting ready for IATT. Here is the actions that they asked of our team, due COB today: 1. They asked if we can utilize Anchore and/or Twistlock to scan their apps and provide a report. They will be glad to do it as well if we want to make the containers available, but they emphasized that the shortest course of action is the best. 2. A plan of action for all High and Critical items scan results from Colleen's scan (if hardening scripts will be needed, they must be delivered IATT, 20 December) As this is the highest urgency task on our list right now, we need to be able to assign these tasks to specific people and knock them out. The deadline is COB today on both items. Who can work with me to make these things happen? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- JONATHAN HULTZ, RHCSA SENIOR CONSULTANT Red Hat Remote US CA jhultz at redhat.com M: 609-713-9778 [https://www.redhat.com/files/brand/email/sig-redhat.png] _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone Confidentiality Notice: This e-mail may contain confidential and privileged material for the sole use of the intended recipient(s). Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient) please contact the sender by reply e-mail and delete all copies of this message. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhultz at redhat.com Wed Dec 4 16:43:23 2019 From: jhultz at redhat.com (Jonathan Hultz) Date: Wed, 4 Dec 2019 11:43:23 -0500 Subject: [Platformone] [EXT] Re: Riddle me this, Batman (odd things in up-prod) In-Reply-To: References: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> <7098_1575412443_5DE6E2DB_7098_1003_1_CAPeU9HY-y6dX2ZqyDe_xk=xs9CxuAGcifQyT8nSgxo6wWZ0XPQ@mail.gmail.com> <98b636fa65e74df9a32cca3a37381c1c@XCGVAG21.northgrum.com> <682371889abb42abaff12c59764e6798@XCGVAG21.northgrum.com> <6C0974A0-0436-4D05-9F6A-8FB5B21FECF8@mitre.org> Message-ID: Tim, The OCP_Bastion is necessary to deploy the OCP 3.11 cluster. It holds a specific version of ansible and inventory file that Openshift deploys from. The CI-Runner triggers the build but its fully automated and it should have zero hands on keyboard access. On Wed, Dec 4, 2019 at 11:29 AM Dean Lystra wrote: > IMO a VPN solution would be the most preferred option. This would be up to > the decision makers to go this route. We used to have OpenVPN in our VPCs, > but I am unsure why we no longer use it. > > On Wed, Dec 4, 2019, 8:21 AM Miller, Timothy J. wrote: > >> I wouldn't expect a public IP into a private subnet, and I'm wouldn't be >> comfy with port forwarding into it either. >> >> I sense a broken concept of operations here. As currently defined in >> IaC, ocp-bastion is useless or redundant: >> >> - It's in the private subnet and unreachable from any admin console by >> the current SG anyway. A VPN is required to reach it. >> >> - OTOH, if we have a VPN, we can whitelist admin consoles via the VPN in >> the default SG. This would save a little $. >> >> - In C1, ocp-bastion is duplicative with the C1 provided bastion. We can >> whitelist the C1 bastion in the default SG and again save some $. >> >> -- T >> >> ?On 12/4/19, 09:33, "Nunez, Carlos A [US] (MS) (Contr)" < >> Carlos.Nunez2 at ngc.com> wrote: >> >> AWS will not allow a public IP in a private subnet. No cloud provider >> does. The NAT is what translates Private Subnet resources to be connected >> to Public Subnet resources. >> >> V/R >> Adrian >> >> -----Original Message----- >> From: Miller, Timothy J. >> Sent: Wednesday, December 4, 2019 10:29 AM >> To: Nunez, Carlos A [US] (MS) (Contr) ; Dean >> Lystra ; Kevin O'Donnell >> Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) < >> Colleen.Feiglstok at ngc.com>; Mathew Huston >> Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman (odd >> things in up-prod) >> >> That's great, once a VPN into the VPC is active. >> >> In the meantime...? >> >> -- T >> >> On 12/4/19, 09:07, "Nunez, Carlos A [US] (MS) (Contr)" < >> Carlos.Nunez2 at ngc.com> wrote: >> >> Bastion cannot be in a private subnet. That is the EC2 that >> admins/devs will connect to at port 22. If it is in a private subnet it >> has no way to be public facing and cannot have a public IP assigned to it. >> >> V/R >> Adrian Nu?ez >> Senior Cloud Architect >> NG Email: carlos.nunez2 at ngc.com >> Bylight email: carlos.nunez at bylight.com >> Cell: (571)230-5289 >> >> >> >> -----Original Message----- >> From: Miller, Timothy J. >> Sent: Wednesday, December 4, 2019 8:54 AM >> To: Dean Lystra ; Kevin O'Donnell < >> kodonnel at redhat.com> >> Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) < >> Colleen.Feiglstok at ngc.com>; Mathew Huston ; Nunez, >> Carlos A [US] (MS) (Contr) >> Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman >> (odd things in up-prod) >> >> Is that one up-prod-bastion? >> >> I'm putting an issue against platform-infrastructure. The >> bastion is broken in a couple ways: >> >> - inbound SG rule defaults to `{{ cidr }}` address space, which >> resolves out to the VPC addresses >> - it's in the private subnet (probably doesn't matter, but helps >> humans keep things straight) >> - no public IP. >> >> -- T >> >> On 12/3/19, 16:34, "Dean Lystra" wrote: >> >> One bastion host was created for the sole purpose of allowing >> access to the IdM CLI. This was done as a quick fix to get the users >> created and for administrative purposes. Access to IdM via web console or >> CLI is not available from the internet. >> onetime is a mystery to me. >> >> On Tue, Dec 3, 2019, 2:15 PM Kevin O'Donnell < >> kodonnel at redhat.com> wrote: >> >> >> Bastion creation is iac, and the other ec2 that?s running in >> prod is for acas and was created to scan and will be shutdown after the >> scans are done >> >> >> >> >> >> >> On Tue, Dec 3, 2019 at 3:34 PM Miller, Timothy J. < >> tmiller at mitre.org> wrote: >> >> >> - There are three bastion hosts (up-prod-bastion, >> up-prod-ocp-bastion, and "onetime"). Of these, I can find only >> up-prod-ocp-bastion in the IaC definition. Both up-prod-bastion and >> "onetime" look like they were built separately ("onetime" is baselined on >> CentOS--which is a giveaway--and up-prod-bastion is attached >> to the `bastion-ssh` security group--which AFAICT is also not part of the >> IaC). >> >> I recall someone (Dean?) telling me that there's no BH in the >> IaC, but that's not true (see >> consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). >> >> - up-prod-openscap and up-prod-sso-server have a public IP >> but its inbound rules permit only traffic from the VPC subnets ( >> 10.40.0.0/16 ) and the up-ss-vpc gitlab-ci-runner >> instance. >> >> - up-prod-openscap is attached to the up-prod-ocp-nodes SG, >> which is doesn't seem right. That opens a bunch of ports that probably >> don't matter to a scan host. >> >> - up-prod-sso-server has a public IP it doesn't need since >> traffic is handled by up-prod-sso-elb. >> >> FWIW, public IPs are assigned to up-prod-bastion, >> up-prod-openscap, up-prod-satellite, up-prod-sso-server, and "onetime". >> The bastion host and openscap kinda make sense, though you can jump to >> openscap from the BH. >> >> Damnfino what "onetime" is supposed to be. >> >> I'm not sure which of these or all of 'em should be turned >> into issues. Comments? >> >> -- T >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> >> >> >> >> -- >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting < >> https://www.redhat.com/> >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> >> >> >> >> >> >> >> >> _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -- JONATHAN HULTZ, RHCSA SENIOR CONSULTANT Red Hat Remote US CA jhultz at redhat.com M: 609-713-9778 -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylor at redhat.com Wed Dec 4 16:56:32 2019 From: taylor at redhat.com (Taylor Biggs) Date: Wed, 4 Dec 2019 11:56:32 -0500 Subject: [Platformone] [PossibleSpam] Re: Rogue One IATT Actions In-Reply-To: <1575477411616.89460@diligent-us.com> References: <1575477411616.89460@diligent-us.com> Message-ID: Adding in Hayden for Anchore Support. Hayden - please see below. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Wed, Dec 4, 2019 at 11:37 AM Pascal, Jay wrote: > I was able to get the scan results for Twistlock. I have also been able > to log in to Anchore. However, when I attempt to analyze a repository or > tag, Anchore does not appear able to locate it. I have tried various > values. I am attempting to analyze the following image. > > > > docker-registry-default.apps.cluster.unified-platform.io/aam-ci-cd/develop-misp-app-web:latest > > > I have tried docker-registry.default.svc as the registry and aam-ci-cd as > the repository. I have tried both the analyze repository and analyze tag > options. using docker-registry.default.svc seems to attempt to pull from > the registry, but does not find the image. > > It may be the Anchore user does not have privileges to pull from the > registry/repository. I don't know which user/service account Anchore > attempts to use to pull from the registry in order to grant them privileges. > > v/r, > > Jay L Pascal > Senior Systems Engineer > *DILIGENT* Consulting Inc. > (O) 210.826.9300 > (C) 210.827.5323 > A Service Disabled Veteran Owned Small Business > CMMI-DEV Maturity Level 3 > ISO 9001:2015 certified > ------------------------------ > *From:* Keegan Reap > *Sent:* Tuesday, December 3, 2019 9:33 AM > *To:* Mark Nissley > *Cc:* Jonathan Hultz; Bubb, Mike; SANCHEZ, MARK GG-13 USAF AFMC > AFLCMC/HNCP; Pascal, Jay; platformONE at redhat.com > *Subject:* [PossibleSpam] Re: [Platformone] Rogue One IATT Actions > > With the added people to the thread, I will go ahead and reiterate these > points just in case, for full transparency: > > > Hey Mark & all, > > As far as the first objective, Twistlock has been deployed to the > environment and is ready to start scanning, link below. The Twistlock app > is locked behind an admin account, so we will need a POC to share the admin > account with. As far as Anchore goes, we have it deployed but it seems > something is preventing it from coming up successfully, we are currently > going to investigate. > > > https://cluster.unified-platform.io/console/project/levelup-twistlock/overview > https://levelup-twistlock.apps.cluster.unified-platform.io/ > > Thanks, > Keegan Reap > > > On Tue, Dec 3, 2019 at 9:29 AM Mark Nissley wrote: > >> Adding some of the UP Nodes Team to this thread. Mike, in a >> separate thread you noted that you were having trouble with Twistlock. >> Could you send a name and email address for someone on your team that we >> can grant access to? That may be you... >> >> >> Mark NISSLEY, PMP, CSM, LEAN >> >> PROGRAM MaNAGER & SR technical Project Manager >> >> North American Consulting, Public Sector >> >> >> M: 850-530-3234 >> >> >> >> *Scheduled Training: October 14-18* >> >> >> On Tue, Dec 3, 2019 at 10:25 AM Jonathan Hultz wrote: >> >>> Mark, >>> >>> Here is the results for the initial stig run against one of the >>> UP Node ec2 instances. >>> https://dccscr.dsop.io/levelup-automation/security/rhel7-stig/issues/1 >>> >>> There are several Cat 1 and 2s that are not implemented and the >>> reasoning is in the ticket. Corey is currently working on the Sat role >>> which will also need several stigs disabled to run correctly. >>> >>> We are currently waiting for Colleen to rescan the UP Prod host with the >>> stigs applied. >>> >>> Cheers, Jon >>> >>> On Tue, Dec 3, 2019 at 10:07 AM Mark Nissley >>> wrote: >>> >>>> I am on call with UP Node aka Rogue One. They are getting ready for >>>> IATT. Here is the actions that they asked of our team, due COB today: >>>> >>>> 1. They asked if we can utilize Anchore and/or Twistlock to scan >>>> their apps and provide a report. They will be glad to do it as well if we >>>> want to make the containers available, but they emphasized that the >>>> shortest course of action is the best. >>>> 2. A plan of action for all High and Critical items scan results >>>> from Colleen's scan (if hardening scripts will be needed, they must be >>>> delivered IATT, 20 December) >>>> >>>> As this is the highest urgency task on our list right now, we need to >>>> be able to assign these tasks to specific people and knock them out. *The >>>> deadline is COB today on both items*. Who can work with me to make >>>> these things happen? >>>> >>>> >>>> Mark NISSLEY, PMP, CSM, LEAN >>>> >>>> PROGRAM MaNAGER & SR technical Project Manager >>>> >>>> North American Consulting, Public Sector >>>> >>>> >>>> M: 850-530-3234 >>>> >>>> >>>> >>>> *Scheduled Training: October 14-18* >>>> _______________________________________________ >>>> platformONE mailing list >>>> platformONE at redhat.com >>>> https://www.redhat.com/mailman/listinfo/platformone >>>> >>> >>> >>> -- >>> >>> JONATHAN HULTZ, RHCSA >>> >>> SENIOR CONSULTANT >>> >>> Red Hat Remote US CA >>> >>> jhultz at redhat.com M: 609-713-9778 >>> >>> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > Confidentiality Notice: This e-mail may contain confidential and > privileged material for the sole use of the intended recipient(s). Any > review, use, distribution or disclosure by others is strictly prohibited. > If you are not the intended recipient (or authorized to receive for the > recipient) please contact the sender by reply e-mail and delete all copies > of this message. > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jay.Pascal at diligent-us.com Wed Dec 4 17:02:22 2019 From: Jay.Pascal at diligent-us.com (Pascal, Jay) Date: Wed, 4 Dec 2019 17:02:22 +0000 Subject: [Platformone] [EXTERNAL] Re: [PossibleSpam] Re: Rogue One IATT Actions In-Reply-To: References: <1575477411616.89460@diligent-us.com>, Message-ID: <1575478943027.20438@diligent-us.com> I think I have figured it out. I created an account (jpascal) in Anchore and granted privileges. I then created a user in that account. Logged in to Anchore with the new account. I created a registry entry and used my credentials. At first it was failing with username/password to log in to the registry. I then used the token as the password and the registry was created. After this I was able to initiate an image analysis. ? Jay L Pascal Senior Systems Engineer DILIGENT Consulting Inc. (O) 210.826.9300 (C) 210.827.5323 A Service Disabled Veteran Owned Small Business CMMI-DEV Maturity Level 3 ISO 9001:2015 certified ________________________________ From: Taylor Biggs Sent: Wednesday, December 4, 2019 10:56 AM To: Pascal, Jay Cc: Keegan Reap; Mark Nissley; Bubb, Mike; platformONE at redhat.com; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP; Hayden Smith Subject: [EXTERNAL] Re: [Platformone] [PossibleSpam] Re: Rogue One IATT Actions Adding in Hayden for Anchore Support. Hayden - please see below. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Wed, Dec 4, 2019 at 11:37 AM Pascal, Jay > wrote: I was able to get the scan results for Twistlock. I have also been able to log in to Anchore. However, when I attempt to analyze a repository or tag, Anchore does not appear able to locate it. I have tried various values. I am attempting to analyze the following image. docker-registry-default.apps.cluster.unified-platform.io/aam-ci-cd/develop-misp-app-web:latest I have tried docker-registry.default.svc as the registry and aam-ci-cd as the repository. I have tried both the analyze repository and analyze tag options. using docker-registry.default.svc seems to attempt to pull from the registry, but does not find the image. It may be the Anchore user does not have privileges to pull from the registry/repository. I don't know which user/service account Anchore attempts to use to pull from the registry in order to grant them privileges. v/r, Jay L Pascal Senior Systems Engineer DILIGENT Consulting Inc. (O) 210.826.9300 (C) 210.827.5323 A Service Disabled Veteran Owned Small Business CMMI-DEV Maturity Level 3 ISO 9001:2015 certified ________________________________ From: Keegan Reap > Sent: Tuesday, December 3, 2019 9:33 AM To: Mark Nissley Cc: Jonathan Hultz; Bubb, Mike; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP; Pascal, Jay; platformONE at redhat.com Subject: [PossibleSpam] Re: [Platformone] Rogue One IATT Actions With the added people to the thread, I will go ahead and reiterate these points just in case, for full transparency: Hey Mark & all, As far as the first objective, Twistlock has been deployed to the environment and is ready to start scanning, link below. The Twistlock app is locked behind an admin account, so we will need a POC to share the admin account with. As far as Anchore goes, we have it deployed but it seems something is preventing it from coming up successfully, we are currently going to investigate. https://cluster.unified-platform.io/console/project/levelup-twistlock/overview https://levelup-twistlock.apps.cluster.unified-platform.io/ Thanks, Keegan Reap On Tue, Dec 3, 2019 at 9:29 AM Mark Nissley > wrote: Adding some of the UP Nodes Team to this thread. Mike, in a separate thread you noted that you were having trouble with Twistlock. Could you send a name and email address for someone on your team that we can grant access to? That may be you... Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 On Tue, Dec 3, 2019 at 10:25 AM Jonathan Hultz > wrote: Mark, Here is the results for the initial stig run against one of the UP Node ec2 instances. https://dccscr.dsop.io/levelup-automation/security/rhel7-stig/issues/1 There are several Cat 1 and 2s that are not implemented and the reasoning is in the ticket. Corey is currently working on the Sat role which will also need several stigs disabled to run correctly. We are currently waiting for Colleen to rescan the UP Prod host with the stigs applied. Cheers, Jon On Tue, Dec 3, 2019 at 10:07 AM Mark Nissley > wrote: I am on call with UP Node aka Rogue One. They are getting ready for IATT. Here is the actions that they asked of our team, due COB today: 1. They asked if we can utilize Anchore and/or Twistlock to scan their apps and provide a report. They will be glad to do it as well if we want to make the containers available, but they emphasized that the shortest course of action is the best. 2. A plan of action for all High and Critical items scan results from Colleen's scan (if hardening scripts will be needed, they must be delivered IATT, 20 December) As this is the highest urgency task on our list right now, we need to be able to assign these tasks to specific people and knock them out. The deadline is COB today on both items. Who can work with me to make these things happen? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- JONATHAN HULTZ, RHCSA SENIOR CONSULTANT Red Hat Remote US CA jhultz at redhat.com M: 609-713-9778 [https://www.redhat.com/files/brand/email/sig-redhat.png] _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone Confidentiality Notice: This e-mail may contain confidential and privileged material for the sole use of the intended recipient(s). Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient) please contact the sender by reply e-mail and delete all copies of this message. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jay.Pascal at diligent-us.com Wed Dec 4 18:48:38 2019 From: Jay.Pascal at diligent-us.com (Pascal, Jay) Date: Wed, 4 Dec 2019 18:48:38 +0000 Subject: [Platformone] [EXTERNAL] Re: [PossibleSpam] Re: Rogue One IATT Actions In-Reply-To: <1575478943027.20438@diligent-us.com> References: <1575477411616.89460@diligent-us.com>, , <1575478943027.20438@diligent-us.com> Message-ID: <1575485318926.25191@diligent-us.com> On another note, adding the registry/repository settings in Anchore, I am only able to access images I have access to either explicitly or inherited via a group, i.e. aam-ci-cd, develop-misp-app. Currently I am the only user with access to Anchore. In order to accomplish the scans needed and set up configuration information, I will need access to custom repositories within the Openshift registry or the entire registry and all repositories. v/r, Jay L Pascal Senior Systems Engineer DILIGENT Consulting Inc. (O) 210.826.9300 (C) 210.827.5323 A Service Disabled Veteran Owned Small Business CMMI-DEV Maturity Level 3 ISO 9001:2015 certified ________________________________ From: Pascal, Jay Sent: Wednesday, December 4, 2019 11:02 AM To: Taylor Biggs Cc: Keegan Reap; Mark Nissley; Bubb, Mike; platformONE at redhat.com; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP; Hayden Smith Subject: Re: [EXTERNAL] Re: [Platformone] [PossibleSpam] Re: Rogue One IATT Actions I think I have figured it out. I created an account (jpascal) in Anchore and granted privileges. I then created a user in that account. Logged in to Anchore with the new account. I created a registry entry and used my credentials. At first it was failing with username/password to log in to the registry. I then used the token as the password and the registry was created. After this I was able to initiate an image analysis. ? Jay L Pascal Senior Systems Engineer DILIGENT Consulting Inc. (O) 210.826.9300 (C) 210.827.5323 A Service Disabled Veteran Owned Small Business CMMI-DEV Maturity Level 3 ISO 9001:2015 certified ________________________________ From: Taylor Biggs Sent: Wednesday, December 4, 2019 10:56 AM To: Pascal, Jay Cc: Keegan Reap; Mark Nissley; Bubb, Mike; platformONE at redhat.com; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP; Hayden Smith Subject: [EXTERNAL] Re: [Platformone] [PossibleSpam] Re: Rogue One IATT Actions Adding in Hayden for Anchore Support. Hayden - please see below. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Wed, Dec 4, 2019 at 11:37 AM Pascal, Jay > wrote: I was able to get the scan results for Twistlock. I have also been able to log in to Anchore. However, when I attempt to analyze a repository or tag, Anchore does not appear able to locate it. I have tried various values. I am attempting to analyze the following image. docker-registry-default.apps.cluster.unified-platform.io/aam-ci-cd/develop-misp-app-web:latest I have tried docker-registry.default.svc as the registry and aam-ci-cd as the repository. I have tried both the analyze repository and analyze tag options. using docker-registry.default.svc seems to attempt to pull from the registry, but does not find the image. It may be the Anchore user does not have privileges to pull from the registry/repository. I don't know which user/service account Anchore attempts to use to pull from the registry in order to grant them privileges. v/r, Jay L Pascal Senior Systems Engineer DILIGENT Consulting Inc. (O) 210.826.9300 (C) 210.827.5323 A Service Disabled Veteran Owned Small Business CMMI-DEV Maturity Level 3 ISO 9001:2015 certified ________________________________ From: Keegan Reap > Sent: Tuesday, December 3, 2019 9:33 AM To: Mark Nissley Cc: Jonathan Hultz; Bubb, Mike; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP; Pascal, Jay; platformONE at redhat.com Subject: [PossibleSpam] Re: [Platformone] Rogue One IATT Actions With the added people to the thread, I will go ahead and reiterate these points just in case, for full transparency: Hey Mark & all, As far as the first objective, Twistlock has been deployed to the environment and is ready to start scanning, link below. The Twistlock app is locked behind an admin account, so we will need a POC to share the admin account with. As far as Anchore goes, we have it deployed but it seems something is preventing it from coming up successfully, we are currently going to investigate. https://cluster.unified-platform.io/console/project/levelup-twistlock/overview https://levelup-twistlock.apps.cluster.unified-platform.io/ Thanks, Keegan Reap On Tue, Dec 3, 2019 at 9:29 AM Mark Nissley > wrote: Adding some of the UP Nodes Team to this thread. Mike, in a separate thread you noted that you were having trouble with Twistlock. Could you send a name and email address for someone on your team that we can grant access to? That may be you... Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 On Tue, Dec 3, 2019 at 10:25 AM Jonathan Hultz > wrote: Mark, Here is the results for the initial stig run against one of the UP Node ec2 instances. https://dccscr.dsop.io/levelup-automation/security/rhel7-stig/issues/1 There are several Cat 1 and 2s that are not implemented and the reasoning is in the ticket. Corey is currently working on the Sat role which will also need several stigs disabled to run correctly. We are currently waiting for Colleen to rescan the UP Prod host with the stigs applied. Cheers, Jon On Tue, Dec 3, 2019 at 10:07 AM Mark Nissley > wrote: I am on call with UP Node aka Rogue One. They are getting ready for IATT. Here is the actions that they asked of our team, due COB today: 1. They asked if we can utilize Anchore and/or Twistlock to scan their apps and provide a report. They will be glad to do it as well if we want to make the containers available, but they emphasized that the shortest course of action is the best. 2. A plan of action for all High and Critical items scan results from Colleen's scan (if hardening scripts will be needed, they must be delivered IATT, 20 December) As this is the highest urgency task on our list right now, we need to be able to assign these tasks to specific people and knock them out. The deadline is COB today on both items. Who can work with me to make these things happen? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- JONATHAN HULTZ, RHCSA SENIOR CONSULTANT Red Hat Remote US CA jhultz at redhat.com M: 609-713-9778 [https://www.redhat.com/files/brand/email/sig-redhat.png] _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone Confidentiality Notice: This e-mail may contain confidential and privileged material for the sole use of the intended recipient(s). Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient) please contact the sender by reply e-mail and delete all copies of this message. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -------------- next part -------------- An HTML attachment was scrubbed... URL: From hsmith at anchore.com Wed Dec 4 19:15:27 2019 From: hsmith at anchore.com (Hayden Smith) Date: Wed, 4 Dec 2019 14:15:27 -0500 Subject: [Platformone] [EXTERNAL] Re: [PossibleSpam] Re: Rogue One IATT Actions In-Reply-To: <1575485318926.25191@diligent-us.com> References: <1575477411616.89460@diligent-us.com> <1575478943027.20438@diligent-us.com> <1575485318926.25191@diligent-us.com> Message-ID: Hey Jay, I totally understand the need to get the scans done. Sorry, I was out of pocket for a couple hours. That is correct that you have limited access. You will need to add every registry you are trying to scan in order to scan the repos and images within that registry. What registry do your images currently sit in? You will need to add all registries that contain the images you are trying to scan. You can do that by doing the following: 1) Open Anchore UI and Navigate to the Configuration Tab 2) Select Registries on the left hand side 3) Add the new registry with proper credentials. Let me know if you have any additional questions. V/r Hayden Smith Senior Engineer Anchore Los Angeles, CA Cell: (562) 676-5815 On Wed, Dec 4, 2019 at 1:48 PM Pascal, Jay wrote: > On another note, adding the registry/repository settings in Anchore, I am > only able to access images I have access to either explicitly or inherited > via a group, i.e. aam-ci-cd, develop-misp-app. Currently I am the only > user with access to Anchore. In order to accomplish the scans needed and > set up configuration information, I will need access to custom repositories > within the Openshift registry or the entire registry and all repositories. > > > v/r, > > > Jay L Pascal > Senior Systems Engineer > *DILIGENT* Consulting Inc. > (O) 210.826.9300 > (C) 210.827.5323 > A Service Disabled Veteran Owned Small Business > CMMI-DEV Maturity Level 3 > ISO 9001:2015 certified > ------------------------------ > *From:* Pascal, Jay > *Sent:* Wednesday, December 4, 2019 11:02 AM > *To:* Taylor Biggs > *Cc:* Keegan Reap; Mark Nissley; Bubb, Mike; platformONE at redhat.com; > SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP; Hayden Smith > *Subject:* Re: [EXTERNAL] Re: [Platformone] [PossibleSpam] Re: Rogue One > IATT Actions > > > I think I have figured it out. I created an account (jpascal) in Anchore > and granted privileges. I then created a user in that account. Logged in > to Anchore with the new account. I created a registry entry and used my > credentials. At first it was failing with username/password to log in to > the registry. I then used the token as the password and the registry was > created. After this I was able to initiate an image analysis. > > ? > > > Jay L Pascal > Senior Systems Engineer > *DILIGENT* Consulting Inc. > (O) 210.826.9300 > (C) 210.827.5323 > A Service Disabled Veteran Owned Small Business > CMMI-DEV Maturity Level 3 > ISO 9001:2015 certified > ------------------------------ > *From:* Taylor Biggs > *Sent:* Wednesday, December 4, 2019 10:56 AM > *To:* Pascal, Jay > *Cc:* Keegan Reap; Mark Nissley; Bubb, Mike; platformONE at redhat.com; > SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP; Hayden Smith > *Subject:* [EXTERNAL] Re: [Platformone] [PossibleSpam] Re: Rogue One IATT > Actions > > Adding in Hayden for Anchore Support. Hayden - please see below. > > Thanks, > Taylor > > ---- > Taylor Biggs > taylor at redhat.com > 850-449-2220 > > > > On Wed, Dec 4, 2019 at 11:37 AM Pascal, Jay > wrote: > >> I was able to get the scan results for Twistlock. I have also been able >> to log in to Anchore. However, when I attempt to analyze a repository or >> tag, Anchore does not appear able to locate it. I have tried various >> values. I am attempting to analyze the following image. >> >> >> >> docker-registry-default.apps.cluster.unified-platform.io/aam-ci-cd/develop-misp-app-web:latest >> >> >> I have tried docker-registry.default.svc as the registry and aam-ci-cd as >> the repository. I have tried both the analyze repository and analyze tag >> options. using docker-registry.default.svc seems to attempt to pull from >> the registry, but does not find the image. >> >> It may be the Anchore user does not have privileges to pull from the >> registry/repository. I don't know which user/service account Anchore >> attempts to use to pull from the registry in order to grant them privileges. >> >> v/r, >> >> Jay L Pascal >> Senior Systems Engineer >> *DILIGENT* Consulting Inc. >> (O) 210.826.9300 >> (C) 210.827.5323 >> A Service Disabled Veteran Owned Small Business >> CMMI-DEV Maturity Level 3 >> ISO 9001:2015 certified >> ------------------------------ >> *From:* Keegan Reap >> *Sent:* Tuesday, December 3, 2019 9:33 AM >> *To:* Mark Nissley >> *Cc:* Jonathan Hultz; Bubb, Mike; SANCHEZ, MARK GG-13 USAF AFMC >> AFLCMC/HNCP; Pascal, Jay; platformONE at redhat.com >> *Subject:* [PossibleSpam] Re: [Platformone] Rogue One IATT Actions >> >> With the added people to the thread, I will go ahead and reiterate these >> points just in case, for full transparency: >> >> >> Hey Mark & all, >> >> As far as the first objective, Twistlock has been deployed to the >> environment and is ready to start scanning, link below. The Twistlock app >> is locked behind an admin account, so we will need a POC to share the admin >> account with. As far as Anchore goes, we have it deployed but it seems >> something is preventing it from coming up successfully, we are currently >> going to investigate. >> >> >> https://cluster.unified-platform.io/console/project/levelup-twistlock/overview >> https://levelup-twistlock.apps.cluster.unified-platform.io/ >> >> Thanks, >> Keegan Reap >> >> >> On Tue, Dec 3, 2019 at 9:29 AM Mark Nissley wrote: >> >>> Adding some of the UP Nodes Team to this thread. Mike, in a >>> separate thread you noted that you were having trouble with Twistlock. >>> Could you send a name and email address for someone on your team that we >>> can grant access to? That may be you... >>> >>> >>> Mark NISSLEY, PMP, CSM, LEAN >>> >>> PROGRAM MaNAGER & SR technical Project Manager >>> >>> North American Consulting, Public Sector >>> >>> >>> M: 850-530-3234 >>> >>> >>> >>> *Scheduled Training: October 14-18* >>> >>> >>> On Tue, Dec 3, 2019 at 10:25 AM Jonathan Hultz >>> wrote: >>> >>>> Mark, >>>> >>>> Here is the results for the initial stig run against one of the >>>> UP Node ec2 instances. >>>> https://dccscr.dsop.io/levelup-automation/security/rhel7-stig/issues/1 >>>> >>>> There are several Cat 1 and 2s that are not implemented and the >>>> reasoning is in the ticket. Corey is currently working on the Sat role >>>> which will also need several stigs disabled to run correctly. >>>> >>>> We are currently waiting for Colleen to rescan the UP Prod host with >>>> the stigs applied. >>>> >>>> Cheers, Jon >>>> >>>> On Tue, Dec 3, 2019 at 10:07 AM Mark Nissley >>>> wrote: >>>> >>>>> I am on call with UP Node aka Rogue One. They are getting ready for >>>>> IATT. Here is the actions that they asked of our team, due COB today: >>>>> >>>>> 1. They asked if we can utilize Anchore and/or Twistlock to scan >>>>> their apps and provide a report. They will be glad to do it as well if we >>>>> want to make the containers available, but they emphasized that the >>>>> shortest course of action is the best. >>>>> 2. A plan of action for all High and Critical items scan results >>>>> from Colleen's scan (if hardening scripts will be needed, they must be >>>>> delivered IATT, 20 December) >>>>> >>>>> As this is the highest urgency task on our list right now, we need to >>>>> be able to assign these tasks to specific people and knock them out. *The >>>>> deadline is COB today on both items*. Who can work with me to make >>>>> these things happen? >>>>> >>>>> >>>>> Mark NISSLEY, PMP, CSM, LEAN >>>>> >>>>> PROGRAM MaNAGER & SR technical Project Manager >>>>> >>>>> North American Consulting, Public Sector >>>>> >>>>> >>>>> M: 850-530-3234 >>>>> >>>>> >>>>> >>>>> *Scheduled Training: October 14-18* >>>>> _______________________________________________ >>>>> platformONE mailing list >>>>> platformONE at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/platformone >>>>> >>>> >>>> >>>> -- >>>> >>>> JONATHAN HULTZ, RHCSA >>>> >>>> SENIOR CONSULTANT >>>> >>>> Red Hat Remote US CA >>>> >>>> jhultz at redhat.com M: 609-713-9778 >>>> >>>> >>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >> Confidentiality Notice: This e-mail may contain confidential and >> privileged material for the sole use of the intended recipient(s). Any >> review, use, distribution or disclosure by others is strictly prohibited. >> If you are not the intended recipient (or authorized to receive for the >> recipient) please contact the sender by reply e-mail and delete all copies >> of this message. >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2019-12-04 at 2.13.10 PM.png Type: image/png Size: 697768 bytes Desc: not available URL: From Jay.Pascal at diligent-us.com Wed Dec 4 19:49:42 2019 From: Jay.Pascal at diligent-us.com (Pascal, Jay) Date: Wed, 4 Dec 2019 19:49:42 +0000 Subject: [Platformone] [EXTERNAL] Re: [PossibleSpam] Re: Rogue One IATT Actions In-Reply-To: References: <1575477411616.89460@diligent-us.com> <1575478943027.20438@diligent-us.com> <1575485318926.25191@diligent-us.com>, Message-ID: <1575488982072.12560@diligent-us.com> Hayden, I was able to set it up to access the repositories I currently have access to in Openshift. The issue is I am unable to access all the repositories within Openshift. So, it's an Openshift access issue, not Anchore. However, I ran into some issues when attempting to access the repositories. I would enter my credentials and they were not accepted. I eventually tried it using the token provided by Openshift for my login session and it worked. My concern is the token will expire and I will need to edit the registry/repository settings pretty much every time we need to scan. Jay L Pascal Senior Systems Engineer DILIGENT Consulting Inc. (O) 210.826.9300 (C) 210.827.5323 A Service Disabled Veteran Owned Small Business CMMI-DEV Maturity Level 3 ISO 9001:2015 certified ________________________________ From: Hayden Smith Sent: Wednesday, December 4, 2019 1:15 PM To: Pascal, Jay Cc: Taylor Biggs; Keegan Reap; Mark Nissley; Bubb, Mike; platformONE at redhat.com; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP Subject: Re: [EXTERNAL] Re: [Platformone] [PossibleSpam] Re: Rogue One IATT Actions Hey Jay, I totally understand the need to get the scans done. Sorry, I was out of pocket for a couple hours. That is correct that you have limited access. You will need to add every registry you are trying to scan in order to scan the repos and images within that registry. What registry do your images currently sit in? You will need to add all registries that contain the images you are trying to scan. You can do that by doing the following: 1) Open Anchore UI and Navigate to the Configuration Tab 2) Select Registries on the left hand side 3) Add the new registry with proper credentials. Let me know if you have any additional questions. V/r Hayden Smith Senior Engineer Anchore Los Angeles, CA Cell: (562) 676-5815 On Wed, Dec 4, 2019 at 1:48 PM Pascal, Jay > wrote: On another note, adding the registry/repository settings in Anchore, I am only able to access images I have access to either explicitly or inherited via a group, i.e. aam-ci-cd, develop-misp-app. Currently I am the only user with access to Anchore. In order to accomplish the scans needed and set up configuration information, I will need access to custom repositories within the Openshift registry or the entire registry and all repositories. v/r, Jay L Pascal Senior Systems Engineer DILIGENT Consulting Inc. (O) 210.826.9300 (C) 210.827.5323 A Service Disabled Veteran Owned Small Business CMMI-DEV Maturity Level 3 ISO 9001:2015 certified ________________________________ From: Pascal, Jay Sent: Wednesday, December 4, 2019 11:02 AM To: Taylor Biggs Cc: Keegan Reap; Mark Nissley; Bubb, Mike; platformONE at redhat.com; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP; Hayden Smith Subject: Re: [EXTERNAL] Re: [Platformone] [PossibleSpam] Re: Rogue One IATT Actions I think I have figured it out. I created an account (jpascal) in Anchore and granted privileges. I then created a user in that account. Logged in to Anchore with the new account. I created a registry entry and used my credentials. At first it was failing with username/password to log in to the registry. I then used the token as the password and the registry was created. After this I was able to initiate an image analysis. ? Jay L Pascal Senior Systems Engineer DILIGENT Consulting Inc. (O) 210.826.9300 (C) 210.827.5323 A Service Disabled Veteran Owned Small Business CMMI-DEV Maturity Level 3 ISO 9001:2015 certified ________________________________ From: Taylor Biggs > Sent: Wednesday, December 4, 2019 10:56 AM To: Pascal, Jay Cc: Keegan Reap; Mark Nissley; Bubb, Mike; platformONE at redhat.com; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP; Hayden Smith Subject: [EXTERNAL] Re: [Platformone] [PossibleSpam] Re: Rogue One IATT Actions Adding in Hayden for Anchore Support. Hayden - please see below. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Wed, Dec 4, 2019 at 11:37 AM Pascal, Jay > wrote: I was able to get the scan results for Twistlock. I have also been able to log in to Anchore. However, when I attempt to analyze a repository or tag, Anchore does not appear able to locate it. I have tried various values. I am attempting to analyze the following image. docker-registry-default.apps.cluster.unified-platform.io/aam-ci-cd/develop-misp-app-web:latest I have tried docker-registry.default.svc as the registry and aam-ci-cd as the repository. I have tried both the analyze repository and analyze tag options. using docker-registry.default.svc seems to attempt to pull from the registry, but does not find the image. It may be the Anchore user does not have privileges to pull from the registry/repository. I don't know which user/service account Anchore attempts to use to pull from the registry in order to grant them privileges. v/r, Jay L Pascal Senior Systems Engineer DILIGENT Consulting Inc. (O) 210.826.9300 (C) 210.827.5323 A Service Disabled Veteran Owned Small Business CMMI-DEV Maturity Level 3 ISO 9001:2015 certified ________________________________ From: Keegan Reap > Sent: Tuesday, December 3, 2019 9:33 AM To: Mark Nissley Cc: Jonathan Hultz; Bubb, Mike; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP; Pascal, Jay; platformONE at redhat.com Subject: [PossibleSpam] Re: [Platformone] Rogue One IATT Actions With the added people to the thread, I will go ahead and reiterate these points just in case, for full transparency: Hey Mark & all, As far as the first objective, Twistlock has been deployed to the environment and is ready to start scanning, link below. The Twistlock app is locked behind an admin account, so we will need a POC to share the admin account with. As far as Anchore goes, we have it deployed but it seems something is preventing it from coming up successfully, we are currently going to investigate. https://cluster.unified-platform.io/console/project/levelup-twistlock/overview https://levelup-twistlock.apps.cluster.unified-platform.io/ Thanks, Keegan Reap On Tue, Dec 3, 2019 at 9:29 AM Mark Nissley > wrote: Adding some of the UP Nodes Team to this thread. Mike, in a separate thread you noted that you were having trouble with Twistlock. Could you send a name and email address for someone on your team that we can grant access to? That may be you... Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 On Tue, Dec 3, 2019 at 10:25 AM Jonathan Hultz > wrote: Mark, Here is the results for the initial stig run against one of the UP Node ec2 instances. https://dccscr.dsop.io/levelup-automation/security/rhel7-stig/issues/1 There are several Cat 1 and 2s that are not implemented and the reasoning is in the ticket. Corey is currently working on the Sat role which will also need several stigs disabled to run correctly. We are currently waiting for Colleen to rescan the UP Prod host with the stigs applied. Cheers, Jon On Tue, Dec 3, 2019 at 10:07 AM Mark Nissley > wrote: I am on call with UP Node aka Rogue One. They are getting ready for IATT. Here is the actions that they asked of our team, due COB today: 1. They asked if we can utilize Anchore and/or Twistlock to scan their apps and provide a report. They will be glad to do it as well if we want to make the containers available, but they emphasized that the shortest course of action is the best. 2. A plan of action for all High and Critical items scan results from Colleen's scan (if hardening scripts will be needed, they must be delivered IATT, 20 December) As this is the highest urgency task on our list right now, we need to be able to assign these tasks to specific people and knock them out. The deadline is COB today on both items. Who can work with me to make these things happen? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- JONATHAN HULTZ, RHCSA SENIOR CONSULTANT Red Hat Remote US CA jhultz at redhat.com M: 609-713-9778 [https://www.redhat.com/files/brand/email/sig-redhat.png] _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone Confidentiality Notice: This e-mail may contain confidential and privileged material for the sole use of the intended recipient(s). Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient) please contact the sender by reply e-mail and delete all copies of this message. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Wed Dec 4 23:29:07 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Wed, 4 Dec 2019 18:29:07 -0500 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: <1575472568190.35402@ManTech.com> References: <1574703796843.94672@ManTech.com> <1574705271520.5593@ManTech.com> <1574705967080.55870@ManTech.com> <1574706319228.45425@ManTech.com> <1574707851362.16520@ManTech.com> <1574713055636.69810@ManTech.com> <1574714851887.51023@ManTech.com> <1574869597001.92851@ManTech.com> <1575303080504.65964@ManTech.com> <1575311511650.96981@ManTech.com> <6277_1575312368_5DE55BEF_6277_803_1_CA+EGyAB6=xB9Na5Wf-nWY52dwt1200Rk8uh6krw1CXojEaO1PQ@mail.gmail.com> <7E65F8DF-F96B-4657-82A2-4A95715FAC0B@mitre.org> <5A89B24C-3C36-4205-B083-CBFBF71ED122@mitre.org> <3052516616bb44caa350c3c6897bc122@DCCPRDEXCH02.ManTech.com> <406B92B0-26F9-48FD-B533-549CC36B4FB2@mitre.org> <1575472568190.35402@ManTech.com> Message-ID: Russell, I have definitely been terrible with email lately and I apologize for the slow response times. I get back to San Antonio tomorrow but I have a pretty full afternoon. I can stop by Friday if you'd like. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C < Russell.Kendall at mantech.com> wrote: > Jonny, > I'd like to suggest you come to 500 to wrap this up, since it seems there > are significant delays in communication that are contributing to downtime. > V/R, > Russell C Kendall > ________________________________________ > From: Miller, Timothy J. > Sent: Wednesday, December 4, 2019 7:02 AM > To: Jonathan Rickard; Keegan Reap > Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ, > JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC > AFLCMC/HNCP; Jonathan Rickard > Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors > > Johnny-- > > Update the issue, if you would be so kind. > > -- T > > ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of Jonathan > Rickard" > wrote: > > Hey Guys - Sorry for taking so long - this has been completed. Please > run your builds and let us know if you're having any problems. > jonny > Jonathan Rickard, RHCA > Principal Consultant, NAPS > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > > > > > > > > > > > On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard > wrote: > > > Russell / Team, > > > We believe we've identified the issue with your application deploying. > In order to rectify the issue I need to evacuate pods so you will probably > see some hiccups while deploying. I will update when this is resolved. > > > Thanks, > jonny > > Jonathan Rickard, RHCA > Principal Consultant, NAPS > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > > > > > > > > > > > On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap wrote: > > > Hey all, we have opened an issue below, that we believe to be the > cause, we are currently investigating: > > > https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 > > > > On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard > wrote: > > > Russell, > > > Getting more eyes on this @platformONE at redhat.com platformONE at redhat.com> > > > We'll keep you posted. > jonny > Jonathan Rickard, RHCA > Principal Consultant, NAPS > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > > > > > > > > > > > On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C < > Russell.Kendall at mantech.com> wrote: > > > Kevin, > > Unfortunately we are receiving deployment errors again. This is the > event: > > 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had > taints that the pod didn't tolerate, 6 node(s) didn't match node selector. > > This is the deployment: > > > https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup > > > V/R, > Russell C Kendall > ________________________________________ > From: Miller, Timothy J. > Sent: Monday, December 2, 2019 2:44:21 PM > To: Kevin O'Donnell > Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF > AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt > USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 > USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE > A CTR USAF AFMC AFLCMC/HNCP > Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors > > Tagged you on it. > > -- T > > On 12/2/19, 14:03, "Kevin O'Donnell" wrote: > > Hello, > > > Autoscaling is on our future IAC roadmap. Tim, the additional > ticket would be appreciated. > > > We have swapped out the app/worker instances with m5a.8xlarge 32 > cores, 128gb of ram. Please let us know if you have any other issues. > > > Thanks, > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting < > https://www.redhat.com/> > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. < > tmiller at mitre.org> wrote: > > > I'll open an issue. IaC needs to have instance size as a host_var > to facilitate scaling. > > -- T > > On 12/2/19, 13:15, "Kevin O'Donnell" wrote: > > Tim, > > > Thanks for the information. We are undersized on the > app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb > of ram. From what I have read each Labs engagement operated on a 3 node > worker cluster with each node having 6core's and 28gb > of ram. We will need to swap out the existing instances with > larger spec's. > > > We are going to try to flush the existing workload out on one > of the workers to see if we can swap them out one at a time. > > > Thanks, > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting < > https://www.redhat.com/> > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. < > tmiller at mitre.org> wrote: > > > Here's what I can see, given the perm limits I seem to be > under: > > - NS:develop-misp-app and NS:lp-develop-misp-app both have > several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned > while trying to fetch something from somewhere (URL isn't recorded in the > stack trace). > > - NS:minishift-misp-app has most of its pods/jobs stuck in > ImagePullBackoff. No detail there in the event stream so I'll see if I can > dig deeper. > > - NS:aam-ci-cd has Jenkins trying to spin up three workers, > those are coming back as unschedulable. > > I can't see into NS:aam-bases or NS:dsop-images b/c of perm > limits. > > I see no DAS-related project(s). > > The MISP stuff needs debugging before calling "blocked" since > it looks like an internal error from this perspective. > > > > In re: AAM Jenkins: If this deployment is coming out of the > OCP storefront, then maybe it should be ephemeral rather than persistent. > If it's a custom deployment, then it probably needs a rethink. > > I'm also not sure why there are two MISP dev projects. > > -- T > > > > On 12/2/19, 12:46, "Kevin O'Donnell" > wrote: > > Russell, > > > Thank you for the information. We can switch out the > instance type for the worker nodes. How much memory is required by the apps? > > > > Thanks, > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting < > https://www.redhat.com/> > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C < > Russell.Kendall at mantech.com> wrote: > > > Kevin, > The lack of resources on > u-p.io < > http://u-p.io> > cluster is hindering development, > testing, and integration of the apps from CCAT AAM DAS, which is > putting one > of our PI goals at risk. > > > We are blocked by the fact that we (CCAT and AAM) cannot > deploy additional pods to the > > unified-platform.io < > http://unified-platform.io> < > http://unified-platform.io> > cluster. We have a subset of containers deployed, but rolling > deployments and new deployments fail. This means that we are > not able to execute integration testing or peer reviews. > We are temporarily working around by NOT > testing/reviewing our code changes live, something that no one likes. Also, > we are now running weeks-old instances of our containers, so we are very > likely producing some technical debt. We currently have > developers > approaching idle or doing non-priority work until the > resource issue is resolved. > > > > Here is the particular error from the OSP cluster I > received while attempting a redeploy of one of our apps. > > > > 0/9 nodes are available: 1 node(s) had taints that the pod > didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node > selector.11 times in the last minute > > Since we do not have any cluster permissions, I cannot > verify which resource is running out, but from experience, I assess it is a > memory issue. > > > > It appears the cluster has been provisioned with a silly > allocation of node types. Without knowing exactly what was deployed, it > appears only 3 of the 9 hosts are suitable worker nodes. We would expect > the cluster to respond to resource limitations > and > scale, > but if a scheduled downtime is required, please work with > us so we can anticipate. As it stands, the cluster does not support > resources required by CCAT and the other dev teams (AAM, DAS, etc.). We > would accept any downtime if it will improve the > situation, > as we are blocked from progressing under the current > constraints. My hope was we could get the cluster redeployed over the TG > holiday to eliminate developer impact, but as Mark pointed out, there were > limited support folks available. Now I am just > trying > to > minimize the losses. > > > > V/R, > > Russell C Kendall > > > > > > ________________________________________ > From: Kevin O'Donnell > Sent: Monday, December 2, 2019 11:52 AM > To: Kendall, Russell C > Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC > AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF > AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); > DIROCCO, > ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy > J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: Re: Unified Platform Pod Deploy Errors > > Hello Russell, > > > Can you elaborate on the term Blocked? What specific > issues are the blockers? > > > > Thanks, > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting < > https://www.redhat.com/> > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C < > Russell.Kendall at mantech.com> wrote: > > > Mark, > > Thank for acknowledging, please be aware the San Antonio > dev teams working in > > > unified-platform.io < > http://unified-platform.io> < > http://unified-platform.io> > are currently blocked. > > V/R, > > Russell C Kendall > > ________________________________________ > From: Mark Nissley > Sent: Monday, December 2, 2019 9:36 AM > To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; > Jonathan Rickard; Chris Kuperstein > Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin > O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org); > DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy > J.; > RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: Re: Unified Platform Pod Deploy Errors > > As noted, I don't suspect much got done on this over the > holiday weekend. I did see the ticket, as dropped some details into it. I > also assigned it to @Jonathan > Rickard and @Chris Kuperstein > . > > > > It looks like short term solutions have been easy but the > issue is recurring. > > > > > Mark NISSLEY, PMP, > CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > North American Consulting, Public Sector > > M: > 850-530-3234 > > > Scheduled Training: October 14-18 > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 > USAF AFMC AFLCMC/HNCP wrote: > > > Mark/Kevin, > > > I just heard at the team stand up that we are still > blocked. This is also affecting the AAM team from my investigations. > > > Please let me know if there is something we need to do to > move this forward. > > Most Sincerely, > > > Ade Abodunrin, GG-12, USAF > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > > LevelUP Code Works > Commercial: > (210) 890-2113 > NIPR email: > ademola.abodunrin at us.af.mil > > > > > > > > > ________________________________________ > From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > Sent: Wednesday, November 27, 2019 12:58 PM > To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < > austen.bryan.1 at us.af.mil>; Mark Nissley ; Kevin > O'Donnell > ; > Brenna Gordon > Cc: Kendall, Russell C ; > Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 > USAF AFMC ESC/AFLCMC/HNCP > ; Miller, Timothy J. < > tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < > jose.ramirez.50.ctr at us.af.mil> > Subject: Re: Unified Platform Pod Deploy Errors > > Thanks a lot Capt Bryan! Russell created the ticket on > GitLab UP Node Project. > > > > > Most Sincerely, > > > Ade Abodunrin, GG-12, USAF > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > > LevelUP Code Works > Commercial: > (210) 890-2113 > NIPR email: > ademola.abodunrin at us.af.mil > > > > > > > > > ________________________________________ > From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < > austen.bryan.1 at us.af.mil> > Sent: Wednesday, November 27, 2019 12:56 PM > To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < > ademola.abodunrin at us.af.mil>; Mark Nissley ; Kevin > O'Donnell > ; Brenna Gordon > Cc: Kendall, Russell C ; > Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 > USAF AFMC ESC/AFLCMC/HNCP > ; Miller, Timothy J. < > tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < > jose.ramirez.50.ctr at us.af.mil> > Subject: RE: Unified Platform Pod Deploy Errors > > Thanks Ade. The team is thin until next week due to the > holidays but I will make sure it is addressed. Were there any issues > submitted to Gitlab?s UP Node Project on DCCSCR? > > @Mark/Kevin ? can we address? > > -Austen > > From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < > ademola.abodunrin at us.af.mil> > > Sent: Wednesday, November 27, 2019 9:51 AM > To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < > austen.bryan.1 at us.af.mil> > Cc: Kendall, Russell C ; > Bubb, Mike (mbubb at mitre.org) > Subject: Fw: Unified Platform Pod Deploy Errors > > > > Capt Bryan, > > Please see the explanation on the issue that Ginyu Force > is currently experiencing below. > > > > Most Sincerely, > > Ade Abodunrin, GG-12, USAF > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > LevelUP Code Works > Commercial: (210) 890-2113 > NIPR email: > ademola.abodunrin at us.af.mil > > > > > > ________________________________________ > > From: Kendall, Russell C > Sent: Wednesday, November 27, 2019 9:46 AM > To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < > ademola.abodunrin at us.af.mil>; Buffaloe, > Christopher ; Molina, > Toby ; > Crace, Jared E ; SANCHEZ, MARK > GG-13 USAF AFMC AFLCMC/HNCP > Cc: > tmiller at mitre.org < > tmiller at mitre.org> > Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy > Errors > > > > Gentlemen, > > The application development teams working in the new > GovCloud OCP environment (unified-platform.io > > > ) > are currently blocked in efforts to deploy new pods for > testing, development, and UAT. > > Red Hat and RogueOne SMEs have been notified and have > attempted some fixes starting on Monday 11/25, but at this point have not > been able to provision resources > sufficient to host CCAT and AAM. > > We have taken steps to minimize our footprint (eliminating > demonstration environment, deleting developer namespaces), but this is not > a sustainable approach, > and has only resulted in moderate improvements in cluster > performance. > > Our hope is the U-P.io cluster compute resources can be > increased very soon, so that we may resume normal development activities. > Our understanding is that > such a scaling requires a complete redeployment of the > cluster, which is unusual, but an acceptable loss to productivity. If the > cluster can be scaled up over the Thanksgiving holiday, the impact will be > minimal to developers and cluster administrators, > alike. > > We are currently collaborating on solutions on the > following MatterMost channel behind the space camp VPN (link below), and > via the email thread forwarded > (further below). > > > > https://chat.spacecamp.ninja/levelup/channels/unified-platform-node < > https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> < > https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> > > > Please keep me posted on developments and I will > coordinate developer activities with any scheduled platform outages. > > V/R, > Russell C Kendall > > ________________________________________ > > From: Curran, Daniel M > Sent: Monday, November 25, 2019 2:47 PM > To: Jonathan Rickard > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, > Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; > Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander; Crace, Jared E; Middleton, > Joseph J > Subject: Re: Unified Platform Pod Deploy Errors > > > > Sounds great. Appreciate it. > I'll watch email and Mattermost in case you need more from > us. > > -Daniel > > ________________________________________ > > From: Jonathan Rickard > Sent: Monday, November 25, 2019 2:44 PM > To: Curran, Daniel M > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, > Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; > Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander; Crace, Jared E; Middleton, > Joseph J > Subject: Re: Unified Platform Pod Deploy Errors > > > > Thanks Daniel - > > > > I'll continue to look into the resource issue that you're > seeing - I'd like to identify the root cause and then work with the team to > come up with a solution. > > > > Jonathan Rickard, > RHCA > Principal Consultant, NAPS > Red > Hat Remote - Texas > jonny at redhat.com > > M: 210-862-9739 > > > > > > > > > > > > > > > On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M < > Daniel.Curran at mantech.com> > wrote: > > > Yeah we hit the limit then had AAM kill some of their > projects and then our pods got scheduled. > We've hit the limit again though. Here's an example pod > that cannot be scheduled > > > > > https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth > < > https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth> > < > https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth > > > < > https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth > > > They're seeing it when their jenkins slaves can't deploy > but it's basically any pod after we hit some limit. > > -Daniel > ________________________________________ > > From: Jonathan Rickard > Sent: Monday, November 25, 2019 1:26 PM > To: Curran, Daniel M > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, > Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; > Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander; Crace, Jared E; Middleton, > Joseph J > Subject: Re: Unified Platform Pod Deploy Errors > > > > Daniel, > > > > I can see that you have 3 mongo pods, 1 chatup and 1 upbot > pod running ... is your app good to go? > > > > Looks like there was an issue with memory on 1 pod, then > some node selector being mismatched - just what i could see in the events... > > > > > > > Jonathan Rickard, > RHCA > Principal Consultant, NAPS > Red > Hat Remote - Texas > jonny at redhat.com > > M: 210-862-9739 > > > > > > > > > > > > > > > On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M < > Daniel.Curran at mantech.com> > wrote: > > > Also, AAM was having similar issues. Looks like they had a > lot of namespaces and scaling down the pods on their deployments didn't > help but actually deleting the namespaces > did. > We have pods scheduling now but I'm adding them and we'd > still like to work through what resource limit we were hitting to avoid > this in the future. > > -Daniel > > ________________________________________ > > From: Curran, Daniel M > Sent: Monday, November 25, 2019 12:25 PM > To: Jonathan Rickard > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, > Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; > Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander > Subject: Re: Unified Platform Pod Deploy Errors > > > > Thanks, sir. > Most important for us to get working is "ccat-demo" but > it's also happening in "ccat-dev" and "ccat-ci-cd". > > -Daniel > ________________________________________ > > From: Jonathan Rickard > Sent: Monday, November 25, 2019 12:22 PM > To: Curran, Daniel M > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, > Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; > Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander > Subject: Re: Unified Platform Pod Deploy Errors > > > > What's the name of the project you're working in? I'm > going to be back at my laptop in about 30 and will take a look when I get > there. > > > > Is it just the Jenkins pods failing? > > > > > > > > On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M < > Daniel.Curran at mantech.com> > wrote: > > > Adding Dean and Alex. > Also, sitting in mattermost if anyone needs to get online > and chat for more information. > > -Daniel > > ________________________________________ > > From: Curran, Daniel M > Sent: Monday, November 25, 2019 12:07 PM > To: > jonny at redhat.com ; > > ckuperst at redhat.com ; Mark > Nissley > Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell > C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher > Subject: Re: Unified Platform Pod Deploy Errors > > > > Adding Kupe and Mark. > > -Daniel > ________________________________________ > > From: Curran, Daniel M > Sent: Monday, November 25, 2019 11:43 AM > To: > jonny at redhat.com > Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell > C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher > Subject: Unified Platform Pod Deploy Errors > > > > Hey Jonny, > > We met briefly at SpaceCAMP a couple weeks ago when > > > > > cluster.unified-platform.io < > http://cluster.unified-platform.io> > > was stood up. We've been trying to deploy some apps today and so > far today we're getting errors on most (if > not all) of our pods. > > 0/9 nodes are available: 3 Insufficient pods, 6 node(s) > didn't match node selector. > > Is what we're seeing. We were thinking it was some volume > types weren't correct but some of our pods don't even have volumes attached > and still give us this error (i.e. Jenkins > slaves or web frontends without persistent storage). > Any idea what this could be? We're not running out of > space on the nodes themselves are we? > We have a demo scheduled for tomorrow at 9:30 AM CST and > are hoping to get a demo env up for them today but this error came up > unexpectedly. Also, we're here at 500 Navarro > St. in San Antonio working through this in person is > better/easier. > > Thanks, > Daniel Curran > > > > > > ________________________________________ > > > This e-mail and any attachments are intended only for the > use of the addressee(s) named herein and may contain proprietary > information. If you are not the intended recipient of this e-mail or > believe that you received this email in error, please > take > immediate > action to notify the sender of the apparent error by > reply e-mail; permanently delete the e-mail and any attachments from your > computer; and do not disseminate, distribute, use, or copy this message and > any attachments. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Thu Dec 5 01:01:11 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Wed, 4 Dec 2019 20:01:11 -0500 Subject: [Platformone] [EXT] Re: Riddle me this, Batman (odd things in up-prod) In-Reply-To: References: <49479C3E-4D9F-4C8C-AE04-585855858CCF@mitre.org> <7098_1575412443_5DE6E2DB_7098_1003_1_CAPeU9HY-y6dX2ZqyDe_xk=xs9CxuAGcifQyT8nSgxo6wWZ0XPQ@mail.gmail.com> <98b636fa65e74df9a32cca3a37381c1c@XCGVAG21.northgrum.com> <682371889abb42abaff12c59764e6798@XCGVAG21.northgrum.com> <6C0974A0-0436-4D05-9F6A-8FB5B21FECF8@mitre.org> Message-ID: Tim, The bastion node in C1 is a windows virtual machine that must be accessed via Guacamole and is used as a legit jump/deploy server. As stated by Jon and Dean - the ocp-bastion's only purpose is to deploy OpenShift - we could probably rename it to make it easier for folks to understand. In our early deployments this node was a true jump box as well, and we used it for executing playbooks and IDM management...etc. Once we moved to the ci-runner deployment the need for remote access really went away. The server that Dean stood up (up-prod-bastion) is more of a management server, which is where we're managing IDM access..etc from. In C1, we break policy by implementing a "deploy" node in the environment so we have to use a container that performs the same functions. This container is run by the Jenkins instance, provided by the C1 team. Hopefully that clears things up jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Wed, Dec 4, 2019 at 11:43 AM Jonathan Hultz wrote: > Tim, > > The OCP_Bastion is necessary to deploy the OCP 3.11 cluster. It holds a > specific version of ansible and inventory file that Openshift deploys from. > The CI-Runner triggers the build but its fully automated and it should have > zero hands on keyboard access. > > On Wed, Dec 4, 2019 at 11:29 AM Dean Lystra wrote: > >> IMO a VPN solution would be the most preferred option. This would be up >> to the decision makers to go this route. We used to have OpenVPN in our >> VPCs, but I am unsure why we no longer use it. >> >> On Wed, Dec 4, 2019, 8:21 AM Miller, Timothy J. >> wrote: >> >>> I wouldn't expect a public IP into a private subnet, and I'm wouldn't be >>> comfy with port forwarding into it either. >>> >>> I sense a broken concept of operations here. As currently defined in >>> IaC, ocp-bastion is useless or redundant: >>> >>> - It's in the private subnet and unreachable from any admin console by >>> the current SG anyway. A VPN is required to reach it. >>> >>> - OTOH, if we have a VPN, we can whitelist admin consoles via the VPN in >>> the default SG. This would save a little $. >>> >>> - In C1, ocp-bastion is duplicative with the C1 provided bastion. We >>> can whitelist the C1 bastion in the default SG and again save some $. >>> >>> -- T >>> >>> ?On 12/4/19, 09:33, "Nunez, Carlos A [US] (MS) (Contr)" < >>> Carlos.Nunez2 at ngc.com> wrote: >>> >>> AWS will not allow a public IP in a private subnet. No cloud >>> provider does. The NAT is what translates Private Subnet resources to be >>> connected to Public Subnet resources. >>> >>> V/R >>> Adrian >>> >>> -----Original Message----- >>> From: Miller, Timothy J. >>> Sent: Wednesday, December 4, 2019 10:29 AM >>> To: Nunez, Carlos A [US] (MS) (Contr) ; Dean >>> Lystra ; Kevin O'Donnell >>> Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) < >>> Colleen.Feiglstok at ngc.com>; Mathew Huston >>> Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman >>> (odd things in up-prod) >>> >>> That's great, once a VPN into the VPC is active. >>> >>> In the meantime...? >>> >>> -- T >>> >>> On 12/4/19, 09:07, "Nunez, Carlos A [US] (MS) (Contr)" < >>> Carlos.Nunez2 at ngc.com> wrote: >>> >>> Bastion cannot be in a private subnet. That is the EC2 that >>> admins/devs will connect to at port 22. If it is in a private subnet it >>> has no way to be public facing and cannot have a public IP assigned to it. >>> >>> V/R >>> Adrian Nu?ez >>> Senior Cloud Architect >>> NG Email: carlos.nunez2 at ngc.com >>> Bylight email: carlos.nunez at bylight.com >>> Cell: (571)230-5289 >>> >>> >>> >>> -----Original Message----- >>> From: Miller, Timothy J. >>> Sent: Wednesday, December 4, 2019 8:54 AM >>> To: Dean Lystra ; Kevin O'Donnell < >>> kodonnel at redhat.com> >>> Cc: platformONE at redhat.com; Feiglstok, Colleen M [US] (MS) < >>> Colleen.Feiglstok at ngc.com>; Mathew Huston ; Nunez, >>> Carlos A [US] (MS) (Contr) >>> Subject: EXT :Re: [EXT] Re: [Platformone] Riddle me this, Batman >>> (odd things in up-prod) >>> >>> Is that one up-prod-bastion? >>> >>> I'm putting an issue against platform-infrastructure. The >>> bastion is broken in a couple ways: >>> >>> - inbound SG rule defaults to `{{ cidr }}` address space, which >>> resolves out to the VPC addresses >>> - it's in the private subnet (probably doesn't matter, but helps >>> humans keep things straight) >>> - no public IP. >>> >>> -- T >>> >>> On 12/3/19, 16:34, "Dean Lystra" wrote: >>> >>> One bastion host was created for the sole purpose of >>> allowing access to the IdM CLI. This was done as a quick fix to get the >>> users created and for administrative purposes. Access to IdM via web >>> console or CLI is not available from the internet. >>> onetime is a mystery to me. >>> >>> On Tue, Dec 3, 2019, 2:15 PM Kevin O'Donnell < >>> kodonnel at redhat.com> wrote: >>> >>> >>> Bastion creation is iac, and the other ec2 that?s running in >>> prod is for acas and was created to scan and will be shutdown after the >>> scans are done >>> >>> >>> >>> >>> >>> >>> On Tue, Dec 3, 2019 at 3:34 PM Miller, Timothy J. < >>> tmiller at mitre.org> wrote: >>> >>> >>> - There are three bastion hosts (up-prod-bastion, >>> up-prod-ocp-bastion, and "onetime"). Of these, I can find only >>> up-prod-ocp-bastion in the IaC definition. Both up-prod-bastion and >>> "onetime" look like they were built separately ("onetime" is baselined on >>> CentOS--which is a giveaway--and up-prod-bastion is >>> attached to the `bastion-ssh` security group--which AFAICT is also not part >>> of the IaC). >>> >>> I recall someone (Dean?) telling me that there's no BH in >>> the IaC, but that's not true (see >>> consumers/up-node-infrastructure/environments/production/group_vars/all/ec2-instances.yml). >>> >>> - up-prod-openscap and up-prod-sso-server have a public IP >>> but its inbound rules permit only traffic from the VPC subnets ( >>> 10.40.0.0/16 ) and the up-ss-vpc gitlab-ci-runner >>> instance. >>> >>> - up-prod-openscap is attached to the up-prod-ocp-nodes SG, >>> which is doesn't seem right. That opens a bunch of ports that probably >>> don't matter to a scan host. >>> >>> - up-prod-sso-server has a public IP it doesn't need since >>> traffic is handled by up-prod-sso-elb. >>> >>> FWIW, public IPs are assigned to up-prod-bastion, >>> up-prod-openscap, up-prod-satellite, up-prod-sso-server, and "onetime". >>> The bastion host and openscap kinda make sense, though you can jump to >>> openscap from the BH. >>> >>> Damnfino what "onetime" is supposed to be. >>> >>> I'm not sure which of these or all of 'em should be turned >>> into issues. Comments? >>> >>> -- T >>> >>> >>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >>> >>> >>> >>> >>> -- >>> KEVIN O'DONNELL >>> ARCHITECT MANAGER >>> Red Hat Red Hat NA Public Sector Consulting < >>> https://www.redhat.com/> >>> >>> kodonnell at redhat.com >>> M: 240-605-4654 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > > > -- > > JONATHAN HULTZ, RHCSA > > SENIOR CONSULTANT > > Red Hat Remote US CA > > jhultz at redhat.com M: 609-713-9778 > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmiller at mitre.org Thu Dec 5 13:08:04 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Thu, 5 Dec 2019 13:08:04 +0000 Subject: [Platformone] Container runtime Message-ID: <112BEE93-7133-4B85-AB5F-54297F559326@mitre.org> Is P1 deployed with dockerd, or crio? I recall a Nic requirement to use crio. -- T? From jrickard at redhat.com Thu Dec 5 13:13:07 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Thu, 5 Dec 2019 08:13:07 -0500 Subject: [Platformone] Container runtime In-Reply-To: <112BEE93-7133-4B85-AB5F-54297F559326@mitre.org> References: <112BEE93-7133-4B85-AB5F-54297F559326@mitre.org> Message-ID: Tim, Currently deployed using docker - I don't think that I've heard of crio being a requirement at any time. Not being argumentative - I just haven't heard that. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Thu, Dec 5, 2019 at 8:08 AM Miller, Timothy J. wrote: > Is P1 deployed with dockerd, or crio? > > I recall a Nic requirement to use crio. > > -- T? > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmiller at mitre.org Thu Dec 5 14:02:05 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Thu, 5 Dec 2019 14:02:05 +0000 Subject: [Platformone] [EXT] Re: Container runtime In-Reply-To: <6775_1575551626_5DE9028A_6775_610_1_CAPHCx3KJsOies5UqrBka75BxZt7eByUpUvY0rG2eJxJiyoYSPQ@mail.gmail.com> References: <112BEE93-7133-4B85-AB5F-54297F559326@mitre.org> <6775_1575551626_5DE9028A_6775_610_1_CAPHCx3KJsOies5UqrBka75BxZt7eByUpUvY0rG2eJxJiyoYSPQ@mail.gmail.com> Message-ID: Ahh, I see crio's the default in 4.x. Good enough. Nothing to see here, carry on. :) -- T ?On 12/5/19, 07:14, "Jonathan Rickard" wrote: Tim, Currently deployed using docker - I don't think that I've heard of crio being a requirement at any time. Not being argumentative - I just haven't heard that. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Thu, Dec 5, 2019 at 8:08 AM Miller, Timothy J. wrote: Is P1 deployed with dockerd, or crio? I recall a Nic requirement to use crio. -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From Russell.Kendall at mantech.com Thu Dec 5 15:55:15 2019 From: Russell.Kendall at mantech.com (Kendall, Russell C) Date: Thu, 5 Dec 2019 15:55:15 +0000 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: References: <1574703796843.94672@ManTech.com> <1574705271520.5593@ManTech.com> <1574705967080.55870@ManTech.com> <1574706319228.45425@ManTech.com> <1574707851362.16520@ManTech.com> <1574713055636.69810@ManTech.com> <1574714851887.51023@ManTech.com> <1574869597001.92851@ManTech.com> <1575303080504.65964@ManTech.com> <1575311511650.96981@ManTech.com> <6277_1575312368_5DE55BEF_6277_803_1_CA+EGyAB6=xB9Na5Wf-nWY52dwt1200Rk8uh6krw1CXojEaO1PQ@mail.gmail.com> <7E65F8DF-F96B-4657-82A2-4A95715FAC0B@mitre.org> <5A89B24C-3C36-4205-B083-CBFBF71ED122@mitre.org> <3052516616bb44caa350c3c6897bc122@DCCPRDEXCH02.ManTech.com> <406B92B0-26F9-48FD-B533-549CC36B4FB2@mitre.org> <1575472568190.35402@ManTech.com>, Message-ID: <1575561314717.40064@ManTech.com> Jonny, I'll see you Friday at 500 Nav. Travel safe. V/R, Russell C Kendall? ________________________________ From: Jonathan Rickard Sent: Wednesday, December 4, 2019 5:29 PM To: Kendall, Russell C Cc: Miller, Timothy J.; Keegan Reap; Bubb, Mike; platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Russell, I have definitely been terrible with email lately and I apologize for the slow response times. I get back to San Antonio tomorrow but I have a pretty full afternoon. I can stop by Friday if you'd like. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C > wrote: Jonny, I'd like to suggest you come to 500 to wrap this up, since it seems there are significant delays in communication that are contributing to downtime. V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. > Sent: Wednesday, December 4, 2019 7:02 AM To: Jonathan Rickard; Keegan Reap Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Johnny-- Update the issue, if you would be so kind. -- T ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of Jonathan Rickard" on behalf of jrickard at redhat.com> wrote: Hey Guys - Sorry for taking so long - this has been completed. Please run your builds and let us know if you're having any problems. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard > wrote: Russell / Team, We believe we've identified the issue with your application deploying. In order to rectify the issue I need to evacuate pods so you will probably see some hiccups while deploying. I will update when this is resolved. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap > wrote: Hey all, we have opened an issue below, that we believe to be the cause, we are currently investigating: https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard > wrote: Russell, Getting more eyes on this @platformONE at redhat.com > We'll keep you posted. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C > wrote: Kevin, Unfortunately we are receiving deployment errors again. This is the event: 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. This is the deployment: https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. > Sent: Monday, December 2, 2019 2:44:21 PM To: Kevin O'Donnell Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors Tagged you on it. -- T On 12/2/19, 14:03, "Kevin O'Donnell" > wrote: Hello, Autoscaling is on our future IAC roadmap. Tim, the additional ticket would be appreciated. We have swapped out the app/worker instances with m5a.8xlarge 32 cores, 128gb of ram. Please let us know if you have any other issues. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. > wrote: I'll open an issue. IaC needs to have instance size as a host_var to facilitate scaling. -- T On 12/2/19, 13:15, "Kevin O'Donnell" > wrote: Tim, Thanks for the information. We are undersized on the app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb of ram. From what I have read each Labs engagement operated on a 3 node worker cluster with each node having 6core's and 28gb of ram. We will need to swap out the existing instances with larger spec's. We are going to try to flush the existing workload out on one of the workers to see if we can swap them out one at a time. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. > wrote: Here's what I can see, given the perm limits I seem to be under: - NS:develop-misp-app and NS:lp-develop-misp-app both have several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned while trying to fetch something from somewhere (URL isn't recorded in the stack trace). - NS:minishift-misp-app has most of its pods/jobs stuck in ImagePullBackoff. No detail there in the event stream so I'll see if I can dig deeper. - NS:aam-ci-cd has Jenkins trying to spin up three workers, those are coming back as unschedulable. I can't see into NS:aam-bases or NS:dsop-images b/c of perm limits. I see no DAS-related project(s). The MISP stuff needs debugging before calling "blocked" since it looks like an internal error from this perspective. In re: AAM Jenkins: If this deployment is coming out of the OCP storefront, then maybe it should be ephemeral rather than persistent. If it's a custom deployment, then it probably needs a rethink. I'm also not sure why there are two MISP dev projects. -- T On 12/2/19, 12:46, "Kevin O'Donnell" > wrote: Russell, Thank you for the information. We can switch out the instance type for the worker nodes. How much memory is required by the apps? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C > wrote: Kevin, The lack of resources on u-p.io cluster is hindering development, testing, and integration of the apps from CCAT AAM DAS, which is putting one of our PI goals at risk. We are blocked by the fact that we (CCAT and AAM) cannot deploy additional pods to the unified-platform.io cluster. We have a subset of containers deployed, but rolling deployments and new deployments fail. This means that we are not able to execute integration testing or peer reviews. We are temporarily working around by NOT testing/reviewing our code changes live, something that no one likes. Also, we are now running weeks-old instances of our containers, so we are very likely producing some technical debt. We currently have developers approaching idle or doing non-priority work until the resource issue is resolved. Here is the particular error from the OSP cluster I received while attempting a redeploy of one of our apps. 0/9 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node selector.11 times in the last minute Since we do not have any cluster permissions, I cannot verify which resource is running out, but from experience, I assess it is a memory issue. It appears the cluster has been provisioned with a silly allocation of node types. Without knowing exactly what was deployed, it appears only 3 of the 9 hosts are suitable worker nodes. We would expect the cluster to respond to resource limitations and scale, but if a scheduled downtime is required, please work with us so we can anticipate. As it stands, the cluster does not support resources required by CCAT and the other dev teams (AAM, DAS, etc.). We would accept any downtime if it will improve the situation, as we are blocked from progressing under the current constraints. My hope was we could get the cluster redeployed over the TG holiday to eliminate developer impact, but as Mark pointed out, there were limited support folks available. Now I am just trying to minimize the losses. V/R, Russell C Kendall ________________________________________ From: Kevin O'Donnell > Sent: Monday, December 2, 2019 11:52 AM To: Kendall, Russell C Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors Hello Russell, Can you elaborate on the term Blocked? What specific issues are the blockers? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C > wrote: Mark, Thank for acknowledging, please be aware the San Antonio dev teams working in unified-platform.io are currently blocked. V/R, Russell C Kendall ________________________________________ From: Mark Nissley > Sent: Monday, December 2, 2019 9:36 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors As noted, I don't suspect much got done on this over the holiday weekend. I did see the ticket, as dropped some details into it. I also assigned it to @Jonathan Rickard > and @Chris Kuperstein > . It looks like short term solutions have been easy but the issue is recurring. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 Scheduled Training: October 14-18 On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > wrote: Mark/Kevin, I just heard at the team stand up that we are still blocked. This is also affecting the AAM team from my investigations. Please let me know if there is something we need to do to move this forward. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Sent: Wednesday, November 27, 2019 12:58 PM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; Mark Nissley >; Kevin O'Donnell >; Brenna Gordon > Cc: Kendall, Russell C ; Bubb, Mike (mbubb at mitre.org) >; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP >; Miller, Timothy J. >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: Re: Unified Platform Pod Deploy Errors Thanks a lot Capt Bryan! Russell created the ticket on GitLab UP Node Project. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP > Sent: Wednesday, November 27, 2019 12:56 PM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >; Mark Nissley >; Kevin O'Donnell >; Brenna Gordon > Cc: Kendall, Russell C ; Bubb, Mike (mbubb at mitre.org) >; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP >; Miller, Timothy J. >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: RE: Unified Platform Pod Deploy Errors Thanks Ade. The team is thin until next week due to the holidays but I will make sure it is addressed. Were there any issues submitted to Gitlab?s UP Node Project on DCCSCR? @Mark/Kevin ? can we address? -Austen From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > Sent: Wednesday, November 27, 2019 9:51 AM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP > Cc: Kendall, Russell C ; Bubb, Mike (mbubb at mitre.org) > Subject: Fw: Unified Platform Pod Deploy Errors Capt Bryan, Please see the explanation on the issue that Ginyu Force is currently experiencing below. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: Kendall, Russell C Sent: Wednesday, November 27, 2019 9:46 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >; Buffaloe, Christopher ; Molina, Toby ; Crace, Jared E ; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP > Cc: tmiller at mitre.org > > Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy Errors Gentlemen, The application development teams working in the new GovCloud OCP environment (unified-platform.io ) are currently blocked in efforts to deploy new pods for testing, development, and UAT. Red Hat and RogueOne SMEs have been notified and have attempted some fixes starting on Monday 11/25, but at this point have not been able to provision resources sufficient to host CCAT and AAM. We have taken steps to minimize our footprint (eliminating demonstration environment, deleting developer namespaces), but this is not a sustainable approach, and has only resulted in moderate improvements in cluster performance. Our hope is the U-P.io cluster compute resources can be increased very soon, so that we may resume normal development activities. Our understanding is that such a scaling requires a complete redeployment of the cluster, which is unusual, but an acceptable loss to productivity. If the cluster can be scaled up over the Thanksgiving holiday, the impact will be minimal to developers and cluster administrators, alike. We are currently collaborating on solutions on the following MatterMost channel behind the space camp VPN (link below), and via the email thread forwarded (further below). https://chat.spacecamp.ninja/levelup/channels/unified-platform-node Please keep me posted on developments and I will coordinate developer activities with any scheduled platform outages. V/R, Russell C Kendall ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 2:47 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Sounds great. Appreciate it. I'll watch email and Mattermost in case you need more from us. -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 2:44 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Thanks Daniel - I'll continue to look into the resource issue that you're seeing - I'd like to identify the root cause and then work with the team to come up with a solution. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M > wrote: Yeah we hit the limit then had AAM kill some of their projects and then our pods got scheduled. We've hit the limit again though. Here's an example pod that cannot be scheduled https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth They're seeing it when their jenkins slaves can't deploy but it's basically any pod after we hit some limit. -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 1:26 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Daniel, I can see that you have 3 mongo pods, 1 chatup and 1 upbot pod running ... is your app good to go? Looks like there was an issue with memory on 1 pod, then some node selector being mismatched - just what i could see in the events... Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M > wrote: Also, AAM was having similar issues. Looks like they had a lot of namespaces and scaling down the pods on their deployments didn't help but actually deleting the namespaces did. We have pods scheduling now but I'm adding them and we'd still like to work through what resource limit we were hitting to avoid this in the future. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:25 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors Thanks, sir. Most important for us to get working is "ccat-demo" but it's also happening in "ccat-dev" and "ccat-ci-cd". -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 12:22 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors What's the name of the project you're working in? I'm going to be back at my laptop in about 30 and will take a look when I get there. Is it just the Jenkins pods failing? On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M > wrote: Adding Dean and Alex. Also, sitting in mattermost if anyone needs to get online and chat for more information. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:07 PM To: jonny at redhat.com >; ckuperst at redhat.com >; Mark Nissley Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Re: Unified Platform Pod Deploy Errors Adding Kupe and Mark. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 11:43 AM To: jonny at redhat.com > Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Unified Platform Pod Deploy Errors Hey Jonny, We met briefly at SpaceCAMP a couple weeks ago when cluster.unified-platform.io was stood up. We've been trying to deploy some apps today and so far today we're getting errors on most (if not all) of our pods. 0/9 nodes are available: 3 Insufficient pods, 6 node(s) didn't match node selector. Is what we're seeing. We were thinking it was some volume types weren't correct but some of our pods don't even have volumes attached and still give us this error (i.e. Jenkins slaves or web frontends without persistent storage). Any idea what this could be? We're not running out of space on the nodes themselves are we? We have a demo scheduled for tomorrow at 9:30 AM CST and are hoping to get a demo env up for them today but this error came up unexpectedly. Also, we're here at 500 Navarro St. in San Antonio working through this in person is better/easier. Thanks, Daniel Curran ________________________________________ This e-mail and any attachments are intended only for the use of the addressee(s) named herein and may contain proprietary information. If you are not the intended recipient of this e-mail or believe that you received this email in error, please take immediate action to notify the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments from your computer; and do not disseminate, distribute, use, or copy this message and any attachments. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -------------- next part -------------- An HTML attachment was scrubbed... URL: From ademola.abodunrin at us.af.mil Thu Dec 5 19:00:16 2019 From: ademola.abodunrin at us.af.mil (ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP) Date: Thu, 5 Dec 2019 19:00:16 +0000 Subject: [Platformone] [Non-DoD Source] Re: EXT :Re: OpenShift Questions In-Reply-To: References: <0db5dd4e86c24c69ace02da1309ccb22@XCGC3021.northgrum.com> <1574192134493.86386@ManTech.com> Message-ID: Looks like this is still hanging. We added comments to the ticket but yet to receive a response. Please help as we are trying to make sure that all is ready for the demo on 12/10/2019. We have a dry run tomorrow at 0945 CST. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil From: Khary Mendez Sent: Tuesday, December 3, 2019 1:06 PM To: Mike Knoth Cc: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP ; platformONE at redhat.com; Blade, Eric D [US] (MS) ; McKay, Brent [US] (MS) (Contr) ; Marc Cooper ; Walter Steins Subject: Re: [Platformone] [Non-DoD Source] Re: EXT :Re: OpenShift Questions Thanks Mike - I just added a comment to your ticket with a preferred path forward along with a less preferred option. Khary A. Mendez, RHCA (150-047-298) Senior Principal Consultant Red Hat Public Sector khary at redhat.com M: (240)888-9170 On Tue, Dec 3, 2019 at 1:52 PM Mike Knoth > wrote: yes here is the ticket - https://dccscr.dsop.io/dsop/dccscr/issues/195 On Tue, Dec 3, 2019 at 1:43 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > wrote: Good afternoon All, Please assist us with the problem below. The team has logged a ticket in the GitLab as well. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil _____ From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Sent: Friday, November 22, 2019 1:50 PM To: Mike Knoth >; Kendall, Russell C >; Walter Steins >; Blade, Eric D [US] (MS) > Cc: McKay, Brent [US] (MS) (Contr) >; Marc Cooper > Subject: RE: [Non-DoD Source] Re: EXT :Re: OpenShift Questions Good afternoon Walter/Eric, Please who is able to assist us with Mike?s concern below? Thanks for your help! Most Sincerely, Ade Abodunrin, GG-12, USAF Acquisition Program Manager LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil From: Mike Knoth > Sent: Wednesday, November 20, 2019 10:22 AM To: Kendall, Russell C > Cc: Walter Steins >; Blade, Eric D [US] (MS) >; McKay, Brent [US] (MS) (Contr) >; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >; Marc Cooper > Subject: [Non-DoD Source] Re: EXT :Re: OpenShift Questions Thanks I got a lot closer now, with some components being deployed. I'm getting some errors unique to this Openshift though. The below is something I have in my YAML file, for several of the components. securityContext: fsGroup: 11111 runAsUser: 11111 With the "runAsUser", Openshift would say: Error creating: pods "openam-1-" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{11111}: 11111 is not an allowed group spec.initContainers[0].securityContext.securityContext.runAsUser: Invalid value: 11111: must be in the ranges: [1000910000, 1000919999] I fixed that by making the "runAsUser" 1000911111 instead, though I'm not sure what affects that will have once everything is running. And then for the group, it says: Error creating: pods "openig-1-" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{11111}: 11111 is not an allowed group] I tried changing this "fsGroup" to 1000911111 but that also fails. So I'm not sure what to put in this value. Do you know how you can make your policy less restrictive, or how I could make the policy less restrictive, to fix the above? On Tue, Nov 19, 2019 at 2:35 PM Kendall, Russell C > wrote: Mike, Here's the URL for the registry: https://docker-registry-default.apps.cluster.unified-platform.io I'm not sure how you deploy your pipeline and apps, but our Ansible scripts take care of creating the namespaces (projects) for us. For example, you may deploy your projects stored locally via oc new-app /path/to/project There are a number of existing projects, you just don't have visibility. Mr. Steins is responsible for assigning roles and is figuring out group memberships that will allow you to control access to your projects by groups instead of by individual. In the meantime you'll need to add each user to each project. V/R, Russell C Kendall _____ From: Mike Knoth > Sent: Tuesday, November 19, 2019 12:35 PM To: Walter Steins Cc: Blade, Eric D [US] (MS); McKay, Brent [US] (MS) (Contr); Kendall, Russell C Subject: Re: EXT :Re: OpenShift Questions Yes I'm logged on openshift right now. And I'm logged on the OC console. But I'm a bit stuck until I can figure out how to docker login, as something like this does not work: docker login -u $(oc whoami) -p $(oc whoami -t) docker-registry-default.unified-platform.io And I'm also stuck until this can show my project which I can deploy to: UrsaMajor:up mike.knoth$ oc projects You have one project on this server: "dsop-images". On Tue, Nov 19, 2019 at 1:33 PM Walter Steins > wrote: Eric, All of the requested accounts were created. Walter ?Wally? Steins Cloud Engineer m: 210.383.9227 | walter.steins at bylight.com By Light Professional IT Services LLC 8484 Westpark Drive Suite 600 McLean VA 22102 f: 703.778.7835 | www.bylight.com From: Blade, Eric D [US] (MS) > Sent: Tuesday, November 19, 2019 12:32 PM To: 'Mike Knoth' >; McKay, Brent [US] (MS) (Contr) > Cc: Kendall, Russell C >; Walter Steins > Subject: RE: EXT :Re: OpenShift Questions [EXTERNAL EMAIL] Mike, This is deployed as a ?production cluster?, so there is no development capabilities. Just an OpenShift environment for running the apps. You will need to get your Openshift account created if it was not done so already. Wally Stein (CC?d) can do that for you. After that my knowledge runs thin. Russell was able to get their app deployed via the OpenShift console. Thanks Eric From: Mike Knoth > Sent: Tuesday, November 19, 2019 1:27 PM To: McKay, Brent [US] (MS) (Contr) > Cc: Kendall, Russell C >; Blade, Eric D [US] (MS) > Subject: EXT :Re: OpenShift Questions Russell/Eric, Hi - do either of you know how I can login to docker from my local macbook? (to the openshift on https://cluster.unified-platform.io/) I was going to use the "bastion" box (52.222.26.122) to do development on, but that doesn't even have git on it. So I guess I have to use my macbook. Also do you know who can create new openshift projects for me on https://cluster.unified-platform.io/? On Tue, Nov 19, 2019 at 1:23 PM McKay, Brent [US] (MS) (Contr) > wrote: Russell/Eric, Mike Knoth(cc?d) approached me regarding the OpenShift deployment I understand the two of you stood up last week while at SpaceCAMP. I believe he was instructed to deploy DAS on said cluster. I wanted to get him in contact with the two of you so he can get his questions to the individuals in the know. Thanks, Brent -- Mike Knoth Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division 302 Sentinel Drive | Annapolis Junction, MD 20701 Email: mike.knoth at g2-inc.com Mobile: (320) 305-6453 Confidentiality Statement: HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains information proprietary or private to Huntington Ingalls Industries, Inc., and is not to be disclosed to, copied by, or used in any manner by others without the prior express, written permission. If you are not the intended recipient, please delete without copying and kindly advise the sender by e-mail of the mistake in delivery. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -- Mike Knoth Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division 302 Sentinel Drive | Annapolis Junction, MD 20701 Email: mike.knoth at g2-inc.com Mobile: (320) 305-6453 Confidentiality Statement: HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains information proprietary or private to Huntington Ingalls Industries, Inc., and is not to be disclosed to, copied by, or used in any manner by others without the prior express, written permission. If you are not the intended recipient, please delete without copying and kindly advise the sender by e-mail of the mistake in delivery. _____ This e-mail and any attachments are intended only for the use of the addressee(s) named herein and may contain proprietary information. If you are not the intended recipient of this e-mail or believe that you received this email in error, please take immediate action to notify the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments from your computer; and do not disseminate, distribute, use, or copy this message and any attachments. -- Mike Knoth Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division 302 Sentinel Drive | Annapolis Junction, MD 20701 Email: mike.knoth at g2-inc.com Mobile: (320) 305-6453 Confidentiality Statement: HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains information proprietary or private to Huntington Ingalls Industries, Inc., and is not to be disclosed to, copied by, or used in any manner by others without the prior express, written permission. If you are not the intended recipient, please delete without copying and kindly advise the sender by e-mail of the mistake in delivery. -- Mike Knoth Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division 302 Sentinel Drive | Annapolis Junction, MD 20701 Email: mike.knoth at g2-inc.com Mobile: (320) 305-6453 Confidentiality Statement: HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains information proprietary or private to Huntington Ingalls Industries, Inc., and is not to be disclosed to, copied by, or used in any manner by others without the prior express, written permission. If you are not the intended recipient, please delete without copying and kindly advise the sender by e-mail of the mistake in delivery. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5490 bytes Desc: not available URL: From jrickard at redhat.com Thu Dec 5 19:11:17 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Thu, 5 Dec 2019 13:11:17 -0600 Subject: [Platformone] [Non-DoD Source] Re: EXT :Re: OpenShift Questions In-Reply-To: References: <0db5dd4e86c24c69ace02da1309ccb22@XCGC3021.northgrum.com> <1574192134493.86386@ManTech.com> Message-ID: Just updated the issue - give it a shot now. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Thu, Dec 5, 2019 at 1:00 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP wrote: > Looks like this is still hanging. We added comments to the ticket but yet > to receive a response. > > Please help as we are trying to make sure that all is ready for the demo > on 12/10/2019. We have a dry run tomorrow at 0945 CST. > > > > Most Sincerely, > > > > Ade Abodunrin, GG-12, USAF > > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > > [image: cid:image001.png at 01D4F814.4AA552D0] > > LevelUP Code Works > > Commercial: (210) 890-2113 > > NIPR email: *ademola.abodunrin at us.af.mil * > > > > *From:* Khary Mendez > *Sent:* Tuesday, December 3, 2019 1:06 PM > *To:* Mike Knoth > *Cc:* ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < > ademola.abodunrin at us.af.mil>; platformONE at redhat.com; Blade, Eric D [US] > (MS) ; McKay, Brent [US] (MS) (Contr) < > Brent.McKay at ngc.com>; Marc Cooper ; Walter Steins > > *Subject:* Re: [Platformone] [Non-DoD Source] Re: EXT :Re: OpenShift > Questions > > > > Thanks Mike - I just added a comment to your ticket with a preferred path > forward along with a less preferred option. > > > *Khary A. Mendez**, RHCA (150-047-298)* > > Senior Principal Consultant > > Red Hat Public Sector > > khary at redhat.com > M: (240)888-9170 > > > > > > > > On Tue, Dec 3, 2019 at 1:52 PM Mike Knoth wrote: > > yes here is the ticket - https://dccscr.dsop.io/dsop/dccscr/issues/195 > > > > On Tue, Dec 3, 2019 at 1:43 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC > AFLCMC/HNCP wrote: > > Good afternoon All, > > > > Please assist us with the problem below. The team has logged a ticket in > the GitLab as well. > > > > > Most Sincerely, > > > > Ade Abodunrin, GG-12, USAF > > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > > LevelUP Code Works > > Commercial: (210) 890-2113 > > NIPR email: *ademola.abodunrin at us.af.mil * > > > > > > > ------------------------------ > > *From:* ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > *Sent:* Friday, November 22, 2019 1:50 PM > *To:* Mike Knoth ; Kendall, Russell C < > Russell.Kendall at mantech.com>; Walter Steins ; > Blade, Eric D [US] (MS) > *Cc:* McKay, Brent [US] (MS) (Contr) ; Marc Cooper < > marc.cooper at g2-inc.com> > *Subject:* RE: [Non-DoD Source] Re: EXT :Re: OpenShift Questions > > > > Good afternoon Walter/Eric, > > > > Please who is able to assist us with Mike?s concern below? > > > > Thanks for your help! > > > > Most Sincerely, > > > > Ade Abodunrin, GG-12, USAF > > Acquisition Program Manager > > > > LevelUP Code Works > > > > Commercial: (210) 890-2113 > > NIPR email: *ademola.abodunrin at us.af.mil * > > > > *From:* Mike Knoth > *Sent:* Wednesday, November 20, 2019 10:22 AM > *To:* Kendall, Russell C > *Cc:* Walter Steins ; Blade, Eric D [US] (MS) < > Eric.Blade at ngc.com>; McKay, Brent [US] (MS) (Contr) ; > ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < > ademola.abodunrin at us.af.mil>; Marc Cooper > *Subject:* [Non-DoD Source] Re: EXT :Re: OpenShift Questions > > > > Thanks I got a lot closer now, with some components being deployed. I'm > getting some errors unique to this Openshift though. The below is something > I have in my YAML file, for several of the components. > > > > securityContext: > fsGroup: 11111 > runAsUser: 11111 > > > > With the "runAsUser", Openshift would say: > > Error creating: pods "openam-1-" is forbidden: unable to validate against > any security context constraint: [fsGroup: Invalid value: []int64{11111}: > 11111 is not an allowed group spec.initContainers[0].securityContext.securityContext.runAsUser: > Invalid value: 11111: must be in the ranges: [1000910000, 1000919999] > > > > I fixed that by making the "runAsUser" 1000911111 instead, though I'm not > sure what affects that will have once everything is running. > > > > And then for the group, it says: > > Error creating: pods "openig-1-" is forbidden: unable to validate against > any security context constraint: [fsGroup: Invalid value: []int64{11111}: > 11111 is not an allowed group] > > > > I tried changing this "fsGroup" to 1000911111 but that also fails. So I'm > not sure what to put in this value. > > > > *Do you know how you can make your policy less restrictive, or how I could > make the policy less restrictive, to fix the above?* > > > > > > > > > > > > > > On Tue, Nov 19, 2019 at 2:35 PM Kendall, Russell C < > Russell.Kendall at mantech.com> wrote: > > Mike, > > Here's the URL for the registry: > > https://docker-registry-default.apps.cluster.unified-platform.io > > > > > I'm not sure how you deploy your pipeline and apps, but our Ansible > scripts take care of creating the namespaces (projects) for us. For > example, you may deploy your projects stored locally via oc new-app > /path/to/project > > > > There are a number of existing projects, you just don't have > visibility. Mr. Steins is responsible for assigning roles and is figuring > out group memberships that will allow you to control access to your > projects by groups instead of by individual. In the meantime you'll need to > add each user to each project. > > > > V/R, > > Russell C Kendall > ------------------------------ > > *From:* Mike Knoth > *Sent:* Tuesday, November 19, 2019 12:35 PM > *To:* Walter Steins > *Cc:* Blade, Eric D [US] (MS); McKay, Brent [US] (MS) (Contr); Kendall, > Russell C > *Subject:* Re: EXT :Re: OpenShift Questions > > > > Yes I'm logged on openshift right now. And I'm logged on the OC console. *But > I'm a bit stuck until I can figure out how to docker login*, as something > like this does not work: > > > > docker login -u $(oc whoami) -p $(oc whoami -t) > docker-registry-default.unified-platform.io > > > > > > And *I'm also stuck until this can show my project which I can deploy to:* > > > > UrsaMajor:up mike.knoth$ oc projects > You have one project on this server: "dsop-images". > > > > > > > > On Tue, Nov 19, 2019 at 1:33 PM Walter Steins > wrote: > > Eric, > > > > All of the requested accounts were created. > > > > > > *Walter ?Wally? Steins* > > Cloud Engineer > > m: 210.383.9227 | *walter.steins at bylight.com * > > *By Light Professional IT Services LLC* > 8484 Westpark Drive Suite 600 McLean VA 22102 > f: 703.778.7835 | *www.bylight.com * > > > > > > > > *From:* Blade, Eric D [US] (MS) > *Sent:* Tuesday, November 19, 2019 12:32 PM > *To:* 'Mike Knoth' ; McKay, Brent [US] (MS) > (Contr) > *Cc:* Kendall, Russell C ; Walter Steins < > walter.steins at bylight.com> > *Subject:* RE: EXT :Re: OpenShift Questions > > > > [EXTERNAL EMAIL] > > Mike, > > This is deployed as a ?production cluster?, so there is no development > capabilities. Just an OpenShift environment for running the apps. > > > > You will need to get your Openshift account created if it was not done so > already. Wally Stein (CC?d) can do that for you. After that my knowledge > runs thin. Russell was able to get their app deployed via the OpenShift > console. > > > > Thanks > > > > Eric > > > > > > *From:* Mike Knoth > *Sent:* Tuesday, November 19, 2019 1:27 PM > *To:* McKay, Brent [US] (MS) (Contr) > *Cc:* Kendall, Russell C ; Blade, Eric D > [US] (MS) > *Subject:* EXT :Re: OpenShift Questions > > > > Russell/Eric, > > > > Hi - do either of you know how I can login to docker from my local > macbook? (to the openshift on https://cluster.unified-platform.io/) > > > > I was going to use the "bastion" box (52.222.26.122) to do development > on, but that doesn't even have git on it. So I guess I have to use my > macbook. > > > > Also do you know who can create new openshift projects for me on > https://cluster.unified-platform.io/? > > > > On Tue, Nov 19, 2019 at 1:23 PM McKay, Brent [US] (MS) (Contr) < > Brent.McKay at ngc.com> wrote: > > Russell/Eric, > > > > Mike Knoth(cc?d) approached me regarding the OpenShift deployment I > understand the two of you stood up last week while at SpaceCAMP. I believe > he was instructed to deploy DAS on said cluster. I wanted to get him in > contact with the two of you so he can get his questions to the individuals > in the know. Thanks, > > > > Brent > > > > > -- > > Mike Knoth > > Software Engineer > > HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. > > Technical Solutions Division > > 302 Sentinel Drive | Annapolis Junction, MD 20701 > > Email: mike.knoth at g2-inc.com > > Mobile: (320) 305-6453 > > > > Confidentiality Statement: > > HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains > information proprietary or private to Huntington Ingalls Industries, Inc., > and is not to be disclosed to, copied by, or used in any manner by others > without the prior express, written permission. If you are not the intended > recipient, please delete without copying and kindly advise the sender by > e-mail of the mistake in delivery. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > > > > -- > > Mike Knoth > > Software Engineer > > HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. > > Technical Solutions Division > > 302 Sentinel Drive | Annapolis Junction, MD 20701 > > Email: mike.knoth at g2-inc.com > > Mobile: (320) 305-6453 > > > > Confidentiality Statement: > > HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains > information proprietary or private to Huntington Ingalls Industries, Inc., > and is not to be disclosed to, copied by, or used in any manner by others > without the prior express, written permission. If you are not the intended > recipient, please delete without copying and kindly advise the sender by > e-mail of the mistake in delivery. > > > ------------------------------ > > > This e-mail and any attachments are intended only for the use of the > addressee(s) named herein and may contain proprietary information. If you > are not the intended recipient of this e-mail or believe that you received > this email in error, please take immediate action to notify the sender of > the apparent error by reply e-mail; permanently delete the e-mail and any > attachments from your computer; and do not disseminate, distribute, use, or > copy this message and any attachments. > > > > > -- > > Mike Knoth > > Software Engineer > > HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. > > Technical Solutions Division > > 302 Sentinel Drive | Annapolis Junction, MD 20701 > > Email: mike.knoth at g2-inc.com > > Mobile: (320) 305-6453 > > > > Confidentiality Statement: > > HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains > information proprietary or private to Huntington Ingalls Industries, Inc., > and is not to be disclosed to, copied by, or used in any manner by others > without the prior express, written permission. If you are not the intended > recipient, please delete without copying and kindly advise the sender by > e-mail of the mistake in delivery. > > > > > -- > > Mike Knoth > > Software Engineer > > HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. > > Technical Solutions Division > > 302 Sentinel Drive | Annapolis Junction, MD 20701 > > Email: mike.knoth at g2-inc.com > > Mobile: (320) 305-6453 > > > > Confidentiality Statement: > > HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains > information proprietary or private to Huntington Ingalls Industries, Inc., > and is not to be disclosed to, copied by, or used in any manner by others > without the prior express, written permission. If you are not the intended > recipient, please delete without copying and kindly advise the sender by > e-mail of the mistake in delivery. > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: From jrickard at redhat.com Thu Dec 5 23:21:08 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Thu, 5 Dec 2019 17:21:08 -0600 Subject: [Platformone] AWS Network Issue Message-ID: Adrian / Andrew, Are you guys aware of any changes to connectivity between prod and dev/staging VPCs? We're just looking into it now, but we figured we'd throw it out there and ask before we get too deep. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrian.nunez at bylight.com Thu Dec 5 23:49:35 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Thu, 5 Dec 2019 23:49:35 +0000 Subject: [Platformone] AWS Network Issue In-Reply-To: References: Message-ID: I didn't touch the VPC peering connections. I saw you guys had 2 yesterday. Get Outlook for Android ________________________________ From: Jonathan Rickard Sent: Thursday, December 5, 2019 6:21:08 PM To: platformONE at redhat.com ; Nunez, Carlos A [US] (MS) (Contr) ; Adrian Nunez Subject: AWS Network Issue [EXTERNAL EMAIL] Adrian / Andrew, Are you guys aware of any changes to connectivity between prod and dev/staging VPCs? We're just looking into it now, but we figured we'd throw it out there and ask before we get too deep. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Thu Dec 5 23:51:32 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Thu, 5 Dec 2019 17:51:32 -0600 Subject: [Platformone] AWS Network Issue In-Reply-To: References: Message-ID: Yah, the peering connections still look good - but we can't ssh from production to dev/staging environments. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez wrote: > I didn't touch the VPC peering connections. I saw you guys had 2 > yesterday. > > Get Outlook for Android > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:21:08 PM > *To:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) ; Adrian Nunez < > adrian.nunez at bylight.com> > *Subject:* AWS Network Issue > > > [EXTERNAL EMAIL] > Adrian / Andrew, > > Are you guys aware of any changes to connectivity between prod and > dev/staging VPCs? We're just looking into it now, but we figured we'd throw > it out there and ask before we get too deep. > > Thanks, > jonny > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrian.nunez at bylight.com Thu Dec 5 23:55:05 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Thu, 5 Dec 2019 23:55:05 +0000 Subject: [Platformone] AWS Network Issue In-Reply-To: References: , Message-ID: Check security groups and IP addresses. There was talk about whitelisting IPs yesterday. Which VPCs are you trying to SSH into? I can go take a look. Get Outlook for Android ________________________________ From: Jonathan Rickard Sent: Thursday, December 5, 2019 6:51:32 PM To: Adrian Nunez Cc: platformONE at redhat.com ; Nunez, Carlos A [US] (MS) (Contr) Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Yah, the peering connections still look good - but we can't ssh from production to dev/staging environments. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez > wrote: I didn't touch the VPC peering connections. I saw you guys had 2 yesterday. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:21:08 PM To: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) >; Adrian Nunez > Subject: AWS Network Issue [EXTERNAL EMAIL] Adrian / Andrew, Are you guys aware of any changes to connectivity between prod and dev/staging VPCs? We're just looking into it now, but we figured we'd throw it out there and ask before we get too deep. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Thu Dec 5 23:59:34 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Thu, 5 Dec 2019 17:59:34 -0600 Subject: [Platformone] AWS Network Issue In-Reply-To: References: Message-ID: Yep, we checked the security groups - even tried to open it up 0.0.0.0 to test and got nothing. I checked the peering and that looks good, I also checked the route tables and from what I can tell it all looks right. Trying to go from production-vpc to dev-up-vpc and staging-up-vpc (production to staging is most important ATM) .. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Thu, Dec 5, 2019 at 5:55 PM Adrian Nunez wrote: > Check security groups and IP addresses. > There was talk about whitelisting IPs yesterday. > > Which VPCs are you trying to SSH into? I can go take a look. > > > > Get Outlook for Android > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:51:32 PM > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > Yah, the peering connections still look good - but we can't ssh from > production to dev/staging environments. > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez > wrote: > > I didn't touch the VPC peering connections. I saw you guys had 2 > yesterday. > > Get Outlook for Android > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:21:08 PM > *To:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) ; Adrian Nunez < > adrian.nunez at bylight.com> > *Subject:* AWS Network Issue > > > [EXTERNAL EMAIL] > Adrian / Andrew, > > Are you guys aware of any changes to connectivity between prod and > dev/staging VPCs? We're just looking into it now, but we figured we'd throw > it out there and ask before we get too deep. > > Thanks, > jonny > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrian.nunez at bylight.com Fri Dec 6 00:01:54 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Fri, 6 Dec 2019 00:01:54 +0000 Subject: [Platformone] AWS Network Issue In-Reply-To: References: , Message-ID: Ok. I'll take a look in about 10. Eating dinner. Get Outlook for Android ________________________________ From: Jonathan Rickard Sent: Thursday, December 5, 2019 6:59:34 PM To: Adrian Nunez Cc: platformONE at redhat.com ; Nunez, Carlos A [US] (MS) (Contr) Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Yep, we checked the security groups - even tried to open it up 0.0.0.0 to test and got nothing. I checked the peering and that looks good, I also checked the route tables and from what I can tell it all looks right. Trying to go from production-vpc to dev-up-vpc and staging-up-vpc (production to staging is most important ATM) .. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 5:55 PM Adrian Nunez > wrote: Check security groups and IP addresses. There was talk about whitelisting IPs yesterday. Which VPCs are you trying to SSH into? I can go take a look. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:51:32 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Yah, the peering connections still look good - but we can't ssh from production to dev/staging environments. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez > wrote: I didn't touch the VPC peering connections. I saw you guys had 2 yesterday. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:21:08 PM To: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) >; Adrian Nunez > Subject: AWS Network Issue [EXTERNAL EMAIL] Adrian / Andrew, Are you guys aware of any changes to connectivity between prod and dev/staging VPCs? We're just looking into it now, but we figured we'd throw it out there and ask before we get too deep. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Fri Dec 6 00:07:17 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Thu, 5 Dec 2019 18:07:17 -0600 Subject: [Platformone] AWS Network Issue In-Reply-To: References: Message-ID: Sounds good - thanks Adrian ... I'm trying to figure out the syntax to query against a log group to see if I can catch the reason for the drop ... Even running a curl against the port is timing out (curl -v :22 ) ... Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Thu, Dec 5, 2019 at 6:02 PM Adrian Nunez wrote: > Ok. I'll take a look in about 10. Eating dinner. > > Get Outlook for Android > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:59:34 PM > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > Yep, we checked the security groups - even tried to open it up 0.0.0.0 to > test and got nothing. I checked the peering and that looks good, I also > checked the route tables and from what I can tell it all looks right. > > Trying to go from production-vpc to dev-up-vpc and staging-up-vpc > (production to staging is most important ATM) .. > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 5:55 PM Adrian Nunez > wrote: > > Check security groups and IP addresses. > There was talk about whitelisting IPs yesterday. > > Which VPCs are you trying to SSH into? I can go take a look. > > > > Get Outlook for Android > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:51:32 PM > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > Yah, the peering connections still look good - but we can't ssh from > production to dev/staging environments. > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez > wrote: > > I didn't touch the VPC peering connections. I saw you guys had 2 > yesterday. > > Get Outlook for Android > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:21:08 PM > *To:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) ; Adrian Nunez < > adrian.nunez at bylight.com> > *Subject:* AWS Network Issue > > > [EXTERNAL EMAIL] > Adrian / Andrew, > > Are you guys aware of any changes to connectivity between prod and > dev/staging VPCs? We're just looking into it now, but we figured we'd throw > it out there and ask before we get too deep. > > Thanks, > jonny > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrian.nunez at bylight.com Fri Dec 6 00:13:12 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Fri, 6 Dec 2019 00:13:12 +0000 Subject: [Platformone] AWS Network Issue In-Reply-To: References: , Message-ID: Try it now. Your peering connection didn't have subnets associated with it. Get Outlook for Android ________________________________ From: Jonathan Rickard Sent: Thursday, December 5, 2019 7:07:17 PM To: Adrian Nunez Cc: platformONE at redhat.com ; Nunez, Carlos A [US] (MS) (Contr) Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Sounds good - thanks Adrian ... I'm trying to figure out the syntax to query against a log group to see if I can catch the reason for the drop ... Even running a curl against the port is timing out (curl -v :22 ) ... Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 6:02 PM Adrian Nunez > wrote: Ok. I'll take a look in about 10. Eating dinner. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:59:34 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Yep, we checked the security groups - even tried to open it up 0.0.0.0 to test and got nothing. I checked the peering and that looks good, I also checked the route tables and from what I can tell it all looks right. Trying to go from production-vpc to dev-up-vpc and staging-up-vpc (production to staging is most important ATM) .. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 5:55 PM Adrian Nunez > wrote: Check security groups and IP addresses. There was talk about whitelisting IPs yesterday. Which VPCs are you trying to SSH into? I can go take a look. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:51:32 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Yah, the peering connections still look good - but we can't ssh from production to dev/staging environments. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez > wrote: I didn't touch the VPC peering connections. I saw you guys had 2 yesterday. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:21:08 PM To: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) >; Adrian Nunez > Subject: AWS Network Issue [EXTERNAL EMAIL] Adrian / Andrew, Are you guys aware of any changes to connectivity between prod and dev/staging VPCs? We're just looking into it now, but we figured we'd throw it out there and ask before we get too deep. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Fri Dec 6 00:29:35 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Thu, 5 Dec 2019 18:29:35 -0600 Subject: [Platformone] AWS Network Issue In-Reply-To: References: Message-ID: Adrian, I think that made it worse - now I can't connect to the gitlab in production vpc (SSH or HTTPS). Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Thu, Dec 5, 2019 at 6:13 PM Adrian Nunez wrote: > Try it now. Your peering connection didn't have subnets associated with > it. > > Get Outlook for Android > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 7:07:17 PM > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > Sounds good - thanks Adrian ... I'm trying to figure out the syntax to > query against a log group to see if I can catch the reason for the drop ... > > Even running a curl against the port is timing out (curl -v :22 ) ... > > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 6:02 PM Adrian Nunez > wrote: > > Ok. I'll take a look in about 10. Eating dinner. > > Get Outlook for Android > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:59:34 PM > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > Yep, we checked the security groups - even tried to open it up 0.0.0.0 to > test and got nothing. I checked the peering and that looks good, I also > checked the route tables and from what I can tell it all looks right. > > Trying to go from production-vpc to dev-up-vpc and staging-up-vpc > (production to staging is most important ATM) .. > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 5:55 PM Adrian Nunez > wrote: > > Check security groups and IP addresses. > There was talk about whitelisting IPs yesterday. > > Which VPCs are you trying to SSH into? I can go take a look. > > > > Get Outlook for Android > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:51:32 PM > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > Yah, the peering connections still look good - but we can't ssh from > production to dev/staging environments. > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez > wrote: > > I didn't touch the VPC peering connections. I saw you guys had 2 > yesterday. > > Get Outlook for Android > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:21:08 PM > *To:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) ; Adrian Nunez < > adrian.nunez at bylight.com> > *Subject:* AWS Network Issue > > > [EXTERNAL EMAIL] > Adrian / Andrew, > > Are you guys aware of any changes to connectivity between prod and > dev/staging VPCs? We're just looking into it now, but we figured we'd throw > it out there and ask before we get too deep. > > Thanks, > jonny > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrian.nunez at bylight.com Fri Dec 6 00:36:19 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Fri, 6 Dec 2019 00:36:19 +0000 Subject: [Platformone] AWS Network Issue In-Reply-To: References: , Message-ID: You have quad zeros on the Gitlab for 22 and 443. You should be able to SSH and HTTPS in from anywhere. [cid:45c04363-918f-4af3-8f3e-7bc74e7e303e] ________________________________ From: Jonathan Rickard Sent: Thursday, December 5, 2019 7:29 PM To: Adrian Nunez Cc: platformONE at redhat.com ; Nunez, Carlos A [US] (MS) (Contr) Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Adrian, I think that made it worse - now I can't connect to the gitlab in production vpc (SSH or HTTPS). Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 6:13 PM Adrian Nunez > wrote: Try it now. Your peering connection didn't have subnets associated with it. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 7:07:17 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Sounds good - thanks Adrian ... I'm trying to figure out the syntax to query against a log group to see if I can catch the reason for the drop ... Even running a curl against the port is timing out (curl -v :22 ) ... Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 6:02 PM Adrian Nunez > wrote: Ok. I'll take a look in about 10. Eating dinner. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:59:34 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Yep, we checked the security groups - even tried to open it up 0.0.0.0 to test and got nothing. I checked the peering and that looks good, I also checked the route tables and from what I can tell it all looks right. Trying to go from production-vpc to dev-up-vpc and staging-up-vpc (production to staging is most important ATM) .. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 5:55 PM Adrian Nunez > wrote: Check security groups and IP addresses. There was talk about whitelisting IPs yesterday. Which VPCs are you trying to SSH into? I can go take a look. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:51:32 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Yah, the peering connections still look good - but we can't ssh from production to dev/staging environments. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez > wrote: I didn't touch the VPC peering connections. I saw you guys had 2 yesterday. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:21:08 PM To: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) >; Adrian Nunez > Subject: AWS Network Issue [EXTERNAL EMAIL] Adrian / Andrew, Are you guys aware of any changes to connectivity between prod and dev/staging VPCs? We're just looking into it now, but we figured we'd throw it out there and ask before we get too deep. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 101111 bytes Desc: image.png URL: From mnissley at redhat.com Fri Dec 6 00:40:42 2019 From: mnissley at redhat.com (Mark Nissley) Date: Thu, 5 Dec 2019 19:40:42 -0500 Subject: [Platformone] AWS Network Issue In-Reply-To: References: Message-ID: Yup. Git Lab is down for me too. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled Training: October 14-18* On Thu, Dec 5, 2019 at 7:30 PM Jonathan Rickard wrote: > Adrian, > > I think that made it worse - now I can't connect to the gitlab in > production vpc (SSH or HTTPS). > > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 6:13 PM Adrian Nunez > wrote: > >> Try it now. Your peering connection didn't have subnets associated with >> it. >> >> Get Outlook for Android >> >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 7:07:17 PM >> *To:* Adrian Nunez >> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) >> *Subject:* Re: AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> Sounds good - thanks Adrian ... I'm trying to figure out the syntax to >> query against a log group to see if I can catch the reason for the drop ... >> >> Even running a curl against the port is timing out (curl -v :22 ) ... >> >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> On Thu, Dec 5, 2019 at 6:02 PM Adrian Nunez >> wrote: >> >> Ok. I'll take a look in about 10. Eating dinner. >> >> Get Outlook for Android >> >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 6:59:34 PM >> *To:* Adrian Nunez >> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) >> *Subject:* Re: AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> Yep, we checked the security groups - even tried to open it up 0.0.0.0 to >> test and got nothing. I checked the peering and that looks good, I also >> checked the route tables and from what I can tell it all looks right. >> >> Trying to go from production-vpc to dev-up-vpc and staging-up-vpc >> (production to staging is most important ATM) .. >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> On Thu, Dec 5, 2019 at 5:55 PM Adrian Nunez >> wrote: >> >> Check security groups and IP addresses. >> There was talk about whitelisting IPs yesterday. >> >> Which VPCs are you trying to SSH into? I can go take a look. >> >> >> >> Get Outlook for Android >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 6:51:32 PM >> *To:* Adrian Nunez >> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) >> *Subject:* Re: AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> Yah, the peering connections still look good - but we can't ssh from >> production to dev/staging environments. >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez >> wrote: >> >> I didn't touch the VPC peering connections. I saw you guys had 2 >> yesterday. >> >> Get Outlook for Android >> >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 6:21:08 PM >> *To:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) ; Adrian Nunez < >> adrian.nunez at bylight.com> >> *Subject:* AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> Adrian / Andrew, >> >> Are you guys aware of any changes to connectivity between prod and >> dev/staging VPCs? We're just looking into it now, but we figured we'd throw >> it out there and ask before we get too deep. >> >> Thanks, >> jonny >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Fri Dec 6 00:41:10 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Thu, 5 Dec 2019 18:41:10 -0600 Subject: [Platformone] AWS Network Issue In-Reply-To: References: Message-ID: I completely agree with you - but whenever i go to https://dccscr.dsop.io it's dead now - and if i were to ssh into the EIP for it, it times out ... before you added the subnets to the peers I was able to reach them ... https: [image: image.png] ssh: [image: image.png] Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Thu, Dec 5, 2019 at 6:36 PM Adrian Nunez wrote: > You have quad zeros on the Gitlab for 22 and 443. You should be able to > SSH and HTTPS in from anywhere. > > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 7:29 PM > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > Adrian, > > I think that made it worse - now I can't connect to the gitlab in > production vpc (SSH or HTTPS). > > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 6:13 PM Adrian Nunez > wrote: > > Try it now. Your peering connection didn't have subnets associated with > it. > > Get Outlook for Android > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 7:07:17 PM > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > Sounds good - thanks Adrian ... I'm trying to figure out the syntax to > query against a log group to see if I can catch the reason for the drop ... > > Even running a curl against the port is timing out (curl -v :22 ) ... > > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 6:02 PM Adrian Nunez > wrote: > > Ok. I'll take a look in about 10. Eating dinner. > > Get Outlook for Android > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:59:34 PM > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > Yep, we checked the security groups - even tried to open it up 0.0.0.0 to > test and got nothing. I checked the peering and that looks good, I also > checked the route tables and from what I can tell it all looks right. > > Trying to go from production-vpc to dev-up-vpc and staging-up-vpc > (production to staging is most important ATM) .. > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 5:55 PM Adrian Nunez > wrote: > > Check security groups and IP addresses. > There was talk about whitelisting IPs yesterday. > > Which VPCs are you trying to SSH into? I can go take a look. > > > > Get Outlook for Android > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:51:32 PM > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > Yah, the peering connections still look good - but we can't ssh from > production to dev/staging environments. > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez > wrote: > > I didn't touch the VPC peering connections. I saw you guys had 2 > yesterday. > > Get Outlook for Android > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:21:08 PM > *To:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) ; Adrian Nunez < > adrian.nunez at bylight.com> > *Subject:* AWS Network Issue > > > [EXTERNAL EMAIL] > Adrian / Andrew, > > Are you guys aware of any changes to connectivity between prod and > dev/staging VPCs? We're just looking into it now, but we figured we'd throw > it out there and ask before we get too deep. > > Thanks, > jonny > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 101111 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 108107 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 67330 bytes Desc: not available URL: From taylor at redhat.com Fri Dec 6 00:45:21 2019 From: taylor at redhat.com (Taylor Biggs) Date: Thu, 5 Dec 2019 19:45:21 -0500 Subject: [Platformone] AWS Network Issue In-Reply-To: References: Message-ID: Can we please back-out any changes made just now, and test future changes in the other VPCs/Subnets/etc before we make them to the prod VPCs? Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Thu, Dec 5, 2019 at 7:42 PM Jonathan Rickard wrote: > I completely agree with you - but whenever i go to https://dccscr.dsop.io > it's dead now - and if i were to ssh into the EIP for it, it times out ... > before you added the subnets to the peers I was able to reach them ... > > https: > [image: image.png] > > ssh: > [image: image.png] > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 6:36 PM Adrian Nunez > wrote: > >> You have quad zeros on the Gitlab for 22 and 443. You should be able to >> SSH and HTTPS in from anywhere. >> >> >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 7:29 PM >> *To:* Adrian Nunez >> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) >> *Subject:* Re: AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> Adrian, >> >> I think that made it worse - now I can't connect to the gitlab in >> production vpc (SSH or HTTPS). >> >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> On Thu, Dec 5, 2019 at 6:13 PM Adrian Nunez >> wrote: >> >> Try it now. Your peering connection didn't have subnets associated with >> it. >> >> Get Outlook for Android >> >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 7:07:17 PM >> *To:* Adrian Nunez >> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) >> *Subject:* Re: AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> Sounds good - thanks Adrian ... I'm trying to figure out the syntax to >> query against a log group to see if I can catch the reason for the drop ... >> >> Even running a curl against the port is timing out (curl -v :22 ) ... >> >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> On Thu, Dec 5, 2019 at 6:02 PM Adrian Nunez >> wrote: >> >> Ok. I'll take a look in about 10. Eating dinner. >> >> Get Outlook for Android >> >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 6:59:34 PM >> *To:* Adrian Nunez >> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) >> *Subject:* Re: AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> Yep, we checked the security groups - even tried to open it up 0.0.0.0 to >> test and got nothing. I checked the peering and that looks good, I also >> checked the route tables and from what I can tell it all looks right. >> >> Trying to go from production-vpc to dev-up-vpc and staging-up-vpc >> (production to staging is most important ATM) .. >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> On Thu, Dec 5, 2019 at 5:55 PM Adrian Nunez >> wrote: >> >> Check security groups and IP addresses. >> There was talk about whitelisting IPs yesterday. >> >> Which VPCs are you trying to SSH into? I can go take a look. >> >> >> >> Get Outlook for Android >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 6:51:32 PM >> *To:* Adrian Nunez >> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) >> *Subject:* Re: AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> Yah, the peering connections still look good - but we can't ssh from >> production to dev/staging environments. >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez >> wrote: >> >> I didn't touch the VPC peering connections. I saw you guys had 2 >> yesterday. >> >> Get Outlook for Android >> >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 6:21:08 PM >> *To:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) ; Adrian Nunez < >> adrian.nunez at bylight.com> >> *Subject:* AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> Adrian / Andrew, >> >> Are you guys aware of any changes to connectivity between prod and >> dev/staging VPCs? We're just looking into it now, but we figured we'd throw >> it out there and ask before we get too deep. >> >> Thanks, >> jonny >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 101111 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 108107 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 67330 bytes Desc: not available URL: From adrian.nunez at bylight.com Fri Dec 6 01:02:35 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Fri, 6 Dec 2019 01:02:35 +0000 Subject: [Platformone] AWS Network Issue In-Reply-To: References: , Message-ID: I put the routing tables back to where they were before you emailed. If Nic can hit sso.dsop.io, then the problem might be internal. If there was no internet connectivity to the internet you wouldn't get sso.dsop.io in a browser. ________________________________ From: Jonathan Rickard Sent: Thursday, December 5, 2019 7:41 PM To: Adrian Nunez Cc: platformONE at redhat.com ; Nunez, Carlos A [US] (MS) (Contr) Subject: Re: AWS Network Issue [EXTERNAL EMAIL] I completely agree with you - but whenever i go to https://dccscr.dsop.io it's dead now - and if i were to ssh into the EIP for it, it times out ... before you added the subnets to the peers I was able to reach them ... https: [image.png] ssh: [image.png] Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 6:36 PM Adrian Nunez > wrote: You have quad zeros on the Gitlab for 22 and 443. You should be able to SSH and HTTPS in from anywhere. [cid:16ed8a43007cb971f161] ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 7:29 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Adrian, I think that made it worse - now I can't connect to the gitlab in production vpc (SSH or HTTPS). Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 6:13 PM Adrian Nunez > wrote: Try it now. Your peering connection didn't have subnets associated with it. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 7:07:17 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Sounds good - thanks Adrian ... I'm trying to figure out the syntax to query against a log group to see if I can catch the reason for the drop ... Even running a curl against the port is timing out (curl -v :22 ) ... Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 6:02 PM Adrian Nunez > wrote: Ok. I'll take a look in about 10. Eating dinner. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:59:34 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Yep, we checked the security groups - even tried to open it up 0.0.0.0 to test and got nothing. I checked the peering and that looks good, I also checked the route tables and from what I can tell it all looks right. Trying to go from production-vpc to dev-up-vpc and staging-up-vpc (production to staging is most important ATM) .. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 5:55 PM Adrian Nunez > wrote: Check security groups and IP addresses. There was talk about whitelisting IPs yesterday. Which VPCs are you trying to SSH into? I can go take a look. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:51:32 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Yah, the peering connections still look good - but we can't ssh from production to dev/staging environments. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez > wrote: I didn't touch the VPC peering connections. I saw you guys had 2 yesterday. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:21:08 PM To: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) >; Adrian Nunez > Subject: AWS Network Issue [EXTERNAL EMAIL] Adrian / Andrew, Are you guys aware of any changes to connectivity between prod and dev/staging VPCs? We're just looking into it now, but we figured we'd throw it out there and ask before we get too deep. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 101111 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 108107 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 67330 bytes Desc: image.png URL: From bgordon at redhat.com Fri Dec 6 01:07:46 2019 From: bgordon at redhat.com (Brenna Gordon) Date: Thu, 5 Dec 2019 20:07:46 -0500 Subject: [Platformone] AWS Network Issue In-Reply-To: References: Message-ID: All - FYSA, Nic has an Ask Me Anything at 9:30 AM EST tomorrow and needs the VPCs back online. On Thu, Dec 5, 2019 at 8:02 PM Adrian Nunez wrote: > I put the routing tables back to where they were before you emailed. If > Nic can hit sso.dsop.io, then the problem might be internal. If there > was no internet connectivity to the internet you wouldn't get sso.dsop.io > in a browser. > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 7:41 PM > > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > I completely agree with you - but whenever i go to https://dccscr.dsop.io > it's dead now - and if i were to ssh into the EIP for it, it times out ... > before you added the subnets to the peers I was able to reach them ... > > https: > [image: image.png] > > ssh: > [image: image.png] > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 6:36 PM Adrian Nunez > wrote: > > You have quad zeros on the Gitlab for 22 and 443. You should be able to > SSH and HTTPS in from anywhere. > > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 7:29 PM > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > Adrian, > > I think that made it worse - now I can't connect to the gitlab in > production vpc (SSH or HTTPS). > > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 6:13 PM Adrian Nunez > wrote: > > Try it now. Your peering connection didn't have subnets associated with > it. > > Get Outlook for Android > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 7:07:17 PM > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > Sounds good - thanks Adrian ... I'm trying to figure out the syntax to > query against a log group to see if I can catch the reason for the drop ... > > Even running a curl against the port is timing out (curl -v :22 ) ... > > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 6:02 PM Adrian Nunez > wrote: > > Ok. I'll take a look in about 10. Eating dinner. > > Get Outlook for Android > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:59:34 PM > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > Yep, we checked the security groups - even tried to open it up 0.0.0.0 to > test and got nothing. I checked the peering and that looks good, I also > checked the route tables and from what I can tell it all looks right. > > Trying to go from production-vpc to dev-up-vpc and staging-up-vpc > (production to staging is most important ATM) .. > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 5:55 PM Adrian Nunez > wrote: > > Check security groups and IP addresses. > There was talk about whitelisting IPs yesterday. > > Which VPCs are you trying to SSH into? I can go take a look. > > > > Get Outlook for Android > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:51:32 PM > *To:* Adrian Nunez > *Cc:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) > *Subject:* Re: AWS Network Issue > > > [EXTERNAL EMAIL] > Yah, the peering connections still look good - but we can't ssh from > production to dev/staging environments. > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez > wrote: > > I didn't touch the VPC peering connections. I saw you guys had 2 > yesterday. > > Get Outlook for Android > > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Thursday, December 5, 2019 6:21:08 PM > *To:* platformONE at redhat.com ; Nunez, Carlos A > [US] (MS) (Contr) ; Adrian Nunez < > adrian.nunez at bylight.com> > *Subject:* AWS Network Issue > > > [EXTERNAL EMAIL] > Adrian / Andrew, > > Are you guys aware of any changes to connectivity between prod and > dev/staging VPCs? We're just looking into it now, but we figured we'd throw > it out there and ask before we get too deep. > > Thanks, > jonny > > Jonathan Rickard, RHCA > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -- Brenna Gordon Client Manager, NAPS Red Hat bgordon at redhat.com M: 703-650-8755 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 108107 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 101111 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 67330 bytes Desc: not available URL: From taylor at redhat.com Fri Dec 6 01:14:06 2019 From: taylor at redhat.com (Taylor Biggs) Date: Thu, 5 Dec 2019 20:14:06 -0500 Subject: [Platformone] AWS Network Issue In-Reply-To: References: Message-ID: Adrian, Looks like sso.dsop.io is pointing to the CHT VPC, so that should be ignored. Definitely a problem that still affects dccscr.dsop.io, dcar.dsop.io, and cluster.dsop.io. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Thu, Dec 5, 2019 at 8:08 PM Brenna Gordon wrote: > All - FYSA, Nic has an Ask Me Anything at 9:30 AM EST tomorrow and needs > the VPCs back online. > > On Thu, Dec 5, 2019 at 8:02 PM Adrian Nunez > wrote: > >> I put the routing tables back to where they were before you emailed. If >> Nic can hit sso.dsop.io, then the problem might be internal. If there >> was no internet connectivity to the internet you wouldn't get sso.dsop.io >> in a browser. >> >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 7:41 PM >> >> *To:* Adrian Nunez >> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) >> *Subject:* Re: AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> I completely agree with you - but whenever i go to https://dccscr.dsop.io >> it's dead now - and if i were to ssh into the EIP for it, it times out ... >> before you added the subnets to the peers I was able to reach them ... >> >> https: >> [image: image.png] >> >> ssh: >> [image: image.png] >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> On Thu, Dec 5, 2019 at 6:36 PM Adrian Nunez >> wrote: >> >> You have quad zeros on the Gitlab for 22 and 443. You should be able to >> SSH and HTTPS in from anywhere. >> >> >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 7:29 PM >> *To:* Adrian Nunez >> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) >> *Subject:* Re: AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> Adrian, >> >> I think that made it worse - now I can't connect to the gitlab in >> production vpc (SSH or HTTPS). >> >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> On Thu, Dec 5, 2019 at 6:13 PM Adrian Nunez >> wrote: >> >> Try it now. Your peering connection didn't have subnets associated with >> it. >> >> Get Outlook for Android >> >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 7:07:17 PM >> *To:* Adrian Nunez >> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) >> *Subject:* Re: AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> Sounds good - thanks Adrian ... I'm trying to figure out the syntax to >> query against a log group to see if I can catch the reason for the drop ... >> >> Even running a curl against the port is timing out (curl -v :22 ) ... >> >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> On Thu, Dec 5, 2019 at 6:02 PM Adrian Nunez >> wrote: >> >> Ok. I'll take a look in about 10. Eating dinner. >> >> Get Outlook for Android >> >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 6:59:34 PM >> *To:* Adrian Nunez >> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) >> *Subject:* Re: AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> Yep, we checked the security groups - even tried to open it up 0.0.0.0 to >> test and got nothing. I checked the peering and that looks good, I also >> checked the route tables and from what I can tell it all looks right. >> >> Trying to go from production-vpc to dev-up-vpc and staging-up-vpc >> (production to staging is most important ATM) .. >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> On Thu, Dec 5, 2019 at 5:55 PM Adrian Nunez >> wrote: >> >> Check security groups and IP addresses. >> There was talk about whitelisting IPs yesterday. >> >> Which VPCs are you trying to SSH into? I can go take a look. >> >> >> >> Get Outlook for Android >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 6:51:32 PM >> *To:* Adrian Nunez >> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) >> *Subject:* Re: AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> Yah, the peering connections still look good - but we can't ssh from >> production to dev/staging environments. >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez >> wrote: >> >> I didn't touch the VPC peering connections. I saw you guys had 2 >> yesterday. >> >> Get Outlook for Android >> >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Thursday, December 5, 2019 6:21:08 PM >> *To:* platformONE at redhat.com ; Nunez, Carlos A >> [US] (MS) (Contr) ; Adrian Nunez < >> adrian.nunez at bylight.com> >> *Subject:* AWS Network Issue >> >> >> [EXTERNAL EMAIL] >> Adrian / Andrew, >> >> Are you guys aware of any changes to connectivity between prod and >> dev/staging VPCs? We're just looking into it now, but we figured we'd throw >> it out there and ask before we get too deep. >> >> Thanks, >> jonny >> >> Jonathan Rickard, RHCA >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > -- > > Brenna Gordon > > Client Manager, NAPS > > Red Hat > > bgordon at redhat.com > M: 703-650-8755 > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 108107 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 101111 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 67330 bytes Desc: not available URL: From taylor at redhat.com Fri Dec 6 01:15:12 2019 From: taylor at redhat.com (Taylor Biggs) Date: Thu, 5 Dec 2019 20:15:12 -0500 Subject: [Platformone] AWS Network Issue In-Reply-To: References: Message-ID: This is definitely a change that happend in the past hour or so, I was just pulling metrics from Kibana (also affected). ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Thu, Dec 5, 2019 at 8:14 PM Taylor Biggs wrote: > Adrian, > > Looks like sso.dsop.io is pointing to the CHT VPC, so that should be > ignored. Definitely a problem that still affects dccscr.dsop.io, > dcar.dsop.io, and cluster.dsop.io. > > Thanks, > Taylor > > ---- > Taylor Biggs > taylor at redhat.com > 850-449-2220 > > > > On Thu, Dec 5, 2019 at 8:08 PM Brenna Gordon wrote: > >> All - FYSA, Nic has an Ask Me Anything at 9:30 AM EST tomorrow and needs >> the VPCs back online. >> >> On Thu, Dec 5, 2019 at 8:02 PM Adrian Nunez >> wrote: >> >>> I put the routing tables back to where they were before you emailed. If >>> Nic can hit sso.dsop.io, then the problem might be internal. If there >>> was no internet connectivity to the internet you wouldn't get >>> sso.dsop.io in a browser. >>> >>> ------------------------------ >>> *From:* Jonathan Rickard >>> *Sent:* Thursday, December 5, 2019 7:41 PM >>> >>> *To:* Adrian Nunez >>> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >>> [US] (MS) (Contr) >>> *Subject:* Re: AWS Network Issue >>> >>> >>> [EXTERNAL EMAIL] >>> I completely agree with you - but whenever i go to >>> https://dccscr.dsop.io it's dead now - and if i were to ssh into the >>> EIP for it, it times out ... before you added the subnets to the peers I >>> was able to reach them ... >>> >>> https: >>> [image: image.png] >>> >>> ssh: >>> [image: image.png] >>> >>> Jonathan Rickard, RHCA >>> >>> Principal Consultant, NAPS >>> >>> Red Hat Remote - Texas >>> >>> jonny at redhat.com >>> M: 210-862-9739 >>> >>> >>> >>> On Thu, Dec 5, 2019 at 6:36 PM Adrian Nunez >>> wrote: >>> >>> You have quad zeros on the Gitlab for 22 and 443. You should be able to >>> SSH and HTTPS in from anywhere. >>> >>> >>> ------------------------------ >>> *From:* Jonathan Rickard >>> *Sent:* Thursday, December 5, 2019 7:29 PM >>> *To:* Adrian Nunez >>> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >>> [US] (MS) (Contr) >>> *Subject:* Re: AWS Network Issue >>> >>> >>> [EXTERNAL EMAIL] >>> Adrian, >>> >>> I think that made it worse - now I can't connect to the gitlab in >>> production vpc (SSH or HTTPS). >>> >>> >>> Jonathan Rickard, RHCA >>> >>> Principal Consultant, NAPS >>> >>> Red Hat Remote - Texas >>> >>> jonny at redhat.com >>> M: 210-862-9739 >>> >>> >>> >>> On Thu, Dec 5, 2019 at 6:13 PM Adrian Nunez >>> wrote: >>> >>> Try it now. Your peering connection didn't have subnets associated with >>> it. >>> >>> Get Outlook for Android >>> >>> ------------------------------ >>> *From:* Jonathan Rickard >>> *Sent:* Thursday, December 5, 2019 7:07:17 PM >>> *To:* Adrian Nunez >>> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >>> [US] (MS) (Contr) >>> *Subject:* Re: AWS Network Issue >>> >>> >>> [EXTERNAL EMAIL] >>> Sounds good - thanks Adrian ... I'm trying to figure out the syntax to >>> query against a log group to see if I can catch the reason for the drop ... >>> >>> Even running a curl against the port is timing out (curl -v :22 ) ... >>> >>> >>> Jonathan Rickard, RHCA >>> >>> Principal Consultant, NAPS >>> >>> Red Hat Remote - Texas >>> >>> jonny at redhat.com >>> M: 210-862-9739 >>> >>> >>> >>> On Thu, Dec 5, 2019 at 6:02 PM Adrian Nunez >>> wrote: >>> >>> Ok. I'll take a look in about 10. Eating dinner. >>> >>> Get Outlook for Android >>> >>> ------------------------------ >>> *From:* Jonathan Rickard >>> *Sent:* Thursday, December 5, 2019 6:59:34 PM >>> *To:* Adrian Nunez >>> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >>> [US] (MS) (Contr) >>> *Subject:* Re: AWS Network Issue >>> >>> >>> [EXTERNAL EMAIL] >>> Yep, we checked the security groups - even tried to open it up 0.0.0.0 >>> to test and got nothing. I checked the peering and that looks good, I also >>> checked the route tables and from what I can tell it all looks right. >>> >>> Trying to go from production-vpc to dev-up-vpc and staging-up-vpc >>> (production to staging is most important ATM) .. >>> >>> Jonathan Rickard, RHCA >>> >>> Principal Consultant, NAPS >>> >>> Red Hat Remote - Texas >>> >>> jonny at redhat.com >>> M: 210-862-9739 >>> >>> >>> >>> On Thu, Dec 5, 2019 at 5:55 PM Adrian Nunez >>> wrote: >>> >>> Check security groups and IP addresses. >>> There was talk about whitelisting IPs yesterday. >>> >>> Which VPCs are you trying to SSH into? I can go take a look. >>> >>> >>> >>> Get Outlook for Android >>> ------------------------------ >>> *From:* Jonathan Rickard >>> *Sent:* Thursday, December 5, 2019 6:51:32 PM >>> *To:* Adrian Nunez >>> *Cc:* platformONE at redhat.com ; Nunez, Carlos A >>> [US] (MS) (Contr) >>> *Subject:* Re: AWS Network Issue >>> >>> >>> [EXTERNAL EMAIL] >>> Yah, the peering connections still look good - but we can't ssh from >>> production to dev/staging environments. >>> >>> Jonathan Rickard, RHCA >>> >>> Principal Consultant, NAPS >>> >>> Red Hat Remote - Texas >>> >>> jonny at redhat.com >>> M: 210-862-9739 >>> >>> >>> >>> On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez >>> wrote: >>> >>> I didn't touch the VPC peering connections. I saw you guys had 2 >>> yesterday. >>> >>> Get Outlook for Android >>> >>> ------------------------------ >>> *From:* Jonathan Rickard >>> *Sent:* Thursday, December 5, 2019 6:21:08 PM >>> *To:* platformONE at redhat.com ; Nunez, Carlos A >>> [US] (MS) (Contr) ; Adrian Nunez < >>> adrian.nunez at bylight.com> >>> *Subject:* AWS Network Issue >>> >>> >>> [EXTERNAL EMAIL] >>> Adrian / Andrew, >>> >>> Are you guys aware of any changes to connectivity between prod and >>> dev/staging VPCs? We're just looking into it now, but we figured we'd throw >>> it out there and ask before we get too deep. >>> >>> Thanks, >>> jonny >>> >>> Jonathan Rickard, RHCA >>> >>> Principal Consultant, NAPS >>> >>> Red Hat Remote - Texas >>> >>> jonny at redhat.com >>> M: 210-862-9739 >>> >>> >>> This communication (including any attachments) may contain information >>> that is proprietary, confidential or exempt from disclosure. If you are not >>> the intended recipient, please note that further dissemination, >>> distribution, use or copying of this communication is strictly prohibited. >>> Anyone who received this message in error should notify the sender >>> immediately by telephone or by return email and delete it from his or her >>> computer. >>> >>> This communication (including any attachments) may contain information >>> that is proprietary, confidential or exempt from disclosure. If you are not >>> the intended recipient, please note that further dissemination, >>> distribution, use or copying of this communication is strictly prohibited. >>> Anyone who received this message in error should notify the sender >>> immediately by telephone or by return email and delete it from his or her >>> computer. >>> >>> This communication (including any attachments) may contain information >>> that is proprietary, confidential or exempt from disclosure. If you are not >>> the intended recipient, please note that further dissemination, >>> distribution, use or copying of this communication is strictly prohibited. >>> Anyone who received this message in error should notify the sender >>> immediately by telephone or by return email and delete it from his or her >>> computer. >>> >>> This communication (including any attachments) may contain information >>> that is proprietary, confidential or exempt from disclosure. If you are not >>> the intended recipient, please note that further dissemination, >>> distribution, use or copying of this communication is strictly prohibited. >>> Anyone who received this message in error should notify the sender >>> immediately by telephone or by return email and delete it from his or her >>> computer. >>> >>> This communication (including any attachments) may contain information >>> that is proprietary, confidential or exempt from disclosure. If you are not >>> the intended recipient, please note that further dissemination, >>> distribution, use or copying of this communication is strictly prohibited. >>> Anyone who received this message in error should notify the sender >>> immediately by telephone or by return email and delete it from his or her >>> computer. >>> >>> This communication (including any attachments) may contain information >>> that is proprietary, confidential or exempt from disclosure. If you are not >>> the intended recipient, please note that further dissemination, >>> distribution, use or copying of this communication is strictly prohibited. >>> Anyone who received this message in error should notify the sender >>> immediately by telephone or by return email and delete it from his or her >>> computer. >>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >> -- >> >> Brenna Gordon >> >> Client Manager, NAPS >> >> Red Hat >> >> bgordon at redhat.com >> M: 703-650-8755 >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 108107 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 101111 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 67330 bytes Desc: not available URL: From adrian.nunez at bylight.com Fri Dec 6 01:26:20 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Fri, 6 Dec 2019 01:26:20 +0000 Subject: [Platformone] AWS Network Issue In-Reply-To: References: , Message-ID: dccscr is back up. The peering connections are not set up right. What happened is if I associated the subnets to the routing tables for the peering connection it loses connectivity to the everything in the VPC. So for now you have the site up, but peering may not be working properly. I suggest deleting the current Peering Connections and setting them up again. With the site up and a demo tomorrow, I do not plan on touching anything else. Taylor is correct. Before any changes are made to production there should be a test. ________________________________ From: Taylor Biggs Sent: Thursday, December 5, 2019 8:15 PM To: Brenna Gordon Cc: Adrian Nunez ; platformONE at redhat.com ; Nunez, Carlos A [US] (MS) (Contr) Subject: Re: [Platformone] AWS Network Issue [EXTERNAL EMAIL] This is definitely a change that happend in the past hour or so, I was just pulling metrics from Kibana (also affected). ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Thu, Dec 5, 2019 at 8:14 PM Taylor Biggs > wrote: Adrian, Looks like sso.dsop.io is pointing to the CHT VPC, so that should be ignored. Definitely a problem that still affects dccscr.dsop.io, dcar.dsop.io, and cluster.dsop.io. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Thu, Dec 5, 2019 at 8:08 PM Brenna Gordon > wrote: All - FYSA, Nic has an Ask Me Anything at 9:30 AM EST tomorrow and needs the VPCs back online. On Thu, Dec 5, 2019 at 8:02 PM Adrian Nunez > wrote: I put the routing tables back to where they were before you emailed. If Nic can hit sso.dsop.io, then the problem might be internal. If there was no internet connectivity to the internet you wouldn't get sso.dsop.io in a browser. ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 7:41 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] I completely agree with you - but whenever i go to https://dccscr.dsop.io it's dead now - and if i were to ssh into the EIP for it, it times out ... before you added the subnets to the peers I was able to reach them ... https: [image.png] ssh: [image.png] Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 6:36 PM Adrian Nunez > wrote: You have quad zeros on the Gitlab for 22 and 443. You should be able to SSH and HTTPS in from anywhere. [cid:16ed8be7dedea654a7f1] ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 7:29 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Adrian, I think that made it worse - now I can't connect to the gitlab in production vpc (SSH or HTTPS). Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 6:13 PM Adrian Nunez > wrote: Try it now. Your peering connection didn't have subnets associated with it. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 7:07:17 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Sounds good - thanks Adrian ... I'm trying to figure out the syntax to query against a log group to see if I can catch the reason for the drop ... Even running a curl against the port is timing out (curl -v :22 ) ... Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 6:02 PM Adrian Nunez > wrote: Ok. I'll take a look in about 10. Eating dinner. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:59:34 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Yep, we checked the security groups - even tried to open it up 0.0.0.0 to test and got nothing. I checked the peering and that looks good, I also checked the route tables and from what I can tell it all looks right. Trying to go from production-vpc to dev-up-vpc and staging-up-vpc (production to staging is most important ATM) .. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 5:55 PM Adrian Nunez > wrote: Check security groups and IP addresses. There was talk about whitelisting IPs yesterday. Which VPCs are you trying to SSH into? I can go take a look. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:51:32 PM To: Adrian Nunez > Cc: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) > Subject: Re: AWS Network Issue [EXTERNAL EMAIL] Yah, the peering connections still look good - but we can't ssh from production to dev/staging environments. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Thu, Dec 5, 2019 at 5:49 PM Adrian Nunez > wrote: I didn't touch the VPC peering connections. I saw you guys had 2 yesterday. Get Outlook for Android ________________________________ From: Jonathan Rickard > Sent: Thursday, December 5, 2019 6:21:08 PM To: platformONE at redhat.com >; Nunez, Carlos A [US] (MS) (Contr) >; Adrian Nunez > Subject: AWS Network Issue [EXTERNAL EMAIL] Adrian / Andrew, Are you guys aware of any changes to connectivity between prod and dev/staging VPCs? We're just looking into it now, but we figured we'd throw it out there and ask before we get too deep. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- Brenna Gordon Client Manager, NAPS Red Hat bgordon at redhat.com M: 703-650-8755 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 108107 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 101111 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 67330 bytes Desc: image.png URL: From ademola.abodunrin at us.af.mil Fri Dec 6 18:01:46 2019 From: ademola.abodunrin at us.af.mil (ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP) Date: Fri, 6 Dec 2019 18:01:46 +0000 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: <1575561314717.40064@ManTech.com> References: <1574703796843.94672@ManTech.com> <1574705271520.5593@ManTech.com> <1574705967080.55870@ManTech.com> <1574706319228.45425@ManTech.com> <1574707851362.16520@ManTech.com> <1574713055636.69810@ManTech.com> <1574714851887.51023@ManTech.com> <1574869597001.92851@ManTech.com> <1575303080504.65964@ManTech.com> <1575311511650.96981@ManTech.com> <6277_1575312368_5DE55BEF_6277_803_1_CA+EGyAB6=xB9Na5Wf-nWY52dwt1200Rk8uh6krw1CXojEaO1PQ@mail.gmail.com> <7E65F8DF-F96B-4657-82A2-4A95715FAC0B@mitre.org> <5A89B24C-3C36-4205-B083-CBFBF71ED122@mitre.org> <3052516616bb44caa350c3c6897bc122@DCCPRDEXCH02.ManTech.com> <406B92B0-26F9-48FD-B533-549CC36B4FB2@mitre.org> <1575472568190.35402@ManTech.com>, <1575561314717.40064@ManTech.com> Message-ID: ALCON, The cluster is down again. Please assist. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil From: Kendall, Russell C Sent: Thursday, December 5, 2019 9:55 AM To: Jonathan Rickard Cc: Miller, Timothy J. ; Keegan Reap ; Bubb, Mike ; platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP ; Jonathan Rickard Subject: [Non-DoD Source] Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Jonny, I'll see you Friday at 500 Nav. Travel safe. V/R, Russell C Kendall? _____ From: Jonathan Rickard > Sent: Wednesday, December 4, 2019 5:29 PM To: Kendall, Russell C Cc: Miller, Timothy J.; Keegan Reap; Bubb, Mike; platformONE at redhat.com ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Russell, I have definitely been terrible with email lately and I apologize for the slow response times. I get back to San Antonio tomorrow but I have a pretty full afternoon. I can stop by Friday if you'd like. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C > wrote: Jonny, I'd like to suggest you come to 500 to wrap this up, since it seems there are significant delays in communication that are contributing to downtime. V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. > Sent: Wednesday, December 4, 2019 7:02 AM To: Jonathan Rickard; Keegan Reap Cc: Bubb, Mike; platformONE at redhat.com ; Kendall, Russell C; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Johnny-- Update the issue, if you would be so kind. -- T ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of Jonathan Rickard" on behalf of jrickard at redhat.com > wrote: Hey Guys - Sorry for taking so long - this has been completed. Please run your builds and let us know if you're having any problems. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard > wrote: Russell / Team, We believe we've identified the issue with your application deploying. In order to rectify the issue I need to evacuate pods so you will probably see some hiccups while deploying. I will update when this is resolved. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap > wrote: Hey all, we have opened an issue below, that we believe to be the cause, we are currently investigating: https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard > wrote: Russell, Getting more eyes on this @platformONE at redhat.com > We'll keep you posted. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C > wrote: Kevin, Unfortunately we are receiving deployment errors again. This is the event: 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. This is the deployment: https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. > Sent: Monday, December 2, 2019 2:44:21 PM To: Kevin O'Donnell Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors Tagged you on it. -- T On 12/2/19, 14:03, "Kevin O'Donnell" > wrote: Hello, Autoscaling is on our future IAC roadmap. Tim, the additional ticket would be appreciated. We have swapped out the app/worker instances with m5a.8xlarge 32 cores, 128gb of ram. Please let us know if you have any other issues. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. > wrote: I'll open an issue. IaC needs to have instance size as a host_var to facilitate scaling. -- T On 12/2/19, 13:15, "Kevin O'Donnell" > wrote: Tim, Thanks for the information. We are undersized on the app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb of ram. From what I have read each Labs engagement operated on a 3 node worker cluster with each node having 6core's and 28gb of ram. We will need to swap out the existing instances with larger spec's. We are going to try to flush the existing workload out on one of the workers to see if we can swap them out one at a time. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. > wrote: Here's what I can see, given the perm limits I seem to be under: - NS:develop-misp-app and NS:lp-develop-misp-app both have several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned while trying to fetch something from somewhere (URL isn't recorded in the stack trace). - NS:minishift-misp-app has most of its pods/jobs stuck in ImagePullBackoff. No detail there in the event stream so I'll see if I can dig deeper. - NS:aam-ci-cd has Jenkins trying to spin up three workers, those are coming back as unschedulable. I can't see into NS:aam-bases or NS:dsop-images b/c of perm limits. I see no DAS-related project(s). The MISP stuff needs debugging before calling "blocked" since it looks like an internal error from this perspective. In re: AAM Jenkins: If this deployment is coming out of the OCP storefront, then maybe it should be ephemeral rather than persistent. If it's a custom deployment, then it probably needs a rethink. I'm also not sure why there are two MISP dev projects. -- T On 12/2/19, 12:46, "Kevin O'Donnell" > wrote: Russell, Thank you for the information. We can switch out the instance type for the worker nodes. How much memory is required by the apps? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C > wrote: Kevin, The lack of resources on u-p.io cluster is hindering development, testing, and integration of the apps from CCAT AAM DAS, which is putting one of our PI goals at risk. We are blocked by the fact that we (CCAT and AAM) cannot deploy additional pods to the unified-platform.io cluster. We have a subset of containers deployed, but rolling deployments and new deployments fail. This means that we are not able to execute integration testing or peer reviews. We are temporarily working around by NOT testing/reviewing our code changes live, something that no one likes. Also, we are now running weeks-old instances of our containers, so we are very likely producing some technical debt. We currently have developers approaching idle or doing non-priority work until the resource issue is resolved. Here is the particular error from the OSP cluster I received while attempting a redeploy of one of our apps. 0/9 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node selector.11 times in the last minute Since we do not have any cluster permissions, I cannot verify which resource is running out, but from experience, I assess it is a memory issue. It appears the cluster has been provisioned with a silly allocation of node types. Without knowing exactly what was deployed, it appears only 3 of the 9 hosts are suitable worker nodes. We would expect the cluster to respond to resource limitations and scale, but if a scheduled downtime is required, please work with us so we can anticipate. As it stands, the cluster does not support resources required by CCAT and the other dev teams (AAM, DAS, etc.). We would accept any downtime if it will improve the situation, as we are blocked from progressing under the current constraints. My hope was we could get the cluster redeployed over the TG holiday to eliminate developer impact, but as Mark pointed out, there were limited support folks available. Now I am just trying to minimize the losses. V/R, Russell C Kendall ________________________________________ From: Kevin O'Donnell > Sent: Monday, December 2, 2019 11:52 AM To: Kendall, Russell C Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org ); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors Hello Russell, Can you elaborate on the term Blocked? What specific issues are the blockers? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C > wrote: Mark, Thank for acknowledging, please be aware the San Antonio dev teams working in unified-platform.io are currently blocked. V/R, Russell C Kendall ________________________________________ From: Mark Nissley > Sent: Monday, December 2, 2019 9:36 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org ); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors As noted, I don't suspect much got done on this over the holiday weekend. I did see the ticket, as dropped some details into it. I also assigned it to @Jonathan Rickard > and @Chris Kuperstein > . It looks like short term solutions have been easy but the issue is recurring. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 Scheduled Training: October 14-18 On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > wrote: Mark/Kevin, I just heard at the team stand up that we are still blocked. This is also affecting the AAM team from my investigations. Please let me know if there is something we need to do to move this forward. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Sent: Wednesday, November 27, 2019 12:58 PM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; Mark Nissley >; Kevin O'Donnell >; Brenna Gordon > Cc: Kendall, Russell C >; Bubb, Mike (mbubb at mitre.org ) >; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP >; Miller, Timothy J. >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: Re: Unified Platform Pod Deploy Errors Thanks a lot Capt Bryan! Russell created the ticket on GitLab UP Node Project. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP > Sent: Wednesday, November 27, 2019 12:56 PM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >; Mark Nissley >; Kevin O'Donnell >; Brenna Gordon > Cc: Kendall, Russell C >; Bubb, Mike (mbubb at mitre.org ) >; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP >; Miller, Timothy J. >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: RE: Unified Platform Pod Deploy Errors Thanks Ade. The team is thin until next week due to the holidays but I will make sure it is addressed. Were there any issues submitted to Gitlab?s UP Node Project on DCCSCR? @Mark/Kevin ? can we address? -Austen From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > Sent: Wednesday, November 27, 2019 9:51 AM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP > Cc: Kendall, Russell C >; Bubb, Mike (mbubb at mitre.org ) > Subject: Fw: Unified Platform Pod Deploy Errors Capt Bryan, Please see the explanation on the issue that Ginyu Force is currently experiencing below. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: Kendall, Russell C > Sent: Wednesday, November 27, 2019 9:46 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >; Buffaloe, Christopher >; Molina, Toby >; Crace, Jared E >; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP > Cc: tmiller at mitre.org > > Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy Errors Gentlemen, The application development teams working in the new GovCloud OCP environment (unified-platform.io ) are currently blocked in efforts to deploy new pods for testing, development, and UAT. Red Hat and RogueOne SMEs have been notified and have attempted some fixes starting on Monday 11/25, but at this point have not been able to provision resources sufficient to host CCAT and AAM. We have taken steps to minimize our footprint (eliminating demonstration environment, deleting developer namespaces), but this is not a sustainable approach, and has only resulted in moderate improvements in cluster performance. Our hope is the U-P.io cluster compute resources can be increased very soon, so that we may resume normal development activities. Our understanding is that such a scaling requires a complete redeployment of the cluster, which is unusual, but an acceptable loss to productivity. If the cluster can be scaled up over the Thanksgiving holiday, the impact will be minimal to developers and cluster administrators, alike. We are currently collaborating on solutions on the following MatterMost channel behind the space camp VPN (link below), and via the email thread forwarded (further below). https://chat.spacecamp.ninja/levelup/channels/unified-platform-node Please keep me posted on developments and I will coordinate developer activities with any scheduled platform outages. V/R, Russell C Kendall ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 2:47 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Sounds great. Appreciate it. I'll watch email and Mattermost in case you need more from us. -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 2:44 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Thanks Daniel - I'll continue to look into the resource issue that you're seeing - I'd like to identify the root cause and then work with the team to come up with a solution. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M > wrote: Yeah we hit the limit then had AAM kill some of their projects and then our pods got scheduled. We've hit the limit again though. Here's an example pod that cannot be scheduled https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth They're seeing it when their jenkins slaves can't deploy but it's basically any pod after we hit some limit. -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 1:26 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Daniel, I can see that you have 3 mongo pods, 1 chatup and 1 upbot pod running ... is your app good to go? Looks like there was an issue with memory on 1 pod, then some node selector being mismatched - just what i could see in the events... Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M > wrote: Also, AAM was having similar issues. Looks like they had a lot of namespaces and scaling down the pods on their deployments didn't help but actually deleting the namespaces did. We have pods scheduling now but I'm adding them and we'd still like to work through what resource limit we were hitting to avoid this in the future. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:25 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors Thanks, sir. Most important for us to get working is "ccat-demo" but it's also happening in "ccat-dev" and "ccat-ci-cd". -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 12:22 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors What's the name of the project you're working in? I'm going to be back at my laptop in about 30 and will take a look when I get there. Is it just the Jenkins pods failing? On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M > wrote: Adding Dean and Alex. Also, sitting in mattermost if anyone needs to get online and chat for more information. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:07 PM To: jonny at redhat.com >; ckuperst at redhat.com >; Mark Nissley Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Re: Unified Platform Pod Deploy Errors Adding Kupe and Mark. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 11:43 AM To: jonny at redhat.com > Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Unified Platform Pod Deploy Errors Hey Jonny, We met briefly at SpaceCAMP a couple weeks ago when cluster.unified-platform.io was stood up. We've been trying to deploy some apps today and so far today we're getting errors on most (if not all) of our pods. 0/9 nodes are available: 3 Insufficient pods, 6 node(s) didn't match node selector. Is what we're seeing. We were thinking it was some volume types weren't correct but some of our pods don't even have volumes attached and still give us this error (i.e. Jenkins slaves or web frontends without persistent storage). Any idea what this could be? We're not running out of space on the nodes themselves are we? We have a demo scheduled for tomorrow at 9:30 AM CST and are hoping to get a demo env up for them today but this error came up unexpectedly. Also, we're here at 500 Navarro St. in San Antonio working through this in person is better/easier. Thanks, Daniel Curran ________________________________________ This e-mail and any attachments are intended only for the use of the addressee(s) named herein and may contain proprietary information. If you are not the intended recipient of this e-mail or believe that you received this email in error, please take immediate action to notify the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments from your computer; and do not disseminate, distribute, use, or copy this message and any attachments. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5490 bytes Desc: not available URL: From jrickard at redhat.com Fri Dec 6 18:06:19 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Fri, 6 Dec 2019 12:06:19 -0600 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: References: <1574703796843.94672@ManTech.com> <1574705271520.5593@ManTech.com> <1574705967080.55870@ManTech.com> <1574706319228.45425@ManTech.com> <1574707851362.16520@ManTech.com> <1574713055636.69810@ManTech.com> <1574714851887.51023@ManTech.com> <1574869597001.92851@ManTech.com> <1575303080504.65964@ManTech.com> <1575311511650.96981@ManTech.com> <6277_1575312368_5DE55BEF_6277_803_1_CA+EGyAB6=xB9Na5Wf-nWY52dwt1200Rk8uh6krw1CXojEaO1PQ@mail.gmail.com> <7E65F8DF-F96B-4657-82A2-4A95715FAC0B@mitre.org> <5A89B24C-3C36-4205-B083-CBFBF71ED122@mitre.org> <3052516616bb44caa350c3c6897bc122@DCCPRDEXCH02.ManTech.com> <406B92B0-26F9-48FD-B533-549CC36B4FB2@mitre.org> <1575472568190.35402@ManTech.com> <1575561314717.40064@ManTech.com> Message-ID: Ade, What does that mean? You can't login, you can't deploy? Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs On Fri, Dec 6, 2019 at 12:02 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP wrote: > ALCON, > > > > The cluster is down again. Please assist. > > > > Most Sincerely, > > > > Ade Abodunrin, GG-12, USAF > > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > > [image: cid:image001.png at 01D4F814.4AA552D0] > > LevelUP Code Works > > Commercial: (210) 890-2113 > > NIPR email: *ademola.abodunrin at us.af.mil * > > > > *From:* Kendall, Russell C > *Sent:* Thursday, December 5, 2019 9:55 AM > *To:* Jonathan Rickard > *Cc:* Miller, Timothy J. ; Keegan Reap < > kreap at redhat.com>; Bubb, Mike ; platformONE at redhat.com; > RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; > ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < > ademola.abodunrin at us.af.mil>; Jonathan Rickard > *Subject:* [Non-DoD Source] Re: [Platformone] [EXT] Re: Unified Platform > Pod Deploy Errors > > > > Jonny, > > I'll see you Friday at 500 Nav. Travel safe. > > > > V/R, > > Russell C Kendall? > > > ------------------------------ > > *From:* Jonathan Rickard > *Sent:* Wednesday, December 4, 2019 5:29 PM > *To:* Kendall, Russell C > *Cc:* Miller, Timothy J.; Keegan Reap; Bubb, Mike; platformONE at redhat.com; > RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF > AFMC AFLCMC/HNCP; Jonathan Rickard > *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors > > > > Russell, > > > > I have definitely been terrible with email lately and I apologize for the > slow response times. I get back to San Antonio tomorrow but I have a pretty > full afternoon. I can stop by Friday if you'd like. > > > > Thanks, > > jonny > > > > *Jonathan Rickard**, RHCA* > > Principal Consultant, NAPS > > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > > > > > On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C < > Russell.Kendall at mantech.com> wrote: > > Jonny, > I'd like to suggest you come to 500 to wrap this up, since it seems there > are significant delays in communication that are contributing to downtime. > V/R, > Russell C Kendall > ________________________________________ > From: Miller, Timothy J. > Sent: Wednesday, December 4, 2019 7:02 AM > To: Jonathan Rickard; Keegan Reap > Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ, > JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC > AFLCMC/HNCP; Jonathan Rickard > Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors > > Johnny-- > > Update the issue, if you would be so kind. > > -- T > > ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of Jonathan > Rickard" > wrote: > > Hey Guys - Sorry for taking so long - this has been completed. Please > run your builds and let us know if you're having any problems. > jonny > Jonathan Rickard, RHCA > Principal Consultant, NAPS > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > > > > > > > > > > > > On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard > wrote: > > > Russell / Team, > > > We believe we've identified the issue with your application deploying. > In order to rectify the issue I need to evacuate pods so you will probably > see some hiccups while deploying. I will update when this is resolved. > > > Thanks, > jonny > > Jonathan Rickard, RHCA > Principal Consultant, NAPS > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > > > > > > > > > > > > On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap wrote: > > > Hey all, we have opened an issue below, that we believe to be the > cause, we are currently investigating: > > > https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 > > > > On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard > wrote: > > > Russell, > > > Getting more eyes on this @platformONE at redhat.com platformONE at redhat.com> > > > We'll keep you posted. > jonny > Jonathan Rickard, RHCA > Principal Consultant, NAPS > Red Hat Remote - Texas > > jonny at redhat.com > M: 210-862-9739 > > > > > > > > > > > > > > > On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C < > Russell.Kendall at mantech.com> wrote: > > > Kevin, > > Unfortunately we are receiving deployment errors again. This is the > event: > > 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had > taints that the pod didn't tolerate, 6 node(s) didn't match node selector. > > This is the deployment: > > > https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup > > > V/R, > Russell C Kendall > ________________________________________ > From: Miller, Timothy J. > Sent: Monday, December 2, 2019 2:44:21 PM > To: Kevin O'Donnell > Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF > AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt > USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 > USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE > A CTR USAF AFMC AFLCMC/HNCP > Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors > > Tagged you on it. > > -- T > > On 12/2/19, 14:03, "Kevin O'Donnell" wrote: > > Hello, > > > Autoscaling is on our future IAC roadmap. Tim, the additional > ticket would be appreciated. > > > We have swapped out the app/worker instances with m5a.8xlarge 32 > cores, 128gb of ram. Please let us know if you have any other issues. > > > Thanks, > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting < > https://www.redhat.com/> > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. < > tmiller at mitre.org> wrote: > > > I'll open an issue. IaC needs to have instance size as a host_var > to facilitate scaling. > > -- T > > On 12/2/19, 13:15, "Kevin O'Donnell" wrote: > > Tim, > > > Thanks for the information. We are undersized on the > app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb > of ram. From what I have read each Labs engagement operated on a 3 node > worker cluster with each node having 6core's and 28gb > of ram. We will need to swap out the existing instances with > larger spec's. > > > We are going to try to flush the existing workload out on one > of the workers to see if we can swap them out one at a time. > > > Thanks, > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting < > https://www.redhat.com/> > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. < > tmiller at mitre.org> wrote: > > > Here's what I can see, given the perm limits I seem to be > under: > > - NS:develop-misp-app and NS:lp-develop-misp-app both have > several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned > while trying to fetch something from somewhere (URL isn't recorded in the > stack trace). > > - NS:minishift-misp-app has most of its pods/jobs stuck in > ImagePullBackoff. No detail there in the event stream so I'll see if I can > dig deeper. > > - NS:aam-ci-cd has Jenkins trying to spin up three workers, > those are coming back as unschedulable. > > I can't see into NS:aam-bases or NS:dsop-images b/c of perm > limits. > > I see no DAS-related project(s). > > The MISP stuff needs debugging before calling "blocked" since > it looks like an internal error from this perspective. > > > > In re: AAM Jenkins: If this deployment is coming out of the > OCP storefront, then maybe it should be ephemeral rather than persistent. > If it's a custom deployment, then it probably needs a rethink. > > I'm also not sure why there are two MISP dev projects. > > -- T > > > > On 12/2/19, 12:46, "Kevin O'Donnell" > wrote: > > Russell, > > > Thank you for the information. We can switch out the > instance type for the worker nodes. How much memory is required by the apps? > > > > Thanks, > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting < > https://www.redhat.com/> > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C < > Russell.Kendall at mantech.com> wrote: > > > Kevin, > The lack of resources on > u-p.io < > http://u-p.io> > cluster is hindering development, > testing, and integration of the apps from CCAT AAM DAS, which is > putting one > of our PI goals at risk. > > > We are blocked by the fact that we (CCAT and AAM) cannot > deploy additional pods to the > > unified-platform.io < > http://unified-platform.io> < > http://unified-platform.io> > cluster. We have a subset of containers deployed, but rolling > deployments and new deployments fail. This means that we are > not able to execute integration testing or peer reviews. > We are temporarily working around by NOT > testing/reviewing our code changes live, something that no one likes. Also, > we are now running weeks-old instances of our containers, so we are very > likely producing some technical debt. We currently have > developers > approaching idle or doing non-priority work until the > resource issue is resolved. > > > > Here is the particular error from the OSP cluster I > received while attempting a redeploy of one of our apps. > > > > 0/9 nodes are available: 1 node(s) had taints that the pod > didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node > selector.11 times in the last minute > > Since we do not have any cluster permissions, I cannot > verify which resource is running out, but from experience, I assess it is a > memory issue. > > > > It appears the cluster has been provisioned with a silly > allocation of node types. Without knowing exactly what was deployed, it > appears only 3 of the 9 hosts are suitable worker nodes. We would expect > the cluster to respond to resource limitations > and > scale, > but if a scheduled downtime is required, please work with > us so we can anticipate. As it stands, the cluster does not support > resources required by CCAT and the other dev teams (AAM, DAS, etc.). We > would accept any downtime if it will improve the > situation, > as we are blocked from progressing under the current > constraints. My hope was we could get the cluster redeployed over the TG > holiday to eliminate developer impact, but as Mark pointed out, there were > limited support folks available. Now I am just > trying > to > minimize the losses. > > > > V/R, > > Russell C Kendall > > > > > > ________________________________________ > From: Kevin O'Donnell > Sent: Monday, December 2, 2019 11:52 AM > To: Kendall, Russell C > Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC > AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF > AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); > DIROCCO, > ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy > J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: Re: Unified Platform Pod Deploy Errors > > Hello Russell, > > > Can you elaborate on the term Blocked? What specific > issues are the blockers? > > > > Thanks, > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting < > https://www.redhat.com/> > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C < > Russell.Kendall at mantech.com> wrote: > > > Mark, > > Thank for acknowledging, please be aware the San Antonio > dev teams working in > > > unified-platform.io < > http://unified-platform.io> < > http://unified-platform.io> > are currently blocked. > > V/R, > > Russell C Kendall > > ________________________________________ > From: Mark Nissley > Sent: Monday, December 2, 2019 9:36 AM > To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; > Jonathan Rickard; Chris Kuperstein > Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin > O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org); > DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy > J.; > RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: Re: Unified Platform Pod Deploy Errors > > As noted, I don't suspect much got done on this over the > holiday weekend. I did see the ticket, as dropped some details into it. I > also assigned it to @Jonathan > Rickard and @Chris Kuperstein > . > > > > It looks like short term solutions have been easy but the > issue is recurring. > > > > > Mark NISSLEY, PMP, > CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > North American Consulting, Public Sector > > M: > 850-530-3234 > > > Scheduled Training: October 14-18 > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 > USAF AFMC AFLCMC/HNCP wrote: > > > Mark/Kevin, > > > I just heard at the team stand up that we are still > blocked. This is also affecting the AAM team from my investigations. > > > Please let me know if there is something we need to do to > move this forward. > > Most Sincerely, > > > Ade Abodunrin, GG-12, USAF > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > > LevelUP Code Works > Commercial: > (210) 890-2113 > NIPR email: > ademola.abodunrin at us.af.mil > > > > > > > > > ________________________________________ > From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > Sent: Wednesday, November 27, 2019 12:58 PM > To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < > austen.bryan.1 at us.af.mil>; Mark Nissley ; Kevin > O'Donnell > ; > Brenna Gordon > Cc: Kendall, Russell C ; > Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 > USAF AFMC ESC/AFLCMC/HNCP > ; Miller, Timothy J. < > tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < > jose.ramirez.50.ctr at us.af.mil> > Subject: Re: Unified Platform Pod Deploy Errors > > Thanks a lot Capt Bryan! Russell created the ticket on > GitLab UP Node Project. > > > > > Most Sincerely, > > > Ade Abodunrin, GG-12, USAF > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > > LevelUP Code Works > Commercial: > (210) 890-2113 > NIPR email: > ademola.abodunrin at us.af.mil > > > > > > > > > ________________________________________ > From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < > austen.bryan.1 at us.af.mil> > Sent: Wednesday, November 27, 2019 12:56 PM > To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < > ademola.abodunrin at us.af.mil>; Mark Nissley ; Kevin > O'Donnell > ; Brenna Gordon > Cc: Kendall, Russell C ; > Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 > USAF AFMC ESC/AFLCMC/HNCP > ; Miller, Timothy J. < > tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < > jose.ramirez.50.ctr at us.af.mil> > Subject: RE: Unified Platform Pod Deploy Errors > > Thanks Ade. The team is thin until next week due to the > holidays but I will make sure it is addressed. Were there any issues > submitted to Gitlab?s UP Node Project on DCCSCR? > > @Mark/Kevin ? can we address? > > -Austen > > From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < > ademola.abodunrin at us.af.mil> > > Sent: Wednesday, November 27, 2019 9:51 AM > To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < > austen.bryan.1 at us.af.mil> > Cc: Kendall, Russell C ; > Bubb, Mike (mbubb at mitre.org) > Subject: Fw: Unified Platform Pod Deploy Errors > > > > Capt Bryan, > > Please see the explanation on the issue that Ginyu Force > is currently experiencing below. > > > > Most Sincerely, > > Ade Abodunrin, GG-12, USAF > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > LevelUP Code Works > Commercial: (210) 890-2113 > NIPR email: > ademola.abodunrin at us.af.mil > > > > > > ________________________________________ > > From: Kendall, Russell C > Sent: Wednesday, November 27, 2019 9:46 AM > To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < > ademola.abodunrin at us.af.mil>; Buffaloe, > Christopher ; Molina, > Toby ; > Crace, Jared E ; SANCHEZ, MARK > GG-13 USAF AFMC AFLCMC/HNCP > Cc: > tmiller at mitre.org < > tmiller at mitre.org> > Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy > Errors > > > > Gentlemen, > > The application development teams working in the new > GovCloud OCP environment (unified-platform.io > > > ) > are currently blocked in efforts to deploy new pods for > testing, development, and UAT. > > Red Hat and RogueOne SMEs have been notified and have > attempted some fixes starting on Monday 11/25, but at this point have not > been able to provision resources > sufficient to host CCAT and AAM. > > We have taken steps to minimize our footprint (eliminating > demonstration environment, deleting developer namespaces), but this is not > a sustainable approach, > and has only resulted in moderate improvements in cluster > performance. > > Our hope is the U-P.io cluster compute resources can be > increased very soon, so that we may resume normal development activities. > Our understanding is that > such a scaling requires a complete redeployment of the > cluster, which is unusual, but an acceptable loss to productivity. If the > cluster can be scaled up over the Thanksgiving holiday, the impact will be > minimal to developers and cluster administrators, > alike. > > We are currently collaborating on solutions on the > following MatterMost channel behind the space camp VPN (link below), and > via the email thread forwarded > (further below). > > > > https://chat.spacecamp.ninja/levelup/channels/unified-platform-node < > https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> < > https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> > > > Please keep me posted on developments and I will > coordinate developer activities with any scheduled platform outages. > > V/R, > Russell C Kendall > > ________________________________________ > > From: Curran, Daniel M > Sent: Monday, November 25, 2019 2:47 PM > To: Jonathan Rickard > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, > Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; > Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander; Crace, Jared E; Middleton, > Joseph J > Subject: Re: Unified Platform Pod Deploy Errors > > > > Sounds great. Appreciate it. > I'll watch email and Mattermost in case you need more from > us. > > -Daniel > > ________________________________________ > > From: Jonathan Rickard > Sent: Monday, November 25, 2019 2:44 PM > To: Curran, Daniel M > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, > Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; > Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander; Crace, Jared E; Middleton, > Joseph J > Subject: Re: Unified Platform Pod Deploy Errors > > > > Thanks Daniel - > > > > I'll continue to look into the resource issue that you're > seeing - I'd like to identify the root cause and then work with the team to > come up with a solution. > > > > Jonathan Rickard, > RHCA > Principal Consultant, NAPS > Red > Hat Remote - Texas > jonny at redhat.com > > M: 210-862-9739 > > > > > > > > > > > > > > > > On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M < > Daniel.Curran at mantech.com> > wrote: > > > Yeah we hit the limit then had AAM kill some of their > projects and then our pods got scheduled. > We've hit the limit again though. Here's an example pod > that cannot be scheduled > > > > > https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth > < > https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth> > < > https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth > > > < > https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth > > > They're seeing it when their jenkins slaves can't deploy > but it's basically any pod after we hit some limit. > > -Daniel > ________________________________________ > > From: Jonathan Rickard > Sent: Monday, November 25, 2019 1:26 PM > To: Curran, Daniel M > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, > Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; > Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander; Crace, Jared E; Middleton, > Joseph J > Subject: Re: Unified Platform Pod Deploy Errors > > > > Daniel, > > > > I can see that you have 3 mongo pods, 1 chatup and 1 upbot > pod running ... is your app good to go? > > > > Looks like there was an issue with memory on 1 pod, then > some node selector being mismatched - just what i could see in the events... > > > > > > > Jonathan Rickard, > RHCA > Principal Consultant, NAPS > Red > Hat Remote - Texas > jonny at redhat.com > > M: 210-862-9739 > > > > > > > > > > > > > > > > On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M < > Daniel.Curran at mantech.com> > wrote: > > > Also, AAM was having similar issues. Looks like they had a > lot of namespaces and scaling down the pods on their deployments didn't > help but actually deleting the namespaces > did. > We have pods scheduling now but I'm adding them and we'd > still like to work through what resource limit we were hitting to avoid > this in the future. > > -Daniel > > ________________________________________ > > From: Curran, Daniel M > Sent: Monday, November 25, 2019 12:25 PM > To: Jonathan Rickard > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, > Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; > Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander > Subject: Re: Unified Platform Pod Deploy Errors > > > > Thanks, sir. > Most important for us to get working is "ccat-demo" but > it's also happening in "ccat-dev" and "ccat-ci-cd". > > -Daniel > ________________________________________ > > From: Jonathan Rickard > Sent: Monday, November 25, 2019 12:22 PM > To: Curran, Daniel M > Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; > dlystra at redhat.com ; Sison, > Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; > Phil > Soliz; > Buffaloe, > Christopher; Torres, Alexander > Subject: Re: Unified Platform Pod Deploy Errors > > > > What's the name of the project you're working in? I'm > going to be back at my laptop in about 30 and will take a look when I get > there. > > > > Is it just the Jenkins pods failing? > > > > > > > > On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M < > Daniel.Curran at mantech.com> > wrote: > > > Adding Dean and Alex. > Also, sitting in mattermost if anyone needs to get online > and chat for more information. > > -Daniel > > ________________________________________ > > From: Curran, Daniel M > Sent: Monday, November 25, 2019 12:07 PM > To: > jonny at redhat.com ; > > ckuperst at redhat.com ; Mark > Nissley > Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell > C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher > Subject: Re: Unified Platform Pod Deploy Errors > > > > Adding Kupe and Mark. > > -Daniel > ________________________________________ > > From: Curran, Daniel M > Sent: Monday, November 25, 2019 11:43 AM > To: > jonny at redhat.com > Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell > C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher > Subject: Unified Platform Pod Deploy Errors > > > > Hey Jonny, > > We met briefly at SpaceCAMP a couple weeks ago when > > > > > cluster.unified-platform.io < > http://cluster.unified-platform.io> > > was stood up. We've been trying to deploy some apps today and so > far today we're getting errors on most (if > not all) of our pods. > > 0/9 nodes are available: 3 Insufficient pods, 6 node(s) > didn't match node selector. > > Is what we're seeing. We were thinking it was some volume > types weren't correct but some of our pods don't even have volumes attached > and still give us this error (i.e. Jenkins > slaves or web frontends without persistent storage). > Any idea what this could be? We're not running out of > space on the nodes themselves are we? > We have a demo scheduled for tomorrow at 9:30 AM CST and > are hoping to get a demo env up for them today but this error came up > unexpectedly. Also, we're here at 500 Navarro > St. in San Antonio working through this in person is > better/easier. > > Thanks, > Daniel Curran > > > > > > ________________________________________ > > > This e-mail and any attachments are intended only for the > use of the addressee(s) named herein and may contain proprietary > information. If you are not the intended recipient of this e-mail or > believe that you received this email in error, please > take > immediate > action to notify the sender of the apparent error by > reply e-mail; permanently delete the e-mail and any attachments from your > computer; and do not disseminate, distribute, use, or copy this message and > any attachments. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: From mnissley at redhat.com Fri Dec 6 18:13:04 2019 From: mnissley at redhat.com (Mark Nissley) Date: Fri, 6 Dec 2019 12:13:04 -0600 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: References: <1574703796843.94672@ManTech.com> <1574705271520.5593@ManTech.com> <1574705967080.55870@ManTech.com> <1574706319228.45425@ManTech.com> <1574707851362.16520@ManTech.com> <1574713055636.69810@ManTech.com> <1574714851887.51023@ManTech.com> <1574869597001.92851@ManTech.com> <1575303080504.65964@ManTech.com> <1575311511650.96981@ManTech.com> <6277_1575312368_5DE55BEF_6277_803_1_CA+EGyAB6=xB9Na5Wf-nWY52dwt1200Rk8uh6krw1CXojEaO1PQ@mail.gmail.com> <7E65F8DF-F96B-4657-82A2-4A95715FAC0B@mitre.org> <5A89B24C-3C36-4205-B083-CBFBF71ED122@mitre.org> <3052516616bb44caa350c3c6897bc122@DCCPRDEXCH02.ManTech.com> <406B92B0-26F9-48FD-B533-549CC36B4FB2@mitre.org> <1575472568190.35402@ManTech.com> <1575561314717.40064@ManTech.com> Message-ID: Screenshots are also very helpful! Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled Training: October 14-18* On Fri, Dec 6, 2019 at 12:07 PM Jonathan Rickard wrote: > Ade, > > What does that mean? You can't login, you can't deploy? > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > > > On Fri, Dec 6, 2019 at 12:02 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC > AFLCMC/HNCP wrote: > >> ALCON, >> >> >> >> The cluster is down again. Please assist. >> >> >> >> Most Sincerely, >> >> >> >> Ade Abodunrin, GG-12, USAF >> >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> >> [image: cid:image001.png at 01D4F814.4AA552D0] >> >> LevelUP Code Works >> >> Commercial: (210) 890-2113 >> >> NIPR email: *ademola.abodunrin at us.af.mil * >> >> >> >> *From:* Kendall, Russell C >> *Sent:* Thursday, December 5, 2019 9:55 AM >> *To:* Jonathan Rickard >> *Cc:* Miller, Timothy J. ; Keegan Reap < >> kreap at redhat.com>; Bubb, Mike ; platformONE at redhat.com; >> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; >> ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >> ademola.abodunrin at us.af.mil>; Jonathan Rickard >> *Subject:* [Non-DoD Source] Re: [Platformone] [EXT] Re: Unified Platform >> Pod Deploy Errors >> >> >> >> Jonny, >> >> I'll see you Friday at 500 Nav. Travel safe. >> >> >> >> V/R, >> >> Russell C Kendall? >> >> >> ------------------------------ >> >> *From:* Jonathan Rickard >> *Sent:* Wednesday, December 4, 2019 5:29 PM >> *To:* Kendall, Russell C >> *Cc:* Miller, Timothy J.; Keegan Reap; Bubb, Mike; platformONE at redhat.com; >> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF >> AFMC AFLCMC/HNCP; Jonathan Rickard >> *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors >> >> >> >> Russell, >> >> >> >> I have definitely been terrible with email lately and I apologize for the >> slow response times. I get back to San Antonio tomorrow but I have a pretty >> full afternoon. I can stop by Friday if you'd like. >> >> >> >> Thanks, >> >> jonny >> >> >> >> *Jonathan Rickard**, RHCA* >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> >> >> >> >> On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C < >> Russell.Kendall at mantech.com> wrote: >> >> Jonny, >> I'd like to suggest you come to 500 to wrap this up, since it seems there >> are significant delays in communication that are contributing to downtime. >> V/R, >> Russell C Kendall >> ________________________________________ >> From: Miller, Timothy J. >> Sent: Wednesday, December 4, 2019 7:02 AM >> To: Jonathan Rickard; Keegan Reap >> Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ, >> JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >> AFLCMC/HNCP; Jonathan Rickard >> Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors >> >> Johnny-- >> >> Update the issue, if you would be so kind. >> >> -- T >> >> ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of >> Jonathan Rickard" > jrickard at redhat.com> wrote: >> >> Hey Guys - Sorry for taking so long - this has been completed. Please >> run your builds and let us know if you're having any problems. >> jonny >> Jonathan Rickard, RHCA >> Principal Consultant, NAPS >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 > >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard >> wrote: >> >> >> Russell / Team, >> >> >> We believe we've identified the issue with your application >> deploying. In order to rectify the issue I need to evacuate pods so you >> will probably see some hiccups while deploying. I will update when this is >> resolved. >> >> >> Thanks, >> jonny >> >> Jonathan Rickard, RHCA >> Principal Consultant, NAPS >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 > >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap wrote: >> >> >> Hey all, we have opened an issue below, that we believe to be the >> cause, we are currently investigating: >> >> >> https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 >> >> >> >> On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard >> wrote: >> >> >> Russell, >> >> >> Getting more eyes on this @platformONE at redhat.com > platformONE at redhat.com> >> >> >> We'll keep you posted. >> jonny >> Jonathan Rickard, RHCA >> Principal Consultant, NAPS >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 > >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C < >> Russell.Kendall at mantech.com> wrote: >> >> >> Kevin, >> >> Unfortunately we are receiving deployment errors again. This is the >> event: >> >> 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had >> taints that the pod didn't tolerate, 6 node(s) didn't match node selector. >> >> This is the deployment: >> >> >> https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup >> >> >> V/R, >> Russell C Kendall >> ________________________________________ >> From: Miller, Timothy J. >> Sent: Monday, December 2, 2019 2:44:21 PM >> To: Kevin O'Donnell >> Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF >> AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt >> USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 >> USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE >> A CTR USAF AFMC AFLCMC/HNCP >> Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors >> >> Tagged you on it. >> >> -- T >> >> On 12/2/19, 14:03, "Kevin O'Donnell" wrote: >> >> Hello, >> >> >> Autoscaling is on our future IAC roadmap. Tim, the additional >> ticket would be appreciated. >> >> >> We have swapped out the app/worker instances with m5a.8xlarge 32 >> cores, 128gb of ram. Please let us know if you have any other issues. >> >> >> Thanks, >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting < >> https://www.redhat.com/> >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. < >> tmiller at mitre.org> wrote: >> >> >> I'll open an issue. IaC needs to have instance size as a >> host_var to facilitate scaling. >> >> -- T >> >> On 12/2/19, 13:15, "Kevin O'Donnell" wrote: >> >> Tim, >> >> >> Thanks for the information. We are undersized on the >> app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb >> of ram. From what I have read each Labs engagement operated on a 3 node >> worker cluster with each node having 6core's and 28gb >> of ram. We will need to swap out the existing instances with >> larger spec's. >> >> >> We are going to try to flush the existing workload out on one >> of the workers to see if we can swap them out one at a time. >> >> >> Thanks, >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting < >> https://www.redhat.com/> >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. < >> tmiller at mitre.org> wrote: >> >> >> Here's what I can see, given the perm limits I seem to be >> under: >> >> - NS:develop-misp-app and NS:lp-develop-misp-app both have >> several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned >> while trying to fetch something from somewhere (URL isn't recorded in the >> stack trace). >> >> - NS:minishift-misp-app has most of its pods/jobs stuck in >> ImagePullBackoff. No detail there in the event stream so I'll see if I can >> dig deeper. >> >> - NS:aam-ci-cd has Jenkins trying to spin up three workers, >> those are coming back as unschedulable. >> >> I can't see into NS:aam-bases or NS:dsop-images b/c of perm >> limits. >> >> I see no DAS-related project(s). >> >> The MISP stuff needs debugging before calling "blocked" since >> it looks like an internal error from this perspective. >> >> >> >> In re: AAM Jenkins: If this deployment is coming out of the >> OCP storefront, then maybe it should be ephemeral rather than persistent. >> If it's a custom deployment, then it probably needs a rethink. >> >> I'm also not sure why there are two MISP dev projects. >> >> -- T >> >> >> >> On 12/2/19, 12:46, "Kevin O'Donnell" >> wrote: >> >> Russell, >> >> >> Thank you for the information. We can switch out the >> instance type for the worker nodes. How much memory is required by the apps? >> >> >> >> Thanks, >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting < >> https://www.redhat.com/> >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C < >> Russell.Kendall at mantech.com> wrote: >> >> >> Kevin, >> The lack of resources on >> u-p.io < >> http://u-p.io> >> cluster is hindering development, >> testing, and integration of the apps from CCAT AAM DAS, which is >> putting one >> of our PI goals at risk. >> >> >> We are blocked by the fact that we (CCAT and AAM) cannot >> deploy additional pods to the >> >> unified-platform.io < >> http://unified-platform.io> < >> http://unified-platform.io> >> cluster. We have a subset of containers deployed, but rolling >> deployments and new deployments fail. This means that we are >> not able to execute integration testing or peer reviews. >> We are temporarily working around by NOT >> testing/reviewing our code changes live, something that no one likes. Also, >> we are now running weeks-old instances of our containers, so we are very >> likely producing some technical debt. We currently have >> developers >> approaching idle or doing non-priority work until the >> resource issue is resolved. >> >> >> >> Here is the particular error from the OSP cluster I >> received while attempting a redeploy of one of our apps. >> >> >> >> 0/9 nodes are available: 1 node(s) had taints that the >> pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node >> selector.11 times in the last minute >> >> Since we do not have any cluster permissions, I cannot >> verify which resource is running out, but from experience, I assess it is a >> memory issue. >> >> >> >> It appears the cluster has been provisioned with a silly >> allocation of node types. Without knowing exactly what was deployed, it >> appears only 3 of the 9 hosts are suitable worker nodes. We would expect >> the cluster to respond to resource limitations >> and >> scale, >> but if a scheduled downtime is required, please work >> with us so we can anticipate. As it stands, the cluster does not support >> resources required by CCAT and the other dev teams (AAM, DAS, etc.). We >> would accept any downtime if it will improve the >> situation, >> as we are blocked from progressing under the current >> constraints. My hope was we could get the cluster redeployed over the TG >> holiday to eliminate developer impact, but as Mark pointed out, there were >> limited support folks available. Now I am just >> trying >> to >> minimize the losses. >> >> >> >> V/R, >> >> Russell C Kendall >> >> >> >> >> >> ________________________________________ >> From: Kevin O'Donnell >> Sent: Monday, December 2, 2019 11:52 AM >> To: Kendall, Russell C >> Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >> AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF >> AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); >> DIROCCO, >> ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy >> J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >> Subject: Re: Unified Platform Pod Deploy Errors >> >> Hello Russell, >> >> >> Can you elaborate on the term Blocked? What specific >> issues are the blockers? >> >> >> >> Thanks, >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting < >> https://www.redhat.com/> >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C < >> Russell.Kendall at mantech.com> wrote: >> >> >> Mark, >> >> Thank for acknowledging, please be aware the San Antonio >> dev teams working in >> >> >> unified-platform.io < >> http://unified-platform.io> < >> http://unified-platform.io> >> are currently blocked. >> >> V/R, >> >> Russell C Kendall >> >> ________________________________________ >> From: Mark Nissley >> Sent: Monday, December 2, 2019 9:36 AM >> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; >> Jonathan Rickard; Chris Kuperstein >> Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin >> O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org); >> DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy >> J.; >> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >> Subject: Re: Unified Platform Pod Deploy Errors >> >> As noted, I don't suspect much got done on this over the >> holiday weekend. I did see the ticket, as dropped some details into it. I >> also assigned it to @Jonathan >> Rickard and @Chris Kuperstein >> . >> >> >> >> It looks like short term solutions have been easy but the >> issue is recurring. >> >> >> >> >> Mark NISSLEY, PMP, >> CSM, LEAN >> >> PROGRAM MaNAGER & SR technical Project Manager >> North American Consulting, Public Sector >> >> M: >> 850-530-3234 >> >> >> Scheduled Training: October 14-18 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A >> GG-12 USAF AFMC AFLCMC/HNCP wrote: >> >> >> Mark/Kevin, >> >> >> I just heard at the team stand up that we are still >> blocked. This is also affecting the AAM team from my investigations. >> >> >> Please let me know if there is something we need to do to >> move this forward. >> >> Most Sincerely, >> >> >> Ade Abodunrin, GG-12, USAF >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> >> LevelUP Code Works >> Commercial: >> (210) 890-2113 >> NIPR email: >> ademola.abodunrin at us.af.mil >> >> >> >> >> >> >> >> >> ________________________________________ >> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >> Sent: Wednesday, November 27, 2019 12:58 PM >> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >> austen.bryan.1 at us.af.mil>; Mark Nissley ; Kevin >> O'Donnell >> ; >> Brenna Gordon >> Cc: Kendall, Russell C ; >> Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 >> USAF AFMC ESC/AFLCMC/HNCP >> ; Miller, Timothy J. < >> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >> jose.ramirez.50.ctr at us.af.mil> >> Subject: Re: Unified Platform Pod Deploy Errors >> >> Thanks a lot Capt Bryan! Russell created the ticket on >> GitLab UP Node Project. >> >> >> >> >> Most Sincerely, >> >> >> Ade Abodunrin, GG-12, USAF >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> >> LevelUP Code Works >> Commercial: >> (210) 890-2113 >> NIPR email: >> ademola.abodunrin at us.af.mil >> >> >> >> >> >> >> >> >> ________________________________________ >> From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >> austen.bryan.1 at us.af.mil> >> Sent: Wednesday, November 27, 2019 12:56 PM >> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >> ademola.abodunrin at us.af.mil>; Mark Nissley ; Kevin >> O'Donnell >> ; Brenna Gordon > > >> Cc: Kendall, Russell C ; >> Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 >> USAF AFMC ESC/AFLCMC/HNCP >> ; Miller, Timothy J. < >> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >> jose.ramirez.50.ctr at us.af.mil> >> Subject: RE: Unified Platform Pod Deploy Errors >> >> Thanks Ade. The team is thin until next week due to the >> holidays but I will make sure it is addressed. Were there any issues >> submitted to Gitlab?s UP Node Project on DCCSCR? >> >> @Mark/Kevin ? can we address? >> >> -Austen >> >> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >> ademola.abodunrin at us.af.mil> >> >> Sent: Wednesday, November 27, 2019 9:51 AM >> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >> austen.bryan.1 at us.af.mil> >> Cc: Kendall, Russell C ; >> Bubb, Mike (mbubb at mitre.org) >> Subject: Fw: Unified Platform Pod Deploy Errors >> >> >> >> Capt Bryan, >> >> Please see the explanation on the issue that Ginyu Force >> is currently experiencing below. >> >> >> >> Most Sincerely, >> >> Ade Abodunrin, GG-12, USAF >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> LevelUP Code Works >> Commercial: (210) 890-2113 >> NIPR email: >> ademola.abodunrin at us.af.mil >> >> >> >> >> >> ________________________________________ >> >> From: Kendall, Russell C >> Sent: Wednesday, November 27, 2019 9:46 AM >> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >> ademola.abodunrin at us.af.mil>; Buffaloe, >> Christopher ; Molina, >> Toby ; >> Crace, Jared E ; SANCHEZ, MARK >> GG-13 USAF AFMC AFLCMC/HNCP >> Cc: >> tmiller at mitre.org < >> tmiller at mitre.org> >> Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy >> Errors >> >> >> >> Gentlemen, >> >> The application development teams working in the new >> GovCloud OCP environment (unified-platform.io >> >> >> ) >> are currently blocked in efforts to deploy new pods for >> testing, development, and UAT. >> >> Red Hat and RogueOne SMEs have been notified and have >> attempted some fixes starting on Monday 11/25, but at this point have not >> been able to provision resources >> sufficient to host CCAT and AAM. >> >> We have taken steps to minimize our footprint >> (eliminating demonstration environment, deleting developer namespaces), but >> this is not a sustainable approach, >> and has only resulted in moderate improvements in >> cluster performance. >> >> Our hope is the U-P.io cluster compute resources can be >> increased very soon, so that we may resume normal development activities. >> Our understanding is that >> such a scaling requires a complete redeployment of the >> cluster, which is unusual, but an acceptable loss to productivity. If the >> cluster can be scaled up over the Thanksgiving holiday, the impact will be >> minimal to developers and cluster administrators, >> alike. >> >> We are currently collaborating on solutions on the >> following MatterMost channel behind the space camp VPN (link below), and >> via the email thread forwarded >> (further below). >> >> >> >> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node < >> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> < >> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> >> > > >> >> Please keep me posted on developments and I will >> coordinate developer activities with any scheduled platform outages. >> >> V/R, >> Russell C Kendall >> >> ________________________________________ >> >> From: Curran, Daniel M >> Sent: Monday, November 25, 2019 2:47 PM >> To: Jonathan Rickard >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, >> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >> Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >> Joseph J >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Sounds great. Appreciate it. >> I'll watch email and Mattermost in case you need more >> from us. >> >> -Daniel >> >> ________________________________________ >> >> From: Jonathan Rickard >> Sent: Monday, November 25, 2019 2:44 PM >> To: Curran, Daniel M >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, >> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >> Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >> Joseph J >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Thanks Daniel - >> >> >> >> I'll continue to look into the resource issue that you're >> seeing - I'd like to identify the root cause and then work with the team to >> come up with a solution. >> >> >> >> Jonathan Rickard, >> RHCA >> Principal Consultant, NAPS >> Red >> Hat Remote - Texas >> jonny at redhat.com >> >> M: 210-862-9739 > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M < >> Daniel.Curran at mantech.com> >> wrote: >> >> >> Yeah we hit the limit then had AAM kill some of their >> projects and then our pods got scheduled. >> We've hit the limit again though. Here's an example pod >> that cannot be scheduled >> >> >> >> >> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >> < >> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth> >> < >> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >> > >> < >> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >> > >> They're seeing it when their jenkins slaves can't deploy >> but it's basically any pod after we hit some limit. >> >> -Daniel >> ________________________________________ >> >> From: Jonathan Rickard >> Sent: Monday, November 25, 2019 1:26 PM >> To: Curran, Daniel M >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, >> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >> Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >> Joseph J >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Daniel, >> >> >> >> I can see that you have 3 mongo pods, 1 chatup and 1 >> upbot pod running ... is your app good to go? >> >> >> >> Looks like there was an issue with memory on 1 pod, then >> some node selector being mismatched - just what i could see in the events... >> >> >> >> >> >> >> Jonathan Rickard, >> RHCA >> Principal Consultant, NAPS >> Red >> Hat Remote - Texas >> jonny at redhat.com >> >> M: 210-862-9739 > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M < >> Daniel.Curran at mantech.com> >> wrote: >> >> >> Also, AAM was having similar issues. Looks like they had >> a lot of namespaces and scaling down the pods on their deployments didn't >> help but actually deleting the namespaces >> did. >> We have pods scheduling now but I'm adding them and we'd >> still like to work through what resource limit we were hitting to avoid >> this in the future. >> >> -Daniel >> >> ________________________________________ >> >> From: Curran, Daniel M >> Sent: Monday, November 25, 2019 12:25 PM >> To: Jonathan Rickard >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, >> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >> Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Thanks, sir. >> Most important for us to get working is "ccat-demo" but >> it's also happening in "ccat-dev" and "ccat-ci-cd". >> >> -Daniel >> ________________________________________ >> >> From: Jonathan Rickard >> Sent: Monday, November 25, 2019 12:22 PM >> To: Curran, Daniel M >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, >> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >> Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> What's the name of the project you're working in? I'm >> going to be back at my laptop in about 30 and will take a look when I get >> there. >> >> >> >> Is it just the Jenkins pods failing? >> >> >> >> >> >> >> >> On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M < >> Daniel.Curran at mantech.com> >> wrote: >> >> >> Adding Dean and Alex. >> Also, sitting in mattermost if anyone needs to get online >> and chat for more information. >> >> -Daniel >> >> ________________________________________ >> >> From: Curran, Daniel M >> Sent: Monday, November 25, 2019 12:07 PM >> To: >> jonny at redhat.com ; >> >> ckuperst at redhat.com ; Mark >> Nissley >> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, >> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Adding Kupe and Mark. >> >> -Daniel >> ________________________________________ >> >> From: Curran, Daniel M >> Sent: Monday, November 25, 2019 11:43 AM >> To: >> jonny at redhat.com >> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, >> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >> Subject: Unified Platform Pod Deploy Errors >> >> >> >> Hey Jonny, >> >> We met briefly at SpaceCAMP a couple weeks ago when >> >> >> >> >> cluster.unified-platform.io < >> http://cluster.unified-platform.io> >> >> was stood up. We've been trying to deploy some apps today and so >> far today we're getting errors on most (if >> not all) of our pods. >> >> 0/9 nodes are available: 3 Insufficient pods, 6 node(s) >> didn't match node selector. >> >> Is what we're seeing. We were thinking it was some volume >> types weren't correct but some of our pods don't even have volumes attached >> and still give us this error (i.e. Jenkins >> slaves or web frontends without persistent storage). >> Any idea what this could be? We're not running out of >> space on the nodes themselves are we? >> We have a demo scheduled for tomorrow at 9:30 AM CST and >> are hoping to get a demo env up for them today but this error came up >> unexpectedly. Also, we're here at 500 Navarro >> St. in San Antonio working through this in person is >> better/easier. >> >> Thanks, >> Daniel Curran >> >> >> >> >> >> ________________________________________ >> >> >> This e-mail and any attachments are intended only for the >> use of the addressee(s) named herein and may contain proprietary >> information. If you are not the intended recipient of this e-mail or >> believe that you received this email in error, please >> take >> immediate >> action to notify the sender of the apparent error by >> reply e-mail; permanently delete the e-mail and any attachments from your >> computer; and do not disseminate, distribute, use, or copy this message and >> any attachments. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: From jrickard at redhat.com Fri Dec 6 18:13:29 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Fri, 6 Dec 2019 12:13:29 -0600 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: References: <1574703796843.94672@ManTech.com> <1574705271520.5593@ManTech.com> <1574705967080.55870@ManTech.com> <1574706319228.45425@ManTech.com> <1574707851362.16520@ManTech.com> <1574713055636.69810@ManTech.com> <1574714851887.51023@ManTech.com> <1574869597001.92851@ManTech.com> <1575303080504.65964@ManTech.com> <1575311511650.96981@ManTech.com> <6277_1575312368_5DE55BEF_6277_803_1_CA+EGyAB6=xB9Na5Wf-nWY52dwt1200Rk8uh6krw1CXojEaO1PQ@mail.gmail.com> <7E65F8DF-F96B-4657-82A2-4A95715FAC0B@mitre.org> <5A89B24C-3C36-4205-B083-CBFBF71ED122@mitre.org> <3052516616bb44caa350c3c6897bc122@DCCPRDEXCH02.ManTech.com> <406B92B0-26F9-48FD-B533-549CC36B4FB2@mitre.org> <1575472568190.35402@ManTech.com> <1575561314717.40064@ManTech.com> Message-ID: Also, is every application having problems or a specific? Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs On Fri, Dec 6, 2019 at 12:06 PM Jonathan Rickard wrote: > Ade, > > What does that mean? You can't login, you can't deploy? > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > > > On Fri, Dec 6, 2019 at 12:02 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC > AFLCMC/HNCP wrote: > >> ALCON, >> >> >> >> The cluster is down again. Please assist. >> >> >> >> Most Sincerely, >> >> >> >> Ade Abodunrin, GG-12, USAF >> >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> >> [image: cid:image001.png at 01D4F814.4AA552D0] >> >> LevelUP Code Works >> >> Commercial: (210) 890-2113 >> >> NIPR email: *ademola.abodunrin at us.af.mil * >> >> >> >> *From:* Kendall, Russell C >> *Sent:* Thursday, December 5, 2019 9:55 AM >> *To:* Jonathan Rickard >> *Cc:* Miller, Timothy J. ; Keegan Reap < >> kreap at redhat.com>; Bubb, Mike ; platformONE at redhat.com; >> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; >> ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >> ademola.abodunrin at us.af.mil>; Jonathan Rickard >> *Subject:* [Non-DoD Source] Re: [Platformone] [EXT] Re: Unified Platform >> Pod Deploy Errors >> >> >> >> Jonny, >> >> I'll see you Friday at 500 Nav. Travel safe. >> >> >> >> V/R, >> >> Russell C Kendall? >> >> >> ------------------------------ >> >> *From:* Jonathan Rickard >> *Sent:* Wednesday, December 4, 2019 5:29 PM >> *To:* Kendall, Russell C >> *Cc:* Miller, Timothy J.; Keegan Reap; Bubb, Mike; platformONE at redhat.com; >> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF >> AFMC AFLCMC/HNCP; Jonathan Rickard >> *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors >> >> >> >> Russell, >> >> >> >> I have definitely been terrible with email lately and I apologize for the >> slow response times. I get back to San Antonio tomorrow but I have a pretty >> full afternoon. I can stop by Friday if you'd like. >> >> >> >> Thanks, >> >> jonny >> >> >> >> *Jonathan Rickard**, RHCA* >> >> Principal Consultant, NAPS >> >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 >> >> >> >> >> >> >> >> On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C < >> Russell.Kendall at mantech.com> wrote: >> >> Jonny, >> I'd like to suggest you come to 500 to wrap this up, since it seems there >> are significant delays in communication that are contributing to downtime. >> V/R, >> Russell C Kendall >> ________________________________________ >> From: Miller, Timothy J. >> Sent: Wednesday, December 4, 2019 7:02 AM >> To: Jonathan Rickard; Keegan Reap >> Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ, >> JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >> AFLCMC/HNCP; Jonathan Rickard >> Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors >> >> Johnny-- >> >> Update the issue, if you would be so kind. >> >> -- T >> >> ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of >> Jonathan Rickard" > jrickard at redhat.com> wrote: >> >> Hey Guys - Sorry for taking so long - this has been completed. Please >> run your builds and let us know if you're having any problems. >> jonny >> Jonathan Rickard, RHCA >> Principal Consultant, NAPS >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 > >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard >> wrote: >> >> >> Russell / Team, >> >> >> We believe we've identified the issue with your application >> deploying. In order to rectify the issue I need to evacuate pods so you >> will probably see some hiccups while deploying. I will update when this is >> resolved. >> >> >> Thanks, >> jonny >> >> Jonathan Rickard, RHCA >> Principal Consultant, NAPS >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 > >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap wrote: >> >> >> Hey all, we have opened an issue below, that we believe to be the >> cause, we are currently investigating: >> >> >> https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 >> >> >> >> On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard >> wrote: >> >> >> Russell, >> >> >> Getting more eyes on this @platformONE at redhat.com > platformONE at redhat.com> >> >> >> We'll keep you posted. >> jonny >> Jonathan Rickard, RHCA >> Principal Consultant, NAPS >> Red Hat Remote - Texas >> >> jonny at redhat.com >> M: 210-862-9739 > >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C < >> Russell.Kendall at mantech.com> wrote: >> >> >> Kevin, >> >> Unfortunately we are receiving deployment errors again. This is the >> event: >> >> 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had >> taints that the pod didn't tolerate, 6 node(s) didn't match node selector. >> >> This is the deployment: >> >> >> https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup >> >> >> V/R, >> Russell C Kendall >> ________________________________________ >> From: Miller, Timothy J. >> Sent: Monday, December 2, 2019 2:44:21 PM >> To: Kevin O'Donnell >> Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF >> AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt >> USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 >> USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE >> A CTR USAF AFMC AFLCMC/HNCP >> Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors >> >> Tagged you on it. >> >> -- T >> >> On 12/2/19, 14:03, "Kevin O'Donnell" wrote: >> >> Hello, >> >> >> Autoscaling is on our future IAC roadmap. Tim, the additional >> ticket would be appreciated. >> >> >> We have swapped out the app/worker instances with m5a.8xlarge 32 >> cores, 128gb of ram. Please let us know if you have any other issues. >> >> >> Thanks, >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting < >> https://www.redhat.com/> >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. < >> tmiller at mitre.org> wrote: >> >> >> I'll open an issue. IaC needs to have instance size as a >> host_var to facilitate scaling. >> >> -- T >> >> On 12/2/19, 13:15, "Kevin O'Donnell" wrote: >> >> Tim, >> >> >> Thanks for the information. We are undersized on the >> app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb >> of ram. From what I have read each Labs engagement operated on a 3 node >> worker cluster with each node having 6core's and 28gb >> of ram. We will need to swap out the existing instances with >> larger spec's. >> >> >> We are going to try to flush the existing workload out on one >> of the workers to see if we can swap them out one at a time. >> >> >> Thanks, >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting < >> https://www.redhat.com/> >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. < >> tmiller at mitre.org> wrote: >> >> >> Here's what I can see, given the perm limits I seem to be >> under: >> >> - NS:develop-misp-app and NS:lp-develop-misp-app both have >> several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned >> while trying to fetch something from somewhere (URL isn't recorded in the >> stack trace). >> >> - NS:minishift-misp-app has most of its pods/jobs stuck in >> ImagePullBackoff. No detail there in the event stream so I'll see if I can >> dig deeper. >> >> - NS:aam-ci-cd has Jenkins trying to spin up three workers, >> those are coming back as unschedulable. >> >> I can't see into NS:aam-bases or NS:dsop-images b/c of perm >> limits. >> >> I see no DAS-related project(s). >> >> The MISP stuff needs debugging before calling "blocked" since >> it looks like an internal error from this perspective. >> >> >> >> In re: AAM Jenkins: If this deployment is coming out of the >> OCP storefront, then maybe it should be ephemeral rather than persistent. >> If it's a custom deployment, then it probably needs a rethink. >> >> I'm also not sure why there are two MISP dev projects. >> >> -- T >> >> >> >> On 12/2/19, 12:46, "Kevin O'Donnell" >> wrote: >> >> Russell, >> >> >> Thank you for the information. We can switch out the >> instance type for the worker nodes. How much memory is required by the apps? >> >> >> >> Thanks, >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting < >> https://www.redhat.com/> >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C < >> Russell.Kendall at mantech.com> wrote: >> >> >> Kevin, >> The lack of resources on >> u-p.io < >> http://u-p.io> >> cluster is hindering development, >> testing, and integration of the apps from CCAT AAM DAS, which is >> putting one >> of our PI goals at risk. >> >> >> We are blocked by the fact that we (CCAT and AAM) cannot >> deploy additional pods to the >> >> unified-platform.io < >> http://unified-platform.io> < >> http://unified-platform.io> >> cluster. We have a subset of containers deployed, but rolling >> deployments and new deployments fail. This means that we are >> not able to execute integration testing or peer reviews. >> We are temporarily working around by NOT >> testing/reviewing our code changes live, something that no one likes. Also, >> we are now running weeks-old instances of our containers, so we are very >> likely producing some technical debt. We currently have >> developers >> approaching idle or doing non-priority work until the >> resource issue is resolved. >> >> >> >> Here is the particular error from the OSP cluster I >> received while attempting a redeploy of one of our apps. >> >> >> >> 0/9 nodes are available: 1 node(s) had taints that the >> pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node >> selector.11 times in the last minute >> >> Since we do not have any cluster permissions, I cannot >> verify which resource is running out, but from experience, I assess it is a >> memory issue. >> >> >> >> It appears the cluster has been provisioned with a silly >> allocation of node types. Without knowing exactly what was deployed, it >> appears only 3 of the 9 hosts are suitable worker nodes. We would expect >> the cluster to respond to resource limitations >> and >> scale, >> but if a scheduled downtime is required, please work >> with us so we can anticipate. As it stands, the cluster does not support >> resources required by CCAT and the other dev teams (AAM, DAS, etc.). We >> would accept any downtime if it will improve the >> situation, >> as we are blocked from progressing under the current >> constraints. My hope was we could get the cluster redeployed over the TG >> holiday to eliminate developer impact, but as Mark pointed out, there were >> limited support folks available. Now I am just >> trying >> to >> minimize the losses. >> >> >> >> V/R, >> >> Russell C Kendall >> >> >> >> >> >> ________________________________________ >> From: Kevin O'Donnell >> Sent: Monday, December 2, 2019 11:52 AM >> To: Kendall, Russell C >> Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >> AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF >> AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); >> DIROCCO, >> ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy >> J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >> Subject: Re: Unified Platform Pod Deploy Errors >> >> Hello Russell, >> >> >> Can you elaborate on the term Blocked? What specific >> issues are the blockers? >> >> >> >> Thanks, >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting < >> https://www.redhat.com/> >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C < >> Russell.Kendall at mantech.com> wrote: >> >> >> Mark, >> >> Thank for acknowledging, please be aware the San Antonio >> dev teams working in >> >> >> unified-platform.io < >> http://unified-platform.io> < >> http://unified-platform.io> >> are currently blocked. >> >> V/R, >> >> Russell C Kendall >> >> ________________________________________ >> From: Mark Nissley >> Sent: Monday, December 2, 2019 9:36 AM >> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; >> Jonathan Rickard; Chris Kuperstein >> Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin >> O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org); >> DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy >> J.; >> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >> Subject: Re: Unified Platform Pod Deploy Errors >> >> As noted, I don't suspect much got done on this over the >> holiday weekend. I did see the ticket, as dropped some details into it. I >> also assigned it to @Jonathan >> Rickard and @Chris Kuperstein >> . >> >> >> >> It looks like short term solutions have been easy but the >> issue is recurring. >> >> >> >> >> Mark NISSLEY, PMP, >> CSM, LEAN >> >> PROGRAM MaNAGER & SR technical Project Manager >> North American Consulting, Public Sector >> >> M: >> 850-530-3234 >> >> >> Scheduled Training: October 14-18 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A >> GG-12 USAF AFMC AFLCMC/HNCP wrote: >> >> >> Mark/Kevin, >> >> >> I just heard at the team stand up that we are still >> blocked. This is also affecting the AAM team from my investigations. >> >> >> Please let me know if there is something we need to do to >> move this forward. >> >> Most Sincerely, >> >> >> Ade Abodunrin, GG-12, USAF >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> >> LevelUP Code Works >> Commercial: >> (210) 890-2113 >> NIPR email: >> ademola.abodunrin at us.af.mil >> >> >> >> >> >> >> >> >> ________________________________________ >> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >> Sent: Wednesday, November 27, 2019 12:58 PM >> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >> austen.bryan.1 at us.af.mil>; Mark Nissley ; Kevin >> O'Donnell >> ; >> Brenna Gordon >> Cc: Kendall, Russell C ; >> Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 >> USAF AFMC ESC/AFLCMC/HNCP >> ; Miller, Timothy J. < >> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >> jose.ramirez.50.ctr at us.af.mil> >> Subject: Re: Unified Platform Pod Deploy Errors >> >> Thanks a lot Capt Bryan! Russell created the ticket on >> GitLab UP Node Project. >> >> >> >> >> Most Sincerely, >> >> >> Ade Abodunrin, GG-12, USAF >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> >> LevelUP Code Works >> Commercial: >> (210) 890-2113 >> NIPR email: >> ademola.abodunrin at us.af.mil >> >> >> >> >> >> >> >> >> ________________________________________ >> From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >> austen.bryan.1 at us.af.mil> >> Sent: Wednesday, November 27, 2019 12:56 PM >> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >> ademola.abodunrin at us.af.mil>; Mark Nissley ; Kevin >> O'Donnell >> ; Brenna Gordon > > >> Cc: Kendall, Russell C ; >> Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 >> USAF AFMC ESC/AFLCMC/HNCP >> ; Miller, Timothy J. < >> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >> jose.ramirez.50.ctr at us.af.mil> >> Subject: RE: Unified Platform Pod Deploy Errors >> >> Thanks Ade. The team is thin until next week due to the >> holidays but I will make sure it is addressed. Were there any issues >> submitted to Gitlab?s UP Node Project on DCCSCR? >> >> @Mark/Kevin ? can we address? >> >> -Austen >> >> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >> ademola.abodunrin at us.af.mil> >> >> Sent: Wednesday, November 27, 2019 9:51 AM >> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >> austen.bryan.1 at us.af.mil> >> Cc: Kendall, Russell C ; >> Bubb, Mike (mbubb at mitre.org) >> Subject: Fw: Unified Platform Pod Deploy Errors >> >> >> >> Capt Bryan, >> >> Please see the explanation on the issue that Ginyu Force >> is currently experiencing below. >> >> >> >> Most Sincerely, >> >> Ade Abodunrin, GG-12, USAF >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> LevelUP Code Works >> Commercial: (210) 890-2113 >> NIPR email: >> ademola.abodunrin at us.af.mil >> >> >> >> >> >> ________________________________________ >> >> From: Kendall, Russell C >> Sent: Wednesday, November 27, 2019 9:46 AM >> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >> ademola.abodunrin at us.af.mil>; Buffaloe, >> Christopher ; Molina, >> Toby ; >> Crace, Jared E ; SANCHEZ, MARK >> GG-13 USAF AFMC AFLCMC/HNCP >> Cc: >> tmiller at mitre.org < >> tmiller at mitre.org> >> Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy >> Errors >> >> >> >> Gentlemen, >> >> The application development teams working in the new >> GovCloud OCP environment (unified-platform.io >> >> >> ) >> are currently blocked in efforts to deploy new pods for >> testing, development, and UAT. >> >> Red Hat and RogueOne SMEs have been notified and have >> attempted some fixes starting on Monday 11/25, but at this point have not >> been able to provision resources >> sufficient to host CCAT and AAM. >> >> We have taken steps to minimize our footprint >> (eliminating demonstration environment, deleting developer namespaces), but >> this is not a sustainable approach, >> and has only resulted in moderate improvements in >> cluster performance. >> >> Our hope is the U-P.io cluster compute resources can be >> increased very soon, so that we may resume normal development activities. >> Our understanding is that >> such a scaling requires a complete redeployment of the >> cluster, which is unusual, but an acceptable loss to productivity. If the >> cluster can be scaled up over the Thanksgiving holiday, the impact will be >> minimal to developers and cluster administrators, >> alike. >> >> We are currently collaborating on solutions on the >> following MatterMost channel behind the space camp VPN (link below), and >> via the email thread forwarded >> (further below). >> >> >> >> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node < >> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> < >> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> >> > > >> >> Please keep me posted on developments and I will >> coordinate developer activities with any scheduled platform outages. >> >> V/R, >> Russell C Kendall >> >> ________________________________________ >> >> From: Curran, Daniel M >> Sent: Monday, November 25, 2019 2:47 PM >> To: Jonathan Rickard >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, >> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >> Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >> Joseph J >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Sounds great. Appreciate it. >> I'll watch email and Mattermost in case you need more >> from us. >> >> -Daniel >> >> ________________________________________ >> >> From: Jonathan Rickard >> Sent: Monday, November 25, 2019 2:44 PM >> To: Curran, Daniel M >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, >> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >> Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >> Joseph J >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Thanks Daniel - >> >> >> >> I'll continue to look into the resource issue that you're >> seeing - I'd like to identify the root cause and then work with the team to >> come up with a solution. >> >> >> >> Jonathan Rickard, >> RHCA >> Principal Consultant, NAPS >> Red >> Hat Remote - Texas >> jonny at redhat.com >> >> M: 210-862-9739 > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M < >> Daniel.Curran at mantech.com> >> wrote: >> >> >> Yeah we hit the limit then had AAM kill some of their >> projects and then our pods got scheduled. >> We've hit the limit again though. Here's an example pod >> that cannot be scheduled >> >> >> >> >> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >> < >> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth> >> < >> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >> > >> < >> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >> > >> They're seeing it when their jenkins slaves can't deploy >> but it's basically any pod after we hit some limit. >> >> -Daniel >> ________________________________________ >> >> From: Jonathan Rickard >> Sent: Monday, November 25, 2019 1:26 PM >> To: Curran, Daniel M >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, >> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >> Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >> Joseph J >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Daniel, >> >> >> >> I can see that you have 3 mongo pods, 1 chatup and 1 >> upbot pod running ... is your app good to go? >> >> >> >> Looks like there was an issue with memory on 1 pod, then >> some node selector being mismatched - just what i could see in the events... >> >> >> >> >> >> >> Jonathan Rickard, >> RHCA >> Principal Consultant, NAPS >> Red >> Hat Remote - Texas >> jonny at redhat.com >> >> M: 210-862-9739 > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M < >> Daniel.Curran at mantech.com> >> wrote: >> >> >> Also, AAM was having similar issues. Looks like they had >> a lot of namespaces and scaling down the pods on their deployments didn't >> help but actually deleting the namespaces >> did. >> We have pods scheduling now but I'm adding them and we'd >> still like to work through what resource limit we were hitting to avoid >> this in the future. >> >> -Daniel >> >> ________________________________________ >> >> From: Curran, Daniel M >> Sent: Monday, November 25, 2019 12:25 PM >> To: Jonathan Rickard >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, >> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >> Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Thanks, sir. >> Most important for us to get working is "ccat-demo" but >> it's also happening in "ccat-dev" and "ccat-ci-cd". >> >> -Daniel >> ________________________________________ >> >> From: Jonathan Rickard >> Sent: Monday, November 25, 2019 12:22 PM >> To: Curran, Daniel M >> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >> dlystra at redhat.com ; Sison, >> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >> Phil >> Soliz; >> Buffaloe, >> Christopher; Torres, Alexander >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> What's the name of the project you're working in? I'm >> going to be back at my laptop in about 30 and will take a look when I get >> there. >> >> >> >> Is it just the Jenkins pods failing? >> >> >> >> >> >> >> >> On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M < >> Daniel.Curran at mantech.com> >> wrote: >> >> >> Adding Dean and Alex. >> Also, sitting in mattermost if anyone needs to get online >> and chat for more information. >> >> -Daniel >> >> ________________________________________ >> >> From: Curran, Daniel M >> Sent: Monday, November 25, 2019 12:07 PM >> To: >> jonny at redhat.com ; >> >> ckuperst at redhat.com ; Mark >> Nissley >> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, >> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >> Subject: Re: Unified Platform Pod Deploy Errors >> >> >> >> Adding Kupe and Mark. >> >> -Daniel >> ________________________________________ >> >> From: Curran, Daniel M >> Sent: Monday, November 25, 2019 11:43 AM >> To: >> jonny at redhat.com >> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, >> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >> Subject: Unified Platform Pod Deploy Errors >> >> >> >> Hey Jonny, >> >> We met briefly at SpaceCAMP a couple weeks ago when >> >> >> >> >> cluster.unified-platform.io < >> http://cluster.unified-platform.io> >> >> was stood up. We've been trying to deploy some apps today and so >> far today we're getting errors on most (if >> not all) of our pods. >> >> 0/9 nodes are available: 3 Insufficient pods, 6 node(s) >> didn't match node selector. >> >> Is what we're seeing. We were thinking it was some volume >> types weren't correct but some of our pods don't even have volumes attached >> and still give us this error (i.e. Jenkins >> slaves or web frontends without persistent storage). >> Any idea what this could be? We're not running out of >> space on the nodes themselves are we? >> We have a demo scheduled for tomorrow at 9:30 AM CST and >> are hoping to get a demo env up for them today but this error came up >> unexpectedly. Also, we're here at 500 Navarro >> St. in San Antonio working through this in person is >> better/easier. >> >> Thanks, >> Daniel Curran >> >> >> >> >> >> ________________________________________ >> >> >> This e-mail and any attachments are intended only for the >> use of the addressee(s) named herein and may contain proprietary >> information. If you are not the intended recipient of this e-mail or >> believe that you received this email in error, please >> take >> immediate >> action to notify the sender of the apparent error by >> reply e-mail; permanently delete the e-mail and any attachments from your >> computer; and do not disseminate, distribute, use, or copy this message and >> any attachments. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> >> >> >> >> >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: From Russell.Kendall at mantech.com Fri Dec 6 18:20:58 2019 From: Russell.Kendall at mantech.com (Kendall, Russell C) Date: Fri, 6 Dec 2019 18:20:58 +0000 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: References: , Message-ID: <1575656455684.39471@ManTech.com> Nine tainted pods. Running apps seem to be okay, where they happened to be running at time the taint flood occurred. This will block IATT efforts, since we can not deploy our apps once we have remediated the vulnerabilities and to confirm remediation with TL and Anchore (there is not local scanning capability). V/R, Russell C Kendall ________________________________ From: Jonathan Rickard Sent: Friday, December 6, 2019 12:13:29 PM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Cc: Kendall, Russell C; platformONE at redhat.com; Miller, Timothy J.; Keegan Reap; Bubb, Mike; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Also, is every application having problems or a specific? Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] On Fri, Dec 6, 2019 at 12:06 PM Jonathan Rickard > wrote: Ade, What does that mean? You can't login, you can't deploy? Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] On Fri, Dec 6, 2019 at 12:02 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > wrote: ALCON, The cluster is down again. Please assist. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform [cid:image001.png at 01D4F814.4AA552D0] LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil From: Kendall, Russell C Sent: Thursday, December 5, 2019 9:55 AM To: Jonathan Rickard > Cc: Miller, Timothy J. >; Keegan Reap >; Bubb, Mike >; platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >; Jonathan Rickard > Subject: [Non-DoD Source] Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Jonny, I'll see you Friday at 500 Nav. Travel safe. V/R, Russell C Kendall? ________________________________ From: Jonathan Rickard > Sent: Wednesday, December 4, 2019 5:29 PM To: Kendall, Russell C Cc: Miller, Timothy J.; Keegan Reap; Bubb, Mike; platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Russell, I have definitely been terrible with email lately and I apologize for the slow response times. I get back to San Antonio tomorrow but I have a pretty full afternoon. I can stop by Friday if you'd like. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C > wrote: Jonny, I'd like to suggest you come to 500 to wrap this up, since it seems there are significant delays in communication that are contributing to downtime. V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. > Sent: Wednesday, December 4, 2019 7:02 AM To: Jonathan Rickard; Keegan Reap Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Johnny-- Update the issue, if you would be so kind. -- T ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of Jonathan Rickard" on behalf of jrickard at redhat.com> wrote: Hey Guys - Sorry for taking so long - this has been completed. Please run your builds and let us know if you're having any problems. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard > wrote: Russell / Team, We believe we've identified the issue with your application deploying. In order to rectify the issue I need to evacuate pods so you will probably see some hiccups while deploying. I will update when this is resolved. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap > wrote: Hey all, we have opened an issue below, that we believe to be the cause, we are currently investigating: https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard > wrote: Russell, Getting more eyes on this @platformONE at redhat.com > We'll keep you posted. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C > wrote: Kevin, Unfortunately we are receiving deployment errors again. This is the event: 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. This is the deployment: https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. > Sent: Monday, December 2, 2019 2:44:21 PM To: Kevin O'Donnell Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors Tagged you on it. -- T On 12/2/19, 14:03, "Kevin O'Donnell" > wrote: Hello, Autoscaling is on our future IAC roadmap. Tim, the additional ticket would be appreciated. We have swapped out the app/worker instances with m5a.8xlarge 32 cores, 128gb of ram. Please let us know if you have any other issues. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. > wrote: I'll open an issue. IaC needs to have instance size as a host_var to facilitate scaling. -- T On 12/2/19, 13:15, "Kevin O'Donnell" > wrote: Tim, Thanks for the information. We are undersized on the app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb of ram. From what I have read each Labs engagement operated on a 3 node worker cluster with each node having 6core's and 28gb of ram. We will need to swap out the existing instances with larger spec's. We are going to try to flush the existing workload out on one of the workers to see if we can swap them out one at a time. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. > wrote: Here's what I can see, given the perm limits I seem to be under: - NS:develop-misp-app and NS:lp-develop-misp-app both have several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned while trying to fetch something from somewhere (URL isn't recorded in the stack trace). - NS:minishift-misp-app has most of its pods/jobs stuck in ImagePullBackoff. No detail there in the event stream so I'll see if I can dig deeper. - NS:aam-ci-cd has Jenkins trying to spin up three workers, those are coming back as unschedulable. I can't see into NS:aam-bases or NS:dsop-images b/c of perm limits. I see no DAS-related project(s). The MISP stuff needs debugging before calling "blocked" since it looks like an internal error from this perspective. In re: AAM Jenkins: If this deployment is coming out of the OCP storefront, then maybe it should be ephemeral rather than persistent. If it's a custom deployment, then it probably needs a rethink. I'm also not sure why there are two MISP dev projects. -- T On 12/2/19, 12:46, "Kevin O'Donnell" > wrote: Russell, Thank you for the information. We can switch out the instance type for the worker nodes. How much memory is required by the apps? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C > wrote: Kevin, The lack of resources on u-p.io cluster is hindering development, testing, and integration of the apps from CCAT AAM DAS, which is putting one of our PI goals at risk. We are blocked by the fact that we (CCAT and AAM) cannot deploy additional pods to the unified-platform.io cluster. We have a subset of containers deployed, but rolling deployments and new deployments fail. This means that we are not able to execute integration testing or peer reviews. We are temporarily working around by NOT testing/reviewing our code changes live, something that no one likes. Also, we are now running weeks-old instances of our containers, so we are very likely producing some technical debt. We currently have developers approaching idle or doing non-priority work until the resource issue is resolved. Here is the particular error from the OSP cluster I received while attempting a redeploy of one of our apps. 0/9 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node selector.11 times in the last minute Since we do not have any cluster permissions, I cannot verify which resource is running out, but from experience, I assess it is a memory issue. It appears the cluster has been provisioned with a silly allocation of node types. Without knowing exactly what was deployed, it appears only 3 of the 9 hosts are suitable worker nodes. We would expect the cluster to respond to resource limitations and scale, but if a scheduled downtime is required, please work with us so we can anticipate. As it stands, the cluster does not support resources required by CCAT and the other dev teams (AAM, DAS, etc.). We would accept any downtime if it will improve the situation, as we are blocked from progressing under the current constraints. My hope was we could get the cluster redeployed over the TG holiday to eliminate developer impact, but as Mark pointed out, there were limited support folks available. Now I am just trying to minimize the losses. V/R, Russell C Kendall ________________________________________ From: Kevin O'Donnell > Sent: Monday, December 2, 2019 11:52 AM To: Kendall, Russell C Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors Hello Russell, Can you elaborate on the term Blocked? What specific issues are the blockers? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C > wrote: Mark, Thank for acknowledging, please be aware the San Antonio dev teams working in unified-platform.io are currently blocked. V/R, Russell C Kendall ________________________________________ From: Mark Nissley > Sent: Monday, December 2, 2019 9:36 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors As noted, I don't suspect much got done on this over the holiday weekend. I did see the ticket, as dropped some details into it. I also assigned it to @Jonathan Rickard > and @Chris Kuperstein > . It looks like short term solutions have been easy but the issue is recurring. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 Scheduled Training: October 14-18 On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > wrote: Mark/Kevin, I just heard at the team stand up that we are still blocked. This is also affecting the AAM team from my investigations. Please let me know if there is something we need to do to move this forward. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Sent: Wednesday, November 27, 2019 12:58 PM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; Mark Nissley >; Kevin O'Donnell >; Brenna Gordon > Cc: Kendall, Russell C >; Bubb, Mike (mbubb at mitre.org) >; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP >; Miller, Timothy J. >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: Re: Unified Platform Pod Deploy Errors Thanks a lot Capt Bryan! Russell created the ticket on GitLab UP Node Project. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP > Sent: Wednesday, November 27, 2019 12:56 PM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >; Mark Nissley >; Kevin O'Donnell >; Brenna Gordon > Cc: Kendall, Russell C >; Bubb, Mike (mbubb at mitre.org) >; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP >; Miller, Timothy J. >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: RE: Unified Platform Pod Deploy Errors Thanks Ade. The team is thin until next week due to the holidays but I will make sure it is addressed. Were there any issues submitted to Gitlab?s UP Node Project on DCCSCR? @Mark/Kevin ? can we address? -Austen From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > Sent: Wednesday, November 27, 2019 9:51 AM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP > Cc: Kendall, Russell C >; Bubb, Mike (mbubb at mitre.org) > Subject: Fw: Unified Platform Pod Deploy Errors Capt Bryan, Please see the explanation on the issue that Ginyu Force is currently experiencing below. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: Kendall, Russell C > Sent: Wednesday, November 27, 2019 9:46 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >; Buffaloe, Christopher >; Molina, Toby >; Crace, Jared E >; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP > Cc: tmiller at mitre.org > > Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy Errors Gentlemen, The application development teams working in the new GovCloud OCP environment (unified-platform.io ) are currently blocked in efforts to deploy new pods for testing, development, and UAT. Red Hat and RogueOne SMEs have been notified and have attempted some fixes starting on Monday 11/25, but at this point have not been able to provision resources sufficient to host CCAT and AAM. We have taken steps to minimize our footprint (eliminating demonstration environment, deleting developer namespaces), but this is not a sustainable approach, and has only resulted in moderate improvements in cluster performance. Our hope is the U-P.io cluster compute resources can be increased very soon, so that we may resume normal development activities. Our understanding is that such a scaling requires a complete redeployment of the cluster, which is unusual, but an acceptable loss to productivity. If the cluster can be scaled up over the Thanksgiving holiday, the impact will be minimal to developers and cluster administrators, alike. We are currently collaborating on solutions on the following MatterMost channel behind the space camp VPN (link below), and via the email thread forwarded (further below). https://chat.spacecamp.ninja/levelup/channels/unified-platform-node Please keep me posted on developments and I will coordinate developer activities with any scheduled platform outages. V/R, Russell C Kendall ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 2:47 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Sounds great. Appreciate it. I'll watch email and Mattermost in case you need more from us. -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 2:44 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Thanks Daniel - I'll continue to look into the resource issue that you're seeing - I'd like to identify the root cause and then work with the team to come up with a solution. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M > wrote: Yeah we hit the limit then had AAM kill some of their projects and then our pods got scheduled. We've hit the limit again though. Here's an example pod that cannot be scheduled https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth They're seeing it when their jenkins slaves can't deploy but it's basically any pod after we hit some limit. -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 1:26 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Daniel, I can see that you have 3 mongo pods, 1 chatup and 1 upbot pod running ... is your app good to go? Looks like there was an issue with memory on 1 pod, then some node selector being mismatched - just what i could see in the events... Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M > wrote: Also, AAM was having similar issues. Looks like they had a lot of namespaces and scaling down the pods on their deployments didn't help but actually deleting the namespaces did. We have pods scheduling now but I'm adding them and we'd still like to work through what resource limit we were hitting to avoid this in the future. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:25 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors Thanks, sir. Most important for us to get working is "ccat-demo" but it's also happening in "ccat-dev" and "ccat-ci-cd". -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 12:22 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors What's the name of the project you're working in? I'm going to be back at my laptop in about 30 and will take a look when I get there. Is it just the Jenkins pods failing? On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M > wrote: Adding Dean and Alex. Also, sitting in mattermost if anyone needs to get online and chat for more information. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:07 PM To: jonny at redhat.com >; ckuperst at redhat.com >; Mark Nissley Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Re: Unified Platform Pod Deploy Errors Adding Kupe and Mark. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 11:43 AM To: jonny at redhat.com > Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Unified Platform Pod Deploy Errors Hey Jonny, We met briefly at SpaceCAMP a couple weeks ago when cluster.unified-platform.io was stood up. We've been trying to deploy some apps today and so far today we're getting errors on most (if not all) of our pods. 0/9 nodes are available: 3 Insufficient pods, 6 node(s) didn't match node selector. Is what we're seeing. We were thinking it was some volume types weren't correct but some of our pods don't even have volumes attached and still give us this error (i.e. Jenkins slaves or web frontends without persistent storage). Any idea what this could be? We're not running out of space on the nodes themselves are we? We have a demo scheduled for tomorrow at 9:30 AM CST and are hoping to get a demo env up for them today but this error came up unexpectedly. Also, we're here at 500 Navarro St. in San Antonio working through this in person is better/easier. Thanks, Daniel Curran ________________________________________ This e-mail and any attachments are intended only for the use of the addressee(s) named herein and may contain proprietary information. If you are not the intended recipient of this e-mail or believe that you received this email in error, please take immediate action to notify the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments from your computer; and do not disseminate, distribute, use, or copy this message and any attachments. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: image001.png URL: From mnissley at redhat.com Fri Dec 6 18:26:08 2019 From: mnissley at redhat.com (Mark Nissley) Date: Fri, 6 Dec 2019 12:26:08 -0600 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: <1575656455684.39471@ManTech.com> References: <1575656455684.39471@ManTech.com> Message-ID: Issue created here: https://dccscr.dsop.io/ginyu-force/ginyu-force/issues/1 Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled Training: October 14-18* On Fri, Dec 6, 2019 at 12:21 PM Kendall, Russell C < Russell.Kendall at mantech.com> wrote: > Nine tainted pods. Running apps seem to be okay, where they happened to be > running at time the taint flood occurred. This will block IATT efforts, > since we can not deploy our apps once we have remediated the > vulnerabilities and to confirm remediation with TL and Anchore (there is > not local scanning capability). > V/R, > Russell C Kendall > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Friday, December 6, 2019 12:13:29 PM > *To:* ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > *Cc:* Kendall, Russell C; platformONE at redhat.com; Miller, Timothy J.; > Keegan Reap; Bubb, Mike; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; > Jonathan Rickard > *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors > > Also, is every application having problems or a specific? > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > > > On Fri, Dec 6, 2019 at 12:06 PM Jonathan Rickard > wrote: > >> Ade, >> >> What does that mean? You can't login, you can't deploy? >> >> Jonathan Rickard, RHCE, RHCA >> >> Consulting Architect >> >> Red Hat Public Sector >> >> jonny at redhat.com >> M: 210.862.9739 >> @redhatjobs redhatjobs >> @redhatjobs >> >> >> >> >> On Fri, Dec 6, 2019 at 12:02 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >> AFLCMC/HNCP wrote: >> >>> ALCON, >>> >>> >>> >>> The cluster is down again. Please assist. >>> >>> >>> >>> Most Sincerely, >>> >>> >>> >>> Ade Abodunrin, GG-12, USAF >>> >>> Product Owner (Cybertron & Ginyu Force), Unified Platform >>> >>> >>> >>> [image: cid:image001.png at 01D4F814.4AA552D0] >>> >>> LevelUP Code Works >>> >>> Commercial: (210) 890-2113 >>> >>> NIPR email: *ademola.abodunrin at us.af.mil * >>> >>> >>> >>> *From:* Kendall, Russell C >>> *Sent:* Thursday, December 5, 2019 9:55 AM >>> *To:* Jonathan Rickard >>> *Cc:* Miller, Timothy J. ; Keegan Reap < >>> kreap at redhat.com>; Bubb, Mike ; platformONE at redhat.com; >>> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; >>> ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>> ademola.abodunrin at us.af.mil>; Jonathan Rickard >>> *Subject:* [Non-DoD Source] Re: [Platformone] [EXT] Re: Unified >>> Platform Pod Deploy Errors >>> >>> >>> >>> Jonny, >>> >>> I'll see you Friday at 500 Nav. Travel safe. >>> >>> >>> >>> V/R, >>> >>> Russell C Kendall? >>> >>> >>> ------------------------------ >>> >>> *From:* Jonathan Rickard >>> *Sent:* Wednesday, December 4, 2019 5:29 PM >>> *To:* Kendall, Russell C >>> *Cc:* Miller, Timothy J.; Keegan Reap; Bubb, Mike; >>> platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; >>> ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard >>> *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy >>> Errors >>> >>> >>> >>> Russell, >>> >>> >>> >>> I have definitely been terrible with email lately and I apologize for >>> the slow response times. I get back to San Antonio tomorrow but I have a >>> pretty full afternoon. I can stop by Friday if you'd like. >>> >>> >>> >>> Thanks, >>> >>> jonny >>> >>> >>> >>> *Jonathan Rickard**, RHCA* >>> >>> Principal Consultant, NAPS >>> >>> Red Hat Remote - Texas >>> >>> jonny at redhat.com >>> M: 210-862-9739 >>> >>> >>> >>> >>> >>> >>> >>> On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C < >>> Russell.Kendall at mantech.com> wrote: >>> >>> Jonny, >>> I'd like to suggest you come to 500 to wrap this up, since it seems >>> there are significant delays in communication that are contributing to >>> downtime. >>> V/R, >>> Russell C Kendall >>> ________________________________________ >>> From: Miller, Timothy J. >>> Sent: Wednesday, December 4, 2019 7:02 AM >>> To: Jonathan Rickard; Keegan Reap >>> Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ, >>> JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >>> AFLCMC/HNCP; Jonathan Rickard >>> Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors >>> >>> Johnny-- >>> >>> Update the issue, if you would be so kind. >>> >>> -- T >>> >>> ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of >>> Jonathan Rickard" >> jrickard at redhat.com> wrote: >>> >>> Hey Guys - Sorry for taking so long - this has been completed. >>> Please run your builds and let us know if you're having any problems. >>> jonny >>> Jonathan Rickard, RHCA >>> Principal Consultant, NAPS >>> Red Hat Remote - Texas >>> >>> jonny at redhat.com >>> M: 210-862-9739 > >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard >>> wrote: >>> >>> >>> Russell / Team, >>> >>> >>> We believe we've identified the issue with your application >>> deploying. In order to rectify the issue I need to evacuate pods so you >>> will probably see some hiccups while deploying. I will update when this is >>> resolved. >>> >>> >>> Thanks, >>> jonny >>> >>> Jonathan Rickard, RHCA >>> Principal Consultant, NAPS >>> Red Hat Remote - Texas >>> >>> jonny at redhat.com >>> M: 210-862-9739 > >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap >>> wrote: >>> >>> >>> Hey all, we have opened an issue below, that we believe to be the >>> cause, we are currently investigating: >>> >>> >>> https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 >>> >>> >>> >>> On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard < >>> jrickard at redhat.com> wrote: >>> >>> >>> Russell, >>> >>> >>> Getting more eyes on this @platformONE at redhat.com >> platformONE at redhat.com> >>> >>> >>> We'll keep you posted. >>> jonny >>> Jonathan Rickard, RHCA >>> Principal Consultant, NAPS >>> Red Hat Remote - Texas >>> >>> jonny at redhat.com >>> M: 210-862-9739 > >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C < >>> Russell.Kendall at mantech.com> wrote: >>> >>> >>> Kevin, >>> >>> Unfortunately we are receiving deployment errors again. This is the >>> event: >>> >>> 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had >>> taints that the pod didn't tolerate, 6 node(s) didn't match node selector. >>> >>> This is the deployment: >>> >>> >>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup >>> >>> >>> V/R, >>> Russell C Kendall >>> ________________________________________ >>> From: Miller, Timothy J. >>> Sent: Monday, December 2, 2019 2:44:21 PM >>> To: Kevin O'Donnell >>> Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 >>> USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R >>> Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E >>> GG-13 USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE >>> A CTR USAF AFMC AFLCMC/HNCP >>> Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors >>> >>> Tagged you on it. >>> >>> -- T >>> >>> On 12/2/19, 14:03, "Kevin O'Donnell" wrote: >>> >>> Hello, >>> >>> >>> Autoscaling is on our future IAC roadmap. Tim, the additional >>> ticket would be appreciated. >>> >>> >>> We have swapped out the app/worker instances with m5a.8xlarge 32 >>> cores, 128gb of ram. Please let us know if you have any other issues. >>> >>> >>> Thanks, >>> >>> KEVIN O'DONNELL >>> ARCHITECT MANAGER >>> Red Hat Red Hat NA Public Sector Consulting < >>> https://www.redhat.com/> >>> >>> kodonnell at redhat.com >>> M: 240-605-4654 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. < >>> tmiller at mitre.org> wrote: >>> >>> >>> I'll open an issue. IaC needs to have instance size as a >>> host_var to facilitate scaling. >>> >>> -- T >>> >>> On 12/2/19, 13:15, "Kevin O'Donnell" >>> wrote: >>> >>> Tim, >>> >>> >>> Thanks for the information. We are undersized on the >>> app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb >>> of ram. From what I have read each Labs engagement operated on a 3 node >>> worker cluster with each node having 6core's and 28gb >>> of ram. We will need to swap out the existing instances >>> with larger spec's. >>> >>> >>> We are going to try to flush the existing workload out on >>> one of the workers to see if we can swap them out one at a time. >>> >>> >>> Thanks, >>> >>> KEVIN O'DONNELL >>> ARCHITECT MANAGER >>> Red Hat Red Hat NA Public Sector Consulting < >>> https://www.redhat.com/> >>> >>> kodonnell at redhat.com >>> M: 240-605-4654 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. < >>> tmiller at mitre.org> wrote: >>> >>> >>> Here's what I can see, given the perm limits I seem to be >>> under: >>> >>> - NS:develop-misp-app and NS:lp-develop-misp-app both have >>> several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned >>> while trying to fetch something from somewhere (URL isn't recorded in the >>> stack trace). >>> >>> - NS:minishift-misp-app has most of its pods/jobs stuck in >>> ImagePullBackoff. No detail there in the event stream so I'll see if I can >>> dig deeper. >>> >>> - NS:aam-ci-cd has Jenkins trying to spin up three workers, >>> those are coming back as unschedulable. >>> >>> I can't see into NS:aam-bases or NS:dsop-images b/c of perm >>> limits. >>> >>> I see no DAS-related project(s). >>> >>> The MISP stuff needs debugging before calling "blocked" >>> since it looks like an internal error from this perspective. >>> >>> >>> >>> In re: AAM Jenkins: If this deployment is coming out of the >>> OCP storefront, then maybe it should be ephemeral rather than persistent. >>> If it's a custom deployment, then it probably needs a rethink. >>> >>> I'm also not sure why there are two MISP dev projects. >>> >>> -- T >>> >>> >>> >>> On 12/2/19, 12:46, "Kevin O'Donnell" >>> wrote: >>> >>> Russell, >>> >>> >>> Thank you for the information. We can switch out the >>> instance type for the worker nodes. How much memory is required by the apps? >>> >>> >>> >>> Thanks, >>> >>> KEVIN O'DONNELL >>> ARCHITECT MANAGER >>> Red Hat Red Hat NA Public Sector Consulting < >>> https://www.redhat.com/> >>> >>> kodonnell at redhat.com >>> M: 240-605-4654 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C < >>> Russell.Kendall at mantech.com> wrote: >>> >>> >>> Kevin, >>> The lack of resources on >>> u-p.io < >>> http://u-p.io> >>> cluster is hindering development, >>> testing, and integration of the apps from CCAT AAM DAS, which >>> is putting one >>> of our PI goals at risk. >>> >>> >>> We are blocked by the fact that we (CCAT and AAM) cannot >>> deploy additional pods to the >>> >>> unified-platform.io < >>> http://unified-platform.io> < >>> http://unified-platform.io> >>> cluster. We have a subset of containers deployed, but rolling >>> deployments and new deployments fail. This means that we >>> are not able to execute integration testing or peer reviews. >>> We are temporarily working around by NOT >>> testing/reviewing our code changes live, something that no one likes. Also, >>> we are now running weeks-old instances of our containers, so we are very >>> likely producing some technical debt. We currently have >>> developers >>> approaching idle or doing non-priority work until the >>> resource issue is resolved. >>> >>> >>> >>> Here is the particular error from the OSP cluster I >>> received while attempting a redeploy of one of our apps. >>> >>> >>> >>> 0/9 nodes are available: 1 node(s) had taints that the >>> pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node >>> selector.11 times in the last minute >>> >>> Since we do not have any cluster permissions, I cannot >>> verify which resource is running out, but from experience, I assess it is a >>> memory issue. >>> >>> >>> >>> It appears the cluster has been provisioned with a silly >>> allocation of node types. Without knowing exactly what was deployed, it >>> appears only 3 of the 9 hosts are suitable worker nodes. We would expect >>> the cluster to respond to resource limitations >>> and >>> scale, >>> but if a scheduled downtime is required, please work >>> with us so we can anticipate. As it stands, the cluster does not support >>> resources required by CCAT and the other dev teams (AAM, DAS, etc.). We >>> would accept any downtime if it will improve the >>> situation, >>> as we are blocked from progressing under the current >>> constraints. My hope was we could get the cluster redeployed over the TG >>> holiday to eliminate developer impact, but as Mark pointed out, there were >>> limited support folks available. Now I am just >>> trying >>> to >>> minimize the losses. >>> >>> >>> >>> V/R, >>> >>> Russell C Kendall >>> >>> >>> >>> >>> >>> ________________________________________ >>> From: Kevin O'Donnell >>> Sent: Monday, December 2, 2019 11:52 AM >>> To: Kendall, Russell C >>> Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >>> AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF >>> AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); >>> DIROCCO, >>> ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, >>> Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> Hello Russell, >>> >>> >>> Can you elaborate on the term Blocked? What specific >>> issues are the blockers? >>> >>> >>> >>> Thanks, >>> >>> KEVIN O'DONNELL >>> ARCHITECT MANAGER >>> Red Hat Red Hat NA Public Sector Consulting < >>> https://www.redhat.com/> >>> >>> kodonnell at redhat.com >>> M: 240-605-4654 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C < >>> Russell.Kendall at mantech.com> wrote: >>> >>> >>> Mark, >>> >>> Thank for acknowledging, please be aware the San Antonio >>> dev teams working in >>> >>> >>> unified-platform.io < >>> http://unified-platform.io> < >>> http://unified-platform.io> >>> are currently blocked. >>> >>> V/R, >>> >>> Russell C Kendall >>> >>> ________________________________________ >>> From: Mark Nissley >>> Sent: Monday, December 2, 2019 9:36 AM >>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; >>> Jonathan Rickard; Chris Kuperstein >>> Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin >>> O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike ( >>> mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; >>> Miller, Timothy >>> J.; >>> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> As noted, I don't suspect much got done on this over the >>> holiday weekend. I did see the ticket, as dropped some details into it. I >>> also assigned it to @Jonathan >>> Rickard and @Chris >>> Kuperstein . >>> >>> >>> >>> It looks like short term solutions have been easy but >>> the issue is recurring. >>> >>> >>> >>> >>> Mark NISSLEY, PMP, >>> CSM, LEAN >>> >>> PROGRAM MaNAGER & SR technical Project Manager >>> North American Consulting, Public Sector >>> >>> M: >>> 850-530-3234 >>> >>> >>> Scheduled Training: October 14-18 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A >>> GG-12 USAF AFMC AFLCMC/HNCP wrote: >>> >>> >>> Mark/Kevin, >>> >>> >>> I just heard at the team stand up that we are still >>> blocked. This is also affecting the AAM team from my investigations. >>> >>> >>> Please let me know if there is something we need to do >>> to move this forward. >>> >>> Most Sincerely, >>> >>> >>> Ade Abodunrin, GG-12, USAF >>> Product Owner (Cybertron & Ginyu Force), Unified Platform >>> >>> >>> >>> LevelUP Code Works >>> Commercial: >>> (210) 890-2113 >>> NIPR email: >>> ademola.abodunrin at us.af.mil >>> >>> >>> >>> >>> >>> >>> >>> >>> ________________________________________ >>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >>> Sent: Wednesday, November 27, 2019 12:58 PM >>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>> austen.bryan.1 at us.af.mil>; Mark Nissley ; Kevin >>> O'Donnell >>> ; >>> Brenna Gordon >>> Cc: Kendall, Russell C ; >>> Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 >>> USAF AFMC ESC/AFLCMC/HNCP >>> ; Miller, Timothy J. < >>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >>> jose.ramirez.50.ctr at us.af.mil> >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> Thanks a lot Capt Bryan! Russell created the ticket on >>> GitLab UP Node Project. >>> >>> >>> >>> >>> Most Sincerely, >>> >>> >>> Ade Abodunrin, GG-12, USAF >>> Product Owner (Cybertron & Ginyu Force), Unified Platform >>> >>> >>> >>> LevelUP Code Works >>> Commercial: >>> (210) 890-2113 >>> NIPR email: >>> ademola.abodunrin at us.af.mil >>> >>> >>> >>> >>> >>> >>> >>> >>> ________________________________________ >>> From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>> austen.bryan.1 at us.af.mil> >>> Sent: Wednesday, November 27, 2019 12:56 PM >>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>> ademola.abodunrin at us.af.mil>; Mark Nissley ; Kevin >>> O'Donnell >>> ; Brenna Gordon < >>> bgordon at redhat.com> >>> Cc: Kendall, Russell C ; >>> Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 >>> USAF AFMC ESC/AFLCMC/HNCP >>> ; Miller, Timothy J. < >>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >>> jose.ramirez.50.ctr at us.af.mil> >>> Subject: RE: Unified Platform Pod Deploy Errors >>> >>> Thanks Ade. The team is thin until next week due to the >>> holidays but I will make sure it is addressed. Were there any issues >>> submitted to Gitlab?s UP Node Project on DCCSCR? >>> >>> @Mark/Kevin ? can we address? >>> >>> -Austen >>> >>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>> ademola.abodunrin at us.af.mil> >>> >>> Sent: Wednesday, November 27, 2019 9:51 AM >>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>> austen.bryan.1 at us.af.mil> >>> Cc: Kendall, Russell C ; >>> Bubb, Mike (mbubb at mitre.org) >>> Subject: Fw: Unified Platform Pod Deploy Errors >>> >>> >>> >>> Capt Bryan, >>> >>> Please see the explanation on the issue that Ginyu Force >>> is currently experiencing below. >>> >>> >>> >>> Most Sincerely, >>> >>> Ade Abodunrin, GG-12, USAF >>> Product Owner (Cybertron & Ginyu Force), Unified Platform >>> >>> >>> LevelUP Code Works >>> Commercial: (210) 890-2113 >>> NIPR email: >>> ademola.abodunrin at us.af.mil >>> >>> >>> >>> >>> >>> ________________________________________ >>> >>> From: Kendall, Russell C >>> Sent: Wednesday, November 27, 2019 9:46 AM >>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>> ademola.abodunrin at us.af.mil>; Buffaloe, >>> Christopher ; >>> Molina, Toby ; >>> Crace, Jared E ; SANCHEZ, >>> MARK GG-13 USAF AFMC AFLCMC/HNCP >>> Cc: >>> tmiller at mitre.org < >>> tmiller at mitre.org> >>> Subject: [Non-DoD Source] Fw: Unified Platform Pod >>> Deploy Errors >>> >>> >>> >>> Gentlemen, >>> >>> The application development teams working in the new >>> GovCloud OCP environment (unified-platform.io < >>> http://unified-platform.io> >>> >>> ) >>> are currently blocked in efforts to deploy new pods for >>> testing, development, and UAT. >>> >>> Red Hat and RogueOne SMEs have been notified and have >>> attempted some fixes starting on Monday 11/25, but at this point have not >>> been able to provision resources >>> sufficient to host CCAT and AAM. >>> >>> We have taken steps to minimize our footprint >>> (eliminating demonstration environment, deleting developer namespaces), but >>> this is not a sustainable approach, >>> and has only resulted in moderate improvements in >>> cluster performance. >>> >>> Our hope is the U-P.io cluster compute resources can be >>> increased very soon, so that we may resume normal development activities. >>> Our understanding is that >>> such a scaling requires a complete redeployment of the >>> cluster, which is unusual, but an acceptable loss to productivity. If the >>> cluster can be scaled up over the Thanksgiving holiday, the impact will be >>> minimal to developers and cluster administrators, >>> alike. >>> >>> We are currently collaborating on solutions on the >>> following MatterMost channel behind the space camp VPN (link below), and >>> via the email thread forwarded >>> (further below). >>> >>> >>> >>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node >>> < >>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> >>> < >>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> >>> >>> Please keep me posted on developments and I will >>> coordinate developer activities with any scheduled platform outages. >>> >>> V/R, >>> Russell C Kendall >>> >>> ________________________________________ >>> >>> From: Curran, Daniel M >>> Sent: Monday, November 25, 2019 2:47 PM >>> To: Jonathan Rickard >>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>> dlystra at redhat.com ; Sison, >>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>> Phil >>> Soliz; >>> Buffaloe, >>> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >>> Joseph J >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> >>> >>> Sounds great. Appreciate it. >>> I'll watch email and Mattermost in case you need more >>> from us. >>> >>> -Daniel >>> >>> ________________________________________ >>> >>> From: Jonathan Rickard >>> Sent: Monday, November 25, 2019 2:44 PM >>> To: Curran, Daniel M >>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>> dlystra at redhat.com ; Sison, >>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>> Phil >>> Soliz; >>> Buffaloe, >>> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >>> Joseph J >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> >>> >>> Thanks Daniel - >>> >>> >>> >>> I'll continue to look into the resource issue that >>> you're seeing - I'd like to identify the root cause and then work with the >>> team to come up with a solution. >>> >>> >>> >>> Jonathan Rickard, >>> RHCA >>> Principal Consultant, NAPS >>> Red >>> Hat Remote - Texas >>> jonny at redhat.com >>> >>> M: 210-862-9739 > >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M < >>> Daniel.Curran at mantech.com> >>> wrote: >>> >>> >>> Yeah we hit the limit then had AAM kill some of their >>> projects and then our pods got scheduled. >>> We've hit the limit again though. Here's an example pod >>> that cannot be scheduled >>> >>> >>> >>> >>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >>> < >>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth> >>> < >>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >>> > >>> < >>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >>> > >>> They're seeing it when their jenkins slaves can't deploy >>> but it's basically any pod after we hit some limit. >>> >>> -Daniel >>> ________________________________________ >>> >>> From: Jonathan Rickard >>> Sent: Monday, November 25, 2019 1:26 PM >>> To: Curran, Daniel M >>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>> dlystra at redhat.com ; Sison, >>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>> Phil >>> Soliz; >>> Buffaloe, >>> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >>> Joseph J >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> >>> >>> Daniel, >>> >>> >>> >>> I can see that you have 3 mongo pods, 1 chatup and 1 >>> upbot pod running ... is your app good to go? >>> >>> >>> >>> Looks like there was an issue with memory on 1 pod, then >>> some node selector being mismatched - just what i could see in the events... >>> >>> >>> >>> >>> >>> >>> Jonathan Rickard, >>> RHCA >>> Principal Consultant, NAPS >>> Red >>> Hat Remote - Texas >>> jonny at redhat.com >>> >>> M: 210-862-9739 > >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M < >>> Daniel.Curran at mantech.com> >>> wrote: >>> >>> >>> Also, AAM was having similar issues. Looks like they had >>> a lot of namespaces and scaling down the pods on their deployments didn't >>> help but actually deleting the namespaces >>> did. >>> We have pods scheduling now but I'm adding them and we'd >>> still like to work through what resource limit we were hitting to avoid >>> this in the future. >>> >>> -Daniel >>> >>> ________________________________________ >>> >>> From: Curran, Daniel M >>> Sent: Monday, November 25, 2019 12:25 PM >>> To: Jonathan Rickard >>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>> dlystra at redhat.com ; Sison, >>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>> Phil >>> Soliz; >>> Buffaloe, >>> Christopher; Torres, Alexander >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> >>> >>> Thanks, sir. >>> Most important for us to get working is "ccat-demo" but >>> it's also happening in "ccat-dev" and "ccat-ci-cd". >>> >>> -Daniel >>> ________________________________________ >>> >>> From: Jonathan Rickard >>> Sent: Monday, November 25, 2019 12:22 PM >>> To: Curran, Daniel M >>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>> dlystra at redhat.com ; Sison, >>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>> Phil >>> Soliz; >>> Buffaloe, >>> Christopher; Torres, Alexander >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> >>> >>> What's the name of the project you're working in? I'm >>> going to be back at my laptop in about 30 and will take a look when I get >>> there. >>> >>> >>> >>> Is it just the Jenkins pods failing? >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M < >>> Daniel.Curran at mantech.com> >>> wrote: >>> >>> >>> Adding Dean and Alex. >>> Also, sitting in mattermost if anyone needs to get >>> online and chat for more information. >>> >>> -Daniel >>> >>> ________________________________________ >>> >>> From: Curran, Daniel M >>> Sent: Monday, November 25, 2019 12:07 PM >>> To: >>> jonny at redhat.com ; >>> >>> ckuperst at redhat.com ; Mark >>> Nissley >>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, >>> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >>> Subject: Re: Unified Platform Pod Deploy Errors >>> >>> >>> >>> Adding Kupe and Mark. >>> >>> -Daniel >>> ________________________________________ >>> >>> From: Curran, Daniel M >>> Sent: Monday, November 25, 2019 11:43 AM >>> To: >>> jonny at redhat.com >>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, >>> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >>> Subject: Unified Platform Pod Deploy Errors >>> >>> >>> >>> Hey Jonny, >>> >>> We met briefly at SpaceCAMP a couple weeks ago when >>> >>> >>> >>> >>> cluster.unified-platform.io < >>> http://cluster.unified-platform.io> >>> >>> was stood up. We've been trying to deploy some apps today and >>> so far today we're getting errors on most (if >>> not all) of our pods. >>> >>> 0/9 nodes are available: 3 Insufficient pods, 6 node(s) >>> didn't match node selector. >>> >>> Is what we're seeing. We were thinking it was some >>> volume types weren't correct but some of our pods don't even have volumes >>> attached and still give us this error (i.e. Jenkins >>> slaves or web frontends without persistent storage). >>> Any idea what this could be? We're not running out of >>> space on the nodes themselves are we? >>> We have a demo scheduled for tomorrow at 9:30 AM CST and >>> are hoping to get a demo env up for them today but this error came up >>> unexpectedly. Also, we're here at 500 Navarro >>> St. in San Antonio working through this in person is >>> better/easier. >>> >>> Thanks, >>> Daniel Curran >>> >>> >>> >>> >>> >>> ________________________________________ >>> >>> >>> This e-mail and any attachments are intended only for >>> the use of the addressee(s) named herein and may contain proprietary >>> information. If you are not the intended recipient of this e-mail or >>> believe that you received this email in error, please >>> take >>> immediate >>> action to notify the sender of the apparent error by >>> reply e-mail; permanently delete the e-mail and any attachments from your >>> computer; and do not disseminate, distribute, use, or copy this message and >>> any attachments. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: From jrickard at redhat.com Fri Dec 6 18:37:26 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Fri, 6 Dec 2019 12:37:26 -0600 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: References: <1575656455684.39471@ManTech.com> Message-ID: Russell, Is CCAT the only application having problems? I see your project has a few failed pv's. The taints appear to be Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs On Fri, Dec 6, 2019 at 12:26 PM Mark Nissley wrote: > Issue created here: > https://dccscr.dsop.io/ginyu-force/ginyu-force/issues/1 > > > Mark NISSLEY, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled Training: October 14-18* > > > On Fri, Dec 6, 2019 at 12:21 PM Kendall, Russell C < > Russell.Kendall at mantech.com> wrote: > >> Nine tainted pods. Running apps seem to be okay, where they happened to >> be running at time the taint flood occurred. This will block IATT efforts, >> since we can not deploy our apps once we have remediated the >> vulnerabilities and to confirm remediation with TL and Anchore (there is >> not local scanning capability). >> V/R, >> Russell C Kendall >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Friday, December 6, 2019 12:13:29 PM >> *To:* ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >> *Cc:* Kendall, Russell C; platformONE at redhat.com; Miller, Timothy J.; >> Keegan Reap; Bubb, Mike; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; >> Jonathan Rickard >> *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors >> >> Also, is every application having problems or a specific? >> >> Jonathan Rickard, RHCE, RHCA >> >> Consulting Architect >> >> Red Hat Public Sector >> >> jonny at redhat.com >> M: 210.862.9739 >> @redhatjobs redhatjobs >> @redhatjobs >> >> >> >> >> On Fri, Dec 6, 2019 at 12:06 PM Jonathan Rickard >> wrote: >> >>> Ade, >>> >>> What does that mean? You can't login, you can't deploy? >>> >>> Jonathan Rickard, RHCE, RHCA >>> >>> Consulting Architect >>> >>> Red Hat Public Sector >>> >>> jonny at redhat.com >>> M: 210.862.9739 >>> @redhatjobs redhatjobs >>> @redhatjobs >>> >>> >>> >>> >>> On Fri, Dec 6, 2019 at 12:02 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >>> AFLCMC/HNCP wrote: >>> >>>> ALCON, >>>> >>>> >>>> >>>> The cluster is down again. Please assist. >>>> >>>> >>>> >>>> Most Sincerely, >>>> >>>> >>>> >>>> Ade Abodunrin, GG-12, USAF >>>> >>>> Product Owner (Cybertron & Ginyu Force), Unified Platform >>>> >>>> >>>> >>>> [image: cid:image001.png at 01D4F814.4AA552D0] >>>> >>>> LevelUP Code Works >>>> >>>> Commercial: (210) 890-2113 >>>> >>>> NIPR email: *ademola.abodunrin at us.af.mil * >>>> >>>> >>>> >>>> *From:* Kendall, Russell C >>>> *Sent:* Thursday, December 5, 2019 9:55 AM >>>> *To:* Jonathan Rickard >>>> *Cc:* Miller, Timothy J. ; Keegan Reap < >>>> kreap at redhat.com>; Bubb, Mike ; platformONE at redhat.com; >>>> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >>>> jose.ramirez.50.ctr at us.af.mil>; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >>>> AFLCMC/HNCP ; Jonathan Rickard < >>>> jonny at redhat.com> >>>> *Subject:* [Non-DoD Source] Re: [Platformone] [EXT] Re: Unified >>>> Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Jonny, >>>> >>>> I'll see you Friday at 500 Nav. Travel safe. >>>> >>>> >>>> >>>> V/R, >>>> >>>> Russell C Kendall? >>>> >>>> >>>> ------------------------------ >>>> >>>> *From:* Jonathan Rickard >>>> *Sent:* Wednesday, December 4, 2019 5:29 PM >>>> *To:* Kendall, Russell C >>>> *Cc:* Miller, Timothy J.; Keegan Reap; Bubb, Mike; >>>> platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; >>>> ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard >>>> *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy >>>> Errors >>>> >>>> >>>> >>>> Russell, >>>> >>>> >>>> >>>> I have definitely been terrible with email lately and I apologize for >>>> the slow response times. I get back to San Antonio tomorrow but I have a >>>> pretty full afternoon. I can stop by Friday if you'd like. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> jonny >>>> >>>> >>>> >>>> *Jonathan Rickard**, RHCA* >>>> >>>> Principal Consultant, NAPS >>>> >>>> Red Hat Remote - Texas >>>> >>>> jonny at redhat.com >>>> M: 210-862-9739 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C < >>>> Russell.Kendall at mantech.com> wrote: >>>> >>>> Jonny, >>>> I'd like to suggest you come to 500 to wrap this up, since it seems >>>> there are significant delays in communication that are contributing to >>>> downtime. >>>> V/R, >>>> Russell C Kendall >>>> ________________________________________ >>>> From: Miller, Timothy J. >>>> Sent: Wednesday, December 4, 2019 7:02 AM >>>> To: Jonathan Rickard; Keegan Reap >>>> Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ, >>>> JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >>>> AFLCMC/HNCP; Jonathan Rickard >>>> Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors >>>> >>>> Johnny-- >>>> >>>> Update the issue, if you would be so kind. >>>> >>>> -- T >>>> >>>> ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of >>>> Jonathan Rickard" >>> jrickard at redhat.com> wrote: >>>> >>>> Hey Guys - Sorry for taking so long - this has been completed. >>>> Please run your builds and let us know if you're having any problems. >>>> jonny >>>> Jonathan Rickard, RHCA >>>> Principal Consultant, NAPS >>>> Red Hat Remote - Texas >>>> >>>> jonny at redhat.com >>>> M: 210-862-9739 > >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard < >>>> jrickard at redhat.com> wrote: >>>> >>>> >>>> Russell / Team, >>>> >>>> >>>> We believe we've identified the issue with your application >>>> deploying. In order to rectify the issue I need to evacuate pods so you >>>> will probably see some hiccups while deploying. I will update when this is >>>> resolved. >>>> >>>> >>>> Thanks, >>>> jonny >>>> >>>> Jonathan Rickard, RHCA >>>> Principal Consultant, NAPS >>>> Red Hat Remote - Texas >>>> >>>> jonny at redhat.com >>>> M: 210-862-9739 > >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap >>>> wrote: >>>> >>>> >>>> Hey all, we have opened an issue below, that we believe to be the >>>> cause, we are currently investigating: >>>> >>>> >>>> https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 >>>> >>>> >>>> >>>> On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard < >>>> jrickard at redhat.com> wrote: >>>> >>>> >>>> Russell, >>>> >>>> >>>> Getting more eyes on this @platformONE at redhat.com >>> platformONE at redhat.com> >>>> >>>> >>>> We'll keep you posted. >>>> jonny >>>> Jonathan Rickard, RHCA >>>> Principal Consultant, NAPS >>>> Red Hat Remote - Texas >>>> >>>> jonny at redhat.com >>>> M: 210-862-9739 > >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C < >>>> Russell.Kendall at mantech.com> wrote: >>>> >>>> >>>> Kevin, >>>> >>>> Unfortunately we are receiving deployment errors again. This is the >>>> event: >>>> >>>> 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had >>>> taints that the pod didn't tolerate, 6 node(s) didn't match node selector. >>>> >>>> This is the deployment: >>>> >>>> >>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup >>>> >>>> >>>> V/R, >>>> Russell C Kendall >>>> ________________________________________ >>>> From: Miller, Timothy J. >>>> Sent: Monday, December 2, 2019 2:44:21 PM >>>> To: Kevin O'Donnell >>>> Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 >>>> USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R >>>> Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E >>>> GG-13 USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE >>>> A CTR USAF AFMC AFLCMC/HNCP >>>> Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors >>>> >>>> Tagged you on it. >>>> >>>> -- T >>>> >>>> On 12/2/19, 14:03, "Kevin O'Donnell" wrote: >>>> >>>> Hello, >>>> >>>> >>>> Autoscaling is on our future IAC roadmap. Tim, the additional >>>> ticket would be appreciated. >>>> >>>> >>>> We have swapped out the app/worker instances with m5a.8xlarge >>>> 32 cores, 128gb of ram. Please let us know if you have any other issues. >>>> >>>> >>>> Thanks, >>>> >>>> KEVIN O'DONNELL >>>> ARCHITECT MANAGER >>>> Red Hat Red Hat NA Public Sector Consulting < >>>> https://www.redhat.com/> >>>> >>>> kodonnell at redhat.com >>>> M: 240-605-4654 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. < >>>> tmiller at mitre.org> wrote: >>>> >>>> >>>> I'll open an issue. IaC needs to have instance size as a >>>> host_var to facilitate scaling. >>>> >>>> -- T >>>> >>>> On 12/2/19, 13:15, "Kevin O'Donnell" >>>> wrote: >>>> >>>> Tim, >>>> >>>> >>>> Thanks for the information. We are undersized on the >>>> app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb >>>> of ram. From what I have read each Labs engagement operated on a 3 node >>>> worker cluster with each node having 6core's and 28gb >>>> of ram. We will need to swap out the existing instances >>>> with larger spec's. >>>> >>>> >>>> We are going to try to flush the existing workload out on >>>> one of the workers to see if we can swap them out one at a time. >>>> >>>> >>>> Thanks, >>>> >>>> KEVIN O'DONNELL >>>> ARCHITECT MANAGER >>>> Red Hat Red Hat NA Public Sector Consulting < >>>> https://www.redhat.com/> >>>> >>>> kodonnell at redhat.com >>>> M: 240-605-4654 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. < >>>> tmiller at mitre.org> wrote: >>>> >>>> >>>> Here's what I can see, given the perm limits I seem to be >>>> under: >>>> >>>> - NS:develop-misp-app and NS:lp-develop-misp-app both have >>>> several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned >>>> while trying to fetch something from somewhere (URL isn't recorded in the >>>> stack trace). >>>> >>>> - NS:minishift-misp-app has most of its pods/jobs stuck in >>>> ImagePullBackoff. No detail there in the event stream so I'll see if I can >>>> dig deeper. >>>> >>>> - NS:aam-ci-cd has Jenkins trying to spin up three workers, >>>> those are coming back as unschedulable. >>>> >>>> I can't see into NS:aam-bases or NS:dsop-images b/c of perm >>>> limits. >>>> >>>> I see no DAS-related project(s). >>>> >>>> The MISP stuff needs debugging before calling "blocked" >>>> since it looks like an internal error from this perspective. >>>> >>>> >>>> >>>> In re: AAM Jenkins: If this deployment is coming out of >>>> the OCP storefront, then maybe it should be ephemeral rather than >>>> persistent. If it's a custom deployment, then it probably needs a rethink. >>>> >>>> I'm also not sure why there are two MISP dev projects. >>>> >>>> -- T >>>> >>>> >>>> >>>> On 12/2/19, 12:46, "Kevin O'Donnell" >>>> wrote: >>>> >>>> Russell, >>>> >>>> >>>> Thank you for the information. We can switch out the >>>> instance type for the worker nodes. How much memory is required by the apps? >>>> >>>> >>>> >>>> Thanks, >>>> >>>> KEVIN O'DONNELL >>>> ARCHITECT MANAGER >>>> Red Hat Red Hat NA Public Sector Consulting < >>>> https://www.redhat.com/> >>>> >>>> kodonnell at redhat.com >>>> M: 240-605-4654 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C < >>>> Russell.Kendall at mantech.com> wrote: >>>> >>>> >>>> Kevin, >>>> The lack of resources on >>>> u-p.io >>>> >>>> cluster is hindering development, >>>> testing, and integration of the apps from CCAT AAM DAS, which >>>> is putting one >>>> of our PI goals at risk. >>>> >>>> >>>> We are blocked by the fact that we (CCAT and AAM) >>>> cannot deploy additional pods to the >>>> >>>> unified-platform.io < >>>> http://unified-platform.io> < >>>> http://unified-platform.io> >>>> cluster. We have a subset of containers deployed, but rolling >>>> deployments and new deployments fail. This means that we >>>> are not able to execute integration testing or peer reviews. >>>> We are temporarily working around by NOT >>>> testing/reviewing our code changes live, something that no one likes. Also, >>>> we are now running weeks-old instances of our containers, so we are very >>>> likely producing some technical debt. We currently have >>>> developers >>>> approaching idle or doing non-priority work until the >>>> resource issue is resolved. >>>> >>>> >>>> >>>> Here is the particular error from the OSP cluster I >>>> received while attempting a redeploy of one of our apps. >>>> >>>> >>>> >>>> 0/9 nodes are available: 1 node(s) had taints that the >>>> pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node >>>> selector.11 times in the last minute >>>> >>>> Since we do not have any cluster permissions, I cannot >>>> verify which resource is running out, but from experience, I assess it is a >>>> memory issue. >>>> >>>> >>>> >>>> It appears the cluster has been provisioned with a >>>> silly allocation of node types. Without knowing exactly what was deployed, >>>> it appears only 3 of the 9 hosts are suitable worker nodes. We would expect >>>> the cluster to respond to resource limitations >>>> and >>>> scale, >>>> but if a scheduled downtime is required, please work >>>> with us so we can anticipate. As it stands, the cluster does not support >>>> resources required by CCAT and the other dev teams (AAM, DAS, etc.). We >>>> would accept any downtime if it will improve the >>>> situation, >>>> as we are blocked from progressing under the current >>>> constraints. My hope was we could get the cluster redeployed over the TG >>>> holiday to eliminate developer impact, but as Mark pointed out, there were >>>> limited support folks available. Now I am just >>>> trying >>>> to >>>> minimize the losses. >>>> >>>> >>>> >>>> V/R, >>>> >>>> Russell C Kendall >>>> >>>> >>>> >>>> >>>> >>>> ________________________________________ >>>> From: Kevin O'Donnell >>>> Sent: Monday, December 2, 2019 11:52 AM >>>> To: Kendall, Russell C >>>> Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >>>> AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF >>>> AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); >>>> DIROCCO, >>>> ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, >>>> Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> Hello Russell, >>>> >>>> >>>> Can you elaborate on the term Blocked? What specific >>>> issues are the blockers? >>>> >>>> >>>> >>>> Thanks, >>>> >>>> KEVIN O'DONNELL >>>> ARCHITECT MANAGER >>>> Red Hat Red Hat NA Public Sector Consulting < >>>> https://www.redhat.com/> >>>> >>>> kodonnell at redhat.com >>>> M: 240-605-4654 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C < >>>> Russell.Kendall at mantech.com> wrote: >>>> >>>> >>>> Mark, >>>> >>>> Thank for acknowledging, please be aware the San >>>> Antonio dev teams working in >>>> >>>> >>>> unified-platform.io < >>>> http://unified-platform.io> < >>>> http://unified-platform.io> >>>> are currently blocked. >>>> >>>> V/R, >>>> >>>> Russell C Kendall >>>> >>>> ________________________________________ >>>> From: Mark Nissley >>>> Sent: Monday, December 2, 2019 9:36 AM >>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; >>>> Jonathan Rickard; Chris Kuperstein >>>> Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin >>>> O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike ( >>>> mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; >>>> Miller, Timothy >>>> J.; >>>> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> As noted, I don't suspect much got done on this over >>>> the holiday weekend. I did see the ticket, as dropped some details into it. >>>> I also assigned it to @Jonathan >>>> Rickard and @Chris >>>> Kuperstein . >>>> >>>> >>>> >>>> It looks like short term solutions have been easy but >>>> the issue is recurring. >>>> >>>> >>>> >>>> >>>> Mark NISSLEY, PMP, >>>> CSM, LEAN >>>> >>>> PROGRAM MaNAGER & SR technical Project Manager >>>> North American Consulting, Public Sector >>>> >>>> M: >>>> 850-530-3234 >>>> >>>> >>>> Scheduled Training: October 14-18 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A >>>> GG-12 USAF AFMC AFLCMC/HNCP wrote: >>>> >>>> >>>> Mark/Kevin, >>>> >>>> >>>> I just heard at the team stand up that we are still >>>> blocked. This is also affecting the AAM team from my investigations. >>>> >>>> >>>> Please let me know if there is something we need to do >>>> to move this forward. >>>> >>>> Most Sincerely, >>>> >>>> >>>> Ade Abodunrin, GG-12, USAF >>>> Product Owner (Cybertron & Ginyu Force), Unified >>>> Platform >>>> >>>> >>>> >>>> LevelUP Code Works >>>> Commercial: >>>> (210) 890-2113 >>>> NIPR email: >>>> ademola.abodunrin at us.af.mil >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ________________________________________ >>>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >>>> Sent: Wednesday, November 27, 2019 12:58 PM >>>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>>> austen.bryan.1 at us.af.mil>; Mark Nissley ; Kevin >>>> O'Donnell >>>> ; >>>> Brenna Gordon >>>> Cc: Kendall, Russell C ; >>>> Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 >>>> USAF AFMC ESC/AFLCMC/HNCP >>>> ; Miller, Timothy J. < >>>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >>>> jose.ramirez.50.ctr at us.af.mil> >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> Thanks a lot Capt Bryan! Russell created the ticket on >>>> GitLab UP Node Project. >>>> >>>> >>>> >>>> >>>> Most Sincerely, >>>> >>>> >>>> Ade Abodunrin, GG-12, USAF >>>> Product Owner (Cybertron & Ginyu Force), Unified >>>> Platform >>>> >>>> >>>> >>>> LevelUP Code Works >>>> Commercial: >>>> (210) 890-2113 >>>> NIPR email: >>>> ademola.abodunrin at us.af.mil >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ________________________________________ >>>> From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>>> austen.bryan.1 at us.af.mil> >>>> Sent: Wednesday, November 27, 2019 12:56 PM >>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>>> ademola.abodunrin at us.af.mil>; Mark Nissley ; Kevin >>>> O'Donnell >>>> ; Brenna Gordon < >>>> bgordon at redhat.com> >>>> Cc: Kendall, Russell C ; >>>> Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E GG-13 >>>> USAF AFMC ESC/AFLCMC/HNCP >>>> ; Miller, Timothy J. < >>>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >>>> jose.ramirez.50.ctr at us.af.mil> >>>> Subject: RE: Unified Platform Pod Deploy Errors >>>> >>>> Thanks Ade. The team is thin until next week due to the >>>> holidays but I will make sure it is addressed. Were there any issues >>>> submitted to Gitlab?s UP Node Project on DCCSCR? >>>> >>>> @Mark/Kevin ? can we address? >>>> >>>> -Austen >>>> >>>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>>> ademola.abodunrin at us.af.mil> >>>> >>>> Sent: Wednesday, November 27, 2019 9:51 AM >>>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>>> austen.bryan.1 at us.af.mil> >>>> Cc: Kendall, Russell C ; >>>> Bubb, Mike (mbubb at mitre.org) >>>> Subject: Fw: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Capt Bryan, >>>> >>>> Please see the explanation on the issue that Ginyu >>>> Force is currently experiencing below. >>>> >>>> >>>> >>>> Most Sincerely, >>>> >>>> Ade Abodunrin, GG-12, USAF >>>> Product Owner (Cybertron & Ginyu Force), Unified >>>> Platform >>>> >>>> >>>> LevelUP Code Works >>>> Commercial: (210) 890-2113 >>>> NIPR email: >>>> ademola.abodunrin at us.af.mil >>>> >>>> >>>> >>>> >>>> >>>> ________________________________________ >>>> >>>> From: Kendall, Russell C >>>> Sent: Wednesday, November 27, 2019 9:46 AM >>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>>> ademola.abodunrin at us.af.mil>; Buffaloe, >>>> Christopher ; >>>> Molina, Toby ; >>>> Crace, Jared E ; SANCHEZ, >>>> MARK GG-13 USAF AFMC AFLCMC/HNCP >>>> Cc: >>>> tmiller at mitre.org < >>>> tmiller at mitre.org> >>>> Subject: [Non-DoD Source] Fw: Unified Platform Pod >>>> Deploy Errors >>>> >>>> >>>> >>>> Gentlemen, >>>> >>>> The application development teams working in the new >>>> GovCloud OCP environment (unified-platform.io < >>>> http://unified-platform.io> >>>> >>>> ) >>>> are currently blocked in efforts to deploy new pods >>>> for testing, development, and UAT. >>>> >>>> Red Hat and RogueOne SMEs have been notified and have >>>> attempted some fixes starting on Monday 11/25, but at this point have not >>>> been able to provision resources >>>> sufficient to host CCAT and AAM. >>>> >>>> We have taken steps to minimize our footprint >>>> (eliminating demonstration environment, deleting developer namespaces), but >>>> this is not a sustainable approach, >>>> and has only resulted in moderate improvements in >>>> cluster performance. >>>> >>>> Our hope is the U-P.io cluster compute resources can be >>>> increased very soon, so that we may resume normal development activities. >>>> Our understanding is that >>>> such a scaling requires a complete redeployment of the >>>> cluster, which is unusual, but an acceptable loss to productivity. If the >>>> cluster can be scaled up over the Thanksgiving holiday, the impact will be >>>> minimal to developers and cluster administrators, >>>> alike. >>>> >>>> We are currently collaborating on solutions on the >>>> following MatterMost channel behind the space camp VPN (link below), and >>>> via the email thread forwarded >>>> (further below). >>>> >>>> >>>> >>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node >>>> < >>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> >>>> < >>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> >>>> >>>> Please keep me posted on developments and I will >>>> coordinate developer activities with any scheduled platform outages. >>>> >>>> V/R, >>>> Russell C Kendall >>>> >>>> ________________________________________ >>>> >>>> From: Curran, Daniel M >>>> Sent: Monday, November 25, 2019 2:47 PM >>>> To: Jonathan Rickard >>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>> dlystra at redhat.com ; Sison, >>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>>> Phil >>>> Soliz; >>>> Buffaloe, >>>> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >>>> Joseph J >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Sounds great. Appreciate it. >>>> I'll watch email and Mattermost in case you need more >>>> from us. >>>> >>>> -Daniel >>>> >>>> ________________________________________ >>>> >>>> From: Jonathan Rickard >>>> Sent: Monday, November 25, 2019 2:44 PM >>>> To: Curran, Daniel M >>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>> dlystra at redhat.com ; Sison, >>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>>> Phil >>>> Soliz; >>>> Buffaloe, >>>> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >>>> Joseph J >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Thanks Daniel - >>>> >>>> >>>> >>>> I'll continue to look into the resource issue that >>>> you're seeing - I'd like to identify the root cause and then work with the >>>> team to come up with a solution. >>>> >>>> >>>> >>>> Jonathan Rickard, >>>> RHCA >>>> Principal Consultant, NAPS >>>> Red >>>> Hat Remote - Texas >>>> jonny at redhat.com >>>> >>>> M: 210-862-9739 > >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M < >>>> Daniel.Curran at mantech.com> >>>> wrote: >>>> >>>> >>>> Yeah we hit the limit then had AAM kill some of their >>>> projects and then our pods got scheduled. >>>> We've hit the limit again though. Here's an example pod >>>> that cannot be scheduled >>>> >>>> >>>> >>>> >>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >>>> < >>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth> >>>> < >>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >>>> > >>>> < >>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >>>> > >>>> They're seeing it when their jenkins slaves can't >>>> deploy but it's basically any pod after we hit some limit. >>>> >>>> -Daniel >>>> ________________________________________ >>>> >>>> From: Jonathan Rickard >>>> Sent: Monday, November 25, 2019 1:26 PM >>>> To: Curran, Daniel M >>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>> dlystra at redhat.com ; Sison, >>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>>> Phil >>>> Soliz; >>>> Buffaloe, >>>> Christopher; Torres, Alexander; Crace, Jared E; Middleton, >>>> Joseph J >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Daniel, >>>> >>>> >>>> >>>> I can see that you have 3 mongo pods, 1 chatup and 1 >>>> upbot pod running ... is your app good to go? >>>> >>>> >>>> >>>> Looks like there was an issue with memory on 1 pod, >>>> then some node selector being mismatched - just what i could see in the >>>> events... >>>> >>>> >>>> >>>> >>>> >>>> >>>> Jonathan Rickard, >>>> RHCA >>>> Principal Consultant, NAPS >>>> Red >>>> Hat Remote - Texas >>>> jonny at redhat.com >>>> >>>> M: 210-862-9739 > >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M < >>>> Daniel.Curran at mantech.com> >>>> wrote: >>>> >>>> >>>> Also, AAM was having similar issues. Looks like they >>>> had a lot of namespaces and scaling down the pods on their deployments >>>> didn't help but actually deleting the namespaces >>>> did. >>>> We have pods scheduling now but I'm adding them and >>>> we'd still like to work through what resource limit we were hitting to >>>> avoid this in the future. >>>> >>>> -Daniel >>>> >>>> ________________________________________ >>>> >>>> From: Curran, Daniel M >>>> Sent: Monday, November 25, 2019 12:25 PM >>>> To: Jonathan Rickard >>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>> dlystra at redhat.com ; Sison, >>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>>> Phil >>>> Soliz; >>>> Buffaloe, >>>> Christopher; Torres, Alexander >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Thanks, sir. >>>> Most important for us to get working is "ccat-demo" but >>>> it's also happening in "ccat-dev" and "ccat-ci-cd". >>>> >>>> -Daniel >>>> ________________________________________ >>>> >>>> From: Jonathan Rickard >>>> Sent: Monday, November 25, 2019 12:22 PM >>>> To: Curran, Daniel M >>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>> dlystra at redhat.com ; Sison, >>>> Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; >>>> Phil >>>> Soliz; >>>> Buffaloe, >>>> Christopher; Torres, Alexander >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> What's the name of the project you're working in? I'm >>>> going to be back at my laptop in about 30 and will take a look when I get >>>> there. >>>> >>>> >>>> >>>> Is it just the Jenkins pods failing? >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M < >>>> Daniel.Curran at mantech.com> >>>> wrote: >>>> >>>> >>>> Adding Dean and Alex. >>>> Also, sitting in mattermost if anyone needs to get >>>> online and chat for more information. >>>> >>>> -Daniel >>>> >>>> ________________________________________ >>>> >>>> From: Curran, Daniel M >>>> Sent: Monday, November 25, 2019 12:07 PM >>>> To: >>>> jonny at redhat.com ; >>>> >>>> ckuperst at redhat.com ; Mark >>>> Nissley >>>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, >>>> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >>>> Subject: Re: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Adding Kupe and Mark. >>>> >>>> -Daniel >>>> ________________________________________ >>>> >>>> From: Curran, Daniel M >>>> Sent: Monday, November 25, 2019 11:43 AM >>>> To: >>>> jonny at redhat.com >>>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, >>>> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >>>> Subject: Unified Platform Pod Deploy Errors >>>> >>>> >>>> >>>> Hey Jonny, >>>> >>>> We met briefly at SpaceCAMP a couple weeks ago when >>>> >>>> >>>> >>>> >>>> cluster.unified-platform.io < >>>> http://cluster.unified-platform.io> >>> > >>>> >>>> was stood up. We've been trying to deploy some apps today and >>>> so far today we're getting errors on most (if >>>> not all) of our pods. >>>> >>>> 0/9 nodes are available: 3 Insufficient pods, 6 node(s) >>>> didn't match node selector. >>>> >>>> Is what we're seeing. We were thinking it was some >>>> volume types weren't correct but some of our pods don't even have volumes >>>> attached and still give us this error (i.e. Jenkins >>>> slaves or web frontends without persistent storage). >>>> Any idea what this could be? We're not running out of >>>> space on the nodes themselves are we? >>>> We have a demo scheduled for tomorrow at 9:30 AM CST >>>> and are hoping to get a demo env up for them today but this error came up >>>> unexpectedly. Also, we're here at 500 Navarro >>>> St. in San Antonio working through this in person is >>>> better/easier. >>>> >>>> Thanks, >>>> Daniel Curran >>>> >>>> >>>> >>>> >>>> >>>> ________________________________________ >>>> >>>> >>>> This e-mail and any attachments are intended only for >>>> the use of the addressee(s) named herein and may contain proprietary >>>> information. If you are not the intended recipient of this e-mail or >>>> believe that you received this email in error, please >>>> take >>>> immediate >>>> action to notify the sender of the apparent error by >>>> reply e-mail; permanently delete the e-mail and any attachments from your >>>> computer; and do not disseminate, distribute, use, or copy this message and >>>> any attachments. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> platformONE mailing list >>>> platformONE at redhat.com >>>> https://www.redhat.com/mailman/listinfo/platformone >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: From Russell.Kendall at mantech.com Fri Dec 6 18:54:56 2019 From: Russell.Kendall at mantech.com (Kendall, Russell C) Date: Fri, 6 Dec 2019 18:54:56 +0000 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: References: <1575656455684.39471@ManTech.com> , Message-ID: <1575658495602.19595@ManTech.com> AAM, CCAT, and DAS are all impacted. V/R, Russell C Kendall ________________________________ From: Jonathan Rickard Sent: Friday, December 6, 2019 12:37 PM To: Mark Nissley Cc: Kendall, Russell C; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Bubb, Mike; platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Russell, Is CCAT the only application having problems? I see your project has a few failed pv's. The taints appear to be Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] On Fri, Dec 6, 2019 at 12:26 PM Mark Nissley > wrote: Issue created here: https://dccscr.dsop.io/ginyu-force/ginyu-force/issues/1 Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 On Fri, Dec 6, 2019 at 12:21 PM Kendall, Russell C > wrote: Nine tainted pods. Running apps seem to be okay, where they happened to be running at time the taint flood occurred. This will block IATT efforts, since we can not deploy our apps once we have remediated the vulnerabilities and to confirm remediation with TL and Anchore (there is not local scanning capability). V/R, Russell C Kendall ________________________________ From: Jonathan Rickard > Sent: Friday, December 6, 2019 12:13:29 PM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Cc: Kendall, Russell C; platformONE at redhat.com; Miller, Timothy J.; Keegan Reap; Bubb, Mike; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Also, is every application having problems or a specific? Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] On Fri, Dec 6, 2019 at 12:06 PM Jonathan Rickard > wrote: Ade, What does that mean? You can't login, you can't deploy? Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] On Fri, Dec 6, 2019 at 12:02 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > wrote: ALCON, The cluster is down again. Please assist. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform [cid:image001.png at 01D4F814.4AA552D0] LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil From: Kendall, Russell C Sent: Thursday, December 5, 2019 9:55 AM To: Jonathan Rickard > Cc: Miller, Timothy J. >; Keegan Reap >; Bubb, Mike >; platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >; Jonathan Rickard > Subject: [Non-DoD Source] Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Jonny, I'll see you Friday at 500 Nav. Travel safe. V/R, Russell C Kendall? ________________________________ From: Jonathan Rickard > Sent: Wednesday, December 4, 2019 5:29 PM To: Kendall, Russell C Cc: Miller, Timothy J.; Keegan Reap; Bubb, Mike; platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Russell, I have definitely been terrible with email lately and I apologize for the slow response times. I get back to San Antonio tomorrow but I have a pretty full afternoon. I can stop by Friday if you'd like. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C > wrote: Jonny, I'd like to suggest you come to 500 to wrap this up, since it seems there are significant delays in communication that are contributing to downtime. V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. > Sent: Wednesday, December 4, 2019 7:02 AM To: Jonathan Rickard; Keegan Reap Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Johnny-- Update the issue, if you would be so kind. -- T ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of Jonathan Rickard" on behalf of jrickard at redhat.com> wrote: Hey Guys - Sorry for taking so long - this has been completed. Please run your builds and let us know if you're having any problems. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard > wrote: Russell / Team, We believe we've identified the issue with your application deploying. In order to rectify the issue I need to evacuate pods so you will probably see some hiccups while deploying. I will update when this is resolved. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap > wrote: Hey all, we have opened an issue below, that we believe to be the cause, we are currently investigating: https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard > wrote: Russell, Getting more eyes on this @platformONE at redhat.com > We'll keep you posted. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C > wrote: Kevin, Unfortunately we are receiving deployment errors again. This is the event: 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. This is the deployment: https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. > Sent: Monday, December 2, 2019 2:44:21 PM To: Kevin O'Donnell Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors Tagged you on it. -- T On 12/2/19, 14:03, "Kevin O'Donnell" > wrote: Hello, Autoscaling is on our future IAC roadmap. Tim, the additional ticket would be appreciated. We have swapped out the app/worker instances with m5a.8xlarge 32 cores, 128gb of ram. Please let us know if you have any other issues. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. > wrote: I'll open an issue. IaC needs to have instance size as a host_var to facilitate scaling. -- T On 12/2/19, 13:15, "Kevin O'Donnell" > wrote: Tim, Thanks for the information. We are undersized on the app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb of ram. From what I have read each Labs engagement operated on a 3 node worker cluster with each node having 6core's and 28gb of ram. We will need to swap out the existing instances with larger spec's. We are going to try to flush the existing workload out on one of the workers to see if we can swap them out one at a time. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. > wrote: Here's what I can see, given the perm limits I seem to be under: - NS:develop-misp-app and NS:lp-develop-misp-app both have several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned while trying to fetch something from somewhere (URL isn't recorded in the stack trace). - NS:minishift-misp-app has most of its pods/jobs stuck in ImagePullBackoff. No detail there in the event stream so I'll see if I can dig deeper. - NS:aam-ci-cd has Jenkins trying to spin up three workers, those are coming back as unschedulable. I can't see into NS:aam-bases or NS:dsop-images b/c of perm limits. I see no DAS-related project(s). The MISP stuff needs debugging before calling "blocked" since it looks like an internal error from this perspective. In re: AAM Jenkins: If this deployment is coming out of the OCP storefront, then maybe it should be ephemeral rather than persistent. If it's a custom deployment, then it probably needs a rethink. I'm also not sure why there are two MISP dev projects. -- T On 12/2/19, 12:46, "Kevin O'Donnell" > wrote: Russell, Thank you for the information. We can switch out the instance type for the worker nodes. How much memory is required by the apps? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C > wrote: Kevin, The lack of resources on u-p.io cluster is hindering development, testing, and integration of the apps from CCAT AAM DAS, which is putting one of our PI goals at risk. We are blocked by the fact that we (CCAT and AAM) cannot deploy additional pods to the unified-platform.io cluster. We have a subset of containers deployed, but rolling deployments and new deployments fail. This means that we are not able to execute integration testing or peer reviews. We are temporarily working around by NOT testing/reviewing our code changes live, something that no one likes. Also, we are now running weeks-old instances of our containers, so we are very likely producing some technical debt. We currently have developers approaching idle or doing non-priority work until the resource issue is resolved. Here is the particular error from the OSP cluster I received while attempting a redeploy of one of our apps. 0/9 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node selector.11 times in the last minute Since we do not have any cluster permissions, I cannot verify which resource is running out, but from experience, I assess it is a memory issue. It appears the cluster has been provisioned with a silly allocation of node types. Without knowing exactly what was deployed, it appears only 3 of the 9 hosts are suitable worker nodes. We would expect the cluster to respond to resource limitations and scale, but if a scheduled downtime is required, please work with us so we can anticipate. As it stands, the cluster does not support resources required by CCAT and the other dev teams (AAM, DAS, etc.). We would accept any downtime if it will improve the situation, as we are blocked from progressing under the current constraints. My hope was we could get the cluster redeployed over the TG holiday to eliminate developer impact, but as Mark pointed out, there were limited support folks available. Now I am just trying to minimize the losses. V/R, Russell C Kendall ________________________________________ From: Kevin O'Donnell > Sent: Monday, December 2, 2019 11:52 AM To: Kendall, Russell C Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors Hello Russell, Can you elaborate on the term Blocked? What specific issues are the blockers? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C > wrote: Mark, Thank for acknowledging, please be aware the San Antonio dev teams working in unified-platform.io are currently blocked. V/R, Russell C Kendall ________________________________________ From: Mark Nissley > Sent: Monday, December 2, 2019 9:36 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors As noted, I don't suspect much got done on this over the holiday weekend. I did see the ticket, as dropped some details into it. I also assigned it to @Jonathan Rickard > and @Chris Kuperstein > . It looks like short term solutions have been easy but the issue is recurring. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 Scheduled Training: October 14-18 On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > wrote: Mark/Kevin, I just heard at the team stand up that we are still blocked. This is also affecting the AAM team from my investigations. Please let me know if there is something we need to do to move this forward. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Sent: Wednesday, November 27, 2019 12:58 PM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; Mark Nissley >; Kevin O'Donnell >; Brenna Gordon > Cc: Kendall, Russell C >; Bubb, Mike (mbubb at mitre.org) >; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP >; Miller, Timothy J. >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: Re: Unified Platform Pod Deploy Errors Thanks a lot Capt Bryan! Russell created the ticket on GitLab UP Node Project. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP > Sent: Wednesday, November 27, 2019 12:56 PM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >; Mark Nissley >; Kevin O'Donnell >; Brenna Gordon > Cc: Kendall, Russell C >; Bubb, Mike (mbubb at mitre.org) >; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP >; Miller, Timothy J. >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: RE: Unified Platform Pod Deploy Errors Thanks Ade. The team is thin until next week due to the holidays but I will make sure it is addressed. Were there any issues submitted to Gitlab?s UP Node Project on DCCSCR? @Mark/Kevin ? can we address? -Austen From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > Sent: Wednesday, November 27, 2019 9:51 AM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP > Cc: Kendall, Russell C >; Bubb, Mike (mbubb at mitre.org) > Subject: Fw: Unified Platform Pod Deploy Errors Capt Bryan, Please see the explanation on the issue that Ginyu Force is currently experiencing below. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: Kendall, Russell C > Sent: Wednesday, November 27, 2019 9:46 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >; Buffaloe, Christopher >; Molina, Toby >; Crace, Jared E >; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP > Cc: tmiller at mitre.org > > Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy Errors Gentlemen, The application development teams working in the new GovCloud OCP environment (unified-platform.io ) are currently blocked in efforts to deploy new pods for testing, development, and UAT. Red Hat and RogueOne SMEs have been notified and have attempted some fixes starting on Monday 11/25, but at this point have not been able to provision resources sufficient to host CCAT and AAM. We have taken steps to minimize our footprint (eliminating demonstration environment, deleting developer namespaces), but this is not a sustainable approach, and has only resulted in moderate improvements in cluster performance. Our hope is the U-P.io cluster compute resources can be increased very soon, so that we may resume normal development activities. Our understanding is that such a scaling requires a complete redeployment of the cluster, which is unusual, but an acceptable loss to productivity. If the cluster can be scaled up over the Thanksgiving holiday, the impact will be minimal to developers and cluster administrators, alike. We are currently collaborating on solutions on the following MatterMost channel behind the space camp VPN (link below), and via the email thread forwarded (further below). https://chat.spacecamp.ninja/levelup/channels/unified-platform-node Please keep me posted on developments and I will coordinate developer activities with any scheduled platform outages. V/R, Russell C Kendall ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 2:47 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Sounds great. Appreciate it. I'll watch email and Mattermost in case you need more from us. -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 2:44 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Thanks Daniel - I'll continue to look into the resource issue that you're seeing - I'd like to identify the root cause and then work with the team to come up with a solution. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M > wrote: Yeah we hit the limit then had AAM kill some of their projects and then our pods got scheduled. We've hit the limit again though. Here's an example pod that cannot be scheduled https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth They're seeing it when their jenkins slaves can't deploy but it's basically any pod after we hit some limit. -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 1:26 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Daniel, I can see that you have 3 mongo pods, 1 chatup and 1 upbot pod running ... is your app good to go? Looks like there was an issue with memory on 1 pod, then some node selector being mismatched - just what i could see in the events... Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M > wrote: Also, AAM was having similar issues. Looks like they had a lot of namespaces and scaling down the pods on their deployments didn't help but actually deleting the namespaces did. We have pods scheduling now but I'm adding them and we'd still like to work through what resource limit we were hitting to avoid this in the future. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:25 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors Thanks, sir. Most important for us to get working is "ccat-demo" but it's also happening in "ccat-dev" and "ccat-ci-cd". -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 12:22 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors What's the name of the project you're working in? I'm going to be back at my laptop in about 30 and will take a look when I get there. Is it just the Jenkins pods failing? On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M > wrote: Adding Dean and Alex. Also, sitting in mattermost if anyone needs to get online and chat for more information. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:07 PM To: jonny at redhat.com >; ckuperst at redhat.com >; Mark Nissley Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Re: Unified Platform Pod Deploy Errors Adding Kupe and Mark. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 11:43 AM To: jonny at redhat.com > Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Unified Platform Pod Deploy Errors Hey Jonny, We met briefly at SpaceCAMP a couple weeks ago when cluster.unified-platform.io was stood up. We've been trying to deploy some apps today and so far today we're getting errors on most (if not all) of our pods. 0/9 nodes are available: 3 Insufficient pods, 6 node(s) didn't match node selector. Is what we're seeing. We were thinking it was some volume types weren't correct but some of our pods don't even have volumes attached and still give us this error (i.e. Jenkins slaves or web frontends without persistent storage). Any idea what this could be? We're not running out of space on the nodes themselves are we? We have a demo scheduled for tomorrow at 9:30 AM CST and are hoping to get a demo env up for them today but this error came up unexpectedly. Also, we're here at 500 Navarro St. in San Antonio working through this in person is better/easier. Thanks, Daniel Curran ________________________________________ This e-mail and any attachments are intended only for the use of the addressee(s) named herein and may contain proprietary information. If you are not the intended recipient of this e-mail or believe that you received this email in error, please take immediate action to notify the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments from your computer; and do not disseminate, distribute, use, or copy this message and any attachments. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: image001.png URL: From jrickard at redhat.com Fri Dec 6 19:15:02 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Fri, 6 Dec 2019 13:15:02 -0600 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: <1575658495602.19595@ManTech.com> References: <1575656455684.39471@ManTech.com> <1575658495602.19595@ManTech.com> Message-ID: Russell, Thank you - I removed the taints. Let's give it a shot now. The problem appears to be because the pv couldn't detach from the ebs volume, and once it met the failure threshold it created the taint. Do you know if you are adding any toleration specs in your builds or statefulsets? Thanks and I'm finally on my way. jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs On Fri, Dec 6, 2019 at 12:55 PM Kendall, Russell C < Russell.Kendall at mantech.com> wrote: > AAM, CCAT, and DAS are all impacted. > > > V/R, > > Russell C Kendall > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Friday, December 6, 2019 12:37 PM > *To:* Mark Nissley > *Cc:* Kendall, Russell C; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC > AFLCMC/HNCP; Bubb, Mike; platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF > AFMC AFLCMC/HNCP; Jonathan Rickard > *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors > > Russell, > > Is CCAT the only application having problems? I see your project has a few > failed pv's. > > The taints appear to be > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > > > On Fri, Dec 6, 2019 at 12:26 PM Mark Nissley wrote: > >> Issue created here: >> https://dccscr.dsop.io/ginyu-force/ginyu-force/issues/1 >> >> >> Mark NISSLEY, PMP, CSM, LEAN >> >> PROGRAM MaNAGER & SR technical Project Manager >> >> North American Consulting, Public Sector >> >> >> M: 850-530-3234 >> >> >> >> *Scheduled Training: October 14-18* >> >> >> On Fri, Dec 6, 2019 at 12:21 PM Kendall, Russell C < >> Russell.Kendall at mantech.com> wrote: >> >>> Nine tainted pods. Running apps seem to be okay, where they happened to >>> be running at time the taint flood occurred. This will block IATT efforts, >>> since we can not deploy our apps once we have remediated the >>> vulnerabilities and to confirm remediation with TL and Anchore (there is >>> not local scanning capability). >>> V/R, >>> Russell C Kendall >>> ------------------------------ >>> *From:* Jonathan Rickard >>> *Sent:* Friday, December 6, 2019 12:13:29 PM >>> *To:* ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >>> *Cc:* Kendall, Russell C; platformONE at redhat.com; Miller, Timothy J.; >>> Keegan Reap; Bubb, Mike; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; >>> Jonathan Rickard >>> *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy >>> Errors >>> >>> Also, is every application having problems or a specific? >>> >>> Jonathan Rickard, RHCE, RHCA >>> >>> Consulting Architect >>> >>> Red Hat Public Sector >>> >>> jonny at redhat.com >>> M: 210.862.9739 >>> @redhatjobs redhatjobs >>> @redhatjobs >>> >>> >>> >>> >>> On Fri, Dec 6, 2019 at 12:06 PM Jonathan Rickard >>> wrote: >>> >>>> Ade, >>>> >>>> What does that mean? You can't login, you can't deploy? >>>> >>>> Jonathan Rickard, RHCE, RHCA >>>> >>>> Consulting Architect >>>> >>>> Red Hat Public Sector >>>> >>>> jonny at redhat.com >>>> M: 210.862.9739 >>>> @redhatjobs redhatjobs >>>> @redhatjobs >>>> >>>> >>>> >>>> >>>> On Fri, Dec 6, 2019 at 12:02 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >>>> AFLCMC/HNCP wrote: >>>> >>>>> ALCON, >>>>> >>>>> >>>>> >>>>> The cluster is down again. Please assist. >>>>> >>>>> >>>>> >>>>> Most Sincerely, >>>>> >>>>> >>>>> >>>>> Ade Abodunrin, GG-12, USAF >>>>> >>>>> Product Owner (Cybertron & Ginyu Force), Unified Platform >>>>> >>>>> >>>>> >>>>> [image: cid:image001.png at 01D4F814.4AA552D0] >>>>> >>>>> LevelUP Code Works >>>>> >>>>> Commercial: (210) 890-2113 >>>>> >>>>> NIPR email: *ademola.abodunrin at us.af.mil >>>>> * >>>>> >>>>> >>>>> >>>>> *From:* Kendall, Russell C >>>>> *Sent:* Thursday, December 5, 2019 9:55 AM >>>>> *To:* Jonathan Rickard >>>>> *Cc:* Miller, Timothy J. ; Keegan Reap < >>>>> kreap at redhat.com>; Bubb, Mike ; >>>>> platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >>>>> jose.ramirez.50.ctr at us.af.mil>; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >>>>> AFLCMC/HNCP ; Jonathan Rickard < >>>>> jonny at redhat.com> >>>>> *Subject:* [Non-DoD Source] Re: [Platformone] [EXT] Re: Unified >>>>> Platform Pod Deploy Errors >>>>> >>>>> >>>>> >>>>> Jonny, >>>>> >>>>> I'll see you Friday at 500 Nav. Travel safe. >>>>> >>>>> >>>>> >>>>> V/R, >>>>> >>>>> Russell C Kendall? >>>>> >>>>> >>>>> ------------------------------ >>>>> >>>>> *From:* Jonathan Rickard >>>>> *Sent:* Wednesday, December 4, 2019 5:29 PM >>>>> *To:* Kendall, Russell C >>>>> *Cc:* Miller, Timothy J.; Keegan Reap; Bubb, Mike; >>>>> platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; >>>>> ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard >>>>> *Subject:* Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy >>>>> Errors >>>>> >>>>> >>>>> >>>>> Russell, >>>>> >>>>> >>>>> >>>>> I have definitely been terrible with email lately and I apologize for >>>>> the slow response times. I get back to San Antonio tomorrow but I have a >>>>> pretty full afternoon. I can stop by Friday if you'd like. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> jonny >>>>> >>>>> >>>>> >>>>> *Jonathan Rickard**, RHCA* >>>>> >>>>> Principal Consultant, NAPS >>>>> >>>>> Red Hat Remote - Texas >>>>> >>>>> jonny at redhat.com >>>>> M: 210-862-9739 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C < >>>>> Russell.Kendall at mantech.com> wrote: >>>>> >>>>> Jonny, >>>>> I'd like to suggest you come to 500 to wrap this up, since it seems >>>>> there are significant delays in communication that are contributing to >>>>> downtime. >>>>> V/R, >>>>> Russell C Kendall >>>>> ________________________________________ >>>>> From: Miller, Timothy J. >>>>> Sent: Wednesday, December 4, 2019 7:02 AM >>>>> To: Jonathan Rickard; Keegan Reap >>>>> Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ, >>>>> JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >>>>> AFLCMC/HNCP; Jonathan Rickard >>>>> Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors >>>>> >>>>> Johnny-- >>>>> >>>>> Update the issue, if you would be so kind. >>>>> >>>>> -- T >>>>> >>>>> ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of >>>>> Jonathan Rickard" >>>> jrickard at redhat.com> wrote: >>>>> >>>>> Hey Guys - Sorry for taking so long - this has been completed. >>>>> Please run your builds and let us know if you're having any problems. >>>>> jonny >>>>> Jonathan Rickard, RHCA >>>>> Principal Consultant, NAPS >>>>> Red Hat Remote - Texas >>>>> >>>>> jonny at redhat.com >>>>> M: 210-862-9739 > >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard < >>>>> jrickard at redhat.com> wrote: >>>>> >>>>> >>>>> Russell / Team, >>>>> >>>>> >>>>> We believe we've identified the issue with your application >>>>> deploying. In order to rectify the issue I need to evacuate pods so you >>>>> will probably see some hiccups while deploying. I will update when this is >>>>> resolved. >>>>> >>>>> >>>>> Thanks, >>>>> jonny >>>>> >>>>> Jonathan Rickard, RHCA >>>>> Principal Consultant, NAPS >>>>> Red Hat Remote - Texas >>>>> >>>>> jonny at redhat.com >>>>> M: 210-862-9739 > >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap >>>>> wrote: >>>>> >>>>> >>>>> Hey all, we have opened an issue below, that we believe to be the >>>>> cause, we are currently investigating: >>>>> >>>>> >>>>> https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 >>>>> >>>>> >>>>> >>>>> On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard < >>>>> jrickard at redhat.com> wrote: >>>>> >>>>> >>>>> Russell, >>>>> >>>>> >>>>> Getting more eyes on this @platformONE at redhat.com >>>> platformONE at redhat.com> >>>>> >>>>> >>>>> We'll keep you posted. >>>>> jonny >>>>> Jonathan Rickard, RHCA >>>>> Principal Consultant, NAPS >>>>> Red Hat Remote - Texas >>>>> >>>>> jonny at redhat.com >>>>> M: 210-862-9739 > >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C < >>>>> Russell.Kendall at mantech.com> wrote: >>>>> >>>>> >>>>> Kevin, >>>>> >>>>> Unfortunately we are receiving deployment errors again. This is >>>>> the event: >>>>> >>>>> 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) >>>>> had taints that the pod didn't tolerate, 6 node(s) didn't match node >>>>> selector. >>>>> >>>>> This is the deployment: >>>>> >>>>> >>>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup >>>>> >>>>> >>>>> V/R, >>>>> Russell C Kendall >>>>> ________________________________________ >>>>> From: Miller, Timothy J. >>>>> Sent: Monday, December 2, 2019 2:44:21 PM >>>>> To: Kevin O'Donnell >>>>> Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 >>>>> USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R >>>>> Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E >>>>> GG-13 USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE >>>>> A CTR USAF AFMC AFLCMC/HNCP >>>>> Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors >>>>> >>>>> Tagged you on it. >>>>> >>>>> -- T >>>>> >>>>> On 12/2/19, 14:03, "Kevin O'Donnell" wrote: >>>>> >>>>> Hello, >>>>> >>>>> >>>>> Autoscaling is on our future IAC roadmap. Tim, the additional >>>>> ticket would be appreciated. >>>>> >>>>> >>>>> We have swapped out the app/worker instances with m5a.8xlarge >>>>> 32 cores, 128gb of ram. Please let us know if you have any other issues. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> KEVIN O'DONNELL >>>>> ARCHITECT MANAGER >>>>> Red Hat Red Hat NA Public Sector Consulting < >>>>> https://www.redhat.com/> >>>>> >>>>> kodonnell at redhat.com >>>>> M: 240-605-4654 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. < >>>>> tmiller at mitre.org> wrote: >>>>> >>>>> >>>>> I'll open an issue. IaC needs to have instance size as a >>>>> host_var to facilitate scaling. >>>>> >>>>> -- T >>>>> >>>>> On 12/2/19, 13:15, "Kevin O'Donnell" >>>>> wrote: >>>>> >>>>> Tim, >>>>> >>>>> >>>>> Thanks for the information. We are undersized on the >>>>> app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb >>>>> of ram. From what I have read each Labs engagement operated on a 3 node >>>>> worker cluster with each node having 6core's and 28gb >>>>> of ram. We will need to swap out the existing instances >>>>> with larger spec's. >>>>> >>>>> >>>>> We are going to try to flush the existing workload out on >>>>> one of the workers to see if we can swap them out one at a time. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> KEVIN O'DONNELL >>>>> ARCHITECT MANAGER >>>>> Red Hat Red Hat NA Public Sector Consulting < >>>>> https://www.redhat.com/> >>>>> >>>>> kodonnell at redhat.com >>>>> M: 240-605-4654 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. < >>>>> tmiller at mitre.org> wrote: >>>>> >>>>> >>>>> Here's what I can see, given the perm limits I seem to be >>>>> under: >>>>> >>>>> - NS:develop-misp-app and NS:lp-develop-misp-app both have >>>>> several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned >>>>> while trying to fetch something from somewhere (URL isn't recorded in the >>>>> stack trace). >>>>> >>>>> - NS:minishift-misp-app has most of its pods/jobs stuck in >>>>> ImagePullBackoff. No detail there in the event stream so I'll see if I can >>>>> dig deeper. >>>>> >>>>> - NS:aam-ci-cd has Jenkins trying to spin up three >>>>> workers, those are coming back as unschedulable. >>>>> >>>>> I can't see into NS:aam-bases or NS:dsop-images b/c of >>>>> perm limits. >>>>> >>>>> I see no DAS-related project(s). >>>>> >>>>> The MISP stuff needs debugging before calling "blocked" >>>>> since it looks like an internal error from this perspective. >>>>> >>>>> >>>>> >>>>> In re: AAM Jenkins: If this deployment is coming out of >>>>> the OCP storefront, then maybe it should be ephemeral rather than >>>>> persistent. If it's a custom deployment, then it probably needs a rethink. >>>>> >>>>> I'm also not sure why there are two MISP dev projects. >>>>> >>>>> -- T >>>>> >>>>> >>>>> >>>>> On 12/2/19, 12:46, "Kevin O'Donnell" >>>>> wrote: >>>>> >>>>> Russell, >>>>> >>>>> >>>>> Thank you for the information. We can switch out the >>>>> instance type for the worker nodes. How much memory is required by the apps? >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> KEVIN O'DONNELL >>>>> ARCHITECT MANAGER >>>>> Red Hat Red Hat NA Public Sector Consulting < >>>>> https://www.redhat.com/> >>>>> >>>>> kodonnell at redhat.com >>>>> M: 240-605-4654 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C < >>>>> Russell.Kendall at mantech.com> wrote: >>>>> >>>>> >>>>> Kevin, >>>>> The lack of resources on >>>>> u-p.io >>>>> >>>>> cluster is hindering development, >>>>> testing, and integration of the apps from CCAT AAM DAS, which >>>>> is putting one >>>>> of our PI goals at risk. >>>>> >>>>> >>>>> We are blocked by the fact that we (CCAT and AAM) >>>>> cannot deploy additional pods to the >>>>> >>>>> unified-platform.io < >>>>> http://unified-platform.io> < >>>>> http://unified-platform.io> >>>>> cluster. We have a subset of containers deployed, but rolling >>>>> deployments and new deployments fail. This means that we >>>>> are not able to execute integration testing or peer reviews. >>>>> We are temporarily working around by NOT >>>>> testing/reviewing our code changes live, something that no one likes. Also, >>>>> we are now running weeks-old instances of our containers, so we are very >>>>> likely producing some technical debt. We currently have >>>>> developers >>>>> approaching idle or doing non-priority work until the >>>>> resource issue is resolved. >>>>> >>>>> >>>>> >>>>> Here is the particular error from the OSP cluster I >>>>> received while attempting a redeploy of one of our apps. >>>>> >>>>> >>>>> >>>>> 0/9 nodes are available: 1 node(s) had taints that the >>>>> pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node >>>>> selector.11 times in the last minute >>>>> >>>>> Since we do not have any cluster permissions, I cannot >>>>> verify which resource is running out, but from experience, I assess it is a >>>>> memory issue. >>>>> >>>>> >>>>> >>>>> It appears the cluster has been provisioned with a >>>>> silly allocation of node types. Without knowing exactly what was deployed, >>>>> it appears only 3 of the 9 hosts are suitable worker nodes. We would expect >>>>> the cluster to respond to resource limitations >>>>> and >>>>> scale, >>>>> but if a scheduled downtime is required, please work >>>>> with us so we can anticipate. As it stands, the cluster does not support >>>>> resources required by CCAT and the other dev teams (AAM, DAS, etc.). We >>>>> would accept any downtime if it will improve the >>>>> situation, >>>>> as we are blocked from progressing under the current >>>>> constraints. My hope was we could get the cluster redeployed over the TG >>>>> holiday to eliminate developer impact, but as Mark pointed out, there were >>>>> limited support folks available. Now I am just >>>>> trying >>>>> to >>>>> minimize the losses. >>>>> >>>>> >>>>> >>>>> V/R, >>>>> >>>>> Russell C Kendall >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ________________________________________ >>>>> From: Kevin O'Donnell >>>>> Sent: Monday, December 2, 2019 11:52 AM >>>>> To: Kendall, Russell C >>>>> Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >>>>> AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF >>>>> AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); >>>>> DIROCCO, >>>>> ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, >>>>> Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >>>>> Subject: Re: Unified Platform Pod Deploy Errors >>>>> >>>>> Hello Russell, >>>>> >>>>> >>>>> Can you elaborate on the term Blocked? What specific >>>>> issues are the blockers? >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> KEVIN O'DONNELL >>>>> ARCHITECT MANAGER >>>>> Red Hat Red Hat NA Public Sector Consulting < >>>>> https://www.redhat.com/> >>>>> >>>>> kodonnell at redhat.com >>>>> M: 240-605-4654 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C < >>>>> Russell.Kendall at mantech.com> wrote: >>>>> >>>>> >>>>> Mark, >>>>> >>>>> Thank for acknowledging, please be aware the San >>>>> Antonio dev teams working in >>>>> >>>>> >>>>> unified-platform.io < >>>>> http://unified-platform.io> < >>>>> http://unified-platform.io> >>>>> are currently blocked. >>>>> >>>>> V/R, >>>>> >>>>> Russell C Kendall >>>>> >>>>> ________________________________________ >>>>> From: Mark Nissley >>>>> Sent: Monday, December 2, 2019 9:36 AM >>>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; >>>>> Jonathan Rickard; Chris Kuperstein >>>>> Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin >>>>> O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike ( >>>>> mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; >>>>> Miller, Timothy >>>>> J.; >>>>> RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >>>>> Subject: Re: Unified Platform Pod Deploy Errors >>>>> >>>>> As noted, I don't suspect much got done on this over >>>>> the holiday weekend. I did see the ticket, as dropped some details into it. >>>>> I also assigned it to @Jonathan >>>>> Rickard and @Chris >>>>> Kuperstein . >>>>> >>>>> >>>>> >>>>> It looks like short term solutions have been easy but >>>>> the issue is recurring. >>>>> >>>>> >>>>> >>>>> >>>>> Mark NISSLEY, PMP, >>>>> CSM, LEAN >>>>> >>>>> PROGRAM MaNAGER & SR technical Project Manager >>>>> North American Consulting, Public Sector >>>>> >>>>> M: >>>>> 850-530-3234 >>>>> >>>>> >>>>> Scheduled Training: October 14-18 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A >>>>> GG-12 USAF AFMC AFLCMC/HNCP wrote: >>>>> >>>>> >>>>> Mark/Kevin, >>>>> >>>>> >>>>> I just heard at the team stand up that we are still >>>>> blocked. This is also affecting the AAM team from my investigations. >>>>> >>>>> >>>>> Please let me know if there is something we need to do >>>>> to move this forward. >>>>> >>>>> Most Sincerely, >>>>> >>>>> >>>>> Ade Abodunrin, GG-12, USAF >>>>> Product Owner (Cybertron & Ginyu Force), Unified >>>>> Platform >>>>> >>>>> >>>>> >>>>> LevelUP Code Works >>>>> Commercial: >>>>> (210) 890-2113 >>>>> NIPR email: >>>>> ademola.abodunrin at us.af.mil >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ________________________________________ >>>>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >>>>> Sent: Wednesday, November 27, 2019 12:58 PM >>>>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>>>> austen.bryan.1 at us.af.mil>; Mark Nissley ; Kevin >>>>> O'Donnell >>>>> ; >>>>> Brenna Gordon >>>>> Cc: Kendall, Russell C ; >>>>> Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E >>>>> GG-13 USAF AFMC ESC/AFLCMC/HNCP >>>>> ; Miller, Timothy J. < >>>>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >>>>> jose.ramirez.50.ctr at us.af.mil> >>>>> Subject: Re: Unified Platform Pod Deploy Errors >>>>> >>>>> Thanks a lot Capt Bryan! Russell created the ticket on >>>>> GitLab UP Node Project. >>>>> >>>>> >>>>> >>>>> >>>>> Most Sincerely, >>>>> >>>>> >>>>> Ade Abodunrin, GG-12, USAF >>>>> Product Owner (Cybertron & Ginyu Force), Unified >>>>> Platform >>>>> >>>>> >>>>> >>>>> LevelUP Code Works >>>>> Commercial: >>>>> (210) 890-2113 >>>>> NIPR email: >>>>> ademola.abodunrin at us.af.mil >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ________________________________________ >>>>> From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>>>> austen.bryan.1 at us.af.mil> >>>>> Sent: Wednesday, November 27, 2019 12:56 PM >>>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>>>> ademola.abodunrin at us.af.mil>; Mark Nissley ; >>>>> Kevin >>>>> O'Donnell >>>>> ; Brenna Gordon < >>>>> bgordon at redhat.com> >>>>> Cc: Kendall, Russell C ; >>>>> Bubb, Mike (mbubb at mitre.org) ; DIROCCO, ROGER E >>>>> GG-13 USAF AFMC ESC/AFLCMC/HNCP >>>>> ; Miller, Timothy J. < >>>>> tmiller at mitre.org>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP < >>>>> jose.ramirez.50.ctr at us.af.mil> >>>>> Subject: RE: Unified Platform Pod Deploy Errors >>>>> >>>>> Thanks Ade. The team is thin until next week due to >>>>> the holidays but I will make sure it is addressed. Were there any issues >>>>> submitted to Gitlab?s UP Node Project on DCCSCR? >>>>> >>>>> @Mark/Kevin ? can we address? >>>>> >>>>> -Austen >>>>> >>>>> From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >>>>> >>>>> >>>>> Sent: Wednesday, November 27, 2019 9:51 AM >>>>> To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < >>>>> austen.bryan.1 at us.af.mil> >>>>> Cc: Kendall, Russell C ; >>>>> Bubb, Mike (mbubb at mitre.org) >>>>> Subject: Fw: Unified Platform Pod Deploy Errors >>>>> >>>>> >>>>> >>>>> Capt Bryan, >>>>> >>>>> Please see the explanation on the issue that Ginyu >>>>> Force is currently experiencing below. >>>>> >>>>> >>>>> >>>>> Most Sincerely, >>>>> >>>>> Ade Abodunrin, GG-12, USAF >>>>> Product Owner (Cybertron & Ginyu Force), Unified >>>>> Platform >>>>> >>>>> >>>>> LevelUP Code Works >>>>> Commercial: (210) 890-2113 >>>>> NIPR email: >>>>> ademola.abodunrin at us.af.mil >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ________________________________________ >>>>> >>>>> From: Kendall, Russell C >>>>> Sent: Wednesday, November 27, 2019 9:46 AM >>>>> To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>>>> ademola.abodunrin at us.af.mil>; Buffaloe, >>>>> Christopher ; >>>>> Molina, Toby ; >>>>> Crace, Jared E ; SANCHEZ, >>>>> MARK GG-13 USAF AFMC AFLCMC/HNCP >>>>> Cc: >>>>> tmiller at mitre.org < >>>>> tmiller at mitre.org> >>>>> Subject: [Non-DoD Source] Fw: Unified Platform Pod >>>>> Deploy Errors >>>>> >>>>> >>>>> >>>>> Gentlemen, >>>>> >>>>> The application development teams working in the new >>>>> GovCloud OCP environment (unified-platform.io < >>>>> http://unified-platform.io> >>>>> >>>>> ) >>>>> are currently blocked in efforts to deploy new pods >>>>> for testing, development, and UAT. >>>>> >>>>> Red Hat and RogueOne SMEs have been notified and have >>>>> attempted some fixes starting on Monday 11/25, but at this point have not >>>>> been able to provision resources >>>>> sufficient to host CCAT and AAM. >>>>> >>>>> We have taken steps to minimize our footprint >>>>> (eliminating demonstration environment, deleting developer namespaces), but >>>>> this is not a sustainable approach, >>>>> and has only resulted in moderate improvements in >>>>> cluster performance. >>>>> >>>>> Our hope is the U-P.io cluster compute resources can >>>>> be increased very soon, so that we may resume normal development >>>>> activities. Our understanding is that >>>>> such a scaling requires a complete redeployment of >>>>> the cluster, which is unusual, but an acceptable loss to productivity. If >>>>> the cluster can be scaled up over the Thanksgiving holiday, the impact will >>>>> be minimal to developers and cluster administrators, >>>>> alike. >>>>> >>>>> We are currently collaborating on solutions on the >>>>> following MatterMost channel behind the space camp VPN (link below), and >>>>> via the email thread forwarded >>>>> (further below). >>>>> >>>>> >>>>> >>>>> >>>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node < >>>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> < >>>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> >>>>> < >>>>> https://chat.spacecamp.ninja/levelup/channels/unified-platform-node> >>>>> >>>>> Please keep me posted on developments and I will >>>>> coordinate developer activities with any scheduled platform outages. >>>>> >>>>> V/R, >>>>> Russell C Kendall >>>>> >>>>> ________________________________________ >>>>> >>>>> From: Curran, Daniel M >>>>> Sent: Monday, November 25, 2019 2:47 PM >>>>> To: Jonathan Rickard >>>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>>> dlystra at redhat.com ; >>>>> Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, >>>>> John J; Phil >>>>> Soliz; >>>>> Buffaloe, >>>>> Christopher; Torres, Alexander; Crace, Jared E; >>>>> Middleton, Joseph J >>>>> Subject: Re: Unified Platform Pod Deploy Errors >>>>> >>>>> >>>>> >>>>> Sounds great. Appreciate it. >>>>> I'll watch email and Mattermost in case you need more >>>>> from us. >>>>> >>>>> -Daniel >>>>> >>>>> ________________________________________ >>>>> >>>>> From: Jonathan Rickard >>>>> Sent: Monday, November 25, 2019 2:44 PM >>>>> To: Curran, Daniel M >>>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>>> dlystra at redhat.com ; >>>>> Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, >>>>> John J; Phil >>>>> Soliz; >>>>> Buffaloe, >>>>> Christopher; Torres, Alexander; Crace, Jared E; >>>>> Middleton, Joseph J >>>>> Subject: Re: Unified Platform Pod Deploy Errors >>>>> >>>>> >>>>> >>>>> Thanks Daniel - >>>>> >>>>> >>>>> >>>>> I'll continue to look into the resource issue that >>>>> you're seeing - I'd like to identify the root cause and then work with the >>>>> team to come up with a solution. >>>>> >>>>> >>>>> >>>>> Jonathan Rickard, >>>>> RHCA >>>>> Principal Consultant, NAPS >>>>> Red >>>>> Hat Remote - Texas >>>>> jonny at redhat.com >>>>> >>>>> M: 210-862-9739 > >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M < >>>>> Daniel.Curran at mantech.com> >>>>> wrote: >>>>> >>>>> >>>>> Yeah we hit the limit then had AAM kill some of their >>>>> projects and then our pods got scheduled. >>>>> We've hit the limit again though. Here's an example >>>>> pod that cannot be scheduled >>>>> >>>>> >>>>> >>>>> >>>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >>>>> < >>>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth> >>>>> < >>>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >>>>> > >>>>> < >>>>> https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth >>>>> > >>>>> They're seeing it when their jenkins slaves can't >>>>> deploy but it's basically any pod after we hit some limit. >>>>> >>>>> -Daniel >>>>> ________________________________________ >>>>> >>>>> From: Jonathan Rickard >>>>> Sent: Monday, November 25, 2019 1:26 PM >>>>> To: Curran, Daniel M >>>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>>> dlystra at redhat.com ; >>>>> Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, >>>>> John J; Phil >>>>> Soliz; >>>>> Buffaloe, >>>>> Christopher; Torres, Alexander; Crace, Jared E; >>>>> Middleton, Joseph J >>>>> Subject: Re: Unified Platform Pod Deploy Errors >>>>> >>>>> >>>>> >>>>> Daniel, >>>>> >>>>> >>>>> >>>>> I can see that you have 3 mongo pods, 1 chatup and 1 >>>>> upbot pod running ... is your app good to go? >>>>> >>>>> >>>>> >>>>> Looks like there was an issue with memory on 1 pod, >>>>> then some node selector being mismatched - just what i could see in the >>>>> events... >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Jonathan Rickard, >>>>> RHCA >>>>> Principal Consultant, NAPS >>>>> Red >>>>> Hat Remote - Texas >>>>> jonny at redhat.com >>>>> >>>>> M: 210-862-9739 > >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M < >>>>> Daniel.Curran at mantech.com> >>>>> wrote: >>>>> >>>>> >>>>> Also, AAM was having similar issues. Looks like they >>>>> had a lot of namespaces and scaling down the pods on their deployments >>>>> didn't help but actually deleting the namespaces >>>>> did. >>>>> We have pods scheduling now but I'm adding them and >>>>> we'd still like to work through what resource limit we were hitting to >>>>> avoid this in the future. >>>>> >>>>> -Daniel >>>>> >>>>> ________________________________________ >>>>> >>>>> From: Curran, Daniel M >>>>> Sent: Monday, November 25, 2019 12:25 PM >>>>> To: Jonathan Rickard >>>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>>> dlystra at redhat.com ; >>>>> Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, >>>>> John J; Phil >>>>> Soliz; >>>>> Buffaloe, >>>>> Christopher; Torres, Alexander >>>>> Subject: Re: Unified Platform Pod Deploy Errors >>>>> >>>>> >>>>> >>>>> Thanks, sir. >>>>> Most important for us to get working is "ccat-demo" >>>>> but it's also happening in "ccat-dev" and "ccat-ci-cd". >>>>> >>>>> -Daniel >>>>> ________________________________________ >>>>> >>>>> From: Jonathan Rickard >>>>> Sent: Monday, November 25, 2019 12:22 PM >>>>> To: Curran, Daniel M >>>>> Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; >>>>> dlystra at redhat.com ; >>>>> Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, >>>>> John J; Phil >>>>> Soliz; >>>>> Buffaloe, >>>>> Christopher; Torres, Alexander >>>>> Subject: Re: Unified Platform Pod Deploy Errors >>>>> >>>>> >>>>> >>>>> What's the name of the project you're working in? I'm >>>>> going to be back at my laptop in about 30 and will take a look when I get >>>>> there. >>>>> >>>>> >>>>> >>>>> Is it just the Jenkins pods failing? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M < >>>>> Daniel.Curran at mantech.com> >>>>> wrote: >>>>> >>>>> >>>>> Adding Dean and Alex. >>>>> Also, sitting in mattermost if anyone needs to get >>>>> online and chat for more information. >>>>> >>>>> -Daniel >>>>> >>>>> ________________________________________ >>>>> >>>>> From: Curran, Daniel M >>>>> Sent: Monday, November 25, 2019 12:07 PM >>>>> To: >>>>> jonny at redhat.com ; >>>>> >>>>> ckuperst at redhat.com ; >>>>> Mark Nissley >>>>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, >>>>> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >>>>> Subject: Re: Unified Platform Pod Deploy Errors >>>>> >>>>> >>>>> >>>>> Adding Kupe and Mark. >>>>> >>>>> -Daniel >>>>> ________________________________________ >>>>> >>>>> From: Curran, Daniel M >>>>> Sent: Monday, November 25, 2019 11:43 AM >>>>> To: >>>>> jonny at redhat.com >>>>> Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, >>>>> Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher >>>>> Subject: Unified Platform Pod Deploy Errors >>>>> >>>>> >>>>> >>>>> Hey Jonny, >>>>> >>>>> We met briefly at SpaceCAMP a couple weeks ago when >>>>> >>>>> >>>>> >>>>> >>>>> cluster.unified-platform.io < >>>>> http://cluster.unified-platform.io> < >>>>> http://cluster.unified-platform.io> >>>>> >>>>> was stood up. We've been trying to deploy some apps today and >>>>> so far today we're getting errors on most (if >>>>> not all) of our pods. >>>>> >>>>> 0/9 nodes are available: 3 Insufficient pods, 6 >>>>> node(s) didn't match node selector. >>>>> >>>>> Is what we're seeing. We were thinking it was some >>>>> volume types weren't correct but some of our pods don't even have volumes >>>>> attached and still give us this error (i.e. Jenkins >>>>> slaves or web frontends without persistent storage). >>>>> Any idea what this could be? We're not running out of >>>>> space on the nodes themselves are we? >>>>> We have a demo scheduled for tomorrow at 9:30 AM CST >>>>> and are hoping to get a demo env up for them today but this error came up >>>>> unexpectedly. Also, we're here at 500 Navarro >>>>> St. in San Antonio working through this in person is >>>>> better/easier. >>>>> >>>>> Thanks, >>>>> Daniel Curran >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ________________________________________ >>>>> >>>>> >>>>> This e-mail and any attachments are intended only for >>>>> the use of the addressee(s) named herein and may contain proprietary >>>>> information. If you are not the intended recipient of this e-mail or >>>>> believe that you received this email in error, please >>>>> take >>>>> immediate >>>>> action to notify the sender of the apparent error by >>>>> reply e-mail; permanently delete the e-mail and any attachments from your >>>>> computer; and do not disseminate, distribute, use, or copy this message and >>>>> any attachments. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> platformONE mailing list >>>>> platformONE at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/platformone >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: From Russell.Kendall at mantech.com Fri Dec 6 19:18:06 2019 From: Russell.Kendall at mantech.com (Kendall, Russell C) Date: Fri, 6 Dec 2019 19:18:06 +0000 Subject: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors In-Reply-To: References: <1575656455684.39471@ManTech.com> <1575658495602.19595@ManTech.com>, Message-ID: <1575659885693.3130@ManTech.com> Thanks Jonny, Let's sync up and confirm the storage is being specified correctly to meet the cluster capabilities. In particular, please seek out Dan Curran on CCAT when you arrive, unless you see me first. V/R, Russell C Kendall ________________________________ From: Jonathan Rickard Sent: Friday, December 6, 2019 1:15 PM To: Kendall, Russell C Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Bubb, Mike; platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Russell, Thank you - I removed the taints. Let's give it a shot now. The problem appears to be because the pv couldn't detach from the ebs volume, and once it met the failure threshold it created the taint. Do you know if you are adding any toleration specs in your builds or statefulsets? Thanks and I'm finally on my way. jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] On Fri, Dec 6, 2019 at 12:55 PM Kendall, Russell C > wrote: AAM, CCAT, and DAS are all impacted. V/R, Russell C Kendall ________________________________ From: Jonathan Rickard > Sent: Friday, December 6, 2019 12:37 PM To: Mark Nissley Cc: Kendall, Russell C; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Bubb, Mike; platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Russell, Is CCAT the only application having problems? I see your project has a few failed pv's. The taints appear to be Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] On Fri, Dec 6, 2019 at 12:26 PM Mark Nissley > wrote: Issue created here: https://dccscr.dsop.io/ginyu-force/ginyu-force/issues/1 Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 On Fri, Dec 6, 2019 at 12:21 PM Kendall, Russell C > wrote: Nine tainted pods. Running apps seem to be okay, where they happened to be running at time the taint flood occurred. This will block IATT efforts, since we can not deploy our apps once we have remediated the vulnerabilities and to confirm remediation with TL and Anchore (there is not local scanning capability). V/R, Russell C Kendall ________________________________ From: Jonathan Rickard > Sent: Friday, December 6, 2019 12:13:29 PM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Cc: Kendall, Russell C; platformONE at redhat.com; Miller, Timothy J.; Keegan Reap; Bubb, Mike; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Also, is every application having problems or a specific? Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] On Fri, Dec 6, 2019 at 12:06 PM Jonathan Rickard > wrote: Ade, What does that mean? You can't login, you can't deploy? Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] On Fri, Dec 6, 2019 at 12:02 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > wrote: ALCON, The cluster is down again. Please assist. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform [cid:image001.png at 01D4F814.4AA552D0] LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil From: Kendall, Russell C Sent: Thursday, December 5, 2019 9:55 AM To: Jonathan Rickard > Cc: Miller, Timothy J. >; Keegan Reap >; Bubb, Mike >; platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >; Jonathan Rickard > Subject: [Non-DoD Source] Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Jonny, I'll see you Friday at 500 Nav. Travel safe. V/R, Russell C Kendall? ________________________________ From: Jonathan Rickard > Sent: Wednesday, December 4, 2019 5:29 PM To: Kendall, Russell C Cc: Miller, Timothy J.; Keegan Reap; Bubb, Mike; platformONE at redhat.com; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Russell, I have definitely been terrible with email lately and I apologize for the slow response times. I get back to San Antonio tomorrow but I have a pretty full afternoon. I can stop by Friday if you'd like. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] On Wed, Dec 4, 2019 at 10:16 AM Kendall, Russell C > wrote: Jonny, I'd like to suggest you come to 500 to wrap this up, since it seems there are significant delays in communication that are contributing to downtime. V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. > Sent: Wednesday, December 4, 2019 7:02 AM To: Jonathan Rickard; Keegan Reap Cc: Bubb, Mike; platformONE at redhat.com; Kendall, Russell C; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard Subject: Re: [Platformone] [EXT] Re: Unified Platform Pod Deploy Errors Johnny-- Update the issue, if you would be so kind. -- T ?On 12/3/19, 18:00, "platformone-bounces at redhat.com on behalf of Jonathan Rickard" on behalf of jrickard at redhat.com> wrote: Hey Guys - Sorry for taking so long - this has been completed. Please run your builds and let us know if you're having any problems. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 3:47 PM Jonathan Rickard > wrote: Russell / Team, We believe we've identified the issue with your application deploying. In order to rectify the issue I need to evacuate pods so you will probably see some hiccups while deploying. I will update when this is resolved. Thanks, jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 12:53 PM Keegan Reap > wrote: Hey all, we have opened an issue below, that we believe to be the cause, we are currently investigating: https://dccscr.dsop.io/ginyu-force/up-iatt/issues/32 On Tue, Dec 3, 2019 at 11:27 AM Jonathan Rickard > wrote: Russell, Getting more eyes on this @platformONE at redhat.com > We'll keep you posted. jonny Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Tue, Dec 3, 2019 at 11:59 AM Kendall, Russell C > wrote: Kevin, Unfortunately we are receiving deployment errors again. This is the event: 0/9 nodes are available: 1 node(s) had disk pressure, 2 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. This is the deployment: https://cluster.unified-platform.io/console/project/ccat-dev/browse/dc/driveup V/R, Russell C Kendall ________________________________________ From: Miller, Timothy J. > Sent: Monday, December 2, 2019 2:44:21 PM To: Kevin O'Donnell Cc: Kendall, Russell C; Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: [EXT] Re: Unified Platform Pod Deploy Errors Tagged you on it. -- T On 12/2/19, 14:03, "Kevin O'Donnell" > wrote: Hello, Autoscaling is on our future IAC roadmap. Tim, the additional ticket would be appreciated. We have swapped out the app/worker instances with m5a.8xlarge 32 cores, 128gb of ram. Please let us know if you have any other issues. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 2:29 PM Miller, Timothy J. > wrote: I'll open an issue. IaC needs to have instance size as a host_var to facilitate scaling. -- T On 12/2/19, 13:15, "Kevin O'Donnell" > wrote: Tim, Thanks for the information. We are undersized on the app/worker nodes, the cluster has 3 and they are m5.xlarge with only 16gb of ram. From what I have read each Labs engagement operated on a 3 node worker cluster with each node having 6core's and 28gb of ram. We will need to swap out the existing instances with larger spec's. We are going to try to flush the existing workload out on one of the workers to see if we can swap them out one at a time. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 2:07 PM Miller, Timothy J. > wrote: Here's what I can see, given the perm limits I seem to be under: - NS:develop-misp-app and NS:lp-develop-misp-app both have several sync jobs in CrashLoopBackoff b/c of an HTTP 401 error returned while trying to fetch something from somewhere (URL isn't recorded in the stack trace). - NS:minishift-misp-app has most of its pods/jobs stuck in ImagePullBackoff. No detail there in the event stream so I'll see if I can dig deeper. - NS:aam-ci-cd has Jenkins trying to spin up three workers, those are coming back as unschedulable. I can't see into NS:aam-bases or NS:dsop-images b/c of perm limits. I see no DAS-related project(s). The MISP stuff needs debugging before calling "blocked" since it looks like an internal error from this perspective. In re: AAM Jenkins: If this deployment is coming out of the OCP storefront, then maybe it should be ephemeral rather than persistent. If it's a custom deployment, then it probably needs a rethink. I'm also not sure why there are two MISP dev projects. -- T On 12/2/19, 12:46, "Kevin O'Donnell" > wrote: Russell, Thank you for the information. We can switch out the instance type for the worker nodes. How much memory is required by the apps? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 1:32 PM Kendall, Russell C > wrote: Kevin, The lack of resources on u-p.io cluster is hindering development, testing, and integration of the apps from CCAT AAM DAS, which is putting one of our PI goals at risk. We are blocked by the fact that we (CCAT and AAM) cannot deploy additional pods to the unified-platform.io cluster. We have a subset of containers deployed, but rolling deployments and new deployments fail. This means that we are not able to execute integration testing or peer reviews. We are temporarily working around by NOT testing/reviewing our code changes live, something that no one likes. Also, we are now running weeks-old instances of our containers, so we are very likely producing some technical debt. We currently have developers approaching idle or doing non-priority work until the resource issue is resolved. Here is the particular error from the OSP cluster I received while attempting a redeploy of one of our apps. 0/9 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 Insufficient pods, 6 node(s) didn't match node selector.11 times in the last minute Since we do not have any cluster permissions, I cannot verify which resource is running out, but from experience, I assess it is a memory issue. It appears the cluster has been provisioned with a silly allocation of node types. Without knowing exactly what was deployed, it appears only 3 of the 9 hosts are suitable worker nodes. We would expect the cluster to respond to resource limitations and scale, but if a scheduled downtime is required, please work with us so we can anticipate. As it stands, the cluster does not support resources required by CCAT and the other dev teams (AAM, DAS, etc.). We would accept any downtime if it will improve the situation, as we are blocked from progressing under the current constraints. My hope was we could get the cluster redeployed over the TG holiday to eliminate developer impact, but as Mark pointed out, there were limited support folks available. Now I am just trying to minimize the losses. V/R, Russell C Kendall ________________________________________ From: Kevin O'Donnell > Sent: Monday, December 2, 2019 11:52 AM To: Kendall, Russell C Cc: Mark Nissley; ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Brenna Gordon; Bubb, Mike (mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors Hello Russell, Can you elaborate on the term Blocked? What specific issues are the blockers? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 On Mon, Dec 2, 2019 at 11:11 AM Kendall, Russell C > wrote: Mark, Thank for acknowledging, please be aware the San Antonio dev teams working in unified-platform.io are currently blocked. V/R, Russell C Kendall ________________________________________ From: Mark Nissley > Sent: Monday, December 2, 2019 9:36 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP; Jonathan Rickard; Chris Kuperstein Cc: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP; Kevin O'Donnell; Brenna Gordon; Kendall, Russell C; Bubb, Mike (mbubb at mitre.org); DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Miller, Timothy J.; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP Subject: Re: Unified Platform Pod Deploy Errors As noted, I don't suspect much got done on this over the holiday weekend. I did see the ticket, as dropped some details into it. I also assigned it to @Jonathan Rickard > and @Chris Kuperstein > . It looks like short term solutions have been easy but the issue is recurring. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 Scheduled Training: October 14-18 On Mon, Dec 2, 2019 at 10:21 AM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > wrote: Mark/Kevin, I just heard at the team stand up that we are still blocked. This is also affecting the AAM team from my investigations. Please let me know if there is something we need to do to move this forward. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Sent: Wednesday, November 27, 2019 12:58 PM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; Mark Nissley >; Kevin O'Donnell >; Brenna Gordon > Cc: Kendall, Russell C >; Bubb, Mike (mbubb at mitre.org) >; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP >; Miller, Timothy J. >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: Re: Unified Platform Pod Deploy Errors Thanks a lot Capt Bryan! Russell created the ticket on GitLab UP Node Project. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP > Sent: Wednesday, November 27, 2019 12:56 PM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >; Mark Nissley >; Kevin O'Donnell >; Brenna Gordon > Cc: Kendall, Russell C >; Bubb, Mike (mbubb at mitre.org) >; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP >; Miller, Timothy J. >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Subject: RE: Unified Platform Pod Deploy Errors Thanks Ade. The team is thin until next week due to the holidays but I will make sure it is addressed. Were there any issues submitted to Gitlab?s UP Node Project on DCCSCR? @Mark/Kevin ? can we address? -Austen From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > Sent: Wednesday, November 27, 2019 9:51 AM To: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP > Cc: Kendall, Russell C >; Bubb, Mike (mbubb at mitre.org) > Subject: Fw: Unified Platform Pod Deploy Errors Capt Bryan, Please see the explanation on the issue that Ginyu Force is currently experiencing below. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil ________________________________________ From: Kendall, Russell C > Sent: Wednesday, November 27, 2019 9:46 AM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP >; Buffaloe, Christopher >; Molina, Toby >; Crace, Jared E >; SANCHEZ, MARK GG-13 USAF AFMC AFLCMC/HNCP > Cc: tmiller at mitre.org > > Subject: [Non-DoD Source] Fw: Unified Platform Pod Deploy Errors Gentlemen, The application development teams working in the new GovCloud OCP environment (unified-platform.io ) are currently blocked in efforts to deploy new pods for testing, development, and UAT. Red Hat and RogueOne SMEs have been notified and have attempted some fixes starting on Monday 11/25, but at this point have not been able to provision resources sufficient to host CCAT and AAM. We have taken steps to minimize our footprint (eliminating demonstration environment, deleting developer namespaces), but this is not a sustainable approach, and has only resulted in moderate improvements in cluster performance. Our hope is the U-P.io cluster compute resources can be increased very soon, so that we may resume normal development activities. Our understanding is that such a scaling requires a complete redeployment of the cluster, which is unusual, but an acceptable loss to productivity. If the cluster can be scaled up over the Thanksgiving holiday, the impact will be minimal to developers and cluster administrators, alike. We are currently collaborating on solutions on the following MatterMost channel behind the space camp VPN (link below), and via the email thread forwarded (further below). https://chat.spacecamp.ninja/levelup/channels/unified-platform-node Please keep me posted on developments and I will coordinate developer activities with any scheduled platform outages. V/R, Russell C Kendall ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 2:47 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Sounds great. Appreciate it. I'll watch email and Mattermost in case you need more from us. -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 2:44 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Thanks Daniel - I'll continue to look into the resource issue that you're seeing - I'd like to identify the root cause and then work with the team to come up with a solution. Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 2:17 PM Curran, Daniel M > wrote: Yeah we hit the limit then had AAM kill some of their projects and then our pods got scheduled. We've hit the limit again though. Here's an example pod that cannot be scheduled https://cluster.unified-platform.io/console/project/ccat-dev/browse/pods/sso-8-2cjth They're seeing it when their jenkins slaves can't deploy but it's basically any pod after we hit some limit. -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 1:26 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander; Crace, Jared E; Middleton, Joseph J Subject: Re: Unified Platform Pod Deploy Errors Daniel, I can see that you have 3 mongo pods, 1 chatup and 1 upbot pod running ... is your app good to go? Looks like there was an issue with memory on 1 pod, then some node selector being mismatched - just what i could see in the events... Jonathan Rickard, RHCA Principal Consultant, NAPS Red Hat Remote - Texas jonny at redhat.com M: 210-862-9739 On Mon, Nov 25, 2019 at 12:50 PM Curran, Daniel M > wrote: Also, AAM was having similar issues. Looks like they had a lot of namespaces and scaling down the pods on their deployments didn't help but actually deleting the namespaces did. We have pods scheduling now but I'm adding them and we'd still like to work through what resource limit we were hitting to avoid this in the future. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:25 PM To: Jonathan Rickard Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors Thanks, sir. Most important for us to get working is "ccat-demo" but it's also happening in "ccat-dev" and "ccat-ci-cd". -Daniel ________________________________________ From: Jonathan Rickard > Sent: Monday, November 25, 2019 12:22 PM To: Curran, Daniel M Cc: Jonathan Rickard; Chris Kuperstein; Mark Nissley; dlystra at redhat.com >; Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher; Torres, Alexander Subject: Re: Unified Platform Pod Deploy Errors What's the name of the project you're working in? I'm going to be back at my laptop in about 30 and will take a look when I get there. Is it just the Jenkins pods failing? On Mon, Nov 25, 2019, 12:20 PM Curran, Daniel M > wrote: Adding Dean and Alex. Also, sitting in mattermost if anyone needs to get online and chat for more information. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 12:07 PM To: jonny at redhat.com >; ckuperst at redhat.com >; Mark Nissley Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Re: Unified Platform Pod Deploy Errors Adding Kupe and Mark. -Daniel ________________________________________ From: Curran, Daniel M Sent: Monday, November 25, 2019 11:43 AM To: jonny at redhat.com > Cc: Sison, Mark Anthony; Cepeda, Rolando; Kendall, Russell C; Andrichak IV, John J; Phil Soliz; Buffaloe, Christopher Subject: Unified Platform Pod Deploy Errors Hey Jonny, We met briefly at SpaceCAMP a couple weeks ago when cluster.unified-platform.io was stood up. We've been trying to deploy some apps today and so far today we're getting errors on most (if not all) of our pods. 0/9 nodes are available: 3 Insufficient pods, 6 node(s) didn't match node selector. Is what we're seeing. We were thinking it was some volume types weren't correct but some of our pods don't even have volumes attached and still give us this error (i.e. Jenkins slaves or web frontends without persistent storage). Any idea what this could be? We're not running out of space on the nodes themselves are we? We have a demo scheduled for tomorrow at 9:30 AM CST and are hoping to get a demo env up for them today but this error came up unexpectedly. Also, we're here at 500 Navarro St. in San Antonio working through this in person is better/easier. Thanks, Daniel Curran ________________________________________ This e-mail and any attachments are intended only for the use of the addressee(s) named herein and may contain proprietary information. If you are not the intended recipient of this e-mail or believe that you received this email in error, please take immediate action to notify the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments from your computer; and do not disseminate, distribute, use, or copy this message and any attachments. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: image001.png URL: From jrickard at redhat.com Sun Dec 8 04:57:14 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Sat, 7 Dec 2019 22:57:14 -0600 Subject: [Platformone] UP-Node Support Message-ID: Team, In order to better facilitate any assistance that may be needed during cluster disruptions, we wanted to provide you some guidelines for reporting and handling issues. This helps us triage faster which ultimately helps us resolve the problem faster. The first thing you should do is submit a gitlab issue to: https://dccscr.dsop.io ; if you're blocked, then use the "blocked" label. Then send an email to platformONE at redhat.com and provide the issue number and some details about the problem. Ideally we'd like to know the following information. We understand that you may not be able to provide everything, but please provide as much information as possible. What isn't working? Logins, Builds, Deployments, pods crashing .. What is the scope of the problem? Are all app teams experiencing the same issues, or is it isolated to your app? What is the project name having problems? thanks, jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Mon Dec 9 21:41:49 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Mon, 9 Dec 2019 15:41:49 -0600 Subject: [Platformone] AWS Access Slowdowns Message-ID: Adrian / Andrew, It was reported that there's some significant slowness while sshing into the EC2's - would either of you be able to take a look? Dino Arachchi and Dean Lystra are running point on this. Thanks! jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnissley at redhat.com Mon Dec 9 22:04:15 2019 From: mnissley at redhat.com (Mark Nissley) Date: Mon, 9 Dec 2019 16:04:15 -0600 Subject: [Platformone] AWS Access Slowdowns In-Reply-To: References: Message-ID: Just wanted to add that we are blocked on this right now. We continue to work it, but could use a call from Andrew or Adrian to Dino at 848-203-1809 at your early convenience! Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled Training: October 14-18* On Mon, Dec 9, 2019 at 3:42 PM Jonathan Rickard wrote: > Adrian / Andrew, > > It was reported that there's some significant slowness while sshing into > the EC2's - would either of you be able to take a look? > > Dino Arachchi and Dean Lystra are running point on this. > > Thanks! > jonny > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrian.nunez at bylight.com Mon Dec 9 22:06:11 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Mon, 9 Dec 2019 22:06:11 +0000 Subject: [Platformone] AWS Access Slowdowns In-Reply-To: References: , Message-ID: Call me. What EC2's? Get Outlook for Android ________________________________ From: Mark Nissley Sent: Monday, December 9, 2019 5:04:15 PM To: Jonathan Rickard Cc: Adrian Nunez ; Goss, Andrew [Semper Valens Solutions (SVS)] ; platformONE at redhat.com Subject: Re: [Platformone] AWS Access Slowdowns [EXTERNAL EMAIL] Just wanted to add that we are blocked on this right now. We continue to work it, but could use a call from Andrew or Adrian to Dino at 848-203-1809 at your early convenience! Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 On Mon, Dec 9, 2019 at 3:42 PM Jonathan Rickard > wrote: Adrian / Andrew, It was reported that there's some significant slowness while sshing into the EC2's - would either of you be able to take a look? Dino Arachchi and Dean Lystra are running point on this. Thanks! jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From darachch at redhat.com Mon Dec 9 22:11:08 2019 From: darachch at redhat.com (Dino Arachchi) Date: Mon, 9 Dec 2019 16:11:08 -0600 Subject: [Platformone] AWS Access Slowdowns In-Reply-To: References: Message-ID: Adrian, What number can I reach you at?Alternatively we can jump on a BlueJeans so I can share my screen: https://bluejeans.com/189190191 Just let me know what works best for you. Best Regards, DINO ARACHCHI SENIOR CONSULTANT darachch at redhat.com M: 848-203-1809 On Mon, Dec 9, 2019 at 4:06 PM Adrian Nunez wrote: > Call me. What EC2's? > > Get Outlook for Android > > ------------------------------ > *From:* Mark Nissley > *Sent:* Monday, December 9, 2019 5:04:15 PM > *To:* Jonathan Rickard > *Cc:* Adrian Nunez ; Goss, Andrew [Semper > Valens Solutions (SVS)] ; > platformONE at redhat.com > *Subject:* Re: [Platformone] AWS Access Slowdowns > > > [EXTERNAL EMAIL] > Just wanted to add that we are blocked on this right now. We continue to > work it, but could use a call from Andrew or Adrian to Dino at 848-203-1809 > at your early convenience! > > > Mark NISSLEY, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled Training: October 14-18* > > > On Mon, Dec 9, 2019 at 3:42 PM Jonathan Rickard > wrote: > > Adrian / Andrew, > > It was reported that there's some significant slowness while sshing into > the EC2's - would either of you be able to take a look? > > Dino Arachchi and Dean Lystra are running point on this. > > Thanks! > jonny > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dlystra at redhat.com Mon Dec 9 22:12:04 2019 From: dlystra at redhat.com (Dean Lystra) Date: Mon, 9 Dec 2019 16:12:04 -0600 Subject: [Platformone] AWS Access Slowdowns In-Reply-To: References: Message-ID: Connections between instances in the staging VPC are slow when they are in different subnets. Any instance in 10.10.0.0/20 hangs for a minute when a ping or SSH is attempted to a server in 10.10.96.0/20. On Mon, Dec 9, 2019, 4:06 PM Adrian Nunez wrote: > Call me. What EC2's? > > Get Outlook for Android > > ------------------------------ > *From:* Mark Nissley > *Sent:* Monday, December 9, 2019 5:04:15 PM > *To:* Jonathan Rickard > *Cc:* Adrian Nunez ; Goss, Andrew [Semper > Valens Solutions (SVS)] ; > platformONE at redhat.com > *Subject:* Re: [Platformone] AWS Access Slowdowns > > > [EXTERNAL EMAIL] > Just wanted to add that we are blocked on this right now. We continue to > work it, but could use a call from Andrew or Adrian to Dino at 848-203-1809 > at your early convenience! > > > Mark NISSLEY, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled Training: October 14-18* > > > On Mon, Dec 9, 2019 at 3:42 PM Jonathan Rickard > wrote: > > Adrian / Andrew, > > It was reported that there's some significant slowness while sshing into > the EC2's - would either of you be able to take a look? > > Dino Arachchi and Dean Lystra are running point on this. > > Thanks! > jonny > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrian.nunez at bylight.com Mon Dec 9 22:13:39 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Mon, 9 Dec 2019 22:13:39 +0000 Subject: [Platformone] AWS Access Slowdowns In-Reply-To: References: , Message-ID: Ok. I'm in the Blue Jeans channel. Slow EC2 SSH is usually from a service running on the EC2 Get Outlook for Android ________________________________ From: Dean Lystra Sent: Monday, December 9, 2019 5:12:04 PM To: Adrian Nunez Cc: Mark Nissley ; Jonathan Rickard ; platformONE at redhat.com Subject: Re: [Platformone] AWS Access Slowdowns [EXTERNAL EMAIL] Connections between instances in the staging VPC are slow when they are in different subnets. Any instance in 10.10.0.0/20 hangs for a minute when a ping or SSH is attempted to a server in 10.10.96.0/20. On Mon, Dec 9, 2019, 4:06 PM Adrian Nunez > wrote: Call me. What EC2's? Get Outlook for Android ________________________________ From: Mark Nissley > Sent: Monday, December 9, 2019 5:04:15 PM To: Jonathan Rickard > Cc: Adrian Nunez >; Goss, Andrew [Semper Valens Solutions (SVS)] >; platformONE at redhat.com > Subject: Re: [Platformone] AWS Access Slowdowns [EXTERNAL EMAIL] Just wanted to add that we are blocked on this right now. We continue to work it, but could use a call from Andrew or Adrian to Dino at 848-203-1809 at your early convenience! Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 On Mon, Dec 9, 2019 at 3:42 PM Jonathan Rickard > wrote: Adrian / Andrew, It was reported that there's some significant slowness while sshing into the EC2's - would either of you be able to take a look? Dino Arachchi and Dean Lystra are running point on this. Thanks! jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dlystra at redhat.com Mon Dec 9 22:15:20 2019 From: dlystra at redhat.com (Dean Lystra) Date: Mon, 9 Dec 2019 16:15:20 -0600 Subject: [Platformone] AWS Access Slowdowns In-Reply-To: References: Message-ID: This behavior only occurs when they are not in the same subnet. We attempted connections from another EC2 in 10.10.0.0/20 and they were connecting instantly. On Mon, Dec 9, 2019, 4:13 PM Adrian Nunez wrote: > Ok. I'm in the Blue Jeans channel. Slow EC2 SSH is usually from a service > running on the EC2 > > Get Outlook for Android > > ------------------------------ > *From:* Dean Lystra > *Sent:* Monday, December 9, 2019 5:12:04 PM > *To:* Adrian Nunez > *Cc:* Mark Nissley ; Jonathan Rickard < > jrickard at redhat.com>; platformONE at redhat.com > *Subject:* Re: [Platformone] AWS Access Slowdowns > > > [EXTERNAL EMAIL] > Connections between instances in the staging VPC are slow when they are in > different subnets. Any instance in 10.10.0.0/20 hangs for a minute when a > ping or SSH is attempted to a server in 10.10.96.0/20. > > On Mon, Dec 9, 2019, 4:06 PM Adrian Nunez > wrote: > > Call me. What EC2's? > > Get Outlook for Android > > ------------------------------ > *From:* Mark Nissley > *Sent:* Monday, December 9, 2019 5:04:15 PM > *To:* Jonathan Rickard > *Cc:* Adrian Nunez ; Goss, Andrew [Semper > Valens Solutions (SVS)] ; > platformONE at redhat.com > *Subject:* Re: [Platformone] AWS Access Slowdowns > > > [EXTERNAL EMAIL] > Just wanted to add that we are blocked on this right now. We continue to > work it, but could use a call from Andrew or Adrian to Dino at 848-203-1809 > at your early convenience! > > > Mark NISSLEY, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled Training: October 14-18* > > > On Mon, Dec 9, 2019 at 3:42 PM Jonathan Rickard > wrote: > > Adrian / Andrew, > > It was reported that there's some significant slowness while sshing into > the EC2's - would either of you be able to take a look? > > Dino Arachchi and Dean Lystra are running point on this. > > Thanks! > jonny > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylor at redhat.com Mon Dec 9 22:24:29 2019 From: taylor at redhat.com (Taylor Biggs) Date: Mon, 9 Dec 2019 17:24:29 -0500 Subject: [Platformone] AWS Access Slowdowns In-Reply-To: References: Message-ID: Often I'll see this because SSH likes to reverse-DNS lookup people before they get all the way logged in. So you might have a DNS lookup issue going on (just look at the output of "w" command). Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Mon, Dec 9, 2019 at 5:15 PM Dean Lystra wrote: > This behavior only occurs when they are not in the same subnet. We > attempted connections from another EC2 in 10.10.0.0/20 and they were > connecting instantly. > > On Mon, Dec 9, 2019, 4:13 PM Adrian Nunez > wrote: > >> Ok. I'm in the Blue Jeans channel. Slow EC2 SSH is usually from a service >> running on the EC2 >> >> Get Outlook for Android >> >> ------------------------------ >> *From:* Dean Lystra >> *Sent:* Monday, December 9, 2019 5:12:04 PM >> *To:* Adrian Nunez >> *Cc:* Mark Nissley ; Jonathan Rickard < >> jrickard at redhat.com>; platformONE at redhat.com >> *Subject:* Re: [Platformone] AWS Access Slowdowns >> >> >> [EXTERNAL EMAIL] >> Connections between instances in the staging VPC are slow when they are >> in different subnets. Any instance in 10.10.0.0/20 hangs for a minute >> when a ping or SSH is attempted to a server in 10.10.96.0/20. >> >> On Mon, Dec 9, 2019, 4:06 PM Adrian Nunez >> wrote: >> >> Call me. What EC2's? >> >> Get Outlook for Android >> >> ------------------------------ >> *From:* Mark Nissley >> *Sent:* Monday, December 9, 2019 5:04:15 PM >> *To:* Jonathan Rickard >> *Cc:* Adrian Nunez ; Goss, Andrew [Semper >> Valens Solutions (SVS)] ; >> platformONE at redhat.com >> *Subject:* Re: [Platformone] AWS Access Slowdowns >> >> >> [EXTERNAL EMAIL] >> Just wanted to add that we are blocked on this right now. We continue to >> work it, but could use a call from Andrew or Adrian to Dino at 848-203-1809 >> at your early convenience! >> >> >> Mark NISSLEY, PMP, CSM, LEAN >> >> PROGRAM MaNAGER & SR technical Project Manager >> >> North American Consulting, Public Sector >> >> >> M: 850-530-3234 >> >> >> >> *Scheduled Training: October 14-18* >> >> >> On Mon, Dec 9, 2019 at 3:42 PM Jonathan Rickard >> wrote: >> >> Adrian / Andrew, >> >> It was reported that there's some significant slowness while sshing into >> the EC2's - would either of you be able to take a look? >> >> Dino Arachchi and Dean Lystra are running point on this. >> >> Thanks! >> jonny >> >> Jonathan Rickard, RHCE, RHCA >> >> Consulting Architect >> >> Red Hat Public Sector >> >> jonny at redhat.com >> M: 210.862.9739 >> @redhatjobs redhatjobs >> @redhatjobs >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From darachch at redhat.com Mon Dec 9 23:16:46 2019 From: darachch at redhat.com (Dino Arachchi) Date: Mon, 9 Dec 2019 17:16:46 -0600 Subject: [Platformone] AWS Access Slowdowns In-Reply-To: References: Message-ID: Just so everyone is aware of the status of this issue, I worked with Adrian to get a support ticket created in AWS: Case ID 6655545661 https://console.amazonaws-us-gov.com/support/cases#/6655545661/en I'm listed as the primary POC for the case, and Jonny is listed there as well, but anyone is free to monitor, etc. Best Regards, DINO ARACHCHI SENIOR CONSULTANT darachch at redhat.com M: 848-203-1809 On Mon, Dec 9, 2019 at 4:24 PM Taylor Biggs wrote: > Often I'll see this because SSH likes to reverse-DNS lookup people before > they get all the way logged in. So you might have a DNS lookup issue going > on (just look at the output of "w" command). > > Thanks, > Taylor > > ---- > Taylor Biggs > taylor at redhat.com > 850-449-2220 > > > > On Mon, Dec 9, 2019 at 5:15 PM Dean Lystra wrote: > >> This behavior only occurs when they are not in the same subnet. We >> attempted connections from another EC2 in 10.10.0.0/20 and they were >> connecting instantly. >> >> On Mon, Dec 9, 2019, 4:13 PM Adrian Nunez >> wrote: >> >>> Ok. I'm in the Blue Jeans channel. Slow EC2 SSH is usually from a >>> service running on the EC2 >>> >>> Get Outlook for Android >>> >>> ------------------------------ >>> *From:* Dean Lystra >>> *Sent:* Monday, December 9, 2019 5:12:04 PM >>> *To:* Adrian Nunez >>> *Cc:* Mark Nissley ; Jonathan Rickard < >>> jrickard at redhat.com>; platformONE at redhat.com >>> *Subject:* Re: [Platformone] AWS Access Slowdowns >>> >>> >>> [EXTERNAL EMAIL] >>> Connections between instances in the staging VPC are slow when they are >>> in different subnets. Any instance in 10.10.0.0/20 hangs for a minute >>> when a ping or SSH is attempted to a server in 10.10.96.0/20. >>> >>> On Mon, Dec 9, 2019, 4:06 PM Adrian Nunez >>> wrote: >>> >>> Call me. What EC2's? >>> >>> Get Outlook for Android >>> >>> ------------------------------ >>> *From:* Mark Nissley >>> *Sent:* Monday, December 9, 2019 5:04:15 PM >>> *To:* Jonathan Rickard >>> *Cc:* Adrian Nunez ; Goss, Andrew [Semper >>> Valens Solutions (SVS)] ; >>> platformONE at redhat.com >>> *Subject:* Re: [Platformone] AWS Access Slowdowns >>> >>> >>> [EXTERNAL EMAIL] >>> Just wanted to add that we are blocked on this right now. We continue to >>> work it, but could use a call from Andrew or Adrian to Dino at 848-203-1809 >>> at your early convenience! >>> >>> >>> Mark NISSLEY, PMP, CSM, LEAN >>> >>> PROGRAM MaNAGER & SR technical Project Manager >>> >>> North American Consulting, Public Sector >>> >>> >>> M: 850-530-3234 >>> >>> >>> >>> *Scheduled Training: October 14-18* >>> >>> >>> On Mon, Dec 9, 2019 at 3:42 PM Jonathan Rickard >>> wrote: >>> >>> Adrian / Andrew, >>> >>> It was reported that there's some significant slowness while sshing into >>> the EC2's - would either of you be able to take a look? >>> >>> Dino Arachchi and Dean Lystra are running point on this. >>> >>> Thanks! >>> jonny >>> >>> Jonathan Rickard, RHCE, RHCA >>> >>> Consulting Architect >>> >>> Red Hat Public Sector >>> >>> jonny at redhat.com >>> M: 210.862.9739 >>> @redhatjobs redhatjobs >>> @redhatjobs >>> >>> >>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >>> This communication (including any attachments) may contain information >>> that is proprietary, confidential or exempt from disclosure. If you are not >>> the intended recipient, please note that further dissemination, >>> distribution, use or copying of this communication is strictly prohibited. >>> Anyone who received this message in error should notify the sender >>> immediately by telephone or by return email and delete it from his or her >>> computer. >>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >>> This communication (including any attachments) may contain information >>> that is proprietary, confidential or exempt from disclosure. If you are not >>> the intended recipient, please note that further dissemination, >>> distribution, use or copying of this communication is strictly prohibited. >>> Anyone who received this message in error should notify the sender >>> immediately by telephone or by return email and delete it from his or her >>> computer. >>> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dlystra at redhat.com Mon Dec 9 23:19:46 2019 From: dlystra at redhat.com (Dean Lystra) Date: Mon, 9 Dec 2019 17:19:46 -0600 Subject: [Platformone] AWS Access Slowdowns In-Reply-To: References: Message-ID: I had Dino change "useDNS" to no in /etc/ssh/sshd_config, restarted the service, then ssh, but it didn't seem to help. On Mon, Dec 9, 2019, 4:24 PM Taylor Biggs wrote: > Often I'll see this because SSH likes to reverse-DNS lookup people before > they get all the way logged in. So you might have a DNS lookup issue going > on (just look at the output of "w" command). > > Thanks, > Taylor > > ---- > Taylor Biggs > taylor at redhat.com > 850-449-2220 > > > > On Mon, Dec 9, 2019 at 5:15 PM Dean Lystra wrote: > >> This behavior only occurs when they are not in the same subnet. We >> attempted connections from another EC2 in 10.10.0.0/20 and they were >> connecting instantly. >> >> On Mon, Dec 9, 2019, 4:13 PM Adrian Nunez >> wrote: >> >>> Ok. I'm in the Blue Jeans channel. Slow EC2 SSH is usually from a >>> service running on the EC2 >>> >>> Get Outlook for Android >>> >>> ------------------------------ >>> *From:* Dean Lystra >>> *Sent:* Monday, December 9, 2019 5:12:04 PM >>> *To:* Adrian Nunez >>> *Cc:* Mark Nissley ; Jonathan Rickard < >>> jrickard at redhat.com>; platformONE at redhat.com >>> *Subject:* Re: [Platformone] AWS Access Slowdowns >>> >>> >>> [EXTERNAL EMAIL] >>> Connections between instances in the staging VPC are slow when they are >>> in different subnets. Any instance in 10.10.0.0/20 hangs for a minute >>> when a ping or SSH is attempted to a server in 10.10.96.0/20. >>> >>> On Mon, Dec 9, 2019, 4:06 PM Adrian Nunez >>> wrote: >>> >>> Call me. What EC2's? >>> >>> Get Outlook for Android >>> >>> ------------------------------ >>> *From:* Mark Nissley >>> *Sent:* Monday, December 9, 2019 5:04:15 PM >>> *To:* Jonathan Rickard >>> *Cc:* Adrian Nunez ; Goss, Andrew [Semper >>> Valens Solutions (SVS)] ; >>> platformONE at redhat.com >>> *Subject:* Re: [Platformone] AWS Access Slowdowns >>> >>> >>> [EXTERNAL EMAIL] >>> Just wanted to add that we are blocked on this right now. We continue to >>> work it, but could use a call from Andrew or Adrian to Dino at 848-203-1809 >>> at your early convenience! >>> >>> >>> Mark NISSLEY, PMP, CSM, LEAN >>> >>> PROGRAM MaNAGER & SR technical Project Manager >>> >>> North American Consulting, Public Sector >>> >>> >>> M: 850-530-3234 >>> >>> >>> >>> *Scheduled Training: October 14-18* >>> >>> >>> On Mon, Dec 9, 2019 at 3:42 PM Jonathan Rickard >>> wrote: >>> >>> Adrian / Andrew, >>> >>> It was reported that there's some significant slowness while sshing into >>> the EC2's - would either of you be able to take a look? >>> >>> Dino Arachchi and Dean Lystra are running point on this. >>> >>> Thanks! >>> jonny >>> >>> Jonathan Rickard, RHCE, RHCA >>> >>> Consulting Architect >>> >>> Red Hat Public Sector >>> >>> jonny at redhat.com >>> M: 210.862.9739 >>> @redhatjobs redhatjobs >>> @redhatjobs >>> >>> >>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >>> This communication (including any attachments) may contain information >>> that is proprietary, confidential or exempt from disclosure. If you are not >>> the intended recipient, please note that further dissemination, >>> distribution, use or copying of this communication is strictly prohibited. >>> Anyone who received this message in error should notify the sender >>> immediately by telephone or by return email and delete it from his or her >>> computer. >>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >>> This communication (including any attachments) may contain information >>> that is proprietary, confidential or exempt from disclosure. If you are not >>> the intended recipient, please note that further dissemination, >>> distribution, use or copying of this communication is strictly prohibited. >>> Anyone who received this message in error should notify the sender >>> immediately by telephone or by return email and delete it from his or her >>> computer. >>> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrian.nunez at bylight.com Mon Dec 9 23:26:17 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Mon, 9 Dec 2019 23:26:17 +0000 Subject: [Platformone] AWS Access Slowdowns In-Reply-To: References: , Message-ID: Ticket opened. Odd that the EC2 is running at 84% utilization and it's empty. V/R Adrian Get Outlook for Android ________________________________ From: Dean Lystra Sent: Monday, December 9, 2019 6:19:46 PM To: Taylor Biggs Cc: Adrian Nunez ; platformONE at redhat.com Subject: Re: [Platformone] AWS Access Slowdowns [EXTERNAL EMAIL] I had Dino change "useDNS" to no in /etc/ssh/sshd_config, restarted the service, then ssh, but it didn't seem to help. On Mon, Dec 9, 2019, 4:24 PM Taylor Biggs > wrote: Often I'll see this because SSH likes to reverse-DNS lookup people before they get all the way logged in. So you might have a DNS lookup issue going on (just look at the output of "w" command). Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Mon, Dec 9, 2019 at 5:15 PM Dean Lystra > wrote: This behavior only occurs when they are not in the same subnet. We attempted connections from another EC2 in 10.10.0.0/20 and they were connecting instantly. On Mon, Dec 9, 2019, 4:13 PM Adrian Nunez > wrote: Ok. I'm in the Blue Jeans channel. Slow EC2 SSH is usually from a service running on the EC2 Get Outlook for Android ________________________________ From: Dean Lystra > Sent: Monday, December 9, 2019 5:12:04 PM To: Adrian Nunez > Cc: Mark Nissley >; Jonathan Rickard >; platformONE at redhat.com > Subject: Re: [Platformone] AWS Access Slowdowns [EXTERNAL EMAIL] Connections between instances in the staging VPC are slow when they are in different subnets. Any instance in 10.10.0.0/20 hangs for a minute when a ping or SSH is attempted to a server in 10.10.96.0/20. On Mon, Dec 9, 2019, 4:06 PM Adrian Nunez > wrote: Call me. What EC2's? Get Outlook for Android ________________________________ From: Mark Nissley > Sent: Monday, December 9, 2019 5:04:15 PM To: Jonathan Rickard > Cc: Adrian Nunez >; Goss, Andrew [Semper Valens Solutions (SVS)] >; platformONE at redhat.com > Subject: Re: [Platformone] AWS Access Slowdowns [EXTERNAL EMAIL] Just wanted to add that we are blocked on this right now. We continue to work it, but could use a call from Andrew or Adrian to Dino at 848-203-1809 at your early convenience! Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 On Mon, Dec 9, 2019 at 3:42 PM Jonathan Rickard > wrote: Adrian / Andrew, It was reported that there's some significant slowness while sshing into the EC2's - would either of you be able to take a look? Dino Arachchi and Dean Lystra are running point on this. Thanks! jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmiller at mitre.org Tue Dec 10 13:35:51 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Tue, 10 Dec 2019 13:35:51 +0000 Subject: [Platformone] Who's managing Keycloak for dcar.dsop.io? Message-ID: <723947A8-4F32-47DD-9593-45E8749B4C7C@mitre.org> Need to reset my MFA (new phone) & there's no self-service I can see.? This is probably going to come up again for others. :) -- T From andrew.goss at accenturefederal.com Tue Dec 10 14:21:49 2019 From: andrew.goss at accenturefederal.com (Goss, Andrew [Semper Valens Solutions (SVS)]) Date: Tue, 10 Dec 2019 14:21:49 +0000 Subject: [Platformone] [External] Who's managing Keycloak for dcar.dsop.io? In-Reply-To: <723947A8-4F32-47DD-9593-45E8749B4C7C@mitre.org> References: <723947A8-4F32-47DD-9593-45E8749B4C7C@mitre.org> Message-ID: My MFA stopped working last week. I just have not a pressing need to reset it yet. -----Original Message----- From: platformone-bounces at redhat.com On Behalf Of Miller, Timothy J. Sent: Tuesday, December 10, 2019 7:36 AM To: platformONE at redhat.com Subject: [External] [Platformone] Who's managing Keycloak for dcar.dsop.io? This message is from an EXTERNAL SENDER - be CAUTIOUS of links and attachments. THINK BEFORE YOU CLICK. ________________________________ Need to reset my MFA (new phone) & there's no self-service I can see.? This is probably going to come up again for others. :) -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.redhat.com%2Fmailman%2Flistinfo%2Fplatformone&data=02%7C01%7Candrew.goss%40accenturefederal.com%7C15da842438324281dc7308d77d75ef7c%7C0ee6c63b4eab4748b74ad1dc22fc1a24%7C0%7C0%7C637115817801705989&sdata=O35osH6k%2FlObt%2BvoADv%2BoXOUnB4hnViIeZYJMh6EPxg%3D&reserved=0 From mnissley at redhat.com Tue Dec 10 15:45:21 2019 From: mnissley at redhat.com (Mark Nissley) Date: Tue, 10 Dec 2019 09:45:21 -0600 Subject: [Platformone] AWS Access Slowdowns In-Reply-To: References: Message-ID: Team- Any update on this today. It is my thought that we are now outside our SLA with AWS for a response. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled Training: October 14-18* On Mon, Dec 9, 2019 at 5:26 PM Adrian Nunez wrote: > Ticket opened. Odd that the EC2 is running at 84% utilization and it's > empty. > > V/R > Adrian > > Get Outlook for Android > > ------------------------------ > *From:* Dean Lystra > *Sent:* Monday, December 9, 2019 6:19:46 PM > *To:* Taylor Biggs > *Cc:* Adrian Nunez ; platformONE at redhat.com < > platformONE at redhat.com> > *Subject:* Re: [Platformone] AWS Access Slowdowns > > > [EXTERNAL EMAIL] > I had Dino change "useDNS" to no in /etc/ssh/sshd_config, restarted the > service, then ssh, but it didn't seem to help. > > On Mon, Dec 9, 2019, 4:24 PM Taylor Biggs wrote: > > Often I'll see this because SSH likes to reverse-DNS lookup people before > they get all the way logged in. So you might have a DNS lookup issue going > on (just look at the output of "w" command). > > Thanks, > Taylor > > ---- > Taylor Biggs > taylor at redhat.com > 850-449-2220 > > > > On Mon, Dec 9, 2019 at 5:15 PM Dean Lystra wrote: > > This behavior only occurs when they are not in the same subnet. We > attempted connections from another EC2 in 10.10.0.0/20 and they were > connecting instantly. > > On Mon, Dec 9, 2019, 4:13 PM Adrian Nunez > wrote: > > Ok. I'm in the Blue Jeans channel. Slow EC2 SSH is usually from a service > running on the EC2 > > Get Outlook for Android > > ------------------------------ > *From:* Dean Lystra > *Sent:* Monday, December 9, 2019 5:12:04 PM > *To:* Adrian Nunez > *Cc:* Mark Nissley ; Jonathan Rickard < > jrickard at redhat.com>; platformONE at redhat.com > *Subject:* Re: [Platformone] AWS Access Slowdowns > > > [EXTERNAL EMAIL] > Connections between instances in the staging VPC are slow when they are in > different subnets. Any instance in 10.10.0.0/20 hangs for a minute when a > ping or SSH is attempted to a server in 10.10.96.0/20. > > On Mon, Dec 9, 2019, 4:06 PM Adrian Nunez > wrote: > > Call me. What EC2's? > > Get Outlook for Android > > ------------------------------ > *From:* Mark Nissley > *Sent:* Monday, December 9, 2019 5:04:15 PM > *To:* Jonathan Rickard > *Cc:* Adrian Nunez ; Goss, Andrew [Semper > Valens Solutions (SVS)] ; > platformONE at redhat.com > *Subject:* Re: [Platformone] AWS Access Slowdowns > > > [EXTERNAL EMAIL] > Just wanted to add that we are blocked on this right now. We continue to > work it, but could use a call from Andrew or Adrian to Dino at 848-203-1809 > at your early convenience! > > > Mark NISSLEY, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled Training: October 14-18* > > > On Mon, Dec 9, 2019 at 3:42 PM Jonathan Rickard > wrote: > > Adrian / Andrew, > > It was reported that there's some significant slowness while sshing into > the EC2's - would either of you be able to take a look? > > Dino Arachchi and Dean Lystra are running point on this. > > Thanks! > jonny > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kodonnel at redhat.com Tue Dec 10 15:52:47 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Tue, 10 Dec 2019 10:52:47 -0500 Subject: [Platformone] Support Assistance Message-ID: Hello Team. We are having some performance issues on one of our EC2 instances. We created a case and would like some assistance in case resolution. Please let us know who can assist us. https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrian.nunez at bylight.com Tue Dec 10 15:55:02 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Tue, 10 Dec 2019 15:55:02 +0000 Subject: [Platformone] AWS Access Slowdowns In-Reply-To: References: , Message-ID: AWS replied. You can see their reply in the same place where we put in a ticket. They responded last night at 7pm. Here is their response: Hello, My name is Ed and I'll be assisting you with this case today. I understand you have noticed slowness on SSH connections from your bastion host (i-0fb3eb5a3efa04e18) to another EC2 instance (i-038792be35142d7f6). Since you don't have any services running on that EC2 you would like to know if there is any issues in infrastructure level. A few notes before sharing more details about my investigation: - I noticed you mentioned that the NAT is properly configured. The source and target instances are in the same VPC, so it won't use any NAT. - I agree, Security Group is opening the required ports for your connection. - I also checked the nACLs and I can confirm that there is no rules on your subnet, so no issues here either. Checking the hosts where your instances are running, I don't see any issues in infrastructure level. I have checked drops, packet loss and the whole host healthy metrics. From instance perspective, my visibility is quite low due to the GovCloud restrictions. I have a few recommendations to help you tracking the source of this slowness: ** Please perform the validations mentioned below on your Bastion host (i-0fb3eb5a3efa04e18) and destination instance (i-038792be35142d7f6). ** 1. Size of wtmp files I have seen some customer experiencing slowness when they are not rotating wtmp files. This file is responsible to keep track of logins on your server. This file can be found in the following path: /var/log/wtmp If the file is large and it is not being rotated, please consider rotating the file and trying to open your SSH connection. 2. Check CPU and burst balance on Cloudwatch Since this is a GovCloud instance, I have no access or visibility from your CLoudwatch metrics. Please check the following metrics on Cloudwatch: CPUUtilization CPUCreditBalance Your instance is a T2 instance, which comes with a burstable[1] CPU utilization[2]. A t2.micro can operate under 10% of CPU usage at any time but always when it needs more CPU power, it is going to use CPU burst credits. Please check the CPUUtilization metric on Cloudwatch to see the CPU usage, if it is 10% or more, please also check the CPUCreditBalance metric. If the CPU credit balance is at 0 or it is constantly dropping, you know your instance is doing more workload than the instance type can handle. If this is the case, you have two options: 2.1 Check the applications using CPU: As we don't have access to your instance operating system, you might consider checking the performance metrics in OS level (e.g top, sar and iostat). 2.2 Resize the instance type: When your instance is constantly using more than its baseline and you can't reduce your workload, you might consider get a larger instance type. 3. Memory usage, check on Operating system layer Please check the memory usage on OS layer using commands such as: top free vmstat If your bastion is heavily loaded with connections or if the SSH connections aren't being properly closed, you might see a high memory usage. 4. Disk usage Your instances are using GP2 vlumes, this kind of volume works with burst credits as well. I would recommend to check the "BurstBalance" metric on Cloudwatch[3]. 4. Packet capture during a connection attempt ** Please, just collect the tcpdump if you have already performed/validated the steps mentioned above. ** You could start a tcpdump on the source and target instance so we know how long it takes to open the connection (from network perspective). Just for your reference, here is the command line to start tcpdump to capture all the traffic from the first network adapter ignoring host naming translation: sudo tcpdump -w /tmp/instance-id-here.pcap -nn Note, the output will be sent to the path specified on -w flag. Once the tcpdump is running, you could try to SSH and reproduce the issue. The steps above will generate a file on both instances. Please collect the data during a SSH connection. Lastly, for more information about EC2 metrics available[1], how CPU burst works on T2/T3 instances[2] and the more details on how to monitor the burst balance for EBS volumes[3] in the following links: [1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html [2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-credits-baseline-concepts.html#earning-CPU-credits [3] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html#monitoring_burstbucket I hope this information is helpful, let us know if you have any other questions. Feel free to give us a call or open a chat with us so we can quickly help you on this. Best regards, Ed F. / Cloud Support / Sydney Amazon Web Services C Check out the AWS Support Knowledge Center, a knowledge base of articles and videos that answer customer questions about AWS services: https://aws.amazon.com/premiumsupport/knowledge-center/?icmpid=support_email_category We value your feedback. Please rate my response using the link below. =================================================== To contact us again about this case, please return to the AWS Support Center using the following URL: https://console.amazonaws-us-gov.com/support/cases#/6655545661/en (If you are connecting by federation, log in before following the link.) *Please note: this e-mail was sent from an address that cannot accept incoming e-mail. Please use the link above if you need to contact us again about this same issue. ==================================================================== Learn to work with the AWS Cloud. Get started with free online videos and self-paced labs at http://aws.amazon.com/training/ ==================================================================== Amazon Web Services, Inc. is an affiliate of Amazon.com, Inc. Amazon.com is a registered trademark of Amazon.com, Inc. or its affiliates. ________________________________ From: Mark Nissley Sent: Tuesday, December 10, 2019 10:45 AM To: Adrian Nunez Cc: Dean Lystra ; Taylor Biggs ; platformONE at redhat.com Subject: Re: [Platformone] AWS Access Slowdowns [EXTERNAL EMAIL] Team- Any update on this today. It is my thought that we are now outside our SLA with AWS for a response. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 On Mon, Dec 9, 2019 at 5:26 PM Adrian Nunez > wrote: Ticket opened. Odd that the EC2 is running at 84% utilization and it's empty. V/R Adrian Get Outlook for Android ________________________________ From: Dean Lystra > Sent: Monday, December 9, 2019 6:19:46 PM To: Taylor Biggs > Cc: Adrian Nunez >; platformONE at redhat.com > Subject: Re: [Platformone] AWS Access Slowdowns [EXTERNAL EMAIL] I had Dino change "useDNS" to no in /etc/ssh/sshd_config, restarted the service, then ssh, but it didn't seem to help. On Mon, Dec 9, 2019, 4:24 PM Taylor Biggs > wrote: Often I'll see this because SSH likes to reverse-DNS lookup people before they get all the way logged in. So you might have a DNS lookup issue going on (just look at the output of "w" command). Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Mon, Dec 9, 2019 at 5:15 PM Dean Lystra > wrote: This behavior only occurs when they are not in the same subnet. We attempted connections from another EC2 in 10.10.0.0/20 and they were connecting instantly. On Mon, Dec 9, 2019, 4:13 PM Adrian Nunez > wrote: Ok. I'm in the Blue Jeans channel. Slow EC2 SSH is usually from a service running on the EC2 Get Outlook for Android ________________________________ From: Dean Lystra > Sent: Monday, December 9, 2019 5:12:04 PM To: Adrian Nunez > Cc: Mark Nissley >; Jonathan Rickard >; platformONE at redhat.com > Subject: Re: [Platformone] AWS Access Slowdowns [EXTERNAL EMAIL] Connections between instances in the staging VPC are slow when they are in different subnets. Any instance in 10.10.0.0/20 hangs for a minute when a ping or SSH is attempted to a server in 10.10.96.0/20. On Mon, Dec 9, 2019, 4:06 PM Adrian Nunez > wrote: Call me. What EC2's? Get Outlook for Android ________________________________ From: Mark Nissley > Sent: Monday, December 9, 2019 5:04:15 PM To: Jonathan Rickard > Cc: Adrian Nunez >; Goss, Andrew [Semper Valens Solutions (SVS)] >; platformONE at redhat.com > Subject: Re: [Platformone] AWS Access Slowdowns [EXTERNAL EMAIL] Just wanted to add that we are blocked on this right now. We continue to work it, but could use a call from Andrew or Adrian to Dino at 848-203-1809 at your early convenience! Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled Training: October 14-18 On Mon, Dec 9, 2019 at 3:42 PM Jonathan Rickard > wrote: Adrian / Andrew, It was reported that there's some significant slowness while sshing into the EC2's - would either of you be able to take a look? Dino Arachchi and Dean Lystra are running point on this. Thanks! jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From darachch at redhat.com Tue Dec 10 16:00:42 2019 From: darachch at redhat.com (Dino Arachchi) Date: Tue, 10 Dec 2019 10:00:42 -0600 Subject: [Platformone] AWS Access Slowdowns In-Reply-To: References: Message-ID: I just responded back to the AWS ticket with the following: Hi Ed, Thanks for your response! Can we please set up a call to discuss troubleshooting this issue? To expedite the process, we'd like to use a screenshare, which we can set up. For additional information, I've provided some results from some of the verification steps you recommended: 1. The size of the wtmp files on both hosts does not seem to be overly large. This is expected as both hosts were recently provisioned (within hours of the issue becoming apparent. 2. The t2.micro instance is the bastion host (i-0fb3eb5a3efa04e18) which does not show any abnormal results when monitoring CPU and memory resources. It's important to note that there are no issues SSHing from this bastion host to other instances in this VPC (i.e. keycloak_server i-0b383b470c41a0302 which is was provisioned at the same time, with the same automation as the keycloak_db i-038792be35142d7f6 instance that is having this issue). However the keycloak_db i-038792be35142d7f6 instance was showing some unusual CPU usage last night (84% utilization). It's dropped down this morning to 13% which is still high and the issue is still persisting. Best Regards, Dino I'm still troubleshooting from our side, but monitoring the ticket for any further responses. Hopefully they can join a Bluejeans and walk through it with us. Best Regards, DINO ARACHCHI SENIOR CONSULTANT darachch at redhat.com M: 848-203-1809 On Tue, Dec 10, 2019 at 9:56 AM Adrian Nunez wrote: > AWS replied. You can see their reply in the same place where we put in a > ticket. They responded last night at 7pm. > > Here is their response: > > Hello, My name is Ed and I'll be assisting you with this case today. I > understand you have noticed slowness on SSH connections from your bastion > host (i-0fb3eb5a3efa04e18) to another EC2 instance (i-038792be35142d7f6). > Since you don't have any services running on that EC2 you would like to > know if there is any issues in infrastructure level. > > A few notes before sharing more details about my investigation: > - I noticed you mentioned that the NAT is properly configured. The source > and target instances are in the same VPC, so it won't use any NAT. > - I agree, Security Group is opening the required ports for your > connection. > - I also checked the nACLs and I can confirm that there is no rules on > your subnet, so no issues here either. > > Checking the hosts where your instances are running, I don't see any > issues in infrastructure level. I have checked drops, packet loss and the > whole host healthy metrics. From instance perspective, my visibility is > quite low due to the GovCloud restrictions. I have a few recommendations to > help you tracking the source of this slowness: > > ** Please perform the validations mentioned below on your Bastion host > (i-0fb3eb5a3efa04e18) and destination instance (i-038792be35142d7f6). ** > 1. Size of wtmp files > I have seen some customer experiencing slowness when they are not rotating > wtmp files. This file is responsible to keep track of logins on your > server. This file can be found in the following path: /var/log/wtmp > > If the file is large and it is not being rotated, please consider rotating > the file and trying to open your SSH connection. > > 2. Check CPU and burst balance on Cloudwatch > Since this is a GovCloud instance, I have no access or visibility from > your CLoudwatch metrics. > Please check the following metrics on Cloudwatch: > CPUUtilization > CPUCreditBalance > > Your instance is a T2 instance, which comes with a burstable[1] CPU > utilization[2]. A t2.micro can operate under 10% of CPU usage at any time > but always when it needs more CPU power, it is going to use CPU burst > credits. Please check the CPUUtilization metric on Cloudwatch to see the > CPU usage, if it is 10% or more, please also check the CPUCreditBalance > metric. If the CPU credit balance is at 0 or it is constantly dropping, you > know your instance is doing more workload than the instance type can > handle. > > If this is the case, you have two options: > 2.1 Check the applications using CPU: As we don't have access to your > instance operating system, you might consider checking the performance > metrics in OS level (e.g top, sar and iostat). > 2.2 Resize the instance type: When your instance is constantly using more > than its baseline and you can't reduce your workload, you might consider > get a larger instance type. > > 3. Memory usage, check on Operating system layer > Please check the memory usage on OS layer using commands such as: > top > free > vmstat > > If your bastion is heavily loaded with connections or if the SSH > connections aren't being properly closed, you might see a high memory > usage. > > 4. Disk usage Your instances are using GP2 vlumes, this kind of volume > works with burst credits as well. I would recommend to check the > "BurstBalance" metric on Cloudwatch[3]. > > 4. Packet capture during a connection attempt ** Please, just collect the > tcpdump if you have already performed/validated the steps mentioned above. > ** You could start a tcpdump on the source and target instance so we know > how long it takes to open the connection (from network perspective). Just > for your reference, here is the command line to start tcpdump to capture > all the traffic from the first network adapter ignoring host naming > translation: sudo tcpdump -w /tmp/instance-id-here.pcap -nn > > Note, the output will be sent to the path specified on -w flag. > > Once the tcpdump is running, you could try to SSH and reproduce the issue. > > The steps above will generate a file on both instances. Please collect the > data during a SSH connection. > > Lastly, for more information about EC2 metrics available[1], how CPU > burst works on T2/T3 instances[2] and the more details on how to monitor > the burst balance for EBS volumes[3] in the following links: [1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html > [2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-credits-baseline-concepts.html#earning-CPU-credits > [3] > https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html#monitoring_burstbucket > > > I hope this information is helpful, let us know if you have any other > questions. Feel free to give us a call or open a chat with us so we can > quickly help you on this. > > Best regards, > Ed F. / Cloud Support / Sydney > Amazon Web Services C > > Check out the AWS Support Knowledge Center, a knowledge base of articles > and videos that answer customer questions about AWS services: https://aws.amazon.com/premiumsupport/knowledge-center/?icmpid=support_email_category > We value your feedback. Please rate my response using the link below. > =================================================== To contact us again > about this case, please return to the AWS Support Center using the > following URL: https://console.amazonaws-us-gov.com/support/cases#/6655545661/en > (If you are connecting by federation, log in before following the link.) > *Please note: this e-mail was sent from an address that cannot accept > incoming e-mail. Please use the link above if you need to contact us again > about this same issue. > ==================================================================== Learn > to work with the AWS Cloud. Get started with free online videos and > self-paced labs at http://aws.amazon.com/training/ ==================================================================== > Amazon Web Services, Inc. is an affiliate of Amazon.com, Inc. Amazon.com is > a registered trademark of Amazon.com, Inc. or its affiliates. > > ------------------------------ > *From:* Mark Nissley > *Sent:* Tuesday, December 10, 2019 10:45 AM > *To:* Adrian Nunez > *Cc:* Dean Lystra ; Taylor Biggs ; > platformONE at redhat.com > *Subject:* Re: [Platformone] AWS Access Slowdowns > > > [EXTERNAL EMAIL] > Team- > > Any update on this today. It is my thought that we are now outside our SLA > with AWS for a response. > > > Mark NISSLEY, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled Training: October 14-18* > > > On Mon, Dec 9, 2019 at 5:26 PM Adrian Nunez > wrote: > > Ticket opened. Odd that the EC2 is running at 84% utilization and it's > empty. > > V/R > Adrian > > Get Outlook for Android > > ------------------------------ > *From:* Dean Lystra > *Sent:* Monday, December 9, 2019 6:19:46 PM > *To:* Taylor Biggs > *Cc:* Adrian Nunez ; platformONE at redhat.com < > platformONE at redhat.com> > *Subject:* Re: [Platformone] AWS Access Slowdowns > > > [EXTERNAL EMAIL] > I had Dino change "useDNS" to no in /etc/ssh/sshd_config, restarted the > service, then ssh, but it didn't seem to help. > > On Mon, Dec 9, 2019, 4:24 PM Taylor Biggs wrote: > > Often I'll see this because SSH likes to reverse-DNS lookup people before > they get all the way logged in. So you might have a DNS lookup issue going > on (just look at the output of "w" command). > > Thanks, > Taylor > > ---- > Taylor Biggs > taylor at redhat.com > 850-449-2220 > > > > On Mon, Dec 9, 2019 at 5:15 PM Dean Lystra wrote: > > This behavior only occurs when they are not in the same subnet. We > attempted connections from another EC2 in 10.10.0.0/20 and they were > connecting instantly. > > On Mon, Dec 9, 2019, 4:13 PM Adrian Nunez > wrote: > > Ok. I'm in the Blue Jeans channel. Slow EC2 SSH is usually from a service > running on the EC2 > > Get Outlook for Android > > ------------------------------ > *From:* Dean Lystra > *Sent:* Monday, December 9, 2019 5:12:04 PM > *To:* Adrian Nunez > *Cc:* Mark Nissley ; Jonathan Rickard < > jrickard at redhat.com>; platformONE at redhat.com > *Subject:* Re: [Platformone] AWS Access Slowdowns > > > [EXTERNAL EMAIL] > Connections between instances in the staging VPC are slow when they are in > different subnets. Any instance in 10.10.0.0/20 hangs for a minute when a > ping or SSH is attempted to a server in 10.10.96.0/20. > > On Mon, Dec 9, 2019, 4:06 PM Adrian Nunez > wrote: > > Call me. What EC2's? > > Get Outlook for Android > > ------------------------------ > *From:* Mark Nissley > *Sent:* Monday, December 9, 2019 5:04:15 PM > *To:* Jonathan Rickard > *Cc:* Adrian Nunez ; Goss, Andrew [Semper > Valens Solutions (SVS)] ; > platformONE at redhat.com > *Subject:* Re: [Platformone] AWS Access Slowdowns > > > [EXTERNAL EMAIL] > Just wanted to add that we are blocked on this right now. We continue to > work it, but could use a call from Andrew or Adrian to Dino at 848-203-1809 > at your early convenience! > > > Mark NISSLEY, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled Training: October 14-18* > > > On Mon, Dec 9, 2019 at 3:42 PM Jonathan Rickard > wrote: > > Adrian / Andrew, > > It was reported that there's some significant slowness while sshing into > the EC2's - would either of you be able to take a look? > > Dino Arachchi and Dean Lystra are running point on this. > > Thanks! > jonny > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kozlowck at amazon.com Tue Dec 10 16:04:53 2019 From: kozlowck at amazon.com (Kozlowski, Chris) Date: Tue, 10 Dec 2019 16:04:53 +0000 Subject: [Platformone] Support Assistance In-Reply-To: References: Message-ID: <80f65916a25f4870bc4f07590f8d4410@EX13D08UEB003.ant.amazon.com> Kevin, Can you give me the account # that you submitted the case from? When was the case submitted? Has support reached out to you? Thanks! [cid:image001.jpg at 01D52B59.2C1253E0] Chris Kozlowski | Sr. Technical Account Manager AWS Enterprise Support, National Security Programs kozlowck at amazon.com | m: 703.831.5110 Thoughts on our interaction? Provide feedback here. From: Kevin O'Donnell Sent: Tuesday, December 10, 2019 10:53 AM To: Settle, Rob ; Carta, Mike ; Kozlowski, Chris ; platformONE at redhat.com Subject: Support Assistance Hello Team. We are having some performance issues on one of our EC2 instances. We created a case and would like some assistance in case resolution. Please let us know who can assist us. https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 2163 bytes Desc: image001.jpg URL: From kodonnel at redhat.com Tue Dec 10 16:32:20 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Tue, 10 Dec 2019 11:32:20 -0500 Subject: [Platformone] Support Assistance In-Reply-To: <80f65916a25f4870bc4f07590f8d4410@EX13D08UEB003.ant.amazon.com> References: <80f65916a25f4870bc4f07590f8d4410@EX13D08UEB003.ant.amazon.com> Message-ID: levelup-factory. Yes we did get a general response back from an individual from Sydney. But nothing past that. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Tue, Dec 10, 2019 at 11:06 AM Kozlowski, Chris wrote: > Kevin, > > > > Can you give me the account # that you submitted the case from? > > > > When was the case submitted? Has support reached out to you? > > > > Thanks! > > > > *[image: cid:image001.jpg at 01D52B59.2C1253E0]* > > *Chris **Kozlowski* | Sr. Technical Account Manager > > AWS Enterprise Support, National Security Programs > > kozlowck at amazon.com | m: 703.831.5110 > > > > Thoughts on our interaction? Provide feedback here > . > > > > *From:* Kevin O'Donnell > *Sent:* Tuesday, December 10, 2019 10:53 AM > *To:* Settle, Rob ; Carta, Mike ; > Kozlowski, Chris ; platformONE at redhat.com > *Subject:* Support Assistance > > > > Hello Team. > > > > We are having some performance issues on one of our EC2 instances. We > created a case and would like some assistance in case resolution. Please > let us know who can assist us. > > > > https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en > > > > Thanks, > > > *KEVIN O'DONNELL * > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 2163 bytes Desc: not available URL: From taylor at redhat.com Tue Dec 10 19:00:14 2019 From: taylor at redhat.com (Taylor Biggs) Date: Tue, 10 Dec 2019 14:00:14 -0500 Subject: [Platformone] [External] Who's managing Keycloak for dcar.dsop.io? In-Reply-To: References: <723947A8-4F32-47DD-9593-45E8749B4C7C@mitre.org> Message-ID: Hi Tim, Andrew, Currently the "helpdesk" is me (is it making sense as to why I keep banging the Day-2 ops drum?)! I've deleted both of your accounts on the DCAR, since it is self-registration and you can just create a new one now. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Tue, Dec 10, 2019 at 9:22 AM Goss, Andrew [Semper Valens Solutions (SVS)] wrote: > My MFA stopped working last week. I just have not a pressing need to > reset it yet. > > -----Original Message----- > From: platformone-bounces at redhat.com On > Behalf Of Miller, Timothy J. > Sent: Tuesday, December 10, 2019 7:36 AM > To: platformONE at redhat.com > Subject: [External] [Platformone] Who's managing Keycloak for dcar.dsop.io > ? > > > This message is from an EXTERNAL SENDER - be CAUTIOUS of links and > attachments. THINK BEFORE YOU CLICK. > ________________________________ > > > Need to reset my MFA (new phone) & there's no self-service I can see.? > > This is probably going to come up again for others. :) > > -- T > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.redhat.com%2Fmailman%2Flistinfo%2Fplatformone&data=02%7C01%7Candrew.goss%40accenturefederal.com%7C15da842438324281dc7308d77d75ef7c%7C0ee6c63b4eab4748b74ad1dc22fc1a24%7C0%7C0%7C637115817801705989&sdata=O35osH6k%2FlObt%2BvoADv%2BoXOUnB4hnViIeZYJMh6EPxg%3D&reserved=0 > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.goss at accenturefederal.com Tue Dec 10 20:18:06 2019 From: andrew.goss at accenturefederal.com (Goss, Andrew [Semper Valens Solutions (SVS)]) Date: Tue, 10 Dec 2019 20:18:06 +0000 Subject: [Platformone] [External] Who's managing Keycloak for dcar.dsop.io? In-Reply-To: References: <723947A8-4F32-47DD-9593-45E8749B4C7C@mitre.org> Message-ID: Thank you. From: Taylor Biggs Sent: Tuesday, December 10, 2019 1:00 PM To: Goss, Andrew [Semper Valens Solutions (SVS)] Cc: Miller, Timothy J. ; platformONE at redhat.com Subject: Re: [Platformone] [External] Who's managing Keycloak for dcar.dsop.io? Hi Tim, Andrew, Currently the "helpdesk" is me (is it making sense as to why I keep banging the Day-2 ops drum?)! I've deleted both of your accounts on the DCAR, since it is self-registration and you can just create a new one now. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Tue, Dec 10, 2019 at 9:22 AM Goss, Andrew [Semper Valens Solutions (SVS)] > wrote: My MFA stopped working last week. I just have not a pressing need to reset it yet. -----Original Message----- From: platformone-bounces at redhat.com > On Behalf Of Miller, Timothy J. Sent: Tuesday, December 10, 2019 7:36 AM To: platformONE at redhat.com Subject: [External] [Platformone] Who's managing Keycloak for dcar.dsop.io? This message is from an EXTERNAL SENDER - be CAUTIOUS of links and attachments. THINK BEFORE YOU CLICK. ________________________________ Need to reset my MFA (new phone) & there's no self-service I can see.? This is probably going to come up again for others. :) -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.redhat.com%2Fmailman%2Flistinfo%2Fplatformone&data=02%7C01%7Candrew.goss%40accenturefederal.com%7C15da842438324281dc7308d77d75ef7c%7C0ee6c63b4eab4748b74ad1dc22fc1a24%7C0%7C0%7C637115817801705989&sdata=O35osH6k%2FlObt%2BvoADv%2BoXOUnB4hnViIeZYJMh6EPxg%3D&reserved=0 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -------------- next part -------------- An HTML attachment was scrubbed... URL: From ademola.abodunrin at us.af.mil Tue Dec 10 21:44:06 2019 From: ademola.abodunrin at us.af.mil (ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP) Date: Tue, 10 Dec 2019 21:44:06 +0000 Subject: [Platformone] [Non-DoD Source] twistlock In-Reply-To: References: <1575832561406.44163@ManTech.com> <1575923417736.80723@ManTech.com> <1575945122643.18529@ManTech.com> Message-ID: PlatformONE team, Please assist. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil From: Mike Knoth Sent: Tuesday, December 10, 2019 3:40 PM To: Curran, Daniel M ; Marc Cooper Cc: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP ; Adam Wise Subject: Re: [Non-DoD Source] twistlock I made a ticket for this - https://dccscr.dsop.io/dsop/dccscr/issues/231 On Mon, Dec 9, 2019 at 9:35 PM Mike Knoth > wrote: here is where you manage the registries & all their settings - can see it here https://levelup-twistlock.apps.cluster.unified-platform.io/#!/defend/vulnerabilities/registry And here you can see none of the scans work ... https://levelup-twistlock.apps.cluster.unified-platform.io/#!/monitor/vulnerabilities/registry On Mon, Dec 9, 2019 at 9:32 PM Curran, Daniel M > wrote: I'm on the CCAT team as well. No clue when it would be back up. I was only able to make your account because I happened to have an admin account on that instance. Still looking for the person who created it as well as who would be in charge of running and maintaining the deployment. Where are you seeing that error? -Dan _____ From: Mike Knoth > Sent: Monday, December 9, 2019 6:54 PM To: Curran, Daniel M Subject: Re: [Non-DoD Source] twistlock Do you know when Twistlock will be back up and running? Seems CCAT (who had a few day head start) is running into this issue ... as am I when I copy their configs: Scanner undefined: Failed to query image details ccat-dev/driveup latest missing secret key in AWS settings On Mon, Dec 9, 2019 at 3:30 PM Curran, Daniel M > wrote: Okay, he's created in twistlock.? Login w/ password1 and change it. _____ From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > Sent: Sunday, December 8, 2019 6:21 PM To: Curran, Daniel M Cc: Adam Wise; mike.knoth (mike.knoth at g2-inc.com ) Subject: RE: [Non-DoD Source] twistlock Thanks Dan?Please let?s create one for Mike.Knoth We can run down Chris Kuperstein tomorrow. Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil From: Curran, Daniel M > Sent: Sunday, December 8, 2019 1:16 PM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > Cc: Adam Wise >; mike.knoth (mike.knoth at g2-inc.com ) > Subject: Re: [Non-DoD Source] twistlock ?Hey Ade, There isn't a standard one for twistlock but send me a username that you want created and I'll put them in there so they can have access. Also, Twistlock seems slightly buggy. Appears to only be picking up containers running on certain hosts. I heard a rumor that a redhat guy named Chris Kuperstein set it up but can't be sure. Wondering if we can track that down early tomorrow so we can reliably scan and rescan. -Dan _____ From: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > Sent: Saturday, December 7, 2019 9:37 PM To: Curran, Daniel M Cc: Adam Wise; mike.knoth (mike.knoth at g2-inc.com ) Subject: FW: [Non-DoD Source] twistlock Hello Dan, Trust that your weekend has been going great so far. Please what is the twistlock username and pwd. I know the url is https://cluster.unified-platform.io/console/project/levelup-twistlock/overview Thanks for your help! Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil From: Mike Knoth > Sent: Friday, December 6, 2019 3:58 PM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP > Cc: Mike Knoth >; Adam Wise > Subject: [Non-DoD Source] twistlock Ade, Can you ask what is the twistlock url and username and password? We have it for anchore, but don't have it for twistlock. Mike _____ This e-mail and any attachments are intended only for the use of the addressee(s) named herein and may contain proprietary information. If you are not the intended recipient of this e-mail or believe that you received this email in error, please take immediate action to notify the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments from your computer; and do not disseminate, distribute, use, or copy this message and any attachments. -- Mike Knoth Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division 302 Sentinel Drive | Annapolis Junction, MD 20701 Email: mike.knoth at g2-inc.com Mobile: (320) 305-6453 Confidentiality Statement: HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains information proprietary or private to Huntington Ingalls Industries, Inc., and is not to be disclosed to, copied by, or used in any manner by others without the prior express, written permission. If you are not the intended recipient, please delete without copying and kindly advise the sender by e-mail of the mistake in delivery. -- Mike Knoth Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division 302 Sentinel Drive | Annapolis Junction, MD 20701 Email: mike.knoth at g2-inc.com Mobile: (320) 305-6453 Confidentiality Statement: HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains information proprietary or private to Huntington Ingalls Industries, Inc., and is not to be disclosed to, copied by, or used in any manner by others without the prior express, written permission. If you are not the intended recipient, please delete without copying and kindly advise the sender by e-mail of the mistake in delivery. -- Mike Knoth Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division 302 Sentinel Drive | Annapolis Junction, MD 20701 Email: mike.knoth at g2-inc.com Mobile: (320) 305-6453 Confidentiality Statement: HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains information proprietary or private to Huntington Ingalls Industries, Inc., and is not to be disclosed to, copied by, or used in any manner by others without the prior express, written permission. If you are not the intended recipient, please delete without copying and kindly advise the sender by e-mail of the mistake in delivery. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5490 bytes Desc: not available URL: From darachch at redhat.com Wed Dec 11 00:43:48 2019 From: darachch at redhat.com (Dino Arachchi) Date: Tue, 10 Dec 2019 18:43:48 -0600 Subject: [Platformone] AWS Access Slowdowns In-Reply-To: References: Message-ID: Here was the resolution to this issue: Thank you for contacting AWS Premium Support today. It was a pleasure working with you. During our call we worked through the issues in regards to the instance level checks failing[1]. The underlying host did not have any failing health checks[2]. This may have been due to the underlying host as through the reboot it appeared to be passing it's health checks. I have marked the underlying host for a review to check to see if there were any issues that may have cropped up. The only items that were changed while the volume was attached to the other instance was the /etc/selinux/config which was set to permissive. Once the instance was booted up, ssh was tested and was not slow as previous, then we did a touch /.autorelabel to ensure the correct context were there then we changed the /etc/selinux/config back to enforcing and lastly rebooted. It appears the issue has been fixed after moving to a different underlying host. This appears that it may have been related to the underlying host as the same AMI was used to launch other instances without issue. This instance also appeared to initially be failing it's instance level health checks shortly after launch but then recover. From the current logs available we were not able to see anything that stood out in the /var/log/audit/audit.log /var/log/messages or the /var/log/secure. Let us know if there are any further questions or concerns. References [1] StatusCheckFailed_Instance: https://console.amazonaws-us-gov.com/cloudwatch/home?region=us-gov-west-1#metricsV2:graph=~%28region~%27us-gov-west-1~metrics~%28~%28~%27AWS*2fEC2~%27StatusCheckFailed_Instance~%27InstanceId~%27i-038792be35142d7f6%29%29~period~300~stat~%27Sum~start~%272019-12-09T21*3a48*3a10Z~end~%272019-12-10T21*3a48*3a10Z%29 [2] StatusCheckFailed_System: https://console.amazonaws-us-gov.com/cloudwatch/home?region=us-gov-west-1#metricsV2:graph=~%28region~%27us-gov-west-1~metrics~%28~%28~%27AWS*2fEC2~%27StatusCheckFailed_System~%27InstanceId~%27i-038792be35142d7f6%29%29~period~300~stat~%27Sum~start~%272019-12-09T21*3a48*3a10Z~end~%272019-12-10T21*3a48*3a10Z%29 TL;DR: The issue has been "resolved" and was very likely due to AWS's underlying infrastructure. Best Regards, DINO ARACHCHI SENIOR CONSULTANT darachch at redhat.com M: 848-203-1809 On Tue, Dec 10, 2019 at 10:00 AM Dino Arachchi wrote: > I just responded back to the AWS ticket with the following: > > Hi Ed, Thanks for your response! Can we please set up a call to discuss > troubleshooting this issue? To expedite the process, we'd like to use a > screenshare, which we can set up. For additional information, I've provided > some results from some of the verification steps you recommended: 1. The > size of the wtmp files on both hosts does not seem to be overly large. This > is expected as both hosts were recently provisioned (within hours of the > issue becoming apparent. 2. The t2.micro instance is the bastion host > (i-0fb3eb5a3efa04e18) which does not show any abnormal results when > monitoring CPU and memory resources. It's important to note that there are > no issues SSHing from this bastion host to other instances in this VPC > (i.e. keycloak_server i-0b383b470c41a0302 which is was provisioned at the > same time, with the same automation as the keycloak_db i-038792be35142d7f6 > instance that is having this issue). However the keycloak_db > i-038792be35142d7f6 instance was showing some unusual CPU usage last night > (84% utilization). It's dropped down this morning to 13% which is still > high and the issue is still persisting. Best Regards, Dino > I'm still troubleshooting from our side, but monitoring the ticket for any > further responses. Hopefully they can join a Bluejeans and walk through it > with us. > > Best Regards, > > > DINO ARACHCHI > > SENIOR CONSULTANT > > darachch at redhat.com M: 848-203-1809 > > > > On Tue, Dec 10, 2019 at 9:56 AM Adrian Nunez > wrote: > >> AWS replied. You can see their reply in the same place where we put in a >> ticket. They responded last night at 7pm. >> >> Here is their response: >> >> Hello, My name is Ed and I'll be assisting you with this case today. I >> understand you have noticed slowness on SSH connections from your bastion >> host (i-0fb3eb5a3efa04e18) to another EC2 instance (i-038792be35142d7f6). >> Since you don't have any services running on that EC2 you would like to >> know if there is any issues in infrastructure level. >> >> A few notes before sharing more details about my investigation: >> - I noticed you mentioned that the NAT is properly configured. The source >> and target instances are in the same VPC, so it won't use any NAT. >> - I agree, Security Group is opening the required ports for your >> connection. >> - I also checked the nACLs and I can confirm that there is no rules on >> your subnet, so no issues here either. >> >> Checking the hosts where your instances are running, I don't see any >> issues in infrastructure level. I have checked drops, packet loss and the >> whole host healthy metrics. From instance perspective, my visibility is >> quite low due to the GovCloud restrictions. I have a few recommendations to >> help you tracking the source of this slowness: >> >> ** Please perform the validations mentioned below on your Bastion host >> (i-0fb3eb5a3efa04e18) and destination instance (i-038792be35142d7f6). ** >> 1. Size of wtmp files >> I have seen some customer experiencing slowness when they are not >> rotating wtmp files. This file is responsible to keep track of logins on >> your server. This file can be found in the following path: /var/log/wtmp >> >> If the file is large and it is not being rotated, please consider >> rotating the file and trying to open your SSH connection. >> >> 2. Check CPU and burst balance on Cloudwatch >> Since this is a GovCloud instance, I have no access or visibility from >> your CLoudwatch metrics. >> Please check the following metrics on Cloudwatch: >> CPUUtilization >> CPUCreditBalance >> >> Your instance is a T2 instance, which comes with a burstable[1] CPU >> utilization[2]. A t2.micro can operate under 10% of CPU usage at any time >> but always when it needs more CPU power, it is going to use CPU burst >> credits. Please check the CPUUtilization metric on Cloudwatch to see the >> CPU usage, if it is 10% or more, please also check the CPUCreditBalance >> metric. If the CPU credit balance is at 0 or it is constantly dropping, you >> know your instance is doing more workload than the instance type can >> handle. >> >> If this is the case, you have two options: >> 2.1 Check the applications using CPU: As we don't have access to your >> instance operating system, you might consider checking the performance >> metrics in OS level (e.g top, sar and iostat). >> 2.2 Resize the instance type: When your instance is constantly using more >> than its baseline and you can't reduce your workload, you might consider >> get a larger instance type. >> >> 3. Memory usage, check on Operating system layer >> Please check the memory usage on OS layer using commands such as: >> top >> free >> vmstat >> >> If your bastion is heavily loaded with connections or if the SSH >> connections aren't being properly closed, you might see a high memory >> usage. >> >> 4. Disk usage Your instances are using GP2 vlumes, this kind of volume >> works with burst credits as well. I would recommend to check the >> "BurstBalance" metric on Cloudwatch[3]. >> >> 4. Packet capture during a connection attempt ** Please, just collect the >> tcpdump if you have already performed/validated the steps mentioned above. >> ** You could start a tcpdump on the source and target instance so we know >> how long it takes to open the connection (from network perspective). Just >> for your reference, here is the command line to start tcpdump to capture >> all the traffic from the first network adapter ignoring host naming >> translation: sudo tcpdump -w /tmp/instance-id-here.pcap -nn >> >> Note, the output will be sent to the path specified on -w flag. >> >> Once the tcpdump is running, you could try to SSH and reproduce the >> issue. >> >> The steps above will generate a file on both instances. Please collect >> the data during a SSH connection. >> >> Lastly, for more information about EC2 metrics available[1], how CPU >> burst works on T2/T3 instances[2] and the more details on how to monitor >> the burst balance for EBS volumes[3] in the following links: [1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html >> [2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-credits-baseline-concepts.html#earning-CPU-credits >> [3] >> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html#monitoring_burstbucket >> >> >> I hope this information is helpful, let us know if you have any other >> questions. Feel free to give us a call or open a chat with us so we can >> quickly help you on this. >> >> Best regards, >> Ed F. / Cloud Support / Sydney >> Amazon Web Services C >> >> Check out the AWS Support Knowledge Center, a knowledge base of articles >> and videos that answer customer questions about AWS services: https://aws.amazon.com/premiumsupport/knowledge-center/?icmpid=support_email_category >> We value your feedback. Please rate my response using the link below. >> =================================================== To contact us again >> about this case, please return to the AWS Support Center using the >> following URL: https://console.amazonaws-us-gov.com/support/cases#/6655545661/en >> (If you are connecting by federation, log in before following the link.) >> *Please note: this e-mail was sent from an address that cannot accept >> incoming e-mail. Please use the link above if you need to contact us again >> about this same issue. >> ==================================================================== Learn >> to work with the AWS Cloud. Get started with free online videos and >> self-paced labs at http://aws.amazon.com/training/ ==================================================================== >> Amazon Web Services, Inc. is an affiliate of Amazon.com, Inc. Amazon.com is >> a registered trademark of Amazon.com, Inc. or its affiliates. >> >> ------------------------------ >> *From:* Mark Nissley >> *Sent:* Tuesday, December 10, 2019 10:45 AM >> *To:* Adrian Nunez >> *Cc:* Dean Lystra ; Taylor Biggs ; >> platformONE at redhat.com >> *Subject:* Re: [Platformone] AWS Access Slowdowns >> >> >> [EXTERNAL EMAIL] >> Team- >> >> Any update on this today. It is my thought that we are now outside our >> SLA with AWS for a response. >> >> >> Mark NISSLEY, PMP, CSM, LEAN >> >> PROGRAM MaNAGER & SR technical Project Manager >> >> North American Consulting, Public Sector >> >> >> M: 850-530-3234 >> >> >> >> *Scheduled Training: October 14-18* >> >> >> On Mon, Dec 9, 2019 at 5:26 PM Adrian Nunez >> wrote: >> >> Ticket opened. Odd that the EC2 is running at 84% utilization and it's >> empty. >> >> V/R >> Adrian >> >> Get Outlook for Android >> >> ------------------------------ >> *From:* Dean Lystra >> *Sent:* Monday, December 9, 2019 6:19:46 PM >> *To:* Taylor Biggs >> *Cc:* Adrian Nunez ; platformONE at redhat.com < >> platformONE at redhat.com> >> *Subject:* Re: [Platformone] AWS Access Slowdowns >> >> >> [EXTERNAL EMAIL] >> I had Dino change "useDNS" to no in /etc/ssh/sshd_config, restarted the >> service, then ssh, but it didn't seem to help. >> >> On Mon, Dec 9, 2019, 4:24 PM Taylor Biggs wrote: >> >> Often I'll see this because SSH likes to reverse-DNS lookup people before >> they get all the way logged in. So you might have a DNS lookup issue going >> on (just look at the output of "w" command). >> >> Thanks, >> Taylor >> >> ---- >> Taylor Biggs >> taylor at redhat.com >> 850-449-2220 >> >> >> >> On Mon, Dec 9, 2019 at 5:15 PM Dean Lystra wrote: >> >> This behavior only occurs when they are not in the same subnet. We >> attempted connections from another EC2 in 10.10.0.0/20 and they were >> connecting instantly. >> >> On Mon, Dec 9, 2019, 4:13 PM Adrian Nunez >> wrote: >> >> Ok. I'm in the Blue Jeans channel. Slow EC2 SSH is usually from a service >> running on the EC2 >> >> Get Outlook for Android >> >> ------------------------------ >> *From:* Dean Lystra >> *Sent:* Monday, December 9, 2019 5:12:04 PM >> *To:* Adrian Nunez >> *Cc:* Mark Nissley ; Jonathan Rickard < >> jrickard at redhat.com>; platformONE at redhat.com >> *Subject:* Re: [Platformone] AWS Access Slowdowns >> >> >> [EXTERNAL EMAIL] >> Connections between instances in the staging VPC are slow when they are >> in different subnets. Any instance in 10.10.0.0/20 hangs for a minute >> when a ping or SSH is attempted to a server in 10.10.96.0/20. >> >> On Mon, Dec 9, 2019, 4:06 PM Adrian Nunez >> wrote: >> >> Call me. What EC2's? >> >> Get Outlook for Android >> >> ------------------------------ >> *From:* Mark Nissley >> *Sent:* Monday, December 9, 2019 5:04:15 PM >> *To:* Jonathan Rickard >> *Cc:* Adrian Nunez ; Goss, Andrew [Semper >> Valens Solutions (SVS)] ; >> platformONE at redhat.com >> *Subject:* Re: [Platformone] AWS Access Slowdowns >> >> >> [EXTERNAL EMAIL] >> Just wanted to add that we are blocked on this right now. We continue to >> work it, but could use a call from Andrew or Adrian to Dino at 848-203-1809 >> at your early convenience! >> >> >> Mark NISSLEY, PMP, CSM, LEAN >> >> PROGRAM MaNAGER & SR technical Project Manager >> >> North American Consulting, Public Sector >> >> >> M: 850-530-3234 >> >> >> >> *Scheduled Training: October 14-18* >> >> >> On Mon, Dec 9, 2019 at 3:42 PM Jonathan Rickard >> wrote: >> >> Adrian / Andrew, >> >> It was reported that there's some significant slowness while sshing into >> the EC2's - would either of you be able to take a look? >> >> Dino Arachchi and Dean Lystra are running point on this. >> >> Thanks! >> jonny >> >> Jonathan Rickard, RHCE, RHCA >> >> Consulting Architect >> >> Red Hat Public Sector >> >> jonny at redhat.com >> M: 210.862.9739 >> @redhatjobs redhatjobs >> @redhatjobs >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kodonnel at redhat.com Wed Dec 11 03:50:34 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Tue, 10 Dec 2019 22:50:34 -0500 Subject: [Platformone] Support Assistance In-Reply-To: References: <80f65916a25f4870bc4f07590f8d4410@EX13D08UEB003.ant.amazon.com> Message-ID: Hello Chris, I am not sure if you worked anything on your side or not. But we with AWS support were able to identify that that issue was due to a aws node and we were able to resolve the issue. Thank you for your support. Who should we contact moving forward for escalations on support tickets? I would like to ensure that we are following the correct path. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Tue, Dec 10, 2019 at 11:32 AM Kevin O'Donnell wrote: > levelup-factory. Yes we did get a general response back from an individual > from Sydney. But nothing past that. > > Thanks, > > KEVIN O'DONNELL > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > On Tue, Dec 10, 2019 at 11:06 AM Kozlowski, Chris > wrote: > >> Kevin, >> >> >> >> Can you give me the account # that you submitted the case from? >> >> >> >> When was the case submitted? Has support reached out to you? >> >> >> >> Thanks! >> >> >> >> *[image: cid:image001.jpg at 01D52B59.2C1253E0]* >> >> *Chris **Kozlowski* | Sr. Technical Account Manager >> >> AWS Enterprise Support, National Security Programs >> >> kozlowck at amazon.com | m: 703.831.5110 >> >> >> >> Thoughts on our interaction? Provide feedback here >> . >> >> >> >> *From:* Kevin O'Donnell >> *Sent:* Tuesday, December 10, 2019 10:53 AM >> *To:* Settle, Rob ; Carta, Mike ; >> Kozlowski, Chris ; platformONE at redhat.com >> *Subject:* Support Assistance >> >> >> >> Hello Team. >> >> >> >> We are having some performance issues on one of our EC2 instances. We >> created a case and would like some assistance in case resolution. Please >> let us know who can assist us. >> >> >> >> https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en >> >> >> >> Thanks, >> >> >> *KEVIN O'DONNELL * >> >> ARCHITECT MANAGER >> >> Red Hat Red Hat NA Public Sector Consulting >> >> kodonnell at redhat.com M: >> 240-605-4654 >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 2163 bytes Desc: not available URL: From kozlowck at amazon.com Wed Dec 11 16:14:03 2019 From: kozlowck at amazon.com (Kozlowski, Chris) Date: Wed, 11 Dec 2019 16:14:03 +0000 Subject: [Platformone] Support Assistance In-Reply-To: References: <80f65916a25f4870bc4f07590f8d4410@EX13D08UEB003.ant.amazon.com> Message-ID: <77975aad2ddc48568c912ae0e9009528@EX13D08UEB003.ant.amazon.com> Kevin, No problem. I actually didn?t need to get involved this time myself; support is generally really responsive. Glad they were able to help you quickly! Getting timely support first starts with classification of the ticket. I?ve included a link below with the breakdown of targeted response times for Enterprise Support. These are the response times the support team will target when you submit a case. For Urgent and Critical cases, the times are 1 hour and 15 minutes, especially. I recommend reserving these for times when you have production or mission-critical systems down. In addition, if you submit a case at this severity level, I will be immediately paged. I recommend using these sparingly, but if you have one of the scenarios above, don?t hesitate to do so. For any other levels, if you need to escalate, feel free to reach out to me directly; that?s one of the roles your TAM is for. Thanks! [cid:image001.jpg at 01D52B59.2C1253E0] Chris Kozlowski | Sr. Technical Account Manager AWS Enterprise Support, National Security Programs kozlowck at amazon.com | m: 703.831.5110 Thoughts on our interaction? Provide feedback here. From: Kevin O'Donnell Sent: Tuesday, December 10, 2019 10:51 PM To: Kozlowski, Chris Cc: Settle, Rob ; Carta, Mike ; platformONE at redhat.com Subject: Re: Support Assistance Hello Chris, I am not sure if you worked anything on your side or not. But we with AWS support were able to identify that that issue was due to a aws node and we were able to resolve the issue. Thank you for your support. Who should we contact moving forward for escalations on support tickets? I would like to ensure that we are following the correct path. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Tue, Dec 10, 2019 at 11:32 AM Kevin O'Donnell > wrote: levelup-factory. Yes we did get a general response back from an individual from Sydney. But nothing past that. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Tue, Dec 10, 2019 at 11:06 AM Kozlowski, Chris > wrote: Kevin, Can you give me the account # that you submitted the case from? When was the case submitted? Has support reached out to you? Thanks! [cid:image001.jpg at 01D52B59.2C1253E0] Chris Kozlowski | Sr. Technical Account Manager AWS Enterprise Support, National Security Programs kozlowck at amazon.com | m: 703.831.5110 Thoughts on our interaction? Provide feedback here. From: Kevin O'Donnell > Sent: Tuesday, December 10, 2019 10:53 AM To: Settle, Rob >; Carta, Mike >; Kozlowski, Chris >; platformONE at redhat.com Subject: Support Assistance Hello Team. We are having some performance issues on one of our EC2 instances. We created a case and would like some assistance in case resolution. Please let us know who can assist us. https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 2163 bytes Desc: image001.jpg URL: From jrickard at redhat.com Wed Dec 11 17:30:02 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Wed, 11 Dec 2019 11:30:02 -0600 Subject: [Platformone] AWS Group Message-ID: Adrian / Andrew, I know y'all are working on an AWS RBAC solution so I wanted to let you know that I created a role called "aws_readonly" which has the policies below attached to it. I gave it the support case policies so folks could monitor tickets...etc that we may have open. [image: image.png] Let me know if you have any questions/concerns. jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 25049 bytes Desc: not available URL: From adrian.nunez at bylight.com Wed Dec 11 17:33:53 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Wed, 11 Dec 2019 17:33:53 +0000 Subject: [Platformone] AWS Group In-Reply-To: References: Message-ID: Sounds good. Thanks for the heads up. I will add that to the read only for RH in the new RBAC. Get Outlook for Android ________________________________ From: Jonathan Rickard Sent: Wednesday, December 11, 2019 12:30:02 PM To: platformONE at redhat.com ; Adrian Nunez ; andrew.goss at accenturefederal.com Subject: AWS Group [EXTERNAL EMAIL] Adrian / Andrew, I know y'all are working on an AWS RBAC solution so I wanted to let you know that I created a role called "aws_readonly" which has the policies below attached to it. I gave it the support case policies so folks could monitor tickets...etc that we may have open. [image.png] Let me know if you have any questions/concerns. jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 25049 bytes Desc: image.png URL: From kodonnel at redhat.com Wed Dec 11 21:05:47 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Wed, 11 Dec 2019 16:05:47 -0500 Subject: [Platformone] Support Assistance In-Reply-To: <77975aad2ddc48568c912ae0e9009528@EX13D08UEB003.ant.amazon.com> References: <80f65916a25f4870bc4f07590f8d4410@EX13D08UEB003.ant.amazon.com> <77975aad2ddc48568c912ae0e9009528@EX13D08UEB003.ant.amazon.com> Message-ID: Hello Chris, Thank you.. As it turns out we are deploying to a new VPC now and we have encountered the same issue again. This new ec2 landed on the same problem host that we had issues with yesterday. We are working with support to move the ec2 to another host, how can we pull this problem host out of the mix? The support engineer asked if you would reach out to him. https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Wed, Dec 11, 2019 at 11:14 AM Kozlowski, Chris wrote: > Kevin, > > > > No problem. I actually didn?t need to get involved this time myself; > support is generally really responsive. Glad they were able to help you > quickly! > > > > Getting timely support first starts with classification of the ticket. > I?ve included a link below with the breakdown of targeted response times > for Enterprise Support. These are the response times the support team will > target when you submit a case. For Urgent and Critical cases, the times are > 1 hour and 15 minutes, especially. I recommend reserving these for times > when you have production or mission-critical systems down. In addition, if > you submit a case at this severity level, I will be immediately paged. I > recommend using these sparingly, but if you have one of the scenarios > above, don?t hesitate to do so. > > > > For any other levels, if you need to escalate, feel free to reach out to > me directly; that?s one of the roles your TAM is for. Thanks! > > > > *[image: cid:image001.jpg at 01D52B59.2C1253E0]* > > *Chris **Kozlowski* | Sr. Technical Account Manager > > AWS Enterprise Support, National Security Programs > > kozlowck at amazon.com | m: 703.831.5110 > > > > Thoughts on our interaction? Provide feedback here > . > > > > *From:* Kevin O'Donnell > *Sent:* Tuesday, December 10, 2019 10:51 PM > *To:* Kozlowski, Chris > *Cc:* Settle, Rob ; Carta, Mike ; > platformONE at redhat.com > *Subject:* Re: Support Assistance > > > > Hello Chris, > > > > I am not sure if you worked anything on your side or not. But we with AWS > support were able to identify that that issue was due to a aws node and we > were able to resolve the issue. Thank you for your support. Who should we > contact moving forward for escalations on support tickets? I would like to > ensure that we are following the correct path. > > > > Thanks, > > > *KEVIN O'DONNELL * > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > > > > > On Tue, Dec 10, 2019 at 11:32 AM Kevin O'Donnell > wrote: > > levelup-factory. Yes we did get a general response back from an individual > from Sydney. But nothing past that. > > > > Thanks, > > > *KEVIN O'DONNELL * > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > > > > > On Tue, Dec 10, 2019 at 11:06 AM Kozlowski, Chris > wrote: > > Kevin, > > > > Can you give me the account # that you submitted the case from? > > > > When was the case submitted? Has support reached out to you? > > > > Thanks! > > > > *[image: cid:image001.jpg at 01D52B59.2C1253E0]* > > *Chris **Kozlowski* | Sr. Technical Account Manager > > AWS Enterprise Support, National Security Programs > > kozlowck at amazon.com | m: 703.831.5110 > > > > Thoughts on our interaction? Provide feedback here > . > > > > *From:* Kevin O'Donnell > *Sent:* Tuesday, December 10, 2019 10:53 AM > *To:* Settle, Rob ; Carta, Mike ; > Kozlowski, Chris ; platformONE at redhat.com > *Subject:* Support Assistance > > > > Hello Team. > > > > We are having some performance issues on one of our EC2 instances. We > created a case and would like some assistance in case resolution. Please > let us know who can assist us. > > > > https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en > > > > Thanks, > > > *KEVIN O'DONNELL * > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 2163 bytes Desc: not available URL: From kodonnel at redhat.com Wed Dec 11 21:52:33 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Wed, 11 Dec 2019 16:52:33 -0500 Subject: [Platformone] Support Assistance In-Reply-To: References: <80f65916a25f4870bc4f07590f8d4410@EX13D08UEB003.ant.amazon.com> <77975aad2ddc48568c912ae0e9009528@EX13D08UEB003.ant.amazon.com> Message-ID: After speaking with my engineer I have some concerns. about how long this took to get a resolution from aws. " We were able to resolve the case this took approximately 3 days of troubleshooting before they decided to pull the physical host. From working with your engineer I understood that he was only able to look at govcloud metrics and not actually able to do any investigative work without using my screen. I also have a concern that when we submitted the ticket from the support center it is picked up by the next available tech from anywhere in the world and not a US-based one specifically for DoD. That technician was able to read any information I put in there and is a bit concerning considering what systems we could possibly be calling in issues about. " Please let me know how we can improve this process. KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Wed, Dec 11, 2019 at 4:05 PM Kevin O'Donnell wrote: > Hello Chris, > > Thank you.. As it turns out we are deploying to a new VPC now and we have > encountered the same issue again. This new ec2 landed on the same problem > host that we had issues with yesterday. We are working with support to move > the ec2 to another host, how can we pull this problem host out of the mix? > > The support engineer asked if you would reach out to him. > > https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en > > > Thanks, > > KEVIN O'DONNELL > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > On Wed, Dec 11, 2019 at 11:14 AM Kozlowski, Chris > wrote: > >> Kevin, >> >> >> >> No problem. I actually didn?t need to get involved this time myself; >> support is generally really responsive. Glad they were able to help you >> quickly! >> >> >> >> Getting timely support first starts with classification of the ticket. >> I?ve included a link below with the breakdown of targeted response times >> for Enterprise Support. These are the response times the support team will >> target when you submit a case. For Urgent and Critical cases, the times are >> 1 hour and 15 minutes, especially. I recommend reserving these for times >> when you have production or mission-critical systems down. In addition, if >> you submit a case at this severity level, I will be immediately paged. I >> recommend using these sparingly, but if you have one of the scenarios >> above, don?t hesitate to do so. >> >> >> >> For any other levels, if you need to escalate, feel free to reach out to >> me directly; that?s one of the roles your TAM is for. Thanks! >> >> >> >> *[image: cid:image001.jpg at 01D52B59.2C1253E0]* >> >> *Chris **Kozlowski* | Sr. Technical Account Manager >> >> AWS Enterprise Support, National Security Programs >> >> kozlowck at amazon.com | m: 703.831.5110 >> >> >> >> Thoughts on our interaction? Provide feedback here >> . >> >> >> >> *From:* Kevin O'Donnell >> *Sent:* Tuesday, December 10, 2019 10:51 PM >> *To:* Kozlowski, Chris >> *Cc:* Settle, Rob ; Carta, Mike ; >> platformONE at redhat.com >> *Subject:* Re: Support Assistance >> >> >> >> Hello Chris, >> >> >> >> I am not sure if you worked anything on your side or not. But we with AWS >> support were able to identify that that issue was due to a aws node and we >> were able to resolve the issue. Thank you for your support. Who should we >> contact moving forward for escalations on support tickets? I would like to >> ensure that we are following the correct path. >> >> >> >> Thanks, >> >> >> *KEVIN O'DONNELL * >> >> ARCHITECT MANAGER >> >> Red Hat Red Hat NA Public Sector Consulting >> >> kodonnell at redhat.com M: >> 240-605-4654 >> >> >> >> >> >> >> >> On Tue, Dec 10, 2019 at 11:32 AM Kevin O'Donnell >> wrote: >> >> levelup-factory. Yes we did get a general response back from an >> individual from Sydney. But nothing past that. >> >> >> >> Thanks, >> >> >> *KEVIN O'DONNELL * >> >> ARCHITECT MANAGER >> >> Red Hat Red Hat NA Public Sector Consulting >> >> kodonnell at redhat.com M: >> 240-605-4654 >> >> >> >> >> >> >> >> On Tue, Dec 10, 2019 at 11:06 AM Kozlowski, Chris >> wrote: >> >> Kevin, >> >> >> >> Can you give me the account # that you submitted the case from? >> >> >> >> When was the case submitted? Has support reached out to you? >> >> >> >> Thanks! >> >> >> >> *[image: cid:image001.jpg at 01D52B59.2C1253E0]* >> >> *Chris **Kozlowski* | Sr. Technical Account Manager >> >> AWS Enterprise Support, National Security Programs >> >> kozlowck at amazon.com | m: 703.831.5110 >> >> >> >> Thoughts on our interaction? Provide feedback here >> . >> >> >> >> *From:* Kevin O'Donnell >> *Sent:* Tuesday, December 10, 2019 10:53 AM >> *To:* Settle, Rob ; Carta, Mike ; >> Kozlowski, Chris ; platformONE at redhat.com >> *Subject:* Support Assistance >> >> >> >> Hello Team. >> >> >> >> We are having some performance issues on one of our EC2 instances. We >> created a case and would like some assistance in case resolution. Please >> let us know who can assist us. >> >> >> >> https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en >> >> >> >> Thanks, >> >> >> *KEVIN O'DONNELL * >> >> ARCHITECT MANAGER >> >> Red Hat Red Hat NA Public Sector Consulting >> >> kodonnell at redhat.com M: >> 240-605-4654 >> >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 2163 bytes Desc: not available URL: From jhultz at redhat.com Wed Dec 11 22:01:40 2019 From: jhultz at redhat.com (Jonathan Hultz) Date: Wed, 11 Dec 2019 16:01:40 -0600 Subject: [Platformone] Fwd: [CASE 6655545661] SSH is Slow to Connect In-Reply-To: <0100016ef6f39db7-32c4b50d-094e-42c5-835e-e3a91aebd333-000000@email.amazonses.com> References: <0100016ef6f39db7-32c4b50d-094e-42c5-835e-e3a91aebd333-000000@email.amazonses.com> Message-ID: ---------- Forwarded message --------- From: no-reply-aws at amazon.com Date: Wed, Dec 11, 2019 at 3:53 PM Subject: RE:[CASE 6655545661] SSH is Slow to Connect To: Cc: Hello Jon, Thank you for contacting AWS Premium Support today. It was a pleasure working with you. During our call we worked through the issues which appeared to be on the underlying host. I could confirm the 3 launches of i-038792be35142d7f6, i-0293e105a4d4ec791, and i-0bf991fc8b83bba3d were launched on the same underlying host. I have reached out to our internal team and the host has been isolated from new launches. I have also reached out to your TAM Chris Kozlowski for situational awareness on the issue. The new launches when launched on another host did not appear to exhibit the same CPU Spiking/Instance level health check failures. I will keep this case locked to myself as an ITAR Employee, if there are issues related to this case please update the case via an email with a good number to reach out at and I will attempt to give a call back as soon as possible. I normally work Tuesday-Saturday 10amCST-6pm CST. If immediate attention is needed outside of my availability, please feel free to initiate a call/chat and an available engineer will assist as well. Let us know if there are any further questions or concerns. Have a Great Day Ahead Your feedback is important to us, please share your experience by rating this response. You will find a link to the AWS Support Center at the end of this correspondence to rate us. Best regards, Wayne S. Amazon Web Services Check out the AWS Support Knowledge Center, a knowledge base of articles and videos that answer customer questions about AWS services: https://aws.amazon.com/premiumsupport/knowledge-center/?icmpid=support_email_category We value your feedback. Please rate my response using the link below. =================================================== To contact us again about this case, please return to the AWS Support Center using the following URL: https://console.amazonaws-us-gov.com/support/cases#/6655545661/en (If you are connecting by federation, log in before following the link.) *Please note: this e-mail was sent from an address that cannot accept incoming e-mail. Please use the link above if you need to contact us again about this same issue. ==================================================================== Learn to work with the AWS Cloud. Get started with free online videos and self-paced labs at http://aws.amazon.com/training/ ==================================================================== Amazon Web Services, Inc. is an affiliate of Amazon.com, Inc. Amazon.com is a registered trademark of Amazon.com, Inc. or its affiliates. -- JONATHAN HULTZ, RHCSA SENIOR CONSULTANT Red Hat Remote US CA jhultz at redhat.com M: 609-713-9778 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kodonnel at redhat.com Wed Dec 11 22:11:13 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Wed, 11 Dec 2019 17:11:13 -0500 Subject: [Platformone] Plan Message-ID: Hello All, I wanted to send this out to get everyone on the same page on our plan. Currently, staging has been deployed and we are completing the last steps to fully deploy dev. Starting now we will be executing the below plan. Today - STIG staging via CI - Test, Sat, IdM, KeyCloak, OCP, and apps on OCP - Identify any application issues and modify STIG - Scan and report - Destroy staging Tonight - Destroy dev - Rebuild dev - promote to staging to rebuild with the current codebase Tomorrow - Run the STIG in dev, retest the applications - Promote STIG to dev - Promote STIG to up-prod - Scan and report on up-prod Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnissley at redhat.com Wed Dec 11 22:44:18 2019 From: mnissley at redhat.com (Mark Nissley) Date: Wed, 11 Dec 2019 16:44:18 -0600 Subject: [Platformone] Plan In-Reply-To: References: Message-ID: All - For your awareness. I will load all of these as Issues in the Git Lab IaC group. Please use those tickets to log all obstacles and successes. Please use these Issue tickets to record lessons learned. This is very important to our future success and that of other programs using Platform One. This work is the highest priority. I may move other "Doing" Issues back to To Do Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled PTO: Dec 23 - Jan 03* On Wed, Dec 11, 2019 at 4:17 PM Kevin O'Donnell wrote: > Hello All, > > I wanted to send this out to get everyone on the same page on our plan. > Currently, staging has been deployed and we are completing the last steps > to fully deploy dev. Starting now we will be executing the below plan. > > Today > - STIG staging via CI > - Test, Sat, IdM, KeyCloak, OCP, and apps on OCP > - Identify any application issues and modify STIG > - Scan and report > - Destroy staging > Tonight > - Destroy dev > - Rebuild dev > - promote to staging to rebuild with the current codebase > Tomorrow > - Run the STIG in dev, retest the applications > - Promote STIG to dev > - Promote STIG to up-prod > - Scan and report on up-prod > > Thanks, > > KEVIN O'DONNELL > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmiller at mitre.org Thu Dec 12 13:28:54 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Thu, 12 Dec 2019 13:28:54 +0000 Subject: [Platformone] [External] Who's managing Keycloak for dcar.dsop.io? In-Reply-To: References: <723947A8-4F32-47DD-9593-45E8749B4C7C@mitre.org> Message-ID: <273FA713-A7E8-4506-9D96-997C8DA5D699@mitre.org> Donkey shorts. -- T ?On 12/10/19, 14:00, "Taylor Biggs" wrote: Hi Tim, Andrew, Currently the "helpdesk" is me (is it making sense as to why I keep banging the Day-2 ops drum?)! I've deleted both of your accounts on the DCAR, since it is self-registration and you can just create a new one now. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Tue, Dec 10, 2019 at 9:22 AM Goss, Andrew [Semper Valens Solutions (SVS)] wrote: My MFA stopped working last week. I just have not a pressing need to reset it yet. -----Original Message----- From: platformone-bounces at redhat.com On Behalf Of Miller, Timothy J. Sent: Tuesday, December 10, 2019 7:36 AM To: platformONE at redhat.com Subject: [External] [Platformone] Who's managing Keycloak for dcar.dsop.io ? This message is from an EXTERNAL SENDER - be CAUTIOUS of links and attachments. THINK BEFORE YOU CLICK. ________________________________ Need to reset my MFA (new phone) & there's no self-service I can see. This is probably going to come up again for others. :) -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.redhat.com%2Fmailman%2Flistinfo%2Fplatformone&data=02%7C01%7Candrew.goss%40accenturefederal.com%7C15da842438324281dc7308d77d75ef7c%7C0ee6c63b4eab4748b74ad1dc22fc1a24%7C0%7C0%7C637115817801705989&sdata=O35osH6k%2FlObt%2BvoADv%2BoXOUnB4hnViIeZYJMh6EPxg%3D&reserved=0 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From kozlowck at amazon.com Thu Dec 12 20:25:59 2019 From: kozlowck at amazon.com (Kozlowski, Chris) Date: Thu, 12 Dec 2019 20:25:59 +0000 Subject: [Platformone] Support Assistance In-Reply-To: References: <80f65916a25f4870bc4f07590f8d4410@EX13D08UEB003.ant.amazon.com> <77975aad2ddc48568c912ae0e9009528@EX13D08UEB003.ant.amazon.com> Message-ID: <9d5b756966814451a7132966ac16db26@EX13D08UEB003.ant.amazon.com> Kevin, I?m sorry to hear that you and your team felt the case took too long to resolve. I?ve reviewed the case with an eye towards seeing what we could have done better. According to the case notes, I?ve put together a general timeline of what took place. ? The case was opened the evening of the 9th. o ?In our "staging-up-vpc", the SSH from the EC2 "i-0fb3eb5a3efa04e18" (bastion-dino) is very slow to another EC2 "i-038792be35142d7f6".? ? AWS Support replied an hour later, examining the connection between the two servers. Support did not see any immediate issues at the infrastructure level. Ed, the support technician, noted that he had limited visibility due to the system being on GovCloud. But he put together a list of troubleshooting steps for your engineer to look at on the instance. AWS engineers have no visibility into your buckets, volumes, or instances by design, for security. ? Three hours after the above message, it?s clarified in an email that there?s actually no issue with SSH between instances, but just to the bastion host itself. A call was requested at this time, and your engineer states that they?d like to use a screen share. ? Minutes later, AWS reached out to get an ITAR-compliant technician to work further (US citizen), as per GovCloud requirements, only ITAR-compliant engineers can participate in such. ? An day later (Dec 10th), Wayne (US citizen) takes ownership of the case, and starts a series of remote sessions with your engineer, going over the SSH configuration. ? Later on the 10th, Wayne worked with your engineer to look deeper at the instance, and at some point it was discovered that some of the instance checks were failing even though the host looked healthy. After reviewing the other configuration items, Wayne flagged the host for review by the service team. The bastion seemed to be working fine at this point as it launched on a different host. ? On the 11th, your engineer reported a similar issue on a different instance. A non-ITAR engineer fielded the original call, then handed off to Wayne who took on the case again from there. He identified three other instances already running on the host flagged for examination later. The instances were relaunched on another host and the original host prevented from new launches. My takeaways from the case are as follows: 1. Support originally regarded the case as a networking issue as it originally seemed to be an intra-EC2 communication issue. Once this was clarified to not be the case, it was examined as a single instance issue. 2. It should be understood that cases for GovCloud may be initially responded to by non-US persons, but will be routed to US Persons where ITAR compliance is needed. In this case, the case was routed to a US person the moment your engineers asked for a screen-share. As there was no visibility into your data prior to that point (viewing a screenshare is viewed as potentially exposing us to your data), that?s where the handover occurred. You can read more about how GovCloud Support works here: https://docs.aws.amazon.com/govcloud-us/latest/UserGuide/customer-supp.html 3. It is strongly recommended you not place sensitive data into a ticket. That did not occur in this case. I thought given the sequence of events in the given context lead to a resolution in a feasible time, but we?re always striving to do better. I?ll talk to the technicians on the case about whether this could have been spotted sooner, but issues can arise anywhere, and it?s important we look at all avenues. Let me know what you think, and feel free to reach out to me further with any questions or concerns. Thanks! [cid:image001.jpg at 01D52B59.2C1253E0] Chris Kozlowski | Sr. Technical Account Manager AWS Enterprise Support, National Security Programs kozlowck at amazon.com | m: 703.831.5110 Thoughts on our interaction? Provide feedback here. From: Kevin O'Donnell Sent: Wednesday, December 11, 2019 4:53 PM To: Kozlowski, Chris Cc: Settle, Rob ; Carta, Mike ; platformONE at redhat.com Subject: Re: Support Assistance After speaking with my engineer I have some concerns. about how long this took to get a resolution from aws. " We were able to resolve the case this took approximately 3 days of troubleshooting before they decided to pull the physical host. From working with your engineer I understood that he was only able to look at govcloud metrics and not actually able to do any investigative work without using my screen. I also have a concern that when we submitted the ticket from the support center it is picked up by the next available tech from anywhere in the world and not a US-based one specifically for DoD. That technician was able to read any information I put in there and is a bit concerning considering what systems we could possibly be calling in issues about. " Please let me know how we can improve this process. KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Wed, Dec 11, 2019 at 4:05 PM Kevin O'Donnell > wrote: Hello Chris, Thank you.. As it turns out we are deploying to a new VPC now and we have encountered the same issue again. This new ec2 landed on the same problem host that we had issues with yesterday. We are working with support to move the ec2 to another host, how can we pull this problem host out of the mix? The support engineer asked if you would reach out to him. https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Wed, Dec 11, 2019 at 11:14 AM Kozlowski, Chris > wrote: Kevin, No problem. I actually didn?t need to get involved this time myself; support is generally really responsive. Glad they were able to help you quickly! Getting timely support first starts with classification of the ticket. I?ve included a link below with the breakdown of targeted response times for Enterprise Support. These are the response times the support team will target when you submit a case. For Urgent and Critical cases, the times are 1 hour and 15 minutes, especially. I recommend reserving these for times when you have production or mission-critical systems down. In addition, if you submit a case at this severity level, I will be immediately paged. I recommend using these sparingly, but if you have one of the scenarios above, don?t hesitate to do so. For any other levels, if you need to escalate, feel free to reach out to me directly; that?s one of the roles your TAM is for. Thanks! [cid:image001.jpg at 01D52B59.2C1253E0] Chris Kozlowski | Sr. Technical Account Manager AWS Enterprise Support, National Security Programs kozlowck at amazon.com | m: 703.831.5110 Thoughts on our interaction? Provide feedback here. From: Kevin O'Donnell > Sent: Tuesday, December 10, 2019 10:51 PM To: Kozlowski, Chris > Cc: Settle, Rob >; Carta, Mike >; platformONE at redhat.com Subject: Re: Support Assistance Hello Chris, I am not sure if you worked anything on your side or not. But we with AWS support were able to identify that that issue was due to a aws node and we were able to resolve the issue. Thank you for your support. Who should we contact moving forward for escalations on support tickets? I would like to ensure that we are following the correct path. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Tue, Dec 10, 2019 at 11:32 AM Kevin O'Donnell > wrote: levelup-factory. Yes we did get a general response back from an individual from Sydney. But nothing past that. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Tue, Dec 10, 2019 at 11:06 AM Kozlowski, Chris > wrote: Kevin, Can you give me the account # that you submitted the case from? When was the case submitted? Has support reached out to you? Thanks! [cid:image001.jpg at 01D52B59.2C1253E0] Chris Kozlowski | Sr. Technical Account Manager AWS Enterprise Support, National Security Programs kozlowck at amazon.com | m: 703.831.5110 Thoughts on our interaction? Provide feedback here. From: Kevin O'Donnell > Sent: Tuesday, December 10, 2019 10:53 AM To: Settle, Rob >; Carta, Mike >; Kozlowski, Chris >; platformONE at redhat.com Subject: Support Assistance Hello Team. We are having some performance issues on one of our EC2 instances. We created a case and would like some assistance in case resolution. Please let us know who can assist us. https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 2163 bytes Desc: image001.jpg URL: From ademola.abodunrin at us.af.mil Thu Dec 12 23:10:50 2019 From: ademola.abodunrin at us.af.mil (ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP) Date: Thu, 12 Dec 2019 23:10:50 +0000 Subject: [Platformone] Unified Platform Cluster is Down.....CCAT Team Not Able To Deploy Message-ID: Hello Team, Please note that our cluster is down. Issue: 0/9 nodes are available: 3 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. We also logged a gitlab ticket: https://dccscr.dsop.io/dsop/dccscr/issues/237 Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5490 bytes Desc: not available URL: From kodonnel at redhat.com Fri Dec 13 01:16:38 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Thu, 12 Dec 2019 20:16:38 -0500 Subject: [Platformone] Unified Platform Cluster is Down.....CCAT Team Not Able To Deploy In-Reply-To: References: Message-ID: FYI, Found several EBS volumes (including one that is causing a failed pod) that are stuck in the "attaching" state. Created an AWS support ticket marked as "Severity: Production system down" to get them involved as well: https://console.amazonaws-us-gov.com/support/cases#/6666325471/en KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Thu, Dec 12, 2019 at 6:11 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP wrote: > Hello Team, > > > > Please note that our cluster is down. Issue: 0/9 nodes are available: 3 > node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match > node selector. > > We also logged a gitlab ticket: > https://dccscr.dsop.io/dsop/dccscr/issues/237 > > > > Most Sincerely, > > > > Ade Abodunrin, GG-12, USAF > > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > > [image: cid:image001.png at 01D4F814.4AA552D0] > > LevelUP Code Works > > Commercial: (210) 890-2113 > > NIPR email: *ademola.abodunrin at us.af.mil * > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: From kodonnel at redhat.com Fri Dec 13 01:30:17 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Thu, 12 Dec 2019 20:30:17 -0500 Subject: [Platformone] Support Assistance In-Reply-To: <9d5b756966814451a7132966ac16db26@EX13D08UEB003.ant.amazon.com> References: <80f65916a25f4870bc4f07590f8d4410@EX13D08UEB003.ant.amazon.com> <77975aad2ddc48568c912ae0e9009528@EX13D08UEB003.ant.amazon.com> <9d5b756966814451a7132966ac16db26@EX13D08UEB003.ant.amazon.com> Message-ID: Chris, Thank you for your response and follow up. Your detail is greatly appreciated. Our deployments after the host issues have been resolved and we have not faced the same issues. I appreciate your team's dedication to addressing and resolving these issues. This is the second time that our team has had host-related issues. How can we as a combined team supporting our DoD customers resolve and or remediate potential issues? Should we follow a dedicated path and or a specific coms route? I just want to make sure we are leveraging all the paths forward. Once again, Thank you for your time and support. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Thu, Dec 12, 2019 at 3:26 PM Kozlowski, Chris wrote: > Kevin, > > > > I?m sorry to hear that you and your team felt the case took too long to > resolve. > > > > I?ve reviewed the case with an eye towards seeing what we could have done > better. According to the case notes, I?ve put together a general timeline > of what took place. > > > > ? The case was opened the evening of the 9th. > > o ?In our "staging-up-vpc", the SSH from the EC2 "i-0fb3eb5a3efa04e18" (bastion-dino) is very slow to another EC2 "i-038792be35142d7f6".? > > ? AWS Support replied an hour later, examining the connection > between the two servers. Support did not see any immediate issues at the > infrastructure level. Ed, the support technician, noted that he had limited > visibility due to the system being on GovCloud. But he put together a list > of troubleshooting steps for your engineer to look at on the instance. AWS > engineers have no visibility into your buckets, volumes, or instances by > design, for security. > > ? Three hours after the above message, it?s clarified in an email > that there?s actually no issue with SSH between instances, but just to the > bastion host itself. A call was requested at this time, and your engineer > states that they?d like to use a screen share. > > ? Minutes later, AWS reached out to get an ITAR-compliant > technician to work further (US citizen), as per GovCloud requirements, only > ITAR-compliant engineers can participate in such. > > ? An day later (Dec 10th), Wayne (US citizen) takes ownership of > the case, and starts a series of remote sessions with your engineer, going > over the SSH configuration. > > ? Later on the 10th, Wayne worked with your engineer to look > deeper at the instance, and at some point it was discovered that some of > the instance checks were failing even though the host looked healthy. After > reviewing the other configuration items, Wayne flagged the host for review > by the service team. The bastion seemed to be working fine at this point as > it launched on a different host. > > ? On the 11th, your engineer reported a similar issue on a > different instance. A non-ITAR engineer fielded the original call, then > handed off to Wayne who took on the case again from there. He identified > three other instances already running on the host flagged for examination > later. The instances were relaunched on another host and the original host > prevented from new launches. > > > > My takeaways from the case are as follows: > > > > 1. Support originally regarded the case as a networking issue as it > originally seemed to be an intra-EC2 communication issue. Once this was > clarified to not be the case, it was examined as a single instance issue. > > 2. It should be understood that cases for GovCloud may be initially > responded to by non-US persons, but will be routed to US Persons where ITAR > compliance is needed. In this case, the case was routed to a US person the > moment your engineers asked for a screen-share. As there was no visibility > into your data prior to that point (viewing a screenshare is viewed as > potentially exposing us to your data), that?s where the handover occurred. > You can read more about how GovCloud Support works here: > https://docs.aws.amazon.com/govcloud-us/latest/UserGuide/customer-supp.html > > 3. It is strongly recommended you not place sensitive data into a > ticket. That did not occur in this case. > > > > I thought given the sequence of events in the given context lead to a > resolution in a feasible time, but we?re always striving to do better. I?ll > talk to the technicians on the case about whether this could have been > spotted sooner, but issues can arise anywhere, and it?s important we look > at all avenues. > > > > Let me know what you think, and feel free to reach out to me further with > any questions or concerns. > > > > Thanks! > > > > > > > > *[image: cid:image001.jpg at 01D52B59.2C1253E0]* > > *Chris **Kozlowski* | Sr. Technical Account Manager > > AWS Enterprise Support, National Security Programs > > kozlowck at amazon.com | m: 703.831.5110 > > > > Thoughts on our interaction? Provide feedback here > . > > > > *From:* Kevin O'Donnell > *Sent:* Wednesday, December 11, 2019 4:53 PM > *To:* Kozlowski, Chris > *Cc:* Settle, Rob ; Carta, Mike ; > platformONE at redhat.com > *Subject:* Re: Support Assistance > > > > After speaking with my engineer I have some concerns. about how long this > took to get a resolution from aws. > > > > " > > We were able to resolve the case this took approximately 3 days of > troubleshooting before they decided to pull the physical host. From working > with your engineer I understood that he was only able to look at govcloud > metrics and not actually able to do any investigative work without using my > screen. I also have a concern that when we submitted the ticket from the > support center it is picked up by the next available tech from anywhere in > the world and not a US-based one specifically for DoD. That technician was > able to read any information I put in there and is a bit concerning > considering what systems we could possibly be calling in issues about. > > " > > > > Please let me know how we can improve this process. > > > > *KEVIN O'DONNELL * > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > > > > > On Wed, Dec 11, 2019 at 4:05 PM Kevin O'Donnell > wrote: > > Hello Chris, > > > > Thank you.. As it turns out we are deploying to a new VPC now and we have > encountered the same issue again. This new ec2 landed on the same problem > host that we had issues with yesterday. We are working with support to move > the ec2 to another host, how can we pull this problem host out of the mix? > > > > The support engineer asked if you would reach out to him. > > > > https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en > > > > > > Thanks, > > > *KEVIN O'DONNELL * > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > > > > > On Wed, Dec 11, 2019 at 11:14 AM Kozlowski, Chris > wrote: > > Kevin, > > > > No problem. I actually didn?t need to get involved this time myself; > support is generally really responsive. Glad they were able to help you > quickly! > > > > Getting timely support first starts with classification of the ticket. > I?ve included a link below with the breakdown of targeted response times > for Enterprise Support. These are the response times the support team will > target when you submit a case. For Urgent and Critical cases, the times are > 1 hour and 15 minutes, especially. I recommend reserving these for times > when you have production or mission-critical systems down. In addition, if > you submit a case at this severity level, I will be immediately paged. I > recommend using these sparingly, but if you have one of the scenarios > above, don?t hesitate to do so. > > > > For any other levels, if you need to escalate, feel free to reach out to > me directly; that?s one of the roles your TAM is for. Thanks! > > > > *[image: cid:image001.jpg at 01D52B59.2C1253E0]* > > *Chris **Kozlowski* | Sr. Technical Account Manager > > AWS Enterprise Support, National Security Programs > > kozlowck at amazon.com | m: 703.831.5110 > > > > Thoughts on our interaction? Provide feedback here > . > > > > *From:* Kevin O'Donnell > *Sent:* Tuesday, December 10, 2019 10:51 PM > *To:* Kozlowski, Chris > *Cc:* Settle, Rob ; Carta, Mike ; > platformONE at redhat.com > *Subject:* Re: Support Assistance > > > > Hello Chris, > > > > I am not sure if you worked anything on your side or not. But we with AWS > support were able to identify that that issue was due to a aws node and we > were able to resolve the issue. Thank you for your support. Who should we > contact moving forward for escalations on support tickets? I would like to > ensure that we are following the correct path. > > > > Thanks, > > > *KEVIN O'DONNELL * > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > > > > > On Tue, Dec 10, 2019 at 11:32 AM Kevin O'Donnell > wrote: > > levelup-factory. Yes we did get a general response back from an individual > from Sydney. But nothing past that. > > > > Thanks, > > > *KEVIN O'DONNELL * > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > > > > > On Tue, Dec 10, 2019 at 11:06 AM Kozlowski, Chris > wrote: > > Kevin, > > > > Can you give me the account # that you submitted the case from? > > > > When was the case submitted? Has support reached out to you? > > > > Thanks! > > > > *[image: cid:image001.jpg at 01D52B59.2C1253E0]* > > *Chris **Kozlowski* | Sr. Technical Account Manager > > AWS Enterprise Support, National Security Programs > > kozlowck at amazon.com | m: 703.831.5110 > > > > Thoughts on our interaction? Provide feedback here > . > > > > *From:* Kevin O'Donnell > *Sent:* Tuesday, December 10, 2019 10:53 AM > *To:* Settle, Rob ; Carta, Mike ; > Kozlowski, Chris ; platformONE at redhat.com > *Subject:* Support Assistance > > > > Hello Team. > > > > We are having some performance issues on one of our EC2 instances. We > created a case and would like some assistance in case resolution. Please > let us know who can assist us. > > > > https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en > > > > Thanks, > > > *KEVIN O'DONNELL * > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 2163 bytes Desc: not available URL: From taylor at redhat.com Fri Dec 13 04:23:54 2019 From: taylor at redhat.com (Taylor Biggs) Date: Thu, 12 Dec 2019 23:23:54 -0500 Subject: [Platformone] Unified Platform Cluster is Down.....CCAT Team Not Able To Deploy In-Reply-To: References: Message-ID: Update for this thread: This is due to the fact that the AMS Instances we're using only allow for 28 attachments (EBS + NICs, etc). We're hitting that limit which is preventing additional pods from starting that have storage attached. More to come from people smarter than I on the topic! Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Thu, Dec 12, 2019 at 8:17 PM Kevin O'Donnell wrote: > FYI, > > Found several EBS volumes (including one that is causing a failed pod) > that are stuck in the "attaching" state. > > Created an AWS support ticket marked as "Severity: Production system down" > to get them involved as well: > https://console.amazonaws-us-gov.com/support/cases#/6666325471/en > > > KEVIN O'DONNELL > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > On Thu, Dec 12, 2019 at 6:11 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC > AFLCMC/HNCP wrote: > >> Hello Team, >> >> >> >> Please note that our cluster is down. Issue: 0/9 nodes are available: 3 >> node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match >> node selector. >> >> We also logged a gitlab ticket: >> https://dccscr.dsop.io/dsop/dccscr/issues/237 >> >> >> >> Most Sincerely, >> >> >> >> Ade Abodunrin, GG-12, USAF >> >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> >> [image: cid:image001.png at 01D4F814.4AA552D0] >> >> LevelUP Code Works >> >> Commercial: (210) 890-2113 >> >> NIPR email: *ademola.abodunrin at us.af.mil * >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: From ademola.abodunrin at us.af.mil Fri Dec 13 19:38:45 2019 From: ademola.abodunrin at us.af.mil (ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP) Date: Fri, 13 Dec 2019 19:38:45 +0000 Subject: [Platformone] FW: [Non-DoD Source] Fwd: Twistlock Image Scanning issue In-Reply-To: References: <1576086567982.65792@ManTech.com> Message-ID: Hello All, Please is anyone able to assist us with a Twistlock image scanning issue? https://dccscr.dsop.io/dsop/dccscr/issues/231 Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil From: Mike Knoth Sent: Friday, December 13, 2019 1:32 PM To: ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP Subject: [Non-DoD Source] Fwd: Twistlock Image Scanning issue Also - https://dccscr.dsop.io/dsop/dccscr/issues/231 ---------- Forwarded message --------- From: Curran, Daniel M > Date: Wed, Dec 11, 2019 at 1:46 PM Subject: RE: Twistlock Image Scanning issue To: Keegan Reap >, mike.knoth at g2-inc.com > Cc: Khary Mendez >, Mark Nissley > Okay, thanks. I'll head by in a few. Mike Knoth brought up another issue in our chat. I've added him to the thread so he can correct me if I get anything wrong but in essence when we navigate to "Monitor -> Vulnerabilities -> image" some of the images are missing tags. Why is this? Also keep seeing this `Scanner undefined: Failed to retrieve repository das info, error missing secret key in AWS settings` ... but only sometimes ________________________________________ From: Keegan Reap [kreap at redhat.com ] Sent: Wednesday, December 11, 2019 11:53 AM To: Curran, Daniel M Cc: Khary Mendez; Mark Nissley Subject: Re: Twistlock Image Scanning issue Interesting, we were able to scan it on our end just now using the url and image you provided. i.e: https://levelup-anchore.apps.cluster.unified-platform.io/image/docker-registry-default.apps.cluster.unified-platform.io/ccat-prod%2Fchatup/rollback/sha256:de0ce30bdc9fe12df867854e1c65693caf8fca40b1b15a93d9de376efd139f3d Feel free to swing by at some point today and we can troubleshoot this further, it might be an account issue for your user that we need to tackle. Thanks, Keegan On Wed, Dec 11, 2019 at 12:49 PM Curran, Daniel M >> wrote: Hey Keegan, Trying to get the folliwng scanned in anchore: docker-registry-default.apps.cluster.unified-platform.io ccat-prod/chatup:latest -Dan ________________________________ From: Keegan Reap >> Sent: Tuesday, December 10, 2019 5:37 PM To: Curran, Daniel M Cc: Khary Mendez; Mark Nissley Subject: Twistlock Image Scanning issue Hey Daniel, We looked into the Twistlock scanning issue, and it seems something might be wrong with your defender pods. After thoroughly looking through the project, it seems the daemonSet for Twistlock lost it's `nodeSelector` at some point. This nodeSelector is what allows the Twislock Defenders to scan images on a specific host. Due to the lost `nodeSelector` we believed this might be the cause of your scanning issues. We attempted to reattach the `nodeSelector` to the daemonSet to allow it to restore the defender pods with limited success. We will continue to investigate this issue tomorrow, but it might be best to just redeploy Twistlock if it's more convenient for you. Let's discuss tomorrow in person! Thanks Keegan Reap ________________________________ This e-mail and any attachments are intended only for the use of the addressee(s) named herein and may contain proprietary information. If you are not the intended recipient of this e-mail or believe that you received this email in error, please take immediate action to notify the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments from your computer; and do not disseminate, distribute, use, or copy this message and any attachments. -- Mike Knoth Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division 302 Sentinel Drive | Annapolis Junction, MD 20701 Email: mike.knoth at g2-inc.com Mobile: (320) 305-6453 Confidentiality Statement: HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains information proprietary or private to Huntington Ingalls Industries, Inc., and is not to be disclosed to, copied by, or used in any manner by others without the prior express, written permission. If you are not the intended recipient, please delete without copying and kindly advise the sender by e-mail of the mistake in delivery. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5490 bytes Desc: not available URL: From kreap at redhat.com Fri Dec 13 22:13:00 2019 From: kreap at redhat.com (Keegan Reap) Date: Fri, 13 Dec 2019 16:13:00 -0600 Subject: [Platformone] FW: [Non-DoD Source] Fwd: Twistlock Image Scanning issue In-Reply-To: References: <1576086567982.65792@ManTech.com> Message-ID: Hey there all, I've added a comment here that might help shed some light on the current issue you are having. Please let us know if there is any way we can assist further! https://dccscr.dsop.io/dsop/dccscr/issues/231#note_10841 Thanks, Keegan On Fri, Dec 13, 2019 at 1:39 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP wrote: > Hello All, > > > > Please is anyone able to assist us with a Twistlock image scanning issue? > > https://dccscr.dsop.io/dsop/dccscr/issues/231 > > > > Most Sincerely, > > > > Ade Abodunrin, GG-12, USAF > > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > > [image: cid:image001.png at 01D4F814.4AA552D0] > > LevelUP Code Works > > Commercial: (210) 890-2113 > > NIPR email: *ademola.abodunrin at us.af.mil * > > > > *From:* Mike Knoth > *Sent:* Friday, December 13, 2019 1:32 PM > *To:* ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < > ademola.abodunrin at us.af.mil> > *Subject:* [Non-DoD Source] Fwd: Twistlock Image Scanning issue > > > > > > Also - https://dccscr.dsop.io/dsop/dccscr/issues/231 > > > > > > ---------- Forwarded message --------- > From: *Curran, Daniel M* > Date: Wed, Dec 11, 2019 at 1:46 PM > Subject: RE: Twistlock Image Scanning issue > To: Keegan Reap , mike.knoth at g2-inc.com < > mike.knoth at g2-inc.com> > Cc: Khary Mendez , Mark Nissley > > > > Okay, thanks. I'll head by in a few. > > Mike Knoth brought up another issue in our chat. I've added him to the > thread so he can correct me if I get anything wrong but in essence when we > navigate to "Monitor -> Vulnerabilities -> image" some of the images are > missing tags. Why is this? > > Also keep seeing this `Scanner undefined: Failed to retrieve repository > das info, error missing secret key in AWS settings` ... but only sometimes > ________________________________________ > From: Keegan Reap [kreap at redhat.com] > Sent: Wednesday, December 11, 2019 11:53 AM > To: Curran, Daniel M > Cc: Khary Mendez; Mark Nissley > Subject: Re: Twistlock Image Scanning issue > > Interesting, we were able to scan it on our end just now using the url and > image you provided. > > i.e: > > https://levelup-anchore.apps.cluster.unified-platform.io/image/docker-registry-default.apps.cluster.unified-platform.io/ccat-prod%2Fchatup/rollback/sha256:de0ce30bdc9fe12df867854e1c65693caf8fca40b1b15a93d9de376efd139f3d > > Feel free to swing by at some point today and we can troubleshoot this > further, it might be an account issue for your user that we need to tackle. > > Thanks, > Keegan > > > On Wed, Dec 11, 2019 at 12:49 PM Curran, Daniel M < > Daniel.Curran at mantech.com> wrote: > > Hey Keegan, > > > Trying to get the folliwng scanned in anchore: > > > docker-registry-default.apps.cluster.unified-platform.io< > http://docker-registry-default.apps.cluster.unified-platform.io> > > ccat-prod/chatup:latest > > > -Dan > > ________________________________ > From: Keegan Reap > > Sent: Tuesday, December 10, 2019 5:37 PM > To: Curran, Daniel M > Cc: Khary Mendez; Mark Nissley > Subject: Twistlock Image Scanning issue > > Hey Daniel, > > We looked into the Twistlock scanning issue, and it seems something might > be wrong with your defender pods. After thoroughly looking through the > project, it seems the daemonSet for Twistlock lost it's `nodeSelector` at > some point. This nodeSelector is what allows the Twislock Defenders to scan > images on a specific host. Due to the lost `nodeSelector` we believed this > might be the cause of your scanning issues. We attempted to reattach the > `nodeSelector` to the daemonSet to allow it to restore the defender pods > with limited success. We will continue to investigate this issue tomorrow, > but it might be best to just redeploy Twistlock if it's more convenient for > you. Let's discuss tomorrow in person! > > Thanks > Keegan Reap > > ________________________________ > > This e-mail and any attachments are intended only for the use of the > addressee(s) named herein and may contain proprietary information. If you > are not the intended recipient of this e-mail or believe that you received > this email in error, please take immediate action to notify the sender of > the apparent error by reply e-mail; permanently delete the e-mail and any > attachments from your computer; and do not disseminate, distribute, use, or > copy this message and any attachments. > > > > > -- > > Mike Knoth > > Software Engineer > > HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. > > Technical Solutions Division > > 302 Sentinel Drive | Annapolis Junction, MD 20701 > > Email: mike.knoth at g2-inc.com > > Mobile: (320) 305-6453 > > > > Confidentiality Statement: > > HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains > information proprietary or private to Huntington Ingalls Industries, Inc., > and is not to be disclosed to, copied by, or used in any manner by others > without the prior express, written permission. If you are not the intended > recipient, please delete without copying and kindly advise the sender by > e-mail of the mistake in delivery. > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: From mike.knoth at g2-inc.com Sat Dec 14 04:18:47 2019 From: mike.knoth at g2-inc.com (Mike Knoth) Date: Fri, 13 Dec 2019 22:18:47 -0600 Subject: [Platformone] FW: [Non-DoD Source] Fwd: Twistlock Image Scanning issue In-Reply-To: References: <1576086567982.65792@ManTech.com> Message-ID: I added another comment, as we still need assistance. Mike On Fri, Dec 13, 2019 at 4:14 PM Keegan Reap wrote: > Hey there all, > > I've added a comment here that might help shed some light on the current > issue you are having. Please let us know if there is any way we can assist > further! > > https://dccscr.dsop.io/dsop/dccscr/issues/231#note_10841 > > Thanks, > > Keegan > > On Fri, Dec 13, 2019 at 1:39 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC > AFLCMC/HNCP wrote: > >> Hello All, >> >> >> >> Please is anyone able to assist us with a Twistlock image scanning issue? >> >> https://dccscr.dsop.io/dsop/dccscr/issues/231 >> >> >> >> Most Sincerely, >> >> >> >> Ade Abodunrin, GG-12, USAF >> >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> >> [image: cid:image001.png at 01D4F814.4AA552D0] >> >> LevelUP Code Works >> >> Commercial: (210) 890-2113 >> >> NIPR email: *ademola.abodunrin at us.af.mil * >> >> >> >> *From:* Mike Knoth >> *Sent:* Friday, December 13, 2019 1:32 PM >> *To:* ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >> ademola.abodunrin at us.af.mil> >> *Subject:* [Non-DoD Source] Fwd: Twistlock Image Scanning issue >> >> >> >> >> >> Also - https://dccscr.dsop.io/dsop/dccscr/issues/231 >> >> >> >> >> >> ---------- Forwarded message --------- >> From: *Curran, Daniel M* >> Date: Wed, Dec 11, 2019 at 1:46 PM >> Subject: RE: Twistlock Image Scanning issue >> To: Keegan Reap , mike.knoth at g2-inc.com < >> mike.knoth at g2-inc.com> >> Cc: Khary Mendez , Mark Nissley >> >> >> >> Okay, thanks. I'll head by in a few. >> >> Mike Knoth brought up another issue in our chat. I've added him to the >> thread so he can correct me if I get anything wrong but in essence when we >> navigate to "Monitor -> Vulnerabilities -> image" some of the images are >> missing tags. Why is this? >> >> Also keep seeing this `Scanner undefined: Failed to retrieve repository >> das info, error missing secret key in AWS settings` ... but only sometimes >> ________________________________________ >> From: Keegan Reap [kreap at redhat.com] >> Sent: Wednesday, December 11, 2019 11:53 AM >> To: Curran, Daniel M >> Cc: Khary Mendez; Mark Nissley >> Subject: Re: Twistlock Image Scanning issue >> >> Interesting, we were able to scan it on our end just now using the url >> and image you provided. >> >> i.e: >> >> https://levelup-anchore.apps.cluster.unified-platform.io/image/docker-registry-default.apps.cluster.unified-platform.io/ccat-prod%2Fchatup/rollback/sha256:de0ce30bdc9fe12df867854e1c65693caf8fca40b1b15a93d9de376efd139f3d >> >> Feel free to swing by at some point today and we can troubleshoot this >> further, it might be an account issue for your user that we need to tackle. >> >> Thanks, >> Keegan >> >> >> On Wed, Dec 11, 2019 at 12:49 PM Curran, Daniel M < >> Daniel.Curran at mantech.com> wrote: >> >> Hey Keegan, >> >> >> Trying to get the folliwng scanned in anchore: >> >> >> docker-registry-default.apps.cluster.unified-platform.io< >> http://docker-registry-default.apps.cluster.unified-platform.io> >> >> ccat-prod/chatup:latest >> >> >> -Dan >> >> ________________________________ >> From: Keegan Reap > >> Sent: Tuesday, December 10, 2019 5:37 PM >> To: Curran, Daniel M >> Cc: Khary Mendez; Mark Nissley >> Subject: Twistlock Image Scanning issue >> >> Hey Daniel, >> >> We looked into the Twistlock scanning issue, and it seems something might >> be wrong with your defender pods. After thoroughly looking through the >> project, it seems the daemonSet for Twistlock lost it's `nodeSelector` at >> some point. This nodeSelector is what allows the Twislock Defenders to scan >> images on a specific host. Due to the lost `nodeSelector` we believed this >> might be the cause of your scanning issues. We attempted to reattach the >> `nodeSelector` to the daemonSet to allow it to restore the defender pods >> with limited success. We will continue to investigate this issue tomorrow, >> but it might be best to just redeploy Twistlock if it's more convenient for >> you. Let's discuss tomorrow in person! >> >> Thanks >> Keegan Reap >> >> ________________________________ >> >> This e-mail and any attachments are intended only for the use of the >> addressee(s) named herein and may contain proprietary information. If you >> are not the intended recipient of this e-mail or believe that you received >> this email in error, please take immediate action to notify the sender of >> the apparent error by reply e-mail; permanently delete the e-mail and any >> attachments from your computer; and do not disseminate, distribute, use, or >> copy this message and any attachments. >> >> >> >> >> -- >> >> Mike Knoth >> >> Software Engineer >> >> HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. >> >> Technical Solutions Division >> >> 302 Sentinel Drive | Annapolis Junction, MD 20701 >> >> Email: mike.knoth at g2-inc.com >> >> Mobile: (320) 305-6453 >> >> >> >> Confidentiality Statement: >> >> HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains >> information proprietary or private to Huntington Ingalls Industries, Inc., >> and is not to be disclosed to, copied by, or used in any manner by others >> without the prior express, written permission. If you are not the intended >> recipient, please delete without copying and kindly advise the sender by >> e-mail of the mistake in delivery. >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > -- Mike Knoth Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division 302 Sentinel Drive | Annapolis Junction, MD 20701 Email: mike.knoth at g2-inc.com Mobile: (320) 305-6453 Confidentiality Statement: HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains information proprietary or private to Huntington Ingalls Industries, Inc., and is not to be disclosed to, copied by, or used in any manner by others without the prior express, written permission. If you are not the intended recipient, please delete without copying and kindly advise the sender by e-mail of the mistake in delivery. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: From kreap at redhat.com Sat Dec 14 17:48:29 2019 From: kreap at redhat.com (Keegan Reap) Date: Sat, 14 Dec 2019 11:48:29 -0600 Subject: [Platformone] FW: [Non-DoD Source] Fwd: Twistlock Image Scanning issue In-Reply-To: References: <1576086567982.65792@ManTech.com> Message-ID: Hey Mike, I saw the comment and we are investigating thoroughly, thank you for pointing out the missing images, I have left a detailed comment on a workaround in the meantime while we troubleshoot this issue. Thank you for your patience, we know this is a high priority so we will continue to investigate! Thanks, Keegan Reap On Fri, Dec 13, 2019 at 10:19 PM Mike Knoth wrote: > I added another comment, as we still need assistance. > > Mike > > On Fri, Dec 13, 2019 at 4:14 PM Keegan Reap wrote: > >> Hey there all, >> >> I've added a comment here that might help shed some light on the current >> issue you are having. Please let us know if there is any way we can assist >> further! >> >> https://dccscr.dsop.io/dsop/dccscr/issues/231#note_10841 >> >> Thanks, >> >> Keegan >> >> On Fri, Dec 13, 2019 at 1:39 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >> AFLCMC/HNCP wrote: >> >>> Hello All, >>> >>> >>> >>> Please is anyone able to assist us with a Twistlock image scanning issue? >>> >>> https://dccscr.dsop.io/dsop/dccscr/issues/231 >>> >>> >>> >>> Most Sincerely, >>> >>> >>> >>> Ade Abodunrin, GG-12, USAF >>> >>> Product Owner (Cybertron & Ginyu Force), Unified Platform >>> >>> >>> >>> [image: cid:image001.png at 01D4F814.4AA552D0] >>> >>> LevelUP Code Works >>> >>> Commercial: (210) 890-2113 >>> >>> NIPR email: *ademola.abodunrin at us.af.mil * >>> >>> >>> >>> *From:* Mike Knoth >>> *Sent:* Friday, December 13, 2019 1:32 PM >>> *To:* ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>> ademola.abodunrin at us.af.mil> >>> *Subject:* [Non-DoD Source] Fwd: Twistlock Image Scanning issue >>> >>> >>> >>> >>> >>> Also - https://dccscr.dsop.io/dsop/dccscr/issues/231 >>> >>> >>> >>> >>> >>> ---------- Forwarded message --------- >>> From: *Curran, Daniel M* >>> Date: Wed, Dec 11, 2019 at 1:46 PM >>> Subject: RE: Twistlock Image Scanning issue >>> To: Keegan Reap , mike.knoth at g2-inc.com < >>> mike.knoth at g2-inc.com> >>> Cc: Khary Mendez , Mark Nissley >> > >>> >>> >>> >>> Okay, thanks. I'll head by in a few. >>> >>> Mike Knoth brought up another issue in our chat. I've added him to the >>> thread so he can correct me if I get anything wrong but in essence when we >>> navigate to "Monitor -> Vulnerabilities -> image" some of the images are >>> missing tags. Why is this? >>> >>> Also keep seeing this `Scanner undefined: Failed to retrieve repository >>> das info, error missing secret key in AWS settings` ... but only sometimes >>> ________________________________________ >>> From: Keegan Reap [kreap at redhat.com] >>> Sent: Wednesday, December 11, 2019 11:53 AM >>> To: Curran, Daniel M >>> Cc: Khary Mendez; Mark Nissley >>> Subject: Re: Twistlock Image Scanning issue >>> >>> Interesting, we were able to scan it on our end just now using the url >>> and image you provided. >>> >>> i.e: >>> >>> https://levelup-anchore.apps.cluster.unified-platform.io/image/docker-registry-default.apps.cluster.unified-platform.io/ccat-prod%2Fchatup/rollback/sha256:de0ce30bdc9fe12df867854e1c65693caf8fca40b1b15a93d9de376efd139f3d >>> >>> Feel free to swing by at some point today and we can troubleshoot this >>> further, it might be an account issue for your user that we need to tackle. >>> >>> Thanks, >>> Keegan >>> >>> >>> On Wed, Dec 11, 2019 at 12:49 PM Curran, Daniel M < >>> Daniel.Curran at mantech.com> wrote: >>> >>> Hey Keegan, >>> >>> >>> Trying to get the folliwng scanned in anchore: >>> >>> >>> docker-registry-default.apps.cluster.unified-platform.io< >>> http://docker-registry-default.apps.cluster.unified-platform.io> >>> >>> ccat-prod/chatup:latest >>> >>> >>> -Dan >>> >>> ________________________________ >>> From: Keegan Reap > >>> Sent: Tuesday, December 10, 2019 5:37 PM >>> To: Curran, Daniel M >>> Cc: Khary Mendez; Mark Nissley >>> Subject: Twistlock Image Scanning issue >>> >>> Hey Daniel, >>> >>> We looked into the Twistlock scanning issue, and it seems something >>> might be wrong with your defender pods. After thoroughly looking through >>> the project, it seems the daemonSet for Twistlock lost it's `nodeSelector` >>> at some point. This nodeSelector is what allows the Twislock Defenders to >>> scan images on a specific host. Due to the lost `nodeSelector` we believed >>> this might be the cause of your scanning issues. We attempted to reattach >>> the `nodeSelector` to the daemonSet to allow it to restore the defender >>> pods with limited success. We will continue to investigate this issue >>> tomorrow, but it might be best to just redeploy Twistlock if it's more >>> convenient for you. Let's discuss tomorrow in person! >>> >>> Thanks >>> Keegan Reap >>> >>> ________________________________ >>> >>> This e-mail and any attachments are intended only for the use of the >>> addressee(s) named herein and may contain proprietary information. If you >>> are not the intended recipient of this e-mail or believe that you received >>> this email in error, please take immediate action to notify the sender of >>> the apparent error by reply e-mail; permanently delete the e-mail and any >>> attachments from your computer; and do not disseminate, distribute, use, or >>> copy this message and any attachments. >>> >>> >>> >>> >>> -- >>> >>> Mike Knoth >>> >>> Software Engineer >>> >>> HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. >>> >>> Technical Solutions Division >>> >>> 302 Sentinel Drive | Annapolis Junction, MD 20701 >>> >>> Email: mike.knoth at g2-inc.com >>> >>> Mobile: (320) 305-6453 >>> >>> >>> >>> Confidentiality Statement: >>> >>> HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains >>> information proprietary or private to Huntington Ingalls Industries, Inc., >>> and is not to be disclosed to, copied by, or used in any manner by others >>> without the prior express, written permission. If you are not the intended >>> recipient, please delete without copying and kindly advise the sender by >>> e-mail of the mistake in delivery. >>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >> > > -- > Mike Knoth > Software Engineer > > HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. > > Technical Solutions Division > > 302 Sentinel Drive | Annapolis Junction, MD 20701 > > Email: mike.knoth at g2-inc.com > > Mobile: (320) 305-6453 > > Confidentiality Statement: > > HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains > information proprietary or private to Huntington Ingalls Industries, Inc., > and is not to be disclosed to, copied by, or used in any manner by others > without the prior express, written permission. If you are not the intended > recipient, please delete without copying and kindly advise the sender by > e-mail of the mistake in delivery. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: From mike.knoth at g2-inc.com Sun Dec 15 00:05:26 2019 From: mike.knoth at g2-inc.com (Mike Knoth) Date: Sat, 14 Dec 2019 19:05:26 -0500 Subject: [Platformone] FW: [Non-DoD Source] Fwd: Twistlock Image Scanning issue In-Reply-To: References: <1576086567982.65792@ManTech.com> Message-ID: Keegan, Thank you, I see you put in a "hot fix". Now that this hot fix is in there - Twistlock is working perfectly as we expect it to. As especially I can view https://levelup-twistlock.apps.cluster.unified-platform.io/#!/monitor/vulnerabilities/registry?search=das%2F and see all of the tags in there. If you wanted to - you could keep this hot fix in for a few weeks, and make this a medium priority issue or something. As we're satisfied with what is in there right now. Mike On Sat, Dec 14, 2019 at 12:49 PM Keegan Reap wrote: > Hey Mike, > > I saw the comment and we are investigating thoroughly, thank you for > pointing out the missing images, I have left a detailed comment on a > workaround in the meantime while we troubleshoot this issue. Thank you for > your patience, we know this is a high priority so we will continue to > investigate! > > Thanks, > Keegan Reap > > On Fri, Dec 13, 2019 at 10:19 PM Mike Knoth wrote: > >> I added another comment, as we still need assistance. >> >> Mike >> >> On Fri, Dec 13, 2019 at 4:14 PM Keegan Reap wrote: >> >>> Hey there all, >>> >>> I've added a comment here that might help shed some light on the current >>> issue you are having. Please let us know if there is any way we can assist >>> further! >>> >>> https://dccscr.dsop.io/dsop/dccscr/issues/231#note_10841 >>> >>> Thanks, >>> >>> Keegan >>> >>> On Fri, Dec 13, 2019 at 1:39 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC >>> AFLCMC/HNCP wrote: >>> >>>> Hello All, >>>> >>>> >>>> >>>> Please is anyone able to assist us with a Twistlock image scanning >>>> issue? >>>> >>>> https://dccscr.dsop.io/dsop/dccscr/issues/231 >>>> >>>> >>>> >>>> Most Sincerely, >>>> >>>> >>>> >>>> Ade Abodunrin, GG-12, USAF >>>> >>>> Product Owner (Cybertron & Ginyu Force), Unified Platform >>>> >>>> >>>> >>>> [image: cid:image001.png at 01D4F814.4AA552D0] >>>> >>>> LevelUP Code Works >>>> >>>> Commercial: (210) 890-2113 >>>> >>>> NIPR email: *ademola.abodunrin at us.af.mil * >>>> >>>> >>>> >>>> *From:* Mike Knoth >>>> *Sent:* Friday, December 13, 2019 1:32 PM >>>> *To:* ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP < >>>> ademola.abodunrin at us.af.mil> >>>> *Subject:* [Non-DoD Source] Fwd: Twistlock Image Scanning issue >>>> >>>> >>>> >>>> >>>> >>>> Also - https://dccscr.dsop.io/dsop/dccscr/issues/231 >>>> >>>> >>>> >>>> >>>> >>>> ---------- Forwarded message --------- >>>> From: *Curran, Daniel M* >>>> Date: Wed, Dec 11, 2019 at 1:46 PM >>>> Subject: RE: Twistlock Image Scanning issue >>>> To: Keegan Reap , mike.knoth at g2-inc.com < >>>> mike.knoth at g2-inc.com> >>>> Cc: Khary Mendez , Mark Nissley < >>>> mnissley at redhat.com> >>>> >>>> >>>> >>>> Okay, thanks. I'll head by in a few. >>>> >>>> Mike Knoth brought up another issue in our chat. I've added him to the >>>> thread so he can correct me if I get anything wrong but in essence when we >>>> navigate to "Monitor -> Vulnerabilities -> image" some of the images are >>>> missing tags. Why is this? >>>> >>>> Also keep seeing this `Scanner undefined: Failed to retrieve repository >>>> das info, error missing secret key in AWS settings` ... but only sometimes >>>> ________________________________________ >>>> From: Keegan Reap [kreap at redhat.com] >>>> Sent: Wednesday, December 11, 2019 11:53 AM >>>> To: Curran, Daniel M >>>> Cc: Khary Mendez; Mark Nissley >>>> Subject: Re: Twistlock Image Scanning issue >>>> >>>> Interesting, we were able to scan it on our end just now using the url >>>> and image you provided. >>>> >>>> i.e: >>>> >>>> https://levelup-anchore.apps.cluster.unified-platform.io/image/docker-registry-default.apps.cluster.unified-platform.io/ccat-prod%2Fchatup/rollback/sha256:de0ce30bdc9fe12df867854e1c65693caf8fca40b1b15a93d9de376efd139f3d >>>> >>>> Feel free to swing by at some point today and we can troubleshoot this >>>> further, it might be an account issue for your user that we need to tackle. >>>> >>>> Thanks, >>>> Keegan >>>> >>>> >>>> On Wed, Dec 11, 2019 at 12:49 PM Curran, Daniel M < >>>> Daniel.Curran at mantech.com> wrote: >>>> >>>> Hey Keegan, >>>> >>>> >>>> Trying to get the folliwng scanned in anchore: >>>> >>>> >>>> docker-registry-default.apps.cluster.unified-platform.io< >>>> http://docker-registry-default.apps.cluster.unified-platform.io> >>>> >>>> ccat-prod/chatup:latest >>>> >>>> >>>> -Dan >>>> >>>> ________________________________ >>>> From: Keegan Reap > >>>> Sent: Tuesday, December 10, 2019 5:37 PM >>>> To: Curran, Daniel M >>>> Cc: Khary Mendez; Mark Nissley >>>> Subject: Twistlock Image Scanning issue >>>> >>>> Hey Daniel, >>>> >>>> We looked into the Twistlock scanning issue, and it seems something >>>> might be wrong with your defender pods. After thoroughly looking through >>>> the project, it seems the daemonSet for Twistlock lost it's `nodeSelector` >>>> at some point. This nodeSelector is what allows the Twislock Defenders to >>>> scan images on a specific host. Due to the lost `nodeSelector` we believed >>>> this might be the cause of your scanning issues. We attempted to reattach >>>> the `nodeSelector` to the daemonSet to allow it to restore the defender >>>> pods with limited success. We will continue to investigate this issue >>>> tomorrow, but it might be best to just redeploy Twistlock if it's more >>>> convenient for you. Let's discuss tomorrow in person! >>>> >>>> Thanks >>>> Keegan Reap >>>> >>>> ________________________________ >>>> >>>> This e-mail and any attachments are intended only for the use of the >>>> addressee(s) named herein and may contain proprietary information. If you >>>> are not the intended recipient of this e-mail or believe that you received >>>> this email in error, please take immediate action to notify the sender of >>>> the apparent error by reply e-mail; permanently delete the e-mail and any >>>> attachments from your computer; and do not disseminate, distribute, use, or >>>> copy this message and any attachments. >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Mike Knoth >>>> >>>> Software Engineer >>>> >>>> HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. >>>> >>>> Technical Solutions Division >>>> >>>> 302 Sentinel Drive | Annapolis Junction, MD 20701 >>>> >>>> Email: mike.knoth at g2-inc.com >>>> >>>> Mobile: (320) 305-6453 >>>> >>>> >>>> >>>> Confidentiality Statement: >>>> >>>> HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains >>>> information proprietary or private to Huntington Ingalls Industries, Inc., >>>> and is not to be disclosed to, copied by, or used in any manner by others >>>> without the prior express, written permission. If you are not the intended >>>> recipient, please delete without copying and kindly advise the sender by >>>> e-mail of the mistake in delivery. >>>> _______________________________________________ >>>> platformONE mailing list >>>> platformONE at redhat.com >>>> https://www.redhat.com/mailman/listinfo/platformone >>>> >>> >> >> -- >> Mike Knoth >> Software Engineer >> >> HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. >> >> Technical Solutions Division >> >> 302 Sentinel Drive | Annapolis Junction, MD 20701 >> >> Email: mike.knoth at g2-inc.com >> >> Mobile: (320) 305-6453 >> >> Confidentiality Statement: >> >> HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains >> information proprietary or private to Huntington Ingalls Industries, Inc., >> and is not to be disclosed to, copied by, or used in any manner by others >> without the prior express, written permission. If you are not the intended >> recipient, please delete without copying and kindly advise the sender by >> e-mail of the mistake in delivery. >> > -- Mike Knoth Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division 302 Sentinel Drive | Annapolis Junction, MD 20701 Email: mike.knoth at g2-inc.com Mobile: (320) 305-6453 Confidentiality Statement: HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains information proprietary or private to Huntington Ingalls Industries, Inc., and is not to be disclosed to, copied by, or used in any manner by others without the prior express, written permission. If you are not the intended recipient, please delete without copying and kindly advise the sender by e-mail of the mistake in delivery. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 2127 bytes Desc: not available URL: From tmiller at mitre.org Mon Dec 16 14:21:49 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Mon, 16 Dec 2019 14:21:49 +0000 Subject: [Platformone] [EXT] Re: Unified Platform Cluster is Down.....CCAT Team Not Able To Deploy In-Reply-To: <12933_1576211059_5DF31273_12933_4387_1_CAE68LrQgzLueqX83qBSv_9RRJNR72jPNmyYhOeGy2Jt0eh=E-Q@mail.gmail.com> References: <12933_1576211059_5DF31273_12933_4387_1_CAE68LrQgzLueqX83qBSv_9RRJNR72jPNmyYhOeGy2Jt0eh=E-Q@mail.gmail.com> Message-ID: https://kubernetes.io/docs/concepts/storage/storage-limits/ -- T ?On 12/12/19, 22:25, "platformone-bounces at redhat.com on behalf of Taylor Biggs" wrote: Update for this thread: This is due to the fact that the AMS Instances we're using only allow for 28 attachments (EBS + NICs, etc). We're hitting that limit which is preventing additional pods from starting that have storage attached. More to come from people smarter than I on the topic! Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Thu, Dec 12, 2019 at 8:17 PM Kevin O'Donnell wrote: FYI, Found several EBS volumes (including one that is causing a failed pod) that are stuck in the "attaching" state. Created an AWS support ticket marked as "Severity: Production system down" to get them involved as well: https://console.amazonaws-us-gov.com/support/cases#/6666325471/en KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Thu, Dec 12, 2019 at 6:11 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP wrote: Hello Team, Please note that our cluster is down. Issue: 0/9 nodes are available: 3 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. We also logged a gitlab ticket: https://dccscr.dsop.io/dsop/dccscr/issues/237 Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From tmiller at mitre.org Mon Dec 16 14:27:42 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Mon, 16 Dec 2019 14:27:42 +0000 Subject: [Platformone] [EXT] Re: Unified Platform Cluster is Down.....CCAT Team Not Able To Deploy In-Reply-To: References: <12933_1576211059_5DF31273_12933_4387_1_CAE68LrQgzLueqX83qBSv_9RRJNR72jPNmyYhOeGy2Jt0eh=E-Q@mail.gmail.com> Message-ID: <975B3EEC-44C1-4971-A7FE-D2420E6E9DB9@mitre.org> Interesting that there's a discrepancy here. AWS sayeth 28, but in-tree plugin sayeth 29 (at least for m5 instances). -- T ?On 12/16/19, 08:21, "Miller, Timothy J." wrote: https://kubernetes.io/docs/concepts/storage/storage-limits/ -- T On 12/12/19, 22:25, "platformone-bounces at redhat.com on behalf of Taylor Biggs" wrote: Update for this thread: This is due to the fact that the AMS Instances we're using only allow for 28 attachments (EBS + NICs, etc). We're hitting that limit which is preventing additional pods from starting that have storage attached. More to come from people smarter than I on the topic! Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Thu, Dec 12, 2019 at 8:17 PM Kevin O'Donnell wrote: FYI, Found several EBS volumes (including one that is causing a failed pod) that are stuck in the "attaching" state. Created an AWS support ticket marked as "Severity: Production system down" to get them involved as well: https://console.amazonaws-us-gov.com/support/cases#/6666325471/en KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Thu, Dec 12, 2019 at 6:11 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP wrote: Hello Team, Please note that our cluster is down. Issue: 0/9 nodes are available: 3 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. We also logged a gitlab ticket: https://dccscr.dsop.io/dsop/dccscr/issues/237 Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From taylor at redhat.com Mon Dec 16 14:40:19 2019 From: taylor at redhat.com (Taylor Biggs) Date: Mon, 16 Dec 2019 09:40:19 -0500 Subject: [Platformone] [EXT] Re: Unified Platform Cluster is Down.....CCAT Team Not Able To Deploy In-Reply-To: <975B3EEC-44C1-4971-A7FE-D2420E6E9DB9@mitre.org> References: <12933_1576211059_5DF31273_12933_4387_1_CAE68LrQgzLueqX83qBSv_9RRJNR72jPNmyYhOeGy2Jt0eh=E-Q@mail.gmail.com> <975B3EEC-44C1-4971-A7FE-D2420E6E9DB9@mitre.org> Message-ID: Yes, but some docs are talking about connections, while others talk about EBS attachments. The NIC counts as a :connection" in some places. ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Mon, Dec 16, 2019 at 9:28 AM Miller, Timothy J. wrote: > Interesting that there's a discrepancy here. AWS sayeth 28, but in-tree > plugin sayeth 29 (at least for m5 instances). > > -- T > > ?On 12/16/19, 08:21, "Miller, Timothy J." wrote: > > https://kubernetes.io/docs/concepts/storage/storage-limits/ > > -- T > > On 12/12/19, 22:25, "platformone-bounces at redhat.com on behalf of > Taylor Biggs" taylor at redhat.com> wrote: > > Update for this thread: > > > This is due to the fact that the AMS Instances we're using only > allow for 28 attachments (EBS + NICs, etc). We're hitting that limit which > is preventing additional pods from starting that have storage attached. > More to come from people smarter than I > on the topic! > > > Thanks, > Taylor > > ---- > Taylor Biggs > taylor at redhat.com > 850-449-2220 > > > > > > > > > > > > > On Thu, Dec 12, 2019 at 8:17 PM Kevin O'Donnell < > kodonnel at redhat.com> wrote: > > > FYI, > > > Found several EBS volumes (including one that is causing a failed > pod) that are stuck in the "attaching" state. > Created an AWS support ticket marked as "Severity: Production > system down" to get them involved as well: > > https://console.amazonaws-us-gov.com/support/cases#/6666325471/en > > > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting < > https://www.redhat.com/> > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > > > On Thu, Dec 12, 2019 at 6:11 PM ABODUNRIN, ADEMOLA A GG-12 USAF > AFMC AFLCMC/HNCP wrote: > > > Hello Team, > > > Please note that our cluster is down. Issue: > 0/9 nodes are available: 3 node(s) had taints that the pod didn't > tolerate, 6 node(s) didn't match node selector. > We also logged a gitlab ticket: > https://dccscr.dsop.io/dsop/dccscr/issues/237 > > Most Sincerely, > > Ade Abodunrin, GG-12, USAF > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > LevelUP Code Works > > Commercial: (210) 890-2113 > NIPR email: > ademola.abodunrin at us.af.mil > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kodonnel at redhat.com Mon Dec 16 14:59:56 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Mon, 16 Dec 2019 09:59:56 -0500 Subject: [Platformone] [EXT] Re: Unified Platform Cluster is Down.....CCAT Team Not Able To Deploy In-Reply-To: References: <12933_1576211059_5DF31273_12933_4387_1_CAE68LrQgzLueqX83qBSv_9RRJNR72jPNmyYhOeGy2Jt0eh=E-Q@mail.gmail.com> <975B3EEC-44C1-4971-A7FE-D2420E6E9DB9@mitre.org> Message-ID: The good news is the new prod cluster has 16 m5.2xlarge app nodes. Thanks, -Kevin On Mon, Dec 16, 2019 at 9:40 AM Taylor Biggs wrote: > Yes, but some docs are talking about connections, while others talk about > EBS attachments. The NIC counts as a :connection" in some places. > > > ---- > Taylor Biggs > taylor at redhat.com > 850-449-2220 > > > > On Mon, Dec 16, 2019 at 9:28 AM Miller, Timothy J. > wrote: > >> Interesting that there's a discrepancy here. AWS sayeth 28, but in-tree >> plugin sayeth 29 (at least for m5 instances). >> >> -- T >> >> ?On 12/16/19, 08:21, "Miller, Timothy J." wrote: >> >> https://kubernetes.io/docs/concepts/storage/storage-limits/ >> >> -- T >> >> On 12/12/19, 22:25, "platformone-bounces at redhat.com on behalf of >> Taylor Biggs" > taylor at redhat.com> wrote: >> >> Update for this thread: >> >> >> This is due to the fact that the AMS Instances we're using only >> allow for 28 attachments (EBS + NICs, etc). We're hitting that limit which >> is preventing additional pods from starting that have storage attached. >> More to come from people smarter than I >> on the topic! >> >> >> Thanks, >> Taylor >> >> ---- >> Taylor Biggs >> taylor at redhat.com >> 850-449-2220 >> >> >> >> >> >> >> >> >> >> >> >> >> On Thu, Dec 12, 2019 at 8:17 PM Kevin O'Donnell < >> kodonnel at redhat.com> wrote: >> >> >> FYI, >> >> >> Found several EBS volumes (including one that is causing a failed >> pod) that are stuck in the "attaching" state. >> Created an AWS support ticket marked as "Severity: Production >> system down" to get them involved as well: >> >> https://console.amazonaws-us-gov.com/support/cases#/6666325471/en >> >> >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting < >> https://www.redhat.com/> >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Thu, Dec 12, 2019 at 6:11 PM ABODUNRIN, ADEMOLA A GG-12 USAF >> AFMC AFLCMC/HNCP wrote: >> >> >> Hello Team, >> >> >> Please note that our cluster is down. Issue: >> 0/9 nodes are available: 3 node(s) had taints that the pod didn't >> tolerate, 6 node(s) didn't match node selector. >> We also logged a gitlab ticket: >> https://dccscr.dsop.io/dsop/dccscr/issues/237 >> >> Most Sincerely, >> >> Ade Abodunrin, GG-12, USAF >> Product Owner (Cybertron & Ginyu Force), Unified Platform >> >> >> LevelUP Code Works >> >> Commercial: (210) 890-2113 >> NIPR email: >> ademola.abodunrin at us.af.mil >> >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> >> >> >> >> >> >> -- KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmiller at mitre.org Mon Dec 16 15:28:07 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Mon, 16 Dec 2019 15:28:07 +0000 Subject: [Platformone] [EXT] Re: Unified Platform Cluster is Down.....CCAT Team Not Able To Deploy In-Reply-To: References: <12933_1576211059_5DF31273_12933_4387_1_CAE68LrQgzLueqX83qBSv_9RRJNR72jPNmyYhOeGy2Jt0eh=E-Q@mail.gmail.com> <975B3EEC-44C1-4971-A7FE-D2420E6E9DB9@mitre.org> Message-ID: <0F4EA235-B9A0-4F7C-B970-C6C7A83AB825@mitre.org> And this would be which cluster in particular? All the OCP clusters I see in the govCloud account are 9 nodes (3+3+3). -- T ?On 12/16/19, 09:00, "Kevin O'Donnell" wrote: The good news is the new prod cluster has 16 m5.2xlarge app nodes. Thanks, -Kevin On Mon, Dec 16, 2019 at 9:40 AM Taylor Biggs wrote: Yes, but some docs are talking about connections, while others talk about EBS attachments. The NIC counts as a :connection" in some places. ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Mon, Dec 16, 2019 at 9:28 AM Miller, Timothy J. wrote: Interesting that there's a discrepancy here. AWS sayeth 28, but in-tree plugin sayeth 29 (at least for m5 instances). -- T On 12/16/19, 08:21, "Miller, Timothy J." wrote: https://kubernetes.io/docs/concepts/storage/storage-limits/ -- T On 12/12/19, 22:25, "platformone-bounces at redhat.com on behalf of Taylor Biggs" wrote: Update for this thread: This is due to the fact that the AMS Instances we're using only allow for 28 attachments (EBS + NICs, etc). We're hitting that limit which is preventing additional pods from starting that have storage attached. More to come from people smarter than I on the topic! Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Thu, Dec 12, 2019 at 8:17 PM Kevin O'Donnell wrote: FYI, Found several EBS volumes (including one that is causing a failed pod) that are stuck in the "attaching" state. Created an AWS support ticket marked as "Severity: Production system down" to get them involved as well: https://console.amazonaws-us-gov.com/support/cases#/6666325471/en KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Thu, Dec 12, 2019 at 6:11 PM ABODUNRIN, ADEMOLA A GG-12 USAF AFMC AFLCMC/HNCP wrote: Hello Team, Please note that our cluster is down. Issue: 0/9 nodes are available: 3 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. We also logged a gitlab ticket: https://dccscr.dsop.io/dsop/dccscr/issues/237 Most Sincerely, Ade Abodunrin, GG-12, USAF Product Owner (Cybertron & Ginyu Force), Unified Platform LevelUP Code Works Commercial: (210) 890-2113 NIPR email: ademola.abodunrin at us.af.mil _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 From darachch at redhat.com Mon Dec 16 15:48:29 2019 From: darachch at redhat.com (Dino Arachchi) Date: Mon, 16 Dec 2019 09:48:29 -0600 Subject: [Platformone] [EXT] Re: Unified Platform Cluster is Down.....CCAT Team Not Able To Deploy In-Reply-To: <0F4EA235-B9A0-4F7C-B970-C6C7A83AB825@mitre.org> References: <12933_1576211059_5DF31273_12933_4387_1_CAE68LrQgzLueqX83qBSv_9RRJNR72jPNmyYhOeGy2Jt0eh=E-Q@mail.gmail.com> <975B3EEC-44C1-4971-A7FE-D2420E6E9DB9@mitre.org> <0F4EA235-B9A0-4F7C-B970-C6C7A83AB825@mitre.org> Message-ID: Tim, This is on the new cluster in the up-prod-b VPC. Best Regards, DINO ARACHCHI SENIOR CONSULTANT darachch at redhat.com M: 848-203-1809 On Mon, Dec 16, 2019 at 9:28 AM Miller, Timothy J. wrote: > And this would be which cluster in particular? All the OCP clusters I see > in the govCloud account are 9 nodes (3+3+3). > > -- T > > ?On 12/16/19, 09:00, "Kevin O'Donnell" wrote: > > The good news is the new prod cluster has 16 m5.2xlarge app nodes. > > > > Thanks, > > > -Kevin > > On Mon, Dec 16, 2019 at 9:40 AM Taylor Biggs > wrote: > > > Yes, but some docs are talking about connections, while others talk > about EBS attachments. The NIC counts as a :connection" in some places. > > > ---- > Taylor Biggs > taylor at redhat.com > 850-449-2220 > > > > > > > > > > > > On Mon, Dec 16, 2019 at 9:28 AM Miller, Timothy J. > wrote: > > > Interesting that there's a discrepancy here. AWS sayeth 28, but > in-tree plugin sayeth 29 (at least for m5 instances). > > -- T > > On 12/16/19, 08:21, "Miller, Timothy J." wrote: > > > https://kubernetes.io/docs/concepts/storage/storage-limits/ < > https://kubernetes.io/docs/concepts/storage/storage-limits/> > > -- T > > On 12/12/19, 22:25, "platformone-bounces at redhat.com on behalf of > Taylor Biggs" of taylor at redhat.com> wrote: > > Update for this thread: > > > This is due to the fact that the AMS Instances we're using > only allow for 28 attachments (EBS + NICs, etc). We're hitting that limit > which is preventing additional pods from starting that have storage > attached. More to come from people smarter than > I > on the topic! > > > Thanks, > Taylor > > ---- > Taylor Biggs > taylor at redhat.com > 850-449-2220 > > > > > > > > > > > > > On Thu, Dec 12, 2019 at 8:17 PM Kevin O'Donnell < > kodonnel at redhat.com> wrote: > > > FYI, > > > Found several EBS volumes (including one that is causing a > failed pod) that are stuck in the "attaching" state. > Created an AWS support ticket marked as "Severity: Production > system down" to get them involved as well: > > > https://console.amazonaws-us-gov.com/support/cases#/6666325471/en < > https://console.amazonaws-us-gov.com/support/cases#/6666325471/en> < > https://console.amazonaws-us-gov.com/support/cases#/6666325471/en> > > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting < > https://www.redhat.com/> > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > > > On Thu, Dec 12, 2019 at 6:11 PM ABODUNRIN, ADEMOLA A GG-12 > USAF AFMC AFLCMC/HNCP wrote: > > > Hello Team, > > > Please note that our cluster is down. Issue: > 0/9 nodes are available: 3 node(s) had taints that the pod > didn't tolerate, 6 node(s) didn't match node selector. > We also logged a gitlab ticket: > > https://dccscr.dsop.io/dsop/dccscr/issues/237 < > https://dccscr.dsop.io/dsop/dccscr/issues/237> > > Most Sincerely, > > Ade Abodunrin, GG-12, USAF > Product Owner (Cybertron & Ginyu Force), Unified Platform > > > LevelUP Code Works > > Commercial: (210) 890-2113 > NIPR email: > ademola.abodunrin at us.af.mil > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > > https://www.redhat.com/mailman/listinfo/platformone < > https://www.redhat.com/mailman/listinfo/platformone> > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > > https://www.redhat.com/mailman/listinfo/platformone < > https://www.redhat.com/mailman/listinfo/platformone> > > > > > > > > > > > > > > > -- > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jennifer.starling at accenturefederal.com Mon Dec 16 16:35:50 2019 From: jennifer.starling at accenturefederal.com (Starling, Jennifer [MERIDIAN TECHNOLOGIES I, LLC]) Date: Mon, 16 Dec 2019 16:35:50 +0000 Subject: [Platformone] cluster issue submitted - pod will not deploy - taint errors Message-ID: <50C4B347-EB8C-408C-8CF3-FB4E3ED1C0A2@contoso.com> I submitted an issue. https://dccscr.dsop.io/unified-platform-node-aam/misp-images/issues/1 I do not know how to make it a blocker, but it is a blocker. When this happens, we cannot create new pods or re-deploy in any projects. Thank you for your assistance, Please let me know if you require further information. Jen? Our pod is getting errors about taint. https://cluster.unified-platform.io/console/project/saml-auth-proto-misp-app/browse/events cluster: https://cluster.unified-platform.io project: saml-auth-proto-misp-app I verified the image exists. It was running and then we tried to re-deploy it. Once one of the pods gets in this state we cannot start up any new pods even in other projects. 10:28:11 AM saml-auth-proto-misp-app-web-2-deploy Pod Warning Failed Scheduling 0/9 nodes are available: 3 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. 6021 times in the last 2 hours 10:26:15 AM rf-domain-sync-1576224000-jxqvt Pod Normal Pulling pulling image "docker-registry.default.svc:5000/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image:latest" 4132 times in the last 3 days 10:26:04 AM rf-domain-sync-1576224000-jxqvt Pod Warning Failed Error: ErrImagePull 106 times in the last 2 days 10:26:04 AM rf-domain-sync-1576224000-jxqvt Pod Warning Failed Failed to pull image "docker-registry.default.svc:5000/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image:latest": rpc error: code = Unknown desc = Get https://docker-registry.default.svc:5000/v2/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image/manifests/latest: Get https://docker-registry.default.svc:5000/openshift/token?account=serviceaccount&scope=repository%3Asaml-auth-proto-misp-app%2Fsaml-auth-proto-misp-app-rf-feed-image%3Apull: net/http: request canceled (Client.Timeout exceeded while awaiting headers) 41 times in the last 15 hours -------------- next part -------------- An HTML attachment was scrubbed... URL: From jennifer.starling at accenturefederal.com Mon Dec 16 16:44:28 2019 From: jennifer.starling at accenturefederal.com (Starling, Jennifer [MERIDIAN TECHNOLOGIES I, LLC]) Date: Mon, 16 Dec 2019 16:44:28 +0000 Subject: [Platformone] cannot build Jenkins slave Message-ID: https://dccscr.dsop.io/unified-platform-node-aam/labs-ci-cd/issues/1 Our build slave for Jenkins will not build. There is a package missing. This is a blocker. cannot build jenkins robot slave The build for the Jenkins slave does not complete. We are using the standard openshift template. This is a blocker! https://cluster.unified-platform.io/console/project/aam-ci-cd/browse/builds/jenkins-slave-python?tab=history cluster: https://cluster.unified-platform.io project: aam-ci-cd I cannot give the log file because it is gone and the cluster is messed up so it will not even do a rebuild. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Mon Dec 16 17:46:21 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Mon, 16 Dec 2019 11:46:21 -0600 Subject: [Platformone] cluster issue submitted - pod will not deploy - taint errors In-Reply-To: <50C4B347-EB8C-408C-8CF3-FB4E3ED1C0A2@contoso.com> References: <50C4B347-EB8C-408C-8CF3-FB4E3ED1C0A2@contoso.com> Message-ID: Jennifer, Thank you for providing this information. We are actively engaged to restore service to the cluster. FYI, the issue is related to the number of devices attached to each of the instances. jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs On Mon, Dec 16, 2019 at 10:39 AM Starling, Jennifer [MERIDIAN TECHNOLOGIES I, LLC] wrote: > I submitted an issue. > https://dccscr.dsop.io/unified-platform-node-aam/misp-images/issues/1 > > I do not know how to make it a blocker, but it is a blocker. When this > happens, we cannot create new pods or re-deploy in any projects. > > > > Thank you for your assistance, Please let me know if you require further > information. > > > > Jen? > > Our pod is getting errors about taint. > https://cluster.unified-platform.io/console/project/saml-auth-proto-misp-app/browse/events > > cluster: https://cluster.unified-platform.io project: > saml-auth-proto-misp-app > > I verified the image exists. It was running and then we tried to re-deploy > it. Once one of the pods gets in this state we cannot start up any new pods > even in other projects. > > 10:28:11 AM saml-auth-proto-misp-app-web-2-deploy Pod Warning Failed Scheduling 0/9 nodes are available: 3 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. > > 6021 times in the last 2 hours > > 10:26:15 AM rf-domain-sync-1576224000-jxqvt Pod Normal Pulling pulling image "docker-registry.default.svc:5000/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image:latest" > > 4132 times in the last 3 days > > 10:26:04 AM rf-domain-sync-1576224000-jxqvt Pod Warning Failed Error: ErrImagePull > > 106 times in the last 2 days > > 10:26:04 AM rf-domain-sync-1576224000-jxqvt Pod Warning Failed Failed to pull image "docker-registry.default.svc:5000/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image:latest": rpc error: code = Unknown desc = Get https://docker-registry.default.svc:5000/v2/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image/manifests/latest: Get https://docker-registry.default.svc:5000/openshift/token?account=serviceaccount&scope=repository%3Asaml-auth-proto-misp-app%2Fsaml-auth-proto-misp-app-rf-feed-image%3Apull: net/http: request canceled (Client.Timeout exceeded while awaiting headers) > > 41 times in the last 15 hours > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jennifer.starling at accenturefederal.com Mon Dec 16 17:54:42 2019 From: jennifer.starling at accenturefederal.com (Starling, Jennifer [MERIDIAN TECHNOLOGIES I, LLC]) Date: Mon, 16 Dec 2019 17:54:42 +0000 Subject: [Platformone] [External] Re: cluster issue submitted - pod will not deploy - taint errors In-Reply-To: References: <50C4B347-EB8C-408C-8CF3-FB4E3ED1C0A2@contoso.com> Message-ID: <015E2FB6-F2E9-42C1-8B37-51288324ED8C@accenturefederal.com> Ok, thanks for letting us know. From: Jonathan Rickard Date: Monday, December 16, 2019 at 11:47 AM To: "Starling, Jennifer [MERIDIAN TECHNOLOGIES I, LLC]" Cc: "platformONE at redhat.com" , "Jared Crace [Mantech]" , "Joseph Middleton (Confluence)" , "Mark Sanchez [Gov]" Subject: [External] Re: [Platformone] cluster issue submitted - pod will not deploy - taint errors This message is from an EXTERNAL SENDER - be CAUTIOUS of links and attachments. THINK BEFORE YOU CLICK. ________________________________ Jennifer, Thank you for providing this information. We are actively engaged to restore service to the cluster. FYI, the issue is related to the number of devices attached to each of the instances. jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [Image removed by sender.] On Mon, Dec 16, 2019 at 10:39 AM Starling, Jennifer [MERIDIAN TECHNOLOGIES I, LLC] > wrote: I submitted an issue. https://dccscr.dsop.io/unified-platform-node-aam/misp-images/issues/1 I do not know how to make it a blocker, but it is a blocker. When this happens, we cannot create new pods or re-deploy in any projects. Thank you for your assistance, Please let me know if you require further information. Jen? Our pod is getting errors about taint. https://cluster.unified-platform.io/console/project/saml-auth-proto-misp-app/browse/events cluster: https://cluster.unified-platform.io project: saml-auth-proto-misp-app I verified the image exists. It was running and then we tried to re-deploy it. Once one of the pods gets in this state we cannot start up any new pods even in other projects. 10:28:11 AM saml-auth-proto-misp-app-web-2-deploy Pod Warning Failed Scheduling 0/9 nodes are available: 3 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. 6021 times in the last 2 hours 10:26:15 AM rf-domain-sync-1576224000-jxqvt Pod Normal Pulling pulling image "docker-registry.default.svc:5000/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image:latest" 4132 times in the last 3 days 10:26:04 AM rf-domain-sync-1576224000-jxqvt Pod Warning Failed Error: ErrImagePull 106 times in the last 2 days 10:26:04 AM rf-domain-sync-1576224000-jxqvt Pod Warning Failed Failed to pull image "docker-registry.default.svc:5000/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image:latest": rpc error: code = Unknown desc = Get https://docker-registry.default.svc:5000/v2/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image/manifests/latest: Get https://docker-registry.default.svc:5000/openshift/token?account=serviceaccount&scope=repository%3Asaml-auth-proto-misp-app%2Fsaml-auth-proto-misp-app-rf-feed-image%3Apull: net/http: request canceled (Client.Timeout exceeded while awaiting headers) 41 times in the last 15 hours _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.jung at g2-inc.com Mon Dec 16 20:48:13 2019 From: john.jung at g2-inc.com (John Jung) Date: Mon, 16 Dec 2019 15:48:13 -0500 Subject: [Platformone] Keycloak integration for an application Message-ID: Hi, in this issue I raised, I am asking for any possible integration with Keycloak for our applications in spacecamp. Please let me know if there is any information or POC that you know of. https://dccscr.dsop.io/dsop/dccscr/issues/239 Thanks, -- ------------------------- John Jung Principal Software Engineer HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. Technical Solutions Division (e) john.jung at g2.inc.com (o) 301.575.5160? -------------- next part -------------- An HTML attachment was scrubbed... URL: From kozlowck at amazon.com Mon Dec 16 20:59:29 2019 From: kozlowck at amazon.com (Kozlowski, Chris) Date: Mon, 16 Dec 2019 20:59:29 +0000 Subject: [Platformone] Support Assistance In-Reply-To: References: <80f65916a25f4870bc4f07590f8d4410@EX13D08UEB003.ant.amazon.com> <77975aad2ddc48568c912ae0e9009528@EX13D08UEB003.ant.amazon.com> <9d5b756966814451a7132966ac16db26@EX13D08UEB003.ant.amazon.com> Message-ID: Kevin, Certainly. For host-related issues, I would suggest two major things: 1. Architect for resiliency. Wherever possible, utilize multiple AZs to spread applications across hosts and availability zones so that physical infrastructure issues cannot lead to an application outage. We?ll attempt to resolve host issues as quickly as we can, but we always advise customers to be well-architected so that these occurrences are minor at worst. 2. Continue to utilize AWS Support for any issues, host or otherwise. When production systems are impacted, do not hesitate to leverage the ?Urgent? or ?Critical? rating. The Support ticket system is the quickest way to resolution. Let me know if you have any other questions or concerns. I?d be happy to sit down with you and your team some time to look at your architecture and see if there?s areas where we can reduce the impact of a host issue. If you have time later this week or next, let me know. Thanks! [cid:image001.jpg at 01D52B59.2C1253E0] Chris Kozlowski | Sr. Technical Account Manager AWS Enterprise Support, National Security Programs kozlowck at amazon.com | m: 703.831.5110 Thoughts on our interaction? Provide feedback here. From: Kevin O'Donnell Sent: Thursday, December 12, 2019 8:30 PM To: Kozlowski, Chris ; Jose Simonelli Cc: Settle, Rob ; Carta, Mike ; platformONE at redhat.com Subject: Re: Support Assistance Chris, Thank you for your response and follow up. Your detail is greatly appreciated. Our deployments after the host issues have been resolved and we have not faced the same issues. I appreciate your team's dedication to addressing and resolving these issues. This is the second time that our team has had host-related issues. How can we as a combined team supporting our DoD customers resolve and or remediate potential issues? Should we follow a dedicated path and or a specific coms route? I just want to make sure we are leveraging all the paths forward. Once again, Thank you for your time and support. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Thu, Dec 12, 2019 at 3:26 PM Kozlowski, Chris > wrote: Kevin, I?m sorry to hear that you and your team felt the case took too long to resolve. I?ve reviewed the case with an eye towards seeing what we could have done better. According to the case notes, I?ve put together a general timeline of what took place. ? The case was opened the evening of the 9th. o ?In our "staging-up-vpc", the SSH from the EC2 "i-0fb3eb5a3efa04e18" (bastion-dino) is very slow to another EC2 "i-038792be35142d7f6".? ? AWS Support replied an hour later, examining the connection between the two servers. Support did not see any immediate issues at the infrastructure level. Ed, the support technician, noted that he had limited visibility due to the system being on GovCloud. But he put together a list of troubleshooting steps for your engineer to look at on the instance. AWS engineers have no visibility into your buckets, volumes, or instances by design, for security. ? Three hours after the above message, it?s clarified in an email that there?s actually no issue with SSH between instances, but just to the bastion host itself. A call was requested at this time, and your engineer states that they?d like to use a screen share. ? Minutes later, AWS reached out to get an ITAR-compliant technician to work further (US citizen), as per GovCloud requirements, only ITAR-compliant engineers can participate in such. ? An day later (Dec 10th), Wayne (US citizen) takes ownership of the case, and starts a series of remote sessions with your engineer, going over the SSH configuration. ? Later on the 10th, Wayne worked with your engineer to look deeper at the instance, and at some point it was discovered that some of the instance checks were failing even though the host looked healthy. After reviewing the other configuration items, Wayne flagged the host for review by the service team. The bastion seemed to be working fine at this point as it launched on a different host. ? On the 11th, your engineer reported a similar issue on a different instance. A non-ITAR engineer fielded the original call, then handed off to Wayne who took on the case again from there. He identified three other instances already running on the host flagged for examination later. The instances were relaunched on another host and the original host prevented from new launches. My takeaways from the case are as follows: 1. Support originally regarded the case as a networking issue as it originally seemed to be an intra-EC2 communication issue. Once this was clarified to not be the case, it was examined as a single instance issue. 2. It should be understood that cases for GovCloud may be initially responded to by non-US persons, but will be routed to US Persons where ITAR compliance is needed. In this case, the case was routed to a US person the moment your engineers asked for a screen-share. As there was no visibility into your data prior to that point (viewing a screenshare is viewed as potentially exposing us to your data), that?s where the handover occurred. You can read more about how GovCloud Support works here: https://docs.aws.amazon.com/govcloud-us/latest/UserGuide/customer-supp.html 3. It is strongly recommended you not place sensitive data into a ticket. That did not occur in this case. I thought given the sequence of events in the given context lead to a resolution in a feasible time, but we?re always striving to do better. I?ll talk to the technicians on the case about whether this could have been spotted sooner, but issues can arise anywhere, and it?s important we look at all avenues. Let me know what you think, and feel free to reach out to me further with any questions or concerns. Thanks! [cid:image001.jpg at 01D52B59.2C1253E0] Chris Kozlowski | Sr. Technical Account Manager AWS Enterprise Support, National Security Programs kozlowck at amazon.com | m: 703.831.5110 Thoughts on our interaction? Provide feedback here. From: Kevin O'Donnell > Sent: Wednesday, December 11, 2019 4:53 PM To: Kozlowski, Chris > Cc: Settle, Rob >; Carta, Mike >; platformONE at redhat.com Subject: Re: Support Assistance After speaking with my engineer I have some concerns. about how long this took to get a resolution from aws. " We were able to resolve the case this took approximately 3 days of troubleshooting before they decided to pull the physical host. From working with your engineer I understood that he was only able to look at govcloud metrics and not actually able to do any investigative work without using my screen. I also have a concern that when we submitted the ticket from the support center it is picked up by the next available tech from anywhere in the world and not a US-based one specifically for DoD. That technician was able to read any information I put in there and is a bit concerning considering what systems we could possibly be calling in issues about. " Please let me know how we can improve this process. KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Wed, Dec 11, 2019 at 4:05 PM Kevin O'Donnell > wrote: Hello Chris, Thank you.. As it turns out we are deploying to a new VPC now and we have encountered the same issue again. This new ec2 landed on the same problem host that we had issues with yesterday. We are working with support to move the ec2 to another host, how can we pull this problem host out of the mix? The support engineer asked if you would reach out to him. https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Wed, Dec 11, 2019 at 11:14 AM Kozlowski, Chris > wrote: Kevin, No problem. I actually didn?t need to get involved this time myself; support is generally really responsive. Glad they were able to help you quickly! Getting timely support first starts with classification of the ticket. I?ve included a link below with the breakdown of targeted response times for Enterprise Support. These are the response times the support team will target when you submit a case. For Urgent and Critical cases, the times are 1 hour and 15 minutes, especially. I recommend reserving these for times when you have production or mission-critical systems down. In addition, if you submit a case at this severity level, I will be immediately paged. I recommend using these sparingly, but if you have one of the scenarios above, don?t hesitate to do so. For any other levels, if you need to escalate, feel free to reach out to me directly; that?s one of the roles your TAM is for. Thanks! [cid:image001.jpg at 01D52B59.2C1253E0] Chris Kozlowski | Sr. Technical Account Manager AWS Enterprise Support, National Security Programs kozlowck at amazon.com | m: 703.831.5110 Thoughts on our interaction? Provide feedback here. From: Kevin O'Donnell > Sent: Tuesday, December 10, 2019 10:51 PM To: Kozlowski, Chris > Cc: Settle, Rob >; Carta, Mike >; platformONE at redhat.com Subject: Re: Support Assistance Hello Chris, I am not sure if you worked anything on your side or not. But we with AWS support were able to identify that that issue was due to a aws node and we were able to resolve the issue. Thank you for your support. Who should we contact moving forward for escalations on support tickets? I would like to ensure that we are following the correct path. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Tue, Dec 10, 2019 at 11:32 AM Kevin O'Donnell > wrote: levelup-factory. Yes we did get a general response back from an individual from Sydney. But nothing past that. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Tue, Dec 10, 2019 at 11:06 AM Kozlowski, Chris > wrote: Kevin, Can you give me the account # that you submitted the case from? When was the case submitted? Has support reached out to you? Thanks! [cid:image001.jpg at 01D52B59.2C1253E0] Chris Kozlowski | Sr. Technical Account Manager AWS Enterprise Support, National Security Programs kozlowck at amazon.com | m: 703.831.5110 Thoughts on our interaction? Provide feedback here. From: Kevin O'Donnell > Sent: Tuesday, December 10, 2019 10:53 AM To: Settle, Rob >; Carta, Mike >; Kozlowski, Chris >; platformONE at redhat.com Subject: Support Assistance Hello Team. We are having some performance issues on one of our EC2 instances. We created a case and would like some assistance in case resolution. Please let us know who can assist us. https://console.amazonaws-us-gov.com/support/cases?#/6655545661/en Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 2163 bytes Desc: image001.jpg URL: From jrickard at redhat.com Tue Dec 17 03:54:14 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Mon, 16 Dec 2019 21:54:14 -0600 Subject: [Platformone] App Team RBAC Message-ID: All, Per our discussion this morning - wanted to get something sent out regarding RBAC so we could start game-planning for attacking this. *Recommendation*: Groups are created and managed in IDM and group_sync is configured on OpenShift cluster to pull users over. Each application would have an admins to manage resources within the project and have the admin role (not cluster-admin) assigned; and a users group for operational usage in the project. We (the Royal WE) should develop a role that bootstraps new applications in the cluster which takes Tim's suggestion of using vars that will create the project, groups, users and properly configure accesses. This is a *post-config operation*, and would be standalone from the deployment. Mainly from the perspective of reuse. We know who our mission app teams are today, but we don't necessarily know who will onboard later. Hopefully this makes sense, let me know what you think! jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs -------------- next part -------------- An HTML attachment was scrubbed... URL: From jennifer.starling at accenturefederal.com Tue Dec 17 15:29:36 2019 From: jennifer.starling at accenturefederal.com (Starling, Jennifer [MERIDIAN TECHNOLOGIES I, LLC]) Date: Tue, 17 Dec 2019 15:29:36 +0000 Subject: [Platformone] [External] Re: cluster issue submitted - pod will not deploy - taint errors In-Reply-To: References: <50C4B347-EB8C-408C-8CF3-FB4E3ED1C0A2@contoso.com> Message-ID: Any updates on the cluster? We are blocked. From: Jonathan Rickard Date: Monday, December 16, 2019 at 11:47 AM To: "Starling, Jennifer [MERIDIAN TECHNOLOGIES I, LLC]" Cc: "platformONE at redhat.com" , "Jared Crace [Mantech]" , "Joseph Middleton (Confluence)" , "Mark Sanchez [Gov]" Subject: [External] Re: [Platformone] cluster issue submitted - pod will not deploy - taint errors This message is from an EXTERNAL SENDER - be CAUTIOUS of links and attachments. THINK BEFORE YOU CLICK. ________________________________ Jennifer, Thank you for providing this information. We are actively engaged to restore service to the cluster. FYI, the issue is related to the number of devices attached to each of the instances. jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [Image removed by sender.] On Mon, Dec 16, 2019 at 10:39 AM Starling, Jennifer [MERIDIAN TECHNOLOGIES I, LLC] > wrote: I submitted an issue. https://dccscr.dsop.io/unified-platform-node-aam/misp-images/issues/1 I do not know how to make it a blocker, but it is a blocker. When this happens, we cannot create new pods or re-deploy in any projects. Thank you for your assistance, Please let me know if you require further information. Jen? Our pod is getting errors about taint. https://cluster.unified-platform.io/console/project/saml-auth-proto-misp-app/browse/events cluster: https://cluster.unified-platform.io project: saml-auth-proto-misp-app I verified the image exists. It was running and then we tried to re-deploy it. Once one of the pods gets in this state we cannot start up any new pods even in other projects. 10:28:11 AM saml-auth-proto-misp-app-web-2-deploy Pod Warning Failed Scheduling 0/9 nodes are available: 3 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. 6021 times in the last 2 hours 10:26:15 AM rf-domain-sync-1576224000-jxqvt Pod Normal Pulling pulling image "docker-registry.default.svc:5000/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image:latest" 4132 times in the last 3 days 10:26:04 AM rf-domain-sync-1576224000-jxqvt Pod Warning Failed Error: ErrImagePull 106 times in the last 2 days 10:26:04 AM rf-domain-sync-1576224000-jxqvt Pod Warning Failed Failed to pull image "docker-registry.default.svc:5000/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image:latest": rpc error: code = Unknown desc = Get https://docker-registry.default.svc:5000/v2/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image/manifests/latest: Get https://docker-registry.default.svc:5000/openshift/token?account=serviceaccount&scope=repository%3Asaml-auth-proto-misp-app%2Fsaml-auth-proto-misp-app-rf-feed-image%3Apull: net/http: request canceled (Client.Timeout exceeded while awaiting headers) 41 times in the last 15 hours _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -------------- next part -------------- An HTML attachment was scrubbed... URL: From kodonnel at redhat.com Tue Dec 17 15:37:50 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Tue, 17 Dec 2019 09:37:50 -0600 Subject: [Platformone] [External] Re: cluster issue submitted - pod will not deploy - taint errors In-Reply-To: References: <50C4B347-EB8C-408C-8CF3-FB4E3ED1C0A2@contoso.com> Message-ID: Hello Jennifer, We are working today to deploy additional worker nodes that will resolve your issues. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Tue, Dec 17, 2019 at 9:29 AM Starling, Jennifer [MERIDIAN TECHNOLOGIES I, LLC] wrote: > Any updates on the cluster? We are blocked. > > > > *From: *Jonathan Rickard > *Date: *Monday, December 16, 2019 at 11:47 AM > *To: *"Starling, Jennifer [MERIDIAN TECHNOLOGIES I, LLC]" < > jennifer.starling at accenturefederal.com> > *Cc: *"platformONE at redhat.com" , "Jared Crace > [Mantech]" , "Joseph Middleton (Confluence)" < > confluence-noreply at di2e.net>, "Mark Sanchez [Gov]" < > mark.sanchez.8 at us.af.mil> > *Subject: *[External] Re: [Platformone] cluster issue submitted - pod > will not deploy - taint errors > > > > > > This message is from an EXTERNAL SENDER - be CAUTIOUS of links and > attachments. THINK BEFORE YOU CLICK. > ------------------------------ > > > > Jennifer, > > > > Thank you for providing this information. We are actively engaged to > restore service to the cluster. FYI, the issue is related to the number of > devices attached to each of the instances. > > > > jonny > > *Jonathan Rickard**, RHCE, RHCA* > > Consulting Architect > > Red Hat Public Sector > > > jonny at redhat.com > M: 210.862.9739 > > @redhatjobs > > redhatjobs > > @redhatjobs > > > > [image: Image removed by sender.] > > > > > > > On Mon, Dec 16, 2019 at 10:39 AM Starling, Jennifer [MERIDIAN TECHNOLOGIES > I, LLC] wrote: > > I submitted an issue. > https://dccscr.dsop.io/unified-platform-node-aam/misp-images/issues/1 > > > I do not know how to make it a blocker, but it is a blocker. When this > happens, we cannot create new pods or re-deploy in any projects. > > > > Thank you for your assistance, Please let me know if you require further > information. > > > > Jen? > > Our pod is getting errors about taint. > https://cluster.unified-platform.io/console/project/saml-auth-proto-misp-app/browse/events > > > cluster: https://cluster.unified-platform.io > > project: saml-auth-proto-misp-app > > I verified the image exists. It was running and then we tried to re-deploy > it. Once one of the pods gets in this state we cannot start up any new pods > even in other projects. > > 10:28:11 AM saml-auth-proto-misp-app-web-2-deploy Pod Warning Failed Scheduling 0/9 nodes are available: 3 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. > > 6021 times in the last 2 hours > > 10:26:15 AM rf-domain-sync-1576224000-jxqvt Pod Normal Pulling pulling image "docker-registry.default.svc:5000/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image:latest" > > 4132 times in the last 3 days > > 10:26:04 AM rf-domain-sync-1576224000-jxqvt Pod Warning Failed Error: ErrImagePull > > 106 times in the last 2 days > > 10:26:04 AM rf-domain-sync-1576224000-jxqvt Pod Warning Failed Failed to pull image "docker-registry.default.svc:5000/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image:latest": rpc error: code = Unknown desc = Get https://docker-registry.default.svc:5000/v2/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image/manifests/latest : Get https://docker-registry.default.svc:5000/openshift/token?account=serviceaccount&scope=repository%3Asaml-auth-proto-misp-app%2Fsaml-auth-proto-misp-app-rf-feed-image%3Apull : net/http: request canceled (Client.Timeout exceeded while awaiting headers) > > 41 times in the last 15 hours > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kodonnel at redhat.com Tue Dec 17 17:23:14 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Tue, 17 Dec 2019 11:23:14 -0600 Subject: [Platformone] AWS Changes Message-ID: Who is making changes in AWS? Specifically the peering connection from prod to up-prod and a bastion instance in up-prod for OSCAP scanning? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylor at redhat.com Tue Dec 17 17:49:42 2019 From: taylor at redhat.com (Taylor Biggs) Date: Tue, 17 Dec 2019 12:49:42 -0500 Subject: [Platformone] AWS Changes In-Reply-To: References: Message-ID: Seeing stuff here too - just lost the ELB to the CHT Satellite. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Tue, Dec 17, 2019 at 12:26 PM Kevin O'Donnell wrote: > Who is making changes in AWS? Specifically the peering connection from > prod to up-prod and a bastion instance in up-prod for OSCAP scanning? > > Thanks, > > KEVIN O'DONNELL > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmiller at mitre.org Tue Dec 17 18:33:16 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Tue, 17 Dec 2019 18:33:16 +0000 Subject: [Platformone] [EXT] Re: AWS Changes In-Reply-To: <7333_1576605074_5DF91591_7333_264_1_CAE68LrQCGan1nbJvySmCKL=ryE75anLpb+DfZBCxd9YS27KY_w@mail.gmail.com> References: <7333_1576605074_5DF91591_7333_264_1_CAE68LrQCGan1nbJvySmCKL=ryE75anLpb+DfZBCxd9YS27KY_w@mail.gmail.com> Message-ID: <26CC513D-0698-4FE9-9637-37E4555CACD3@mitre.org> CloudTrails is enabled for at least some things, so you might start there. -- T ?On 12/17/19, 11:52, "platformone-bounces at redhat.com on behalf of Taylor Biggs" wrote: Seeing stuff here too - just lost the ELB to the CHT Satellite. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Tue, Dec 17, 2019 at 12:26 PM Kevin O'Donnell wrote: Who is making changes in AWS? Specifically the peering connection from prod to up-prod and a bastion instance in up-prod for OSCAP scanning? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From tmiller at mitre.org Tue Dec 17 18:37:24 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Tue, 17 Dec 2019 18:37:24 +0000 Subject: [Platformone] [EXT] Re: AWS Changes In-Reply-To: <7333_1576605074_5DF91591_7333_264_1_CAE68LrQCGan1nbJvySmCKL=ryE75anLpb+DfZBCxd9YS27KY_w@mail.gmail.com> References: <7333_1576605074_5DF91591_7333_264_1_CAE68LrQCGan1nbJvySmCKL=ryE75anLpb+DfZBCxd9YS27KY_w@mail.gmail.com> Message-ID: To answer the VPC peering question, that looks like Dino. https://console.amazonaws-us-gov.com/cloudtrail/home?region=us-gov-west-1#/events?ResourceType=AWS::EC2::VPCPeeringConnection -- T ?On 12/17/19, 11:52, "platformone-bounces at redhat.com on behalf of Taylor Biggs" wrote: Seeing stuff here too - just lost the ELB to the CHT Satellite. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Tue, Dec 17, 2019 at 12:26 PM Kevin O'Donnell wrote: Who is making changes in AWS? Specifically the peering connection from prod to up-prod and a bastion instance in up-prod for OSCAP scanning? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From jrickard at redhat.com Tue Dec 17 18:47:36 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Tue, 17 Dec 2019 12:47:36 -0600 Subject: [Platformone] [EXT] Re: AWS Changes In-Reply-To: References: <7333_1576605074_5DF91591_7333_264_1_CAE68LrQCGan1nbJvySmCKL=ryE75anLpb+DfZBCxd9YS27KY_w@mail.gmail.com> Message-ID: Tim, It looks like Dino created the vpc peering connect - not removed them. Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs On Tue, Dec 17, 2019 at 12:40 PM Miller, Timothy J. wrote: > To answer the VPC peering question, that looks like Dino. > > > https://console.amazonaws-us-gov.com/cloudtrail/home?region=us-gov-west-1#/events?ResourceType=AWS::EC2::VPCPeeringConnection > > -- T > > ?On 12/17/19, 11:52, "platformone-bounces at redhat.com on behalf of Taylor > Biggs" > wrote: > > Seeing stuff here too - just lost the ELB to the CHT Satellite. > > > Thanks, > Taylor > > ---- > Taylor Biggs > taylor at redhat.com > 850-449-2220 > > > > > > > > > > > > > On Tue, Dec 17, 2019 at 12:26 PM Kevin O'Donnell > wrote: > > > Who is making changes in AWS? Specifically the peering connection from > prod to up-prod and a bastion instance in up-prod for OSCAP scanning? > > > Thanks, > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > > > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Tue Dec 17 19:30:52 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Tue, 17 Dec 2019 13:30:52 -0600 Subject: [Platformone] EXT :Unable to access Openshift In-Reply-To: References: <1576610742544.59956@ngc.com> Message-ID: Welp, your password was reset. For password issues email platformONE at redhat.com and submit a gitlab issue for the request. Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs On Tue, Dec 17, 2019 at 1:29 PM Mike Knoth wrote: > Actually we were able to log in, sorry about that. > > But can I assume that for future requests related to Openshift accounts, > that I should email Jonnny? > > Mike > > On Tue, Dec 17, 2019 at 2:25 PM Mucker, Edward [US] (MS) (Contr) < > Edward.Mucker at ngc.com> wrote: > >> Jonny, >> >> >> Per our discussion, can you assist with getting user password reset? >> >> >> Thanks, >> >> >> Ed >> >> >> ------------------------------ >> *From:* Mike Knoth >> *Sent:* Tuesday, December 17, 2019 12:48 PM >> *To:* Sanchez, Victor [US] (MS); Mucker, Edward [US] (MS) (Contr) >> *Cc:* Rick Smith >> *Subject:* EXT :Unable to access Openshift >> >> Victor & Edward, >> >> I am on the Cybertron team, and I have my images deployed to >> https://cluster.unified-platform.io/console/project/das/. I have a >> developer (Rick Smith) who is trying to access this cluster, but is unable >> to log in to change his password from the default. I've contacted both of >> the below people who say they can't help. Do you know who can help >> with this? >> >> walter.steins at bylight.com >> Eric.Blade at ngc.com >> >> -- >> Mike Knoth >> Software Engineer >> >> HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. >> >> Technical Solutions Division >> >> 302 Sentinel Drive | Annapolis Junction, MD 20701 >> >> Email: mike.knoth at g2-inc.com >> >> Mobile: (320) 305-6453 >> >> Confidentiality Statement: >> >> HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains >> information proprietary or private to Huntington Ingalls Industries, Inc., >> and is not to be disclosed to, copied by, or used in any manner by others >> without the prior express, written permission. If you are not the intended >> recipient, please delete without copying and kindly advise the sender by >> e-mail of the mistake in delivery. >> > > > -- > Mike Knoth > Software Engineer > > HII Mission Driven Innovative Solutions (HII-MDIS) ? formerly G2, Inc. > > Technical Solutions Division > > 302 Sentinel Drive | Annapolis Junction, MD 20701 > > Email: mike.knoth at g2-inc.com > > Mobile: (320) 305-6453 > > Confidentiality Statement: > > HUNTINGTON INGALLS INDUSTRIES PROPRIETARY ? This e-mail contains > information proprietary or private to Huntington Ingalls Industries, Inc., > and is not to be disclosed to, copied by, or used in any manner by others > without the prior express, written permission. If you are not the intended > recipient, please delete without copying and kindly advise the sender by > e-mail of the mistake in delivery. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrian.nunez at bylight.com Tue Dec 17 19:35:56 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Tue, 17 Dec 2019 19:35:56 +0000 Subject: [Platformone] [EXT] Re: AWS Changes In-Reply-To: References: <7333_1576605074_5DF91591_7333_264_1_CAE68LrQCGan1nbJvySmCKL=ryE75anLpb+DfZBCxd9YS27KY_w@mail.gmail.com> , Message-ID: I have not touched any of Platform1's VPC's, peering connections or any other services since we were troubleshooting the slow SSH 2 weeks ago. I also have not implemented the RBAC user restrictions neither. V/R Adrian Get Outlook for Android ________________________________ From: Jonathan Rickard Sent: Tuesday, December 17, 2019 1:47:36 PM To: Miller, Timothy J. Cc: Taylor Biggs ; Kevin O'Donnell ; platformONE at redhat.com ; Adrian Nunez Subject: Re: [Platformone] [EXT] Re: AWS Changes [EXTERNAL EMAIL] Tim, It looks like Dino created the vpc peering connect - not removed them. Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs [https://static.redhat.com/libs/redhat/brand-assets/2/corp/logo--200.png] On Tue, Dec 17, 2019 at 12:40 PM Miller, Timothy J. > wrote: To answer the VPC peering question, that looks like Dino. https://console.amazonaws-us-gov.com/cloudtrail/home?region=us-gov-west-1#/events?ResourceType=AWS::EC2::VPCPeeringConnection -- T ?On 12/17/19, 11:52, "platformone-bounces at redhat.com on behalf of Taylor Biggs" on behalf of taylor at redhat.com> wrote: Seeing stuff here too - just lost the ELB to the CHT Satellite. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Tue, Dec 17, 2019 at 12:26 PM Kevin O'Donnell > wrote: Who is making changes in AWS? Specifically the peering connection from prod to up-prod and a bastion instance in up-prod for OSCAP scanning? Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com %20M:240-605-4654> M: 240-605-4654 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylor at redhat.com Tue Dec 17 22:34:21 2019 From: taylor at redhat.com (Taylor Biggs) Date: Tue, 17 Dec 2019 17:34:21 -0500 Subject: [Platformone] [EXT] Re: AWS Changes In-Reply-To: References: <7333_1576605074_5DF91591_7333_264_1_CAE68LrQCGan1nbJvySmCKL=ryE75anLpb+DfZBCxd9YS27KY_w@mail.gmail.com> Message-ID: Thanks Adrian - sounds like it's about time we implement your RBAC stuff :) Kevin, powered off these instances: up-ss-gitlab-1 up-ss-gitlab-ci-runner up-ss-sso-1 Working on finding more, and moving out of SS/CHT to free those up completely. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Tue, Dec 17, 2019 at 2:36 PM Adrian Nunez wrote: > I have not touched any of Platform1's VPC's, peering connections or any > other services since we were troubleshooting the slow SSH 2 weeks ago. > > I also have not implemented the RBAC user restrictions neither. > > V/R > Adrian > > > Get Outlook for Android > ------------------------------ > *From:* Jonathan Rickard > *Sent:* Tuesday, December 17, 2019 1:47:36 PM > *To:* Miller, Timothy J. > *Cc:* Taylor Biggs ; Kevin O'Donnell < > kodonnel at redhat.com>; platformONE at redhat.com ; > Adrian Nunez > *Subject:* Re: [Platformone] [EXT] Re: AWS Changes > > > [EXTERNAL EMAIL] > Tim, > > It looks like Dino created the vpc peering connect - not removed them. > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > > > On Tue, Dec 17, 2019 at 12:40 PM Miller, Timothy J. > wrote: > > To answer the VPC peering question, that looks like Dino. > > > https://console.amazonaws-us-gov.com/cloudtrail/home?region=us-gov-west-1#/events?ResourceType=AWS::EC2::VPCPeeringConnection > > -- T > > ?On 12/17/19, 11:52, "platformone-bounces at redhat.com on behalf of Taylor > Biggs" > wrote: > > Seeing stuff here too - just lost the ELB to the CHT Satellite. > > > Thanks, > Taylor > > ---- > Taylor Biggs > taylor at redhat.com > 850-449-2220 > > > > > > > > > > > > > On Tue, Dec 17, 2019 at 12:26 PM Kevin O'Donnell > wrote: > > > Who is making changes in AWS? Specifically the peering connection from > prod to up-prod and a bastion instance in up-prod for OSCAP scanning? > > > Thanks, > > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com > M: 240-605-4654 > > > > > > > > > > > > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > > > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From darachch at redhat.com Wed Dec 18 02:27:45 2019 From: darachch at redhat.com (Dino Arachchi) Date: Tue, 17 Dec 2019 20:27:45 -0600 Subject: [Platformone] [External] Re: cluster issue submitted - pod will not deploy - taint errors In-Reply-To: References: <50C4B347-EB8C-408C-8CF3-FB4E3ED1C0A2@contoso.com> Message-ID: Hi Jennifer, The additional worker nodes were successfully added with no interruption to or loss of data from the cluster. Also, the issue with not being able to pull from the docker-registry has been resolved. As a test, a Jenkins instance with persistent storage was created in the up-prod cluster and confirmed as working: https://jenkins-dino-test3.apps.cluster.unified-platform.io I have marked the Gitlab issue as closed, but please reach out if there are any further questions or concerns! Best Regards, DINO ARACHCHI SENIOR CONSULTANT darachch at redhat.com M: 848-203-1809 On Tue, Dec 17, 2019 at 9:40 AM Kevin O'Donnell wrote: > Hello Jennifer, > > We are working today to deploy additional worker nodes that will resolve > your issues. > > Thanks, > > KEVIN O'DONNELL > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > On Tue, Dec 17, 2019 at 9:29 AM Starling, Jennifer [MERIDIAN TECHNOLOGIES > I, LLC] wrote: > >> Any updates on the cluster? We are blocked. >> >> >> >> *From: *Jonathan Rickard >> *Date: *Monday, December 16, 2019 at 11:47 AM >> *To: *"Starling, Jennifer [MERIDIAN TECHNOLOGIES I, LLC]" < >> jennifer.starling at accenturefederal.com> >> *Cc: *"platformONE at redhat.com" , "Jared Crace >> [Mantech]" , "Joseph Middleton (Confluence)" < >> confluence-noreply at di2e.net>, "Mark Sanchez [Gov]" < >> mark.sanchez.8 at us.af.mil> >> *Subject: *[External] Re: [Platformone] cluster issue submitted - pod >> will not deploy - taint errors >> >> >> >> >> >> This message is from an EXTERNAL SENDER - be CAUTIOUS of links and >> attachments. THINK BEFORE YOU CLICK. >> ------------------------------ >> >> >> >> Jennifer, >> >> >> >> Thank you for providing this information. We are actively engaged to >> restore service to the cluster. FYI, the issue is related to the number of >> devices attached to each of the instances. >> >> >> >> jonny >> >> *Jonathan Rickard**, RHCE, RHCA* >> >> Consulting Architect >> >> Red Hat Public Sector >> >> >> jonny at redhat.com >> M: 210.862.9739 >> >> @redhatjobs >> >> redhatjobs >> >> @redhatjobs >> >> >> >> [image: Image removed by sender.] >> >> >> >> >> >> >> On Mon, Dec 16, 2019 at 10:39 AM Starling, Jennifer [MERIDIAN >> TECHNOLOGIES I, LLC] wrote: >> >> I submitted an issue. >> https://dccscr.dsop.io/unified-platform-node-aam/misp-images/issues/1 >> >> >> I do not know how to make it a blocker, but it is a blocker. When this >> happens, we cannot create new pods or re-deploy in any projects. >> >> >> >> Thank you for your assistance, Please let me know if you require further >> information. >> >> >> >> Jen? >> >> Our pod is getting errors about taint. >> https://cluster.unified-platform.io/console/project/saml-auth-proto-misp-app/browse/events >> >> >> cluster: https://cluster.unified-platform.io >> >> project: saml-auth-proto-misp-app >> >> I verified the image exists. It was running and then we tried to >> re-deploy it. Once one of the pods gets in this state we cannot start up >> any new pods even in other projects. >> >> 10:28:11 AM saml-auth-proto-misp-app-web-2-deploy Pod Warning Failed Scheduling 0/9 nodes are available: 3 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector. >> >> 6021 times in the last 2 hours >> >> 10:26:15 AM rf-domain-sync-1576224000-jxqvt Pod Normal Pulling pulling image "docker-registry.default.svc:5000/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image:latest" >> >> 4132 times in the last 3 days >> >> 10:26:04 AM rf-domain-sync-1576224000-jxqvt Pod Warning Failed Error: ErrImagePull >> >> 106 times in the last 2 days >> >> 10:26:04 AM rf-domain-sync-1576224000-jxqvt Pod Warning Failed Failed to pull image "docker-registry.default.svc:5000/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image:latest": rpc error: code = Unknown desc = Get https://docker-registry.default.svc:5000/v2/saml-auth-proto-misp-app/saml-auth-proto-misp-app-rf-feed-image/manifests/latest : Get https://docker-registry.default.svc:5000/openshift/token?account=serviceaccount&scope=repository%3Asaml-auth-proto-misp-app%2Fsaml-auth-proto-misp-app-rf-feed-image%3Apull : net/http: request canceled (Client.Timeout exceeded while awaiting headers) >> >> 41 times in the last 15 hours >> >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kodonnel at redhat.com Wed Dec 18 15:20:39 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Wed, 18 Dec 2019 09:20:39 -0600 Subject: [Platformone] IATT Way Ahead In-Reply-To: References: Message-ID: +PlatformONE team KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Wed, Dec 18, 2019 at 9:02 AM Lastrilla, Jet wrote: > +TJ > > > > *From:* Lastrilla, Jet > *Sent:* Wednesday, December 18, 2019 9:01 AM > *To:* Kevin O'Donnell ; TRAMBLE, ELIJAH Q Capt USAF > AFMC AFLCMC/HNC ; BRYAN, AUSTEN R Capt USAF > AFMC AFLCMC/HNCP ; Tim Gast ; > RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > *Cc:* LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP < > richard.lopezdeuralde at us.af.mil>; Bubb, Mike ; > REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP < > melissa.reinhardt.2 at us.af.mil>; Leonard, Michael C. > *Subject:* IATT Way Ahead > > > > All: > > > > Just to re-affirm our way ahead from yesterday afternoon: > > > > In priority order: > > 1. Complete deployment of worker nodes on UP-Prod; Coordinate removal > of ?dev? devices from UP-Prod; Perform complete scan (platform and apps) of > UP-Prod. Complete by COB 18 December. > 2. Complete build out of UP-ProdB; Deploy applications on UP-ProdB; > Scan entire environment. Complete by 19 December 1200 Central time. This > work will be performed by RH and TJ with LIMITED engagement with the app > teams. App team engagement will go through TJ. > > > > Our #1 objective for UP is IATT on 20 December. We have missed all of our > deadlines this week. We will provide an update to Nic during our regular > Friday call. Let Tim or I know if you have any questions. > > > > R/Jet > > > > *Jet Lastrilla*, CISSP > > MITRE | Systems Security Engineer > > San Antonio, TX > > 210-208-4867 > > jlastrilla at mitre.org > > Jethro.lastrilla.ctr at us.af.mil (NIPR) > > Jethro.s.lastrilla.ctr at mail.smil.mil (SIPR) > > Jethro.lastrilla_ctr at af.ic.gov (JWICS) > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlastrilla at mitre.org Wed Dec 18 15:32:02 2019 From: jlastrilla at mitre.org (Lastrilla, Jet) Date: Wed, 18 Dec 2019 15:32:02 +0000 Subject: [Platformone] [EXT] Re: IATT Way Ahead In-Reply-To: <1683_1576682454_5DFA43D5_1683_29_1_CA+EGyAA4dNquORSXCZLRQoGBmOPiBC5HMRSOyMXFM-1FEgo1Jw@mail.gmail.com> References: <1683_1576682454_5DFA43D5_1683_29_1_CA+EGyAA4dNquORSXCZLRQoGBmOPiBC5HMRSOyMXFM-1FEgo1Jw@mail.gmail.com> Message-ID: +Colleen and Eric? Colleen, lets coordinate on getting environment scans for UP Prod and UP-ProdB accordingly. R/Jet 619-508-5888 From: Kevin O'Donnell Sent: Wednesday, December 18, 2019 9:21 AM To: Lastrilla, Jet ; platformONE at redhat.com Cc: TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP ; Tim Gast ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Bubb, Mike ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; tj.zimmerman at braingu.com Subject: [EXT] Re: IATT Way Ahead +PlatformONE team KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Wed, Dec 18, 2019 at 9:02 AM Lastrilla, Jet > wrote: +TJ From: Lastrilla, Jet Sent: Wednesday, December 18, 2019 9:01 AM To: Kevin O'Donnell >; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC >; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; Tim Gast >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Cc: LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP >; Bubb, Mike >; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP >; Leonard, Michael C. > Subject: IATT Way Ahead All: Just to re-affirm our way ahead from yesterday afternoon: In priority order: 1. Complete deployment of worker nodes on UP-Prod; Coordinate removal of ?dev? devices from UP-Prod; Perform complete scan (platform and apps) of UP-Prod. Complete by COB 18 December. 2. Complete build out of UP-ProdB; Deploy applications on UP-ProdB; Scan entire environment. Complete by 19 December 1200 Central time. This work will be performed by RH and TJ with LIMITED engagement with the app teams. App team engagement will go through TJ. Our #1 objective for UP is IATT on 20 December. We have missed all of our deadlines this week. We will provide an update to Nic during our regular Friday call. Let Tim or I know if you have any questions. R/Jet Jet Lastrilla, CISSP MITRE | Systems Security Engineer San Antonio, TX 210-208-4867 jlastrilla at mitre.org Jethro.lastrilla.ctr at us.af.mil (NIPR) Jethro.s.lastrilla.ctr at mail.smil.mil (SIPR) Jethro.lastrilla_ctr at af.ic.gov (JWICS) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlastrilla at mitre.org Wed Dec 18 15:33:05 2019 From: jlastrilla at mitre.org (Lastrilla, Jet) Date: Wed, 18 Dec 2019 15:33:05 +0000 Subject: [Platformone] [EXT] Re: IATT Way Ahead In-Reply-To: References: <1683_1576682454_5DFA43D5_1683_29_1_CA+EGyAA4dNquORSXCZLRQoGBmOPiBC5HMRSOyMXFM-1FEgo1Jw@mail.gmail.com> Message-ID: +Roc My bad.. moving too fast. From: Lastrilla, Jet Sent: Wednesday, December 18, 2019 9:32 AM To: Kevin O'Donnell ; platformONE at redhat.com Cc: TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP ; Tim Gast ; Feiglstok, Colleen M [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Bubb, Mike ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; tj.zimmerman at braingu.com; Blade, Eric D [US] (MS) Subject: RE: [EXT] Re: IATT Way Ahead +Colleen and Eric? Colleen, lets coordinate on getting environment scans for UP Prod and UP-ProdB accordingly. R/Jet 619-508-5888 From: Kevin O'Donnell > Sent: Wednesday, December 18, 2019 9:21 AM To: Lastrilla, Jet >; platformONE at redhat.com Cc: TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC >; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; Tim Gast >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP >; Bubb, Mike >; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP >; Leonard, Michael C. >; tj.zimmerman at braingu.com Subject: [EXT] Re: IATT Way Ahead +PlatformONE team KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Wed, Dec 18, 2019 at 9:02 AM Lastrilla, Jet > wrote: +TJ From: Lastrilla, Jet Sent: Wednesday, December 18, 2019 9:01 AM To: Kevin O'Donnell >; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC >; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; Tim Gast >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Cc: LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP >; Bubb, Mike >; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP >; Leonard, Michael C. > Subject: IATT Way Ahead All: Just to re-affirm our way ahead from yesterday afternoon: In priority order: 1. Complete deployment of worker nodes on UP-Prod; Coordinate removal of ?dev? devices from UP-Prod; Perform complete scan (platform and apps) of UP-Prod. Complete by COB 18 December. 2. Complete build out of UP-ProdB; Deploy applications on UP-ProdB; Scan entire environment. Complete by 19 December 1200 Central time. This work will be performed by RH and TJ with LIMITED engagement with the app teams. App team engagement will go through TJ. Our #1 objective for UP is IATT on 20 December. We have missed all of our deadlines this week. We will provide an update to Nic during our regular Friday call. Let Tim or I know if you have any questions. R/Jet Jet Lastrilla, CISSP MITRE | Systems Security Engineer San Antonio, TX 210-208-4867 jlastrilla at mitre.org Jethro.lastrilla.ctr at us.af.mil (NIPR) Jethro.s.lastrilla.ctr at mail.smil.mil (SIPR) Jethro.lastrilla_ctr at af.ic.gov (JWICS) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jennifer.starling at accenturefederal.com Wed Dec 18 16:05:25 2019 From: jennifer.starling at accenturefederal.com (Starling, Jennifer [MERIDIAN TECHNOLOGIES I, LLC]) Date: Wed, 18 Dec 2019 16:05:25 +0000 Subject: [Platformone] please add team member to cluster Message-ID: Hi, I have opened a ticket to get creds for our new team member. https://dccscr.dsop.io/dsop/dccscr/issues/241 We have a new team member that needs access to the cluster https://cluster.unified-platform.io She will need to be added to the openshift groups c27-levelup-ocp-admins c27-levelup-ocp-devs and need read access to the default project. Email:? ayeesha.subhan at mantech.com Thank you Jen? -------------- next part -------------- An HTML attachment was scrubbed... URL: From roger.dirocco.4 at us.af.mil Wed Dec 18 16:17:29 2019 From: roger.dirocco.4 at us.af.mil (DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP) Date: Wed, 18 Dec 2019 16:17:29 +0000 Subject: [Platformone] [EXT] Re: IATT Way Ahead In-Reply-To: References: <1683_1576682454_5DFA43D5_1683_29_1_CA+EGyAA4dNquORSXCZLRQoGBmOPiBC5HMRSOyMXFM-1FEgo1Jw@mail.gmail.com> Message-ID: All, here are the notes I captured from this morning?s meeting with Nic: ? Is Twistlock in runtime in Prod-B (and what about current Prod)? If not, then it needs to be. (recommend for RH P1 Team) ? AWS Account Hardening ? CloudWatch, CloudTrail, RBAC... (in progress with Adrian?s team) ? DCAR S3 Bucket ? Validate Proxy in place and no direct external access (recommend for Taylor?s DSOP Team) ? Need Encryption on open Ports (recommend for RH P1 Team to look into) ? Need better diagram showing both internal and external ports/protocols right on the diagram (no IPs or become Classified Document) with encryption, and what?s internal/external to AWS account, VPC, inside/outside cluster, what?s public facing and what?s not, application; for IATT focus on what?s outside the cluster?what goes in/out of cluster boundary and identify/define what goes in/out (which team will take lead?) ? Action Item: Taylor send DSOP scans of apps to Nic, focus on the delta (the findings not covered by UBI) ? Need to share with Nic the findings on facility scan from CYBERCOM/Darkwolf: (Cybersecurity Team taking lead?) - 1. Wi-fi: WPA2, needs to be WPA3 - 2. Facility Badging system has known vulnerabilities - 3. Third thing? Mark/Kevin, please add the RH P1 and DSOP teams actions above to our GitLab Backlog. --v/r, Roc From: Lastrilla, Jet Sent: Wednesday, December 18, 2019 9:33 AM To: Kevin O'Donnell ; platformONE at redhat.com; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP Cc: TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP ; Tim Gast ; Feiglstok, Colleen M [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Bubb, Mike ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; tj.zimmerman at braingu.com; Blade, Eric D [US] (MS) Subject: [Non-DoD Source] RE: [EXT] Re: IATT Way Ahead +Roc My bad.. moving too fast. From: Lastrilla, Jet Sent: Wednesday, December 18, 2019 9:32 AM To: Kevin O'Donnell >; platformONE at redhat.com Cc: TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC >; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; Tim Gast >; Feiglstok, Colleen M [US] (MS) >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP >; Bubb, Mike >; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP >; Leonard, Michael C. >; tj.zimmerman at braingu.com ; Blade, Eric D [US] (MS) > Subject: RE: [EXT] Re: IATT Way Ahead +Colleen and Eric? Colleen, lets coordinate on getting environment scans for UP Prod and UP-ProdB accordingly. R/Jet 619-508-5888 From: Kevin O'Donnell > Sent: Wednesday, December 18, 2019 9:21 AM To: Lastrilla, Jet >; platformONE at redhat.com Cc: TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC >; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; Tim Gast >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP >; Bubb, Mike >; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP >; Leonard, Michael C. >; tj.zimmerman at braingu.com Subject: [EXT] Re: IATT Way Ahead +PlatformONE team KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Wed, Dec 18, 2019 at 9:02 AM Lastrilla, Jet > wrote: +TJ From: Lastrilla, Jet Sent: Wednesday, December 18, 2019 9:01 AM To: Kevin O'Donnell >; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC >; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; Tim Gast >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Cc: LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP >; Bubb, Mike >; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP >; Leonard, Michael C. > Subject: IATT Way Ahead All: Just to re-affirm our way ahead from yesterday afternoon: In priority order: 1. Complete deployment of worker nodes on UP-Prod; Coordinate removal of ?dev? devices from UP-Prod; Perform complete scan (platform and apps) of UP-Prod. Complete by COB 18 December. 2. Complete build out of UP-ProdB; Deploy applications on UP-ProdB; Scan entire environment. Complete by 19 December 1200 Central time. This work will be performed by RH and TJ with LIMITED engagement with the app teams. App team engagement will go through TJ. Our #1 objective for UP is IATT on 20 December. We have missed all of our deadlines this week. We will provide an update to Nic during our regular Friday call. Let Tim or I know if you have any questions. R/Jet Jet Lastrilla, CISSP MITRE | Systems Security Engineer San Antonio, TX 210-208-4867 jlastrilla at mitre.org Jethro.lastrilla.ctr at us.af.mil (NIPR) Jethro.s.lastrilla.ctr at mail.smil.mil (SIPR) Jethro.lastrilla_ctr at af.ic.gov (JWICS) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5532 bytes Desc: not available URL: From cmckee at redhat.com Wed Dec 18 17:09:01 2019 From: cmckee at redhat.com (Cory McKee) Date: Wed, 18 Dec 2019 12:09:01 -0500 Subject: [Platformone] Stig Scans Message-ID: Steve, Are we only concerned with the "High" findings from the scans? All, We are scanning all nodes in the UP-PROD vpc. If there are nodes that are not part of IAC code please power them down. -- Cory McKee Senior Consultant Red Hat Public Sector 8260 Greensboro Drive McLean, VA 22102 cmckee at redhat.com M: 5714090193 TRIED. TESTED. TRUSTED. @redhatjobs redhatjobs @redhatjobs -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Wed Dec 18 18:13:10 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Wed, 18 Dec 2019 12:13:10 -0600 Subject: [Platformone] AAM upstream dependencies In-Reply-To: <62CF2BD3-1FC1-4047-A39C-778B952F8923@braingu.com> References: <62CF2BD3-1FC1-4047-A39C-778B952F8923@braingu.com> Message-ID: Thanks Tim and Sam - Adding +platformONE at redhat.com for situational awareness. Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs On Wed, Dec 18, 2019 at 12:06 PM Tim Gast wrote: > Jonny, > > Sam (copied) got the information from Jay on the upstream packages we need > added to satellite. > > The package names are: > *atlas* > *atlas-devel* > > We didn?t see a Red Hat specific build of it, but there are Centos and > Fedora builds. > Can you snarf these into Satellite? > > Thanks, > -Tim > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mholmes at redhat.com Wed Dec 18 19:06:29 2019 From: mholmes at redhat.com (Michael Holmes) Date: Wed, 18 Dec 2019 13:06:29 -0600 Subject: [Platformone] AAM upstream dependencies In-Reply-To: References: <62CF2BD3-1FC1-4047-A39C-778B952F8923@braingu.com> Message-ID: Tim, Which machine are you trying to pull these packages on to? On Wed, Dec 18, 2019 at 12:13 PM Jonathan Rickard wrote: > Thanks Tim and Sam - > > Adding +platformONE at redhat.com for situational > awareness. > > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > > > On Wed, Dec 18, 2019 at 12:06 PM Tim Gast wrote: > >> Jonny, >> >> Sam (copied) got the information from Jay on the upstream packages we >> need added to satellite. >> >> The package names are: >> *atlas* >> *atlas-devel* >> >> We didn?t see a Red Hat specific build of it, but there are Centos and >> Fedora builds. >> Can you snarf these into Satellite? >> >> Thanks, >> -Tim >> >> _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -- Michael Holmes, RHCSA Senior Consultant Red Hat Remote - Texas mholmes at redhat.com M: 808-780-4877 -------------- next part -------------- An HTML attachment was scrubbed... URL: From steven.bogue.1.ctr at us.af.mil Wed Dec 18 19:11:41 2019 From: steven.bogue.1.ctr at us.af.mil (BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP) Date: Wed, 18 Dec 2019 19:11:41 +0000 Subject: [Platformone] [Non-DoD Source] Stig Scans In-Reply-To: References: Message-ID: For the IATT and ATO the Highs are a must to address. The others are still captured and POA&M?d, but the HIGHs need to be remediated. From: Cory McKee Sent: Wednesday, December 18, 2019 11:09 AM To: platformone at redhat.com; BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP ; Feiglstok, Colleen M [US] (MS) ; carlos.nunez at bylight.com Subject: [Non-DoD Source] Stig Scans Steve, Are we only concerned with the "High" findings from the scans? All, We are scanning all nodes in the UP-PROD vpc. If there are nodes that are not part of IAC code please power them down. -- Cory McKee Senior Consultant Red Hat Public Sector 8260 Greensboro Drive McLean, VA 22102 cmckee at redhat.com M: 5714090193 TRIED. TESTED. TRUSTED. @redhatjobs redhatjobs @redhatjobs -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5492 bytes Desc: not available URL: From sam at braingu.com Wed Dec 18 19:45:51 2019 From: sam at braingu.com (Samuel James) Date: Wed, 18 Dec 2019 14:45:51 -0500 Subject: [Platformone] AAM upstream dependencies In-Reply-To: References: <62CF2BD3-1FC1-4047-A39C-778B952F8923@braingu.com> Message-ID: Michael and Jonny, I think this is the repository they need for the atlas packages: http://mirror.centos.org/centos/7/os/x86_64/Packages/ The build job they are running cluster.unifiedplatform.io Thanks! Sam On Wed, Dec 18, 2019 at 2:06 PM Michael Holmes wrote: > Tim, > > Which machine are you trying to pull these packages on to? > > On Wed, Dec 18, 2019 at 12:13 PM Jonathan Rickard > wrote: > >> Thanks Tim and Sam - >> >> Adding +platformONE at redhat.com for situational >> awareness. >> >> >> Jonathan Rickard, RHCE, RHCA >> >> Consulting Architect >> >> Red Hat Public Sector >> >> jonny at redhat.com >> M: 210.862.9739 >> @redhatjobs redhatjobs >> @redhatjobs >> >> >> >> >> On Wed, Dec 18, 2019 at 12:06 PM Tim Gast wrote: >> >>> Jonny, >>> >>> Sam (copied) got the information from Jay on the upstream packages we >>> need added to satellite. >>> >>> The package names are: >>> *atlas* >>> *atlas-devel* >>> >>> We didn?t see a Red Hat specific build of it, but there are Centos and >>> Fedora builds. >>> Can you snarf these into Satellite? >>> >>> Thanks, >>> -Tim >>> >>> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > > > -- > > Michael Holmes, RHCSA > > Senior Consultant > > Red Hat Remote - Texas > > mholmes at redhat.com > M: 808-780-4877 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Colleen.Feiglstok at ngc.com Wed Dec 18 19:47:47 2019 From: Colleen.Feiglstok at ngc.com (Feiglstok, Colleen M [US] (MS)) Date: Wed, 18 Dec 2019 19:47:47 +0000 Subject: [Platformone] up-prod and up-prod-b Message-ID: <7b57ae3e561546e39a41aaa6992438da@XCGVAG22.northgrum.com> Uploaded the new oscap scans to the confluence page. From first glance, up-prod's numbers are lower than up-prod-b but have the same amount of high findings. Let me know when I'm free to start testing and which VPC. I think the plan is up-prod today and up-prod-b tomorrow? Or is it the opposite? Platform1 team - I need the account information, too -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmckee at redhat.com Wed Dec 18 20:02:09 2019 From: cmckee at redhat.com (Cory McKee) Date: Wed, 18 Dec 2019 15:02:09 -0500 Subject: [Platformone] up-prod and up-prod-b In-Reply-To: <7b57ae3e561546e39a41aaa6992438da@XCGVAG22.northgrum.com> References: <7b57ae3e561546e39a41aaa6992438da@XCGVAG22.northgrum.com> Message-ID: I am done doing my scanning so you should be free to do your scans On Wed, Dec 18, 2019 at 2:47 PM Feiglstok, Colleen M [US] (MS) < Colleen.Feiglstok at ngc.com> wrote: > Uploaded the new oscap scans to the confluence page. From first glance, > up-prod?s numbers are lower than up-prod-b but have the same amount of high > findings. > > > > Let me know when I?m free to start testing and which VPC. I think the plan > is up-prod today and up-prod-b tomorrow? Or is it the opposite? > > > > Platform1 team - I need the account information, too > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -- Cory McKee Senior Consultant Red Hat Public Sector 8260 Greensboro Drive McLean, VA 22102 cmckee at redhat.com M: 5714090193 TRIED. TESTED. TRUSTED. @redhatjobs redhatjobs @redhatjobs -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylor at redhat.com Wed Dec 18 20:07:14 2019 From: taylor at redhat.com (Taylor Biggs) Date: Wed, 18 Dec 2019 15:07:14 -0500 Subject: [Platformone] [EXT] Re: AWS Changes In-Reply-To: References: <7333_1576605074_5DF91591_7333_264_1_CAE68LrQCGan1nbJvySmCKL=ryE75anLpb+DfZBCxd9YS27KY_w@mail.gmail.com> Message-ID: Welp, that didn't work out since I had to turn most of those back on. But I did kill the UP-CHT VPC instances with the exception of the Satellite. So that's a bit of space gained back... Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Tue, Dec 17, 2019 at 5:34 PM Taylor Biggs wrote: > Thanks Adrian - sounds like it's about time we implement your RBAC stuff :) > > Kevin, powered off these instances: > up-ss-gitlab-1 > up-ss-gitlab-ci-runner > up-ss-sso-1 > > Working on finding more, and moving out of SS/CHT to free those up > completely. > > Thanks, > Taylor > > ---- > Taylor Biggs > taylor at redhat.com > 850-449-2220 > > > > On Tue, Dec 17, 2019 at 2:36 PM Adrian Nunez > wrote: > >> I have not touched any of Platform1's VPC's, peering connections or any >> other services since we were troubleshooting the slow SSH 2 weeks ago. >> >> I also have not implemented the RBAC user restrictions neither. >> >> V/R >> Adrian >> >> >> Get Outlook for Android >> ------------------------------ >> *From:* Jonathan Rickard >> *Sent:* Tuesday, December 17, 2019 1:47:36 PM >> *To:* Miller, Timothy J. >> *Cc:* Taylor Biggs ; Kevin O'Donnell < >> kodonnel at redhat.com>; platformONE at redhat.com ; >> Adrian Nunez >> *Subject:* Re: [Platformone] [EXT] Re: AWS Changes >> >> >> [EXTERNAL EMAIL] >> Tim, >> >> It looks like Dino created the vpc peering connect - not removed them. >> >> Jonathan Rickard, RHCE, RHCA >> >> Consulting Architect >> >> Red Hat Public Sector >> >> jonny at redhat.com >> M: 210.862.9739 >> @redhatjobs redhatjobs >> @redhatjobs >> >> >> >> >> On Tue, Dec 17, 2019 at 12:40 PM Miller, Timothy J. >> wrote: >> >> To answer the VPC peering question, that looks like Dino. >> >> >> https://console.amazonaws-us-gov.com/cloudtrail/home?region=us-gov-west-1#/events?ResourceType=AWS::EC2::VPCPeeringConnection >> >> -- T >> >> ?On 12/17/19, 11:52, "platformone-bounces at redhat.com on behalf of Taylor >> Biggs" >> wrote: >> >> Seeing stuff here too - just lost the ELB to the CHT Satellite. >> >> >> Thanks, >> Taylor >> >> ---- >> Taylor Biggs >> taylor at redhat.com >> 850-449-2220 >> >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Dec 17, 2019 at 12:26 PM Kevin O'Donnell >> wrote: >> >> >> Who is making changes in AWS? Specifically the peering connection >> from prod to up-prod and a bastion instance in up-prod for OSCAP scanning? >> >> >> Thanks, >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting >> >> kodonnell at redhat.com >> M: 240-605-4654 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> >> >> >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> >> This communication (including any attachments) may contain information >> that is proprietary, confidential or exempt from disclosure. If you are not >> the intended recipient, please note that further dissemination, >> distribution, use or copying of this communication is strictly prohibited. >> Anyone who received this message in error should notify the sender >> immediately by telephone or by return email and delete it from his or her >> computer. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2019-12-18 14-06-39.png Type: image/png Size: 293772 bytes Desc: not available URL: From kodonnel at redhat.com Wed Dec 18 20:10:27 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Wed, 18 Dec 2019 14:10:27 -0600 Subject: [Platformone] [EXT] Re: AWS Changes In-Reply-To: References: <7333_1576605074_5DF91591_7333_264_1_CAE68LrQCGan1nbJvySmCKL=ryE75anLpb+DfZBCxd9YS27KY_w@mail.gmail.com> Message-ID: Thanks Taylor, We should plan to terminate the instances in the new year. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Wed, Dec 18, 2019 at 2:07 PM Taylor Biggs wrote: > Welp, that didn't work out since I had to turn most of those back on. But > I did kill the UP-CHT VPC instances with the exception of the Satellite. > So that's a bit of space gained back... > > Thanks, > Taylor > > ---- > Taylor Biggs > taylor at redhat.com > 850-449-2220 > > > > On Tue, Dec 17, 2019 at 5:34 PM Taylor Biggs wrote: > >> Thanks Adrian - sounds like it's about time we implement your RBAC stuff >> :) >> >> Kevin, powered off these instances: >> up-ss-gitlab-1 >> up-ss-gitlab-ci-runner >> up-ss-sso-1 >> >> Working on finding more, and moving out of SS/CHT to free those up >> completely. >> >> Thanks, >> Taylor >> >> ---- >> Taylor Biggs >> taylor at redhat.com >> 850-449-2220 >> >> >> >> On Tue, Dec 17, 2019 at 2:36 PM Adrian Nunez >> wrote: >> >>> I have not touched any of Platform1's VPC's, peering connections or any >>> other services since we were troubleshooting the slow SSH 2 weeks ago. >>> >>> I also have not implemented the RBAC user restrictions neither. >>> >>> V/R >>> Adrian >>> >>> >>> Get Outlook for Android >>> ------------------------------ >>> *From:* Jonathan Rickard >>> *Sent:* Tuesday, December 17, 2019 1:47:36 PM >>> *To:* Miller, Timothy J. >>> *Cc:* Taylor Biggs ; Kevin O'Donnell < >>> kodonnel at redhat.com>; platformONE at redhat.com ; >>> Adrian Nunez >>> *Subject:* Re: [Platformone] [EXT] Re: AWS Changes >>> >>> >>> [EXTERNAL EMAIL] >>> Tim, >>> >>> It looks like Dino created the vpc peering connect - not removed them. >>> >>> Jonathan Rickard, RHCE, RHCA >>> >>> Consulting Architect >>> >>> Red Hat Public Sector >>> >>> jonny at redhat.com >>> M: 210.862.9739 >>> @redhatjobs redhatjobs >>> @redhatjobs >>> >>> >>> >>> >>> On Tue, Dec 17, 2019 at 12:40 PM Miller, Timothy J. >>> wrote: >>> >>> To answer the VPC peering question, that looks like Dino. >>> >>> >>> https://console.amazonaws-us-gov.com/cloudtrail/home?region=us-gov-west-1#/events?ResourceType=AWS::EC2::VPCPeeringConnection >>> >>> -- T >>> >>> ?On 12/17/19, 11:52, "platformone-bounces at redhat.com on behalf of >>> Taylor Biggs" >> taylor at redhat.com> wrote: >>> >>> Seeing stuff here too - just lost the ELB to the CHT Satellite. >>> >>> >>> Thanks, >>> Taylor >>> >>> ---- >>> Taylor Biggs >>> taylor at redhat.com >>> 850-449-2220 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Dec 17, 2019 at 12:26 PM Kevin O'Donnell < >>> kodonnel at redhat.com> wrote: >>> >>> >>> Who is making changes in AWS? Specifically the peering connection >>> from prod to up-prod and a bastion instance in up-prod for OSCAP scanning? >>> >>> >>> Thanks, >>> >>> KEVIN O'DONNELL >>> ARCHITECT MANAGER >>> Red Hat Red Hat NA Public Sector Consulting >> > >>> >>> kodonnell at redhat.com >>> M: 240-605-4654 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >>> This communication (including any attachments) may contain information >>> that is proprietary, confidential or exempt from disclosure. If you are not >>> the intended recipient, please note that further dissemination, >>> distribution, use or copying of this communication is strictly prohibited. >>> Anyone who received this message in error should notify the sender >>> immediately by telephone or by return email and delete it from his or her >>> computer. >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlastrilla at mitre.org Wed Dec 18 20:12:55 2019 From: jlastrilla at mitre.org (Lastrilla, Jet) Date: Wed, 18 Dec 2019 20:12:55 +0000 Subject: [Platformone] [EXT] Re: IATT Way Ahead In-Reply-To: References: <1683_1576682454_5DFA43D5_1683_29_1_CA+EGyAA4dNquORSXCZLRQoGBmOPiBC5HMRSOyMXFM-1FEgo1Jw@mail.gmail.com> Message-ID: All: Please inform this thread once the UP-Prod VPC is ready for Colleen to scan. Please remember that peering connections will expand the scope of Colleen?s scans into that environment. R/Jet From: Lastrilla, Jet Sent: Wednesday, December 18, 2019 9:33 AM To: Kevin O'Donnell ; platformONE at redhat.com; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP Cc: TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP ; Tim Gast ; Feiglstok, Colleen M [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Bubb, Mike ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; tj.zimmerman at braingu.com; Blade, Eric D [US] (MS) Subject: RE: [EXT] Re: IATT Way Ahead +Roc My bad.. moving too fast. From: Lastrilla, Jet Sent: Wednesday, December 18, 2019 9:32 AM To: Kevin O'Donnell >; platformONE at redhat.com Cc: TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC >; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; Tim Gast >; Feiglstok, Colleen M [US] (MS) >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP >; Bubb, Mike >; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP >; Leonard, Michael C. >; tj.zimmerman at braingu.com; Blade, Eric D [US] (MS) > Subject: RE: [EXT] Re: IATT Way Ahead +Colleen and Eric? Colleen, lets coordinate on getting environment scans for UP Prod and UP-ProdB accordingly. R/Jet 619-508-5888 From: Kevin O'Donnell > Sent: Wednesday, December 18, 2019 9:21 AM To: Lastrilla, Jet >; platformONE at redhat.com Cc: TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC >; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; Tim Gast >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP >; Bubb, Mike >; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP >; Leonard, Michael C. >; tj.zimmerman at braingu.com Subject: [EXT] Re: IATT Way Ahead +PlatformONE team KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Wed, Dec 18, 2019 at 9:02 AM Lastrilla, Jet > wrote: +TJ From: Lastrilla, Jet Sent: Wednesday, December 18, 2019 9:01 AM To: Kevin O'Donnell >; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC >; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; Tim Gast >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP > Cc: LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP >; Bubb, Mike >; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP >; Leonard, Michael C. > Subject: IATT Way Ahead All: Just to re-affirm our way ahead from yesterday afternoon: In priority order: 1. Complete deployment of worker nodes on UP-Prod; Coordinate removal of ?dev? devices from UP-Prod; Perform complete scan (platform and apps) of UP-Prod. Complete by COB 18 December. 2. Complete build out of UP-ProdB; Deploy applications on UP-ProdB; Scan entire environment. Complete by 19 December 1200 Central time. This work will be performed by RH and TJ with LIMITED engagement with the app teams. App team engagement will go through TJ. Our #1 objective for UP is IATT on 20 December. We have missed all of our deadlines this week. We will provide an update to Nic during our regular Friday call. Let Tim or I know if you have any questions. R/Jet Jet Lastrilla, CISSP MITRE | Systems Security Engineer San Antonio, TX 210-208-4867 jlastrilla at mitre.org Jethro.lastrilla.ctr at us.af.mil (NIPR) Jethro.s.lastrilla.ctr at mail.smil.mil (SIPR) Jethro.lastrilla_ctr at af.ic.gov (JWICS) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlastrilla at mitre.org Wed Dec 18 20:24:16 2019 From: jlastrilla at mitre.org (Lastrilla, Jet) Date: Wed, 18 Dec 2019 20:24:16 +0000 Subject: [Platformone] [EXT] Re: up-prod and up-prod-b In-Reply-To: <23652_1576699345_5DFA85D1_23652_313_1_CAKQ_jXVtDU+AUBwqNE1c6yzje07GG9xPq=G6EVMW2Wiwdphz4A@mail.gmail.com> References: <7b57ae3e561546e39a41aaa6992438da@XCGVAG22.northgrum.com> <23652_1576699345_5DFA85D1_23652_313_1_CAKQ_jXVtDU+AUBwqNE1c6yzje07GG9xPq=G6EVMW2Wiwdphz4A@mail.gmail.com> Message-ID: Are all the apps landed in UP Prod? From: Cory McKee Sent: Wednesday, December 18, 2019 2:02 PM To: Feiglstok, Colleen M [US] (MS) Cc: Lastrilla, Jet ; BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP ; CRISP, JOSHUA M GS-09 USAF AFMC AFLCMC/HNCP ; platformONE at redhat.com; Bubb, Mike Subject: [EXT] Re: [Platformone] up-prod and up-prod-b I am done doing my scanning so you should be free to do your scans On Wed, Dec 18, 2019 at 2:47 PM Feiglstok, Colleen M [US] (MS) > wrote: Uploaded the new oscap scans to the confluence page. From first glance, up-prod?s numbers are lower than up-prod-b but have the same amount of high findings. Let me know when I?m free to start testing and which VPC. I think the plan is up-prod today and up-prod-b tomorrow? Or is it the opposite? Platform1 team - I need the account information, too _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- Cory McKee Senior Consultant Red Hat Public Sector 8260 Greensboro Drive McLean, VA 22102 cmckee at redhat.com M: 5714090193 [https://www.redhat.com/files/brand/email/sig-redhat.png] TRIED. TESTED. TRUSTED. @redhatjobs redhatjobs @redhatjobs -------------- next part -------------- An HTML attachment was scrubbed... URL: From mholmes at redhat.com Wed Dec 18 20:40:23 2019 From: mholmes at redhat.com (Michael Holmes) Date: Wed, 18 Dec 2019 14:40:23 -0600 Subject: [Platformone] AAM upstream dependencies In-Reply-To: References: <62CF2BD3-1FC1-4047-A39C-778B952F8923@braingu.com> Message-ID: I've gone ahead and added rhel-7 options to the worker nodes, which has the devel package. Please run the build job and let me know if it works. Thank you! On Wed, Dec 18, 2019 at 1:46 PM Samuel James wrote: > Michael and Jonny, > > I think this is the repository they need for the atlas packages: > http://mirror.centos.org/centos/7/os/x86_64/Packages/ > > The build job they are running cluster.unifiedplatform.io > > Thanks! > Sam > > On Wed, Dec 18, 2019 at 2:06 PM Michael Holmes wrote: > >> Tim, >> >> Which machine are you trying to pull these packages on to? >> >> On Wed, Dec 18, 2019 at 12:13 PM Jonathan Rickard >> wrote: >> >>> Thanks Tim and Sam - >>> >>> Adding +platformONE at redhat.com for situational >>> awareness. >>> >>> >>> Jonathan Rickard, RHCE, RHCA >>> >>> Consulting Architect >>> >>> Red Hat Public Sector >>> >>> jonny at redhat.com >>> M: 210.862.9739 >>> @redhatjobs redhatjobs >>> @redhatjobs >>> >>> >>> >>> >>> On Wed, Dec 18, 2019 at 12:06 PM Tim Gast wrote: >>> >>>> Jonny, >>>> >>>> Sam (copied) got the information from Jay on the upstream packages we >>>> need added to satellite. >>>> >>>> The package names are: >>>> *atlas* >>>> *atlas-devel* >>>> >>>> We didn?t see a Red Hat specific build of it, but there are Centos and >>>> Fedora builds. >>>> Can you snarf these into Satellite? >>>> >>>> Thanks, >>>> -Tim >>>> >>>> _______________________________________________ >>> platformONE mailing list >>> platformONE at redhat.com >>> https://www.redhat.com/mailman/listinfo/platformone >>> >> >> >> -- >> >> Michael Holmes, RHCSA >> >> Senior Consultant >> >> Red Hat Remote - Texas >> >> mholmes at redhat.com >> M: 808-780-4877 >> >> > -- Michael Holmes, RHCSA Senior Consultant Red Hat Remote - Texas mholmes at redhat.com M: 808-780-4877 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmiller at mitre.org Wed Dec 18 20:41:24 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Wed, 18 Dec 2019 20:41:24 +0000 Subject: [Platformone] [EXT] Re: IATT Way Ahead Message-ID: > ? Is Twistlock in runtime in Prod-B (and what about current Prod)? If not, then it > needs to be. (recommend for RH P1 Team) Twistlock is deployed in up-prod w/ runtime defense enabled. There's no custom content and it's still in learning mode, but running containers are being scanned and runtime events are being generated. The compliance report is so-so but the vulnerability reports are fugly. I'm waiting on access to up-prod-b to verify, but I expect it's the same. > ? DCAR S3 Bucket ? Validate Proxy in place and no direct external access (recommend for Taylor?s DSOP Team) Cybersec needs to be part of this. The DSOP S3 bucket may be ACL'd but it is reachable by anything in the peered VPCs--production-vpc, staging-up-vpc, dev-up-vpc, and up-prod-vpc. > ? Need Encryption on open Ports (recommend for RH P1 Team to look into) There's nothing answering on 80 AFAICT, but having 80 open is useful for TLS redirect. If I can get cert-manager off the ground (still working w/ AF PKI SPO on this), 80 is required for the ACME HTTP01 challenge. > ? Need better diagram showing both internal and external ports/protocols right on the > diagram (no IPs or become Classified Document) with encryption, and what?s > internal/external to AWS account, VPC, inside/outside cluster, what?s public facing > and what?s not, application; for IATT focus on what?s outside the cluster?what goes > in/out of cluster boundary and identify/define what goes in/out (which team will take > lead?) I might be able to do much of this w/ cloudmapper, but the result is (a) a freakin' eyechart (I need to work on the filtering feature), and (b) intended to be interactive. I might be able to generate a standalone version I can just host from S3. However, there's no way to do this without IP addresses. AWS internal addresses are encoded into the internal DNS name, which has to be reported or nothing makes sense. > ? Action Item: Taylor send DSOP scans of apps to Nic, focus on the delta (the findings not covered by UBI) Twistlock can report CVEs by layer, but IDK about compliance. That might be a useful source. -- T From sam at braingu.com Wed Dec 18 20:45:20 2019 From: sam at braingu.com (Samuel James) Date: Wed, 18 Dec 2019 15:45:20 -0500 Subject: [Platformone] AAM upstream dependencies In-Reply-To: References: <62CF2BD3-1FC1-4047-A39C-778B952F8923@braingu.com> Message-ID: Thanks! Jonny, Did that CA get added? Sam On Wed, Dec 18, 2019 at 3:40 PM Michael Holmes wrote: > I've gone ahead and added rhel-7 options to the worker nodes, which has > the devel package. Please run the build job and let me know if it works. > Thank you! > > On Wed, Dec 18, 2019 at 1:46 PM Samuel James wrote: > >> Michael and Jonny, >> >> I think this is the repository they need for the atlas packages: >> http://mirror.centos.org/centos/7/os/x86_64/Packages/ >> >> The build job they are running cluster.unifiedplatform.io >> >> Thanks! >> Sam >> >> On Wed, Dec 18, 2019 at 2:06 PM Michael Holmes >> wrote: >> >>> Tim, >>> >>> Which machine are you trying to pull these packages on to? >>> >>> On Wed, Dec 18, 2019 at 12:13 PM Jonathan Rickard >>> wrote: >>> >>>> Thanks Tim and Sam - >>>> >>>> Adding +platformONE at redhat.com for >>>> situational awareness. >>>> >>>> >>>> Jonathan Rickard, RHCE, RHCA >>>> >>>> Consulting Architect >>>> >>>> Red Hat Public Sector >>>> >>>> jonny at redhat.com >>>> M: 210.862.9739 >>>> @redhatjobs redhatjobs >>>> @redhatjobs >>>> >>>> >>>> >>>> >>>> On Wed, Dec 18, 2019 at 12:06 PM Tim Gast wrote: >>>> >>>>> Jonny, >>>>> >>>>> Sam (copied) got the information from Jay on the upstream packages we >>>>> need added to satellite. >>>>> >>>>> The package names are: >>>>> *atlas* >>>>> *atlas-devel* >>>>> >>>>> We didn?t see a Red Hat specific build of it, but there are Centos and >>>>> Fedora builds. >>>>> Can you snarf these into Satellite? >>>>> >>>>> Thanks, >>>>> -Tim >>>>> >>>>> _______________________________________________ >>>> platformONE mailing list >>>> platformONE at redhat.com >>>> https://www.redhat.com/mailman/listinfo/platformone >>>> >>> >>> >>> -- >>> >>> Michael Holmes, RHCSA >>> >>> Senior Consultant >>> >>> Red Hat Remote - Texas >>> >>> mholmes at redhat.com >>> M: 808-780-4877 >>> >>> >> > > -- > > Michael Holmes, RHCSA > > Senior Consultant > > Red Hat Remote - Texas > > mholmes at redhat.com > M: 808-780-4877 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlastrilla at mitre.org Wed Dec 18 20:45:31 2019 From: jlastrilla at mitre.org (Lastrilla, Jet) Date: Wed, 18 Dec 2019 20:45:31 +0000 Subject: [Platformone] [EXT] Re: IATT Way Ahead In-Reply-To: References: Message-ID: Tim, Good get, but lets focus this thread on UP Prod until we decide to pivot. -----Original Message----- From: Miller, Timothy J. Sent: Wednesday, December 18, 2019 2:41 PM To: DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Lastrilla, Jet ; Kevin O'Donnell ; platformONE at redhat.com Cc: Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; Feiglstok, Colleen M [US] (MS) ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP Subject: Re: [Platformone] [EXT] Re: IATT Way Ahead > ? Is Twistlock in runtime in Prod-B (and what about current Prod)? If > not, then it needs to be. (recommend for RH P1 Team) Twistlock is deployed in up-prod w/ runtime defense enabled. There's no custom content and it's still in learning mode, but running containers are being scanned and runtime events are being generated. The compliance report is so-so but the vulnerability reports are fugly. I'm waiting on access to up-prod-b to verify, but I expect it's the same. > ? DCAR S3 Bucket ? Validate Proxy in place and no direct external > access (recommend for Taylor?s DSOP Team) Cybersec needs to be part of this. The DSOP S3 bucket may be ACL'd but it is reachable by anything in the peered VPCs--production-vpc, staging-up-vpc, dev-up-vpc, and up-prod-vpc. > ? Need Encryption on open Ports (recommend for RH P1 Team to look > into) There's nothing answering on 80 AFAICT, but having 80 open is useful for TLS redirect. If I can get cert-manager off the ground (still working w/ AF PKI SPO on this), 80 is required for the ACME HTTP01 challenge. > ? Need better diagram showing both internal and external > ports/protocols right on the diagram (no IPs or become Classified > Document) with encryption, and what?s internal/external to AWS > account, VPC, inside/outside cluster, what?s public facing and what?s > not, application; for IATT focus on what?s outside the cluster?what > goes in/out of cluster boundary and identify/define what goes in/out > (which team will take > lead?) I might be able to do much of this w/ cloudmapper, but the result is (a) a freakin' eyechart (I need to work on the filtering feature), and (b) intended to be interactive. I might be able to generate a standalone version I can just host from S3. However, there's no way to do this without IP addresses. AWS internal addresses are encoded into the internal DNS name, which has to be reported or nothing makes sense. > ? Action Item: Taylor send DSOP scans of apps to Nic, focus on the > delta (the findings not covered by UBI) Twistlock can report CVEs by layer, but IDK about compliance. That might be a useful source. -- T From tmiller at mitre.org Wed Dec 18 20:50:47 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Wed, 18 Dec 2019 20:50:47 +0000 Subject: [Platformone] [EXT] Re: up-prod and up-prod-b In-Reply-To: References: <7b57ae3e561546e39a41aaa6992438da@XCGVAG22.northgrum.com> <23652_1576699345_5DFA85D1_23652_313_1_CAKQ_jXVtDU+AUBwqNE1c6yzje07GG9xPq=G6EVMW2Wiwdphz4A@mail.gmail.com> Message-ID: <85D0AE8A-60A5-4521-BE8D-C8259D34BF99@mitre.org> AFAICT, AAM is missing. Only the aam-ci-cd namespace has any pods running--jenkins and sonarqube. All other namespaces are empty or have only imagestreams. -- T ?On 12/18/19, 14:25, "platformone-bounces at redhat.com on behalf of Lastrilla, Jet" wrote: Are all the apps landed in UP Prod? From: Cory McKee Sent: Wednesday, December 18, 2019 2:02 PM To: Feiglstok, Colleen M [US] (MS) Cc: Lastrilla, Jet ; BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP ; CRISP, JOSHUA M GS-09 USAF AFMC AFLCMC/HNCP ; platformONE at redhat.com; Bubb, Mike Subject: [EXT] Re: [Platformone] up-prod and up-prod-b I am done doing my scanning so you should be free to do your scans On Wed, Dec 18, 2019 at 2:47 PM Feiglstok, Colleen M [US] (MS) wrote: Uploaded the new oscap scans to the confluence page. From first glance, up-prod?s numbers are lower than up-prod-b but have the same amount of high findings. Let me know when I?m free to start testing and which VPC. I think the plan is up-prod today and up-prod-b tomorrow? Or is it the opposite? Platform1 team - I need the account information, too _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- Cory McKee Senior Consultant Red Hat Public Sector 8260 Greensboro Drive McLean, VA 22102 cmckee at redhat.com M: 5714090193 TRIED. TESTED. TRUSTED. @redhatjobs redhatjobs @redhatjobs From jrickard at redhat.com Wed Dec 18 21:10:18 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Wed, 18 Dec 2019 15:10:18 -0600 Subject: [Platformone] AAM upstream dependencies In-Reply-To: References: <62CF2BD3-1FC1-4047-A39C-778B952F8923@braingu.com> Message-ID: Sam - Sorry for the delay! Please try now. Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs On Wed, Dec 18, 2019 at 2:45 PM Samuel James wrote: > Thanks! > > Jonny, > Did that CA get added? > > Sam > > On Wed, Dec 18, 2019 at 3:40 PM Michael Holmes wrote: > >> I've gone ahead and added rhel-7 options to the worker nodes, which has >> the devel package. Please run the build job and let me know if it works. >> Thank you! >> >> On Wed, Dec 18, 2019 at 1:46 PM Samuel James wrote: >> >>> Michael and Jonny, >>> >>> I think this is the repository they need for the atlas packages: >>> http://mirror.centos.org/centos/7/os/x86_64/Packages/ >>> >>> The build job they are running cluster.unifiedplatform.io >>> >>> Thanks! >>> Sam >>> >>> On Wed, Dec 18, 2019 at 2:06 PM Michael Holmes >>> wrote: >>> >>>> Tim, >>>> >>>> Which machine are you trying to pull these packages on to? >>>> >>>> On Wed, Dec 18, 2019 at 12:13 PM Jonathan Rickard >>>> wrote: >>>> >>>>> Thanks Tim and Sam - >>>>> >>>>> Adding +platformONE at redhat.com for >>>>> situational awareness. >>>>> >>>>> >>>>> Jonathan Rickard, RHCE, RHCA >>>>> >>>>> Consulting Architect >>>>> >>>>> Red Hat Public Sector >>>>> >>>>> jonny at redhat.com >>>>> M: 210.862.9739 >>>>> @redhatjobs redhatjobs >>>>> @redhatjobs >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Dec 18, 2019 at 12:06 PM Tim Gast wrote: >>>>> >>>>>> Jonny, >>>>>> >>>>>> Sam (copied) got the information from Jay on the upstream packages we >>>>>> need added to satellite. >>>>>> >>>>>> The package names are: >>>>>> *atlas* >>>>>> *atlas-devel* >>>>>> >>>>>> We didn?t see a Red Hat specific build of it, but there are Centos >>>>>> and Fedora builds. >>>>>> Can you snarf these into Satellite? >>>>>> >>>>>> Thanks, >>>>>> -Tim >>>>>> >>>>>> _______________________________________________ >>>>> platformONE mailing list >>>>> platformONE at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/platformone >>>>> >>>> >>>> >>>> -- >>>> >>>> Michael Holmes, RHCSA >>>> >>>> Senior Consultant >>>> >>>> Red Hat Remote - Texas >>>> >>>> mholmes at redhat.com >>>> M: 808-780-4877 >>>> >>>> >>> >> >> -- >> >> Michael Holmes, RHCSA >> >> Senior Consultant >> >> Red Hat Remote - Texas >> >> mholmes at redhat.com >> M: 808-780-4877 >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam at braingu.com Wed Dec 18 21:12:26 2019 From: sam at braingu.com (Samuel James) Date: Wed, 18 Dec 2019 16:12:26 -0500 Subject: [Platformone] AAM upstream dependencies In-Reply-To: References: <62CF2BD3-1FC1-4047-A39C-778B952F8923@braingu.com> Message-ID: Thanks! It seems there's issues with DI2E Bitbucket currently, so Jay won't be able to deploy again until that is resolved. Sam On Wed, Dec 18, 2019 at 4:10 PM Jonathan Rickard wrote: > Sam - Sorry for the delay! > > Please try now. > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > > > On Wed, Dec 18, 2019 at 2:45 PM Samuel James wrote: > >> Thanks! >> >> Jonny, >> Did that CA get added? >> >> Sam >> >> On Wed, Dec 18, 2019 at 3:40 PM Michael Holmes >> wrote: >> >>> I've gone ahead and added rhel-7 options to the worker nodes, which has >>> the devel package. Please run the build job and let me know if it works. >>> Thank you! >>> >>> On Wed, Dec 18, 2019 at 1:46 PM Samuel James wrote: >>> >>>> Michael and Jonny, >>>> >>>> I think this is the repository they need for the atlas packages: >>>> http://mirror.centos.org/centos/7/os/x86_64/Packages/ >>>> >>>> The build job they are running cluster.unifiedplatform.io >>>> >>>> Thanks! >>>> Sam >>>> >>>> On Wed, Dec 18, 2019 at 2:06 PM Michael Holmes >>>> wrote: >>>> >>>>> Tim, >>>>> >>>>> Which machine are you trying to pull these packages on to? >>>>> >>>>> On Wed, Dec 18, 2019 at 12:13 PM Jonathan Rickard >>>>> wrote: >>>>> >>>>>> Thanks Tim and Sam - >>>>>> >>>>>> Adding +platformONE at redhat.com for >>>>>> situational awareness. >>>>>> >>>>>> >>>>>> Jonathan Rickard, RHCE, RHCA >>>>>> >>>>>> Consulting Architect >>>>>> >>>>>> Red Hat Public Sector >>>>>> >>>>>> jonny at redhat.com >>>>>> M: 210.862.9739 >>>>>> @redhatjobs redhatjobs >>>>>> @redhatjobs >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Dec 18, 2019 at 12:06 PM Tim Gast wrote: >>>>>> >>>>>>> Jonny, >>>>>>> >>>>>>> Sam (copied) got the information from Jay on the upstream packages >>>>>>> we need added to satellite. >>>>>>> >>>>>>> The package names are: >>>>>>> *atlas* >>>>>>> *atlas-devel* >>>>>>> >>>>>>> We didn?t see a Red Hat specific build of it, but there are Centos >>>>>>> and Fedora builds. >>>>>>> Can you snarf these into Satellite? >>>>>>> >>>>>>> Thanks, >>>>>>> -Tim >>>>>>> >>>>>>> _______________________________________________ >>>>>> platformONE mailing list >>>>>> platformONE at redhat.com >>>>>> https://www.redhat.com/mailman/listinfo/platformone >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Michael Holmes, RHCSA >>>>> >>>>> Senior Consultant >>>>> >>>>> Red Hat Remote - Texas >>>>> >>>>> mholmes at redhat.com >>>>> M: 808-780-4877 >>>>> >>>>> >>>> >>> >>> -- >>> >>> Michael Holmes, RHCSA >>> >>> Senior Consultant >>> >>> Red Hat Remote - Texas >>> >>> mholmes at redhat.com >>> M: 808-780-4877 >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Wed Dec 18 21:13:11 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Wed, 18 Dec 2019 15:13:11 -0600 Subject: [Platformone] AAM upstream dependencies In-Reply-To: References: <62CF2BD3-1FC1-4047-A39C-778B952F8923@braingu.com> Message-ID: Any reason they can't use GITLAB? Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs On Wed, Dec 18, 2019 at 3:12 PM Samuel James wrote: > Thanks! > > It seems there's issues with DI2E Bitbucket currently, so Jay won't be > able to deploy again until that is resolved. > > Sam > > On Wed, Dec 18, 2019 at 4:10 PM Jonathan Rickard > wrote: > >> Sam - Sorry for the delay! >> >> Please try now. >> >> Jonathan Rickard, RHCE, RHCA >> >> Consulting Architect >> >> Red Hat Public Sector >> >> jonny at redhat.com >> M: 210.862.9739 >> @redhatjobs redhatjobs >> @redhatjobs >> >> >> >> >> On Wed, Dec 18, 2019 at 2:45 PM Samuel James wrote: >> >>> Thanks! >>> >>> Jonny, >>> Did that CA get added? >>> >>> Sam >>> >>> On Wed, Dec 18, 2019 at 3:40 PM Michael Holmes >>> wrote: >>> >>>> I've gone ahead and added rhel-7 options to the worker nodes, which has >>>> the devel package. Please run the build job and let me know if it works. >>>> Thank you! >>>> >>>> On Wed, Dec 18, 2019 at 1:46 PM Samuel James wrote: >>>> >>>>> Michael and Jonny, >>>>> >>>>> I think this is the repository they need for the atlas packages: >>>>> http://mirror.centos.org/centos/7/os/x86_64/Packages/ >>>>> >>>>> The build job they are running cluster.unifiedplatform.io >>>>> >>>>> Thanks! >>>>> Sam >>>>> >>>>> On Wed, Dec 18, 2019 at 2:06 PM Michael Holmes >>>>> wrote: >>>>> >>>>>> Tim, >>>>>> >>>>>> Which machine are you trying to pull these packages on to? >>>>>> >>>>>> On Wed, Dec 18, 2019 at 12:13 PM Jonathan Rickard < >>>>>> jrickard at redhat.com> wrote: >>>>>> >>>>>>> Thanks Tim and Sam - >>>>>>> >>>>>>> Adding +platformONE at redhat.com for >>>>>>> situational awareness. >>>>>>> >>>>>>> >>>>>>> Jonathan Rickard, RHCE, RHCA >>>>>>> >>>>>>> Consulting Architect >>>>>>> >>>>>>> Red Hat Public Sector >>>>>>> >>>>>>> jonny at redhat.com >>>>>>> M: 210.862.9739 >>>>>>> @redhatjobs redhatjobs >>>>>>> @redhatjobs >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Dec 18, 2019 at 12:06 PM Tim Gast wrote: >>>>>>> >>>>>>>> Jonny, >>>>>>>> >>>>>>>> Sam (copied) got the information from Jay on the upstream packages >>>>>>>> we need added to satellite. >>>>>>>> >>>>>>>> The package names are: >>>>>>>> *atlas* >>>>>>>> *atlas-devel* >>>>>>>> >>>>>>>> We didn?t see a Red Hat specific build of it, but there are Centos >>>>>>>> and Fedora builds. >>>>>>>> Can you snarf these into Satellite? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> -Tim >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>> platformONE mailing list >>>>>>> platformONE at redhat.com >>>>>>> https://www.redhat.com/mailman/listinfo/platformone >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Michael Holmes, RHCSA >>>>>> >>>>>> Senior Consultant >>>>>> >>>>>> Red Hat Remote - Texas >>>>>> >>>>>> mholmes at redhat.com >>>>>> M: 808-780-4877 >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> >>>> Michael Holmes, RHCSA >>>> >>>> Senior Consultant >>>> >>>> Red Hat Remote - Texas >>>> >>>> mholmes at redhat.com >>>> M: 808-780-4877 >>>> >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam at braingu.com Wed Dec 18 21:15:37 2019 From: sam at braingu.com (Samuel James) Date: Wed, 18 Dec 2019 16:15:37 -0500 Subject: [Platformone] AAM upstream dependencies In-Reply-To: References: <62CF2BD3-1FC1-4047-A39C-778B952F8923@braingu.com> Message-ID: As far as I'm aware we are planning on moving them to a Gitlab instance, but currently their project is in DI2E. Sam On Wed, Dec 18, 2019 at 4:13 PM Jonathan Rickard wrote: > Any reason they can't use GITLAB? > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > > > On Wed, Dec 18, 2019 at 3:12 PM Samuel James wrote: > >> Thanks! >> >> It seems there's issues with DI2E Bitbucket currently, so Jay won't be >> able to deploy again until that is resolved. >> >> Sam >> >> On Wed, Dec 18, 2019 at 4:10 PM Jonathan Rickard >> wrote: >> >>> Sam - Sorry for the delay! >>> >>> Please try now. >>> >>> Jonathan Rickard, RHCE, RHCA >>> >>> Consulting Architect >>> >>> Red Hat Public Sector >>> >>> jonny at redhat.com >>> M: 210.862.9739 >>> @redhatjobs redhatjobs >>> @redhatjobs >>> >>> >>> >>> >>> On Wed, Dec 18, 2019 at 2:45 PM Samuel James wrote: >>> >>>> Thanks! >>>> >>>> Jonny, >>>> Did that CA get added? >>>> >>>> Sam >>>> >>>> On Wed, Dec 18, 2019 at 3:40 PM Michael Holmes >>>> wrote: >>>> >>>>> I've gone ahead and added rhel-7 options to the worker nodes, which >>>>> has the devel package. Please run the build job and let me know if it >>>>> works. Thank you! >>>>> >>>>> On Wed, Dec 18, 2019 at 1:46 PM Samuel James wrote: >>>>> >>>>>> Michael and Jonny, >>>>>> >>>>>> I think this is the repository they need for the atlas packages: >>>>>> http://mirror.centos.org/centos/7/os/x86_64/Packages/ >>>>>> >>>>>> The build job they are running cluster.unifiedplatform.io >>>>>> >>>>>> Thanks! >>>>>> Sam >>>>>> >>>>>> On Wed, Dec 18, 2019 at 2:06 PM Michael Holmes >>>>>> wrote: >>>>>> >>>>>>> Tim, >>>>>>> >>>>>>> Which machine are you trying to pull these packages on to? >>>>>>> >>>>>>> On Wed, Dec 18, 2019 at 12:13 PM Jonathan Rickard < >>>>>>> jrickard at redhat.com> wrote: >>>>>>> >>>>>>>> Thanks Tim and Sam - >>>>>>>> >>>>>>>> Adding +platformONE at redhat.com for >>>>>>>> situational awareness. >>>>>>>> >>>>>>>> >>>>>>>> Jonathan Rickard, RHCE, RHCA >>>>>>>> >>>>>>>> Consulting Architect >>>>>>>> >>>>>>>> Red Hat Public Sector >>>>>>>> >>>>>>>> jonny at redhat.com >>>>>>>> M: 210.862.9739 >>>>>>>> @redhatjobs redhatjobs >>>>>>>> @redhatjobs >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Dec 18, 2019 at 12:06 PM Tim Gast wrote: >>>>>>>> >>>>>>>>> Jonny, >>>>>>>>> >>>>>>>>> Sam (copied) got the information from Jay on the upstream packages >>>>>>>>> we need added to satellite. >>>>>>>>> >>>>>>>>> The package names are: >>>>>>>>> *atlas* >>>>>>>>> *atlas-devel* >>>>>>>>> >>>>>>>>> We didn?t see a Red Hat specific build of it, but there are Centos >>>>>>>>> and Fedora builds. >>>>>>>>> Can you snarf these into Satellite? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> -Tim >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>> platformONE mailing list >>>>>>>> platformONE at redhat.com >>>>>>>> https://www.redhat.com/mailman/listinfo/platformone >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Michael Holmes, RHCSA >>>>>>> >>>>>>> Senior Consultant >>>>>>> >>>>>>> Red Hat Remote - Texas >>>>>>> >>>>>>> mholmes at redhat.com >>>>>>> M: 808-780-4877 >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> >>>>> Michael Holmes, RHCSA >>>>> >>>>> Senior Consultant >>>>> >>>>> Red Hat Remote - Texas >>>>> >>>>> mholmes at redhat.com >>>>> M: 808-780-4877 >>>>> >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlastrilla at mitre.org Wed Dec 18 21:32:05 2019 From: jlastrilla at mitre.org (Lastrilla, Jet) Date: Wed, 18 Dec 2019 21:32:05 +0000 Subject: [Platformone] [EXT] Re: IATT Way Ahead In-Reply-To: References: Message-ID: All: Great job to the collective team on getting this done together! Here are the actions, in order, that need to be completed: 1. Complete AAM build in UP Prod. Blocker being worked by RH and Gu 2. Colleen scans UP Prod VPC in conjunction with Taylor scanning the VPC 3. Identify delta between testing provided earlier this week and new environment scans 4. Update external interface diagram per Nic's request (no dependencies on others on this list) 5. Send updated IATT package to Nic/Lauren. Let me know if you have any questions. R/Jet 619-508-5888 -----Original Message----- From: Miller, Timothy J. Sent: Wednesday, December 18, 2019 2:41 PM To: DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Lastrilla, Jet ; Kevin O'Donnell ; platformONE at redhat.com Cc: Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; Feiglstok, Colleen M [US] (MS) ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP Subject: Re: [Platformone] [EXT] Re: IATT Way Ahead > ? Is Twistlock in runtime in Prod-B (and what about current Prod)? If > not, then it needs to be. (recommend for RH P1 Team) Twistlock is deployed in up-prod w/ runtime defense enabled. There's no custom content and it's still in learning mode, but running containers are being scanned and runtime events are being generated. The compliance report is so-so but the vulnerability reports are fugly. I'm waiting on access to up-prod-b to verify, but I expect it's the same. > ? DCAR S3 Bucket ? Validate Proxy in place and no direct external > access (recommend for Taylor?s DSOP Team) Cybersec needs to be part of this. The DSOP S3 bucket may be ACL'd but it is reachable by anything in the peered VPCs--production-vpc, staging-up-vpc, dev-up-vpc, and up-prod-vpc. > ? Need Encryption on open Ports (recommend for RH P1 Team to look > into) There's nothing answering on 80 AFAICT, but having 80 open is useful for TLS redirect. If I can get cert-manager off the ground (still working w/ AF PKI SPO on this), 80 is required for the ACME HTTP01 challenge. > ? Need better diagram showing both internal and external > ports/protocols right on the diagram (no IPs or become Classified > Document) with encryption, and what?s internal/external to AWS > account, VPC, inside/outside cluster, what?s public facing and what?s > not, application; for IATT focus on what?s outside the cluster?what > goes in/out of cluster boundary and identify/define what goes in/out > (which team will take > lead?) I might be able to do much of this w/ cloudmapper, but the result is (a) a freakin' eyechart (I need to work on the filtering feature), and (b) intended to be interactive. I might be able to generate a standalone version I can just host from S3. However, there's no way to do this without IP addresses. AWS internal addresses are encoded into the internal DNS name, which has to be reported or nothing makes sense. > ? Action Item: Taylor send DSOP scans of apps to Nic, focus on the > delta (the findings not covered by UBI) Twistlock can report CVEs by layer, but IDK about compliance. That might be a useful source. -- T From jlastrilla at mitre.org Wed Dec 18 21:35:05 2019 From: jlastrilla at mitre.org (Lastrilla, Jet) Date: Wed, 18 Dec 2019 21:35:05 +0000 Subject: [Platformone] [EXT] Re: IATT Way Ahead In-Reply-To: References: Message-ID: UPDATE. Colleen/Taylor, please proceed with scans. Let the team know when you are complete. R/Jet -----Original Message----- From: Lastrilla, Jet Sent: Wednesday, December 18, 2019 3:32 PM To: Miller, Timothy J. ; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Kevin O'Donnell ; platformONE at redhat.com Cc: Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; Feiglstok, Colleen M [US] (MS) ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP Subject: RE: [Platformone] [EXT] Re: IATT Way Ahead All: Great job to the collective team on getting this done together! Here are the actions, in order, that need to be completed: 1. Complete AAM build in UP Prod. Blocker being worked by RH and Gu 2. Colleen scans UP Prod VPC in conjunction with Taylor scanning the VPC 3. Identify delta between testing provided earlier this week and new environment scans 4. Update external interface diagram per Nic's request (no dependencies on others on this list) 5. Send updated IATT package to Nic/Lauren. Let me know if you have any questions. R/Jet 619-508-5888 -----Original Message----- From: Miller, Timothy J. Sent: Wednesday, December 18, 2019 2:41 PM To: DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Lastrilla, Jet ; Kevin O'Donnell ; platformONE at redhat.com Cc: Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; Feiglstok, Colleen M [US] (MS) ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP Subject: Re: [Platformone] [EXT] Re: IATT Way Ahead > ? Is Twistlock in runtime in Prod-B (and what about current Prod)? If > not, then it needs to be. (recommend for RH P1 Team) Twistlock is deployed in up-prod w/ runtime defense enabled. There's no custom content and it's still in learning mode, but running containers are being scanned and runtime events are being generated. The compliance report is so-so but the vulnerability reports are fugly. I'm waiting on access to up-prod-b to verify, but I expect it's the same. > ? DCAR S3 Bucket ? Validate Proxy in place and no direct external > access (recommend for Taylor?s DSOP Team) Cybersec needs to be part of this. The DSOP S3 bucket may be ACL'd but it is reachable by anything in the peered VPCs--production-vpc, staging-up-vpc, dev-up-vpc, and up-prod-vpc. > ? Need Encryption on open Ports (recommend for RH P1 Team to look > into) There's nothing answering on 80 AFAICT, but having 80 open is useful for TLS redirect. If I can get cert-manager off the ground (still working w/ AF PKI SPO on this), 80 is required for the ACME HTTP01 challenge. > ? Need better diagram showing both internal and external > ports/protocols right on the diagram (no IPs or become Classified > Document) with encryption, and what?s internal/external to AWS > account, VPC, inside/outside cluster, what?s public facing and what?s > not, application; for IATT focus on what?s outside the cluster?what > goes in/out of cluster boundary and identify/define what goes in/out > (which team will take > lead?) I might be able to do much of this w/ cloudmapper, but the result is (a) a freakin' eyechart (I need to work on the filtering feature), and (b) intended to be interactive. I might be able to generate a standalone version I can just host from S3. However, there's no way to do this without IP addresses. AWS internal addresses are encoded into the internal DNS name, which has to be reported or nothing makes sense. > ? Action Item: Taylor send DSOP scans of apps to Nic, focus on the > delta (the findings not covered by UBI) Twistlock can report CVEs by layer, but IDK about compliance. That might be a useful source. -- T From austen.bryan.1 at us.af.mil Thu Dec 19 02:34:53 2019 From: austen.bryan.1 at us.af.mil (BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP) Date: Thu, 19 Dec 2019 02:34:53 +0000 Subject: [Platformone] [EXT] Re: IATT Way Ahead In-Reply-To: References: Message-ID: Jet, Thanks for the quick rundown. Really excited by the teams progress. We need to capture the goodness in this coordination from the past few days and instutionalize it. Not sure when, where and who yet but we need to track that too. In #2, did you mean Taylor scan the apps again since the latest round of code changes and builds? Is the task complete, or is it at a later date, to whitelist DODIN IP addresses? Are we tagging up again tomorrow? If so, please put something on the calendar for end of day-ish. Lastly, we should track an open action for a scheduled meeting with Ms K. Nic and I continue to reach out. -Austen -----Original Message----- From: platformone-bounces at redhat.com On Behalf Of Lastrilla, Jet Sent: Wednesday, December 18, 2019 3:32 PM To: Miller, Timothy J. ; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Kevin O'Donnell ; platformONE at redhat.com Cc: Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; Feiglstok, Colleen M [US] (MS) ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP Subject: [Non-DoD Source] Re: [Platformone] [EXT] Re: IATT Way Ahead All: Great job to the collective team on getting this done together! Here are the actions, in order, that need to be completed: 1. Complete AAM build in UP Prod. Blocker being worked by RH and Gu 2. Colleen scans UP Prod VPC in conjunction with Taylor scanning the VPC 3. Identify delta between testing provided earlier this week and new environment scans 4. Update external interface diagram per Nic's request (no dependencies on others on this list) 5. Send updated IATT package to Nic/Lauren. Let me know if you have any questions. R/Jet 619-508-5888 -----Original Message----- From: Miller, Timothy J. Sent: Wednesday, December 18, 2019 2:41 PM To: DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Lastrilla, Jet ; Kevin O'Donnell ; platformONE at redhat.com Cc: Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; Feiglstok, Colleen M [US] (MS) ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP Subject: Re: [Platformone] [EXT] Re: IATT Way Ahead > ? Is Twistlock in runtime in Prod-B (and what about current Prod)? If > not, then it needs to be. (recommend for RH P1 Team) Twistlock is deployed in up-prod w/ runtime defense enabled. There's no custom content and it's still in learning mode, but running containers are being scanned and runtime events are being generated. The compliance report is so-so but the vulnerability reports are fugly. I'm waiting on access to up-prod-b to verify, but I expect it's the same. > ? DCAR S3 Bucket ? Validate Proxy in place and no direct external > access (recommend for Taylor?s DSOP Team) Cybersec needs to be part of this. The DSOP S3 bucket may be ACL'd but it is reachable by anything in the peered VPCs--production-vpc, staging-up-vpc, dev-up-vpc, and up-prod-vpc. > ? Need Encryption on open Ports (recommend for RH P1 Team to look > into) There's nothing answering on 80 AFAICT, but having 80 open is useful for TLS redirect. If I can get cert-manager off the ground (still working w/ AF PKI SPO on this), 80 is required for the ACME HTTP01 challenge. > ? Need better diagram showing both internal and external > ports/protocols right on the diagram (no IPs or become Classified > Document) with encryption, and what?s internal/external to AWS > account, VPC, inside/outside cluster, what?s public facing and what?s > not, application; for IATT focus on what?s outside the cluster?what > goes in/out of cluster boundary and identify/define what goes in/out > (which team will take > lead?) I might be able to do much of this w/ cloudmapper, but the result is (a) a freakin' eyechart (I need to work on the filtering feature), and (b) intended to be interactive. I might be able to generate a standalone version I can just host from S3. However, there's no way to do this without IP addresses. AWS internal addresses are encoded into the internal DNS name, which has to be reported or nothing makes sense. > ? Action Item: Taylor send DSOP scans of apps to Nic, focus on the > delta (the findings not covered by UBI) Twistlock can report CVEs by layer, but IDK about compliance. That might be a useful source. -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5444 bytes Desc: not available URL: From jlastrilla at mitre.org Thu Dec 19 02:42:49 2019 From: jlastrilla at mitre.org (Lastrilla, Jet) Date: Thu, 19 Dec 2019 02:42:49 +0000 Subject: [Platformone] [EXT] Re: IATT Way Ahead In-Reply-To: References: , Message-ID: Austen, It was exciting to see the progress and all the collaboration between the platform and app teams. One of the things we mentioned to Nic is the current RMf requires scans in the production environment. Taylor needs to rescan the apps while inside the UP prod VPc to show that there is no change between stages of build. This was the proof of DSOP that I told Nic we needed. We still need to work on the ingress waitlisting from DODIN. I don?t have a tag up scheduled, but I?ll put one in the calendar for tomorrow afternoon. I will be out of pocket in the afternoon for a preschool play. Tina would kill me if I missed it. Get Outlook for iOS ________________________________ From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP Sent: Wednesday, December 18, 2019 8:37 PM To: Lastrilla, Jet; Miller, Timothy J.; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Kevin O'Donnell; platformONE at redhat.com Cc: Tim Gast; Bubb, Mike; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC; tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP; Blade, Eric D [US] (MS); RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; Leonard, Michael C.; Feiglstok, Colleen M [US] (MS); REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP Subject: RE: [Platformone] [EXT] Re: IATT Way Ahead Jet, Thanks for the quick rundown. Really excited by the teams progress. We need to capture the goodness in this coordination from the past few days and instutionalize it. Not sure when, where and who yet but we need to track that too. In #2, did you mean Taylor scan the apps again since the latest round of code changes and builds? Is the task complete, or is it at a later date, to whitelist DODIN IP addresses? Are we tagging up again tomorrow? If so, please put something on the calendar for end of day-ish. Lastly, we should track an open action for a scheduled meeting with Ms K. Nic and I continue to reach out. -Austen -----Original Message----- From: platformone-bounces at redhat.com On Behalf Of Lastrilla, Jet Sent: Wednesday, December 18, 2019 3:32 PM To: Miller, Timothy J. ; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Kevin O'Donnell ; platformONE at redhat.com Cc: Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; Feiglstok, Colleen M [US] (MS) ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP Subject: [Non-DoD Source] Re: [Platformone] [EXT] Re: IATT Way Ahead All: Great job to the collective team on getting this done together! Here are the actions, in order, that need to be completed: 1. Complete AAM build in UP Prod. Blocker being worked by RH and Gu 2. Colleen scans UP Prod VPC in conjunction with Taylor scanning the VPC 3. Identify delta between testing provided earlier this week and new environment scans 4. Update external interface diagram per Nic's request (no dependencies on others on this list) 5. Send updated IATT package to Nic/Lauren. Let me know if you have any questions. R/Jet 619-508-5888 -----Original Message----- From: Miller, Timothy J. Sent: Wednesday, December 18, 2019 2:41 PM To: DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Lastrilla, Jet ; Kevin O'Donnell ; platformONE at redhat.com Cc: Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; Feiglstok, Colleen M [US] (MS) ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP Subject: Re: [Platformone] [EXT] Re: IATT Way Ahead > " Is Twistlock in runtime in Prod-B (and what about current Prod)? If > not, then it needs to be. (recommend for RH P1 Team) Twistlock is deployed in up-prod w/ runtime defense enabled. There's no custom content and it's still in learning mode, but running containers are being scanned and runtime events are being generated. The compliance report is so-so but the vulnerability reports are fugly. I'm waiting on access to up-prod-b to verify, but I expect it's the same. > " DCAR S3 Bucket  Validate Proxy in place and no direct external > access (recommend for Taylor s DSOP Team) Cybersec needs to be part of this. The DSOP S3 bucket may be ACL'd but it is reachable by anything in the peered VPCs--production-vpc, staging-up-vpc, dev-up-vpc, and up-prod-vpc. > " Need Encryption on open Ports (recommend for RH P1 Team to look > into) There's nothing answering on 80 AFAICT, but having 80 open is useful for TLS redirect. If I can get cert-manager off the ground (still working w/ AF PKI SPO on this), 80 is required for the ACME HTTP01 challenge. > " Need better diagram showing both internal and external > ports/protocols right on the diagram (no IPs or become Classified > Document) with encryption, and what s internal/external to AWS > account, VPC, inside/outside cluster, what s public facing and what s > not, application; for IATT focus on what s outside the cluster what > goes in/out of cluster boundary and identify/define what goes in/out > (which team will take > lead?) I might be able to do much of this w/ cloudmapper, but the result is (a) a freakin' eyechart (I need to work on the filtering feature), and (b) intended to be interactive. I might be able to generate a standalone version I can just host from S3. However, there's no way to do this without IP addresses. AWS internal addresses are encoded into the internal DNS name, which has to be reported or nothing makes sense. > " Action Item: Taylor send DSOP scans of apps to Nic, focus on the > delta (the findings not covered by UBI) Twistlock can report CVEs by layer, but IDK about compliance. That might be a useful source. -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Thu Dec 19 13:27:26 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Thu, 19 Dec 2019 07:27:26 -0600 Subject: [Platformone] The Unicorn Project Message-ID: Good Morning, Sorry to spam you - but I thought this would be a good share - One of our Architects shared this gem today - the Unicorn Project is free today on Amazon: https://www.amazon.com/gp/product/B078Y98RG8/ref=as_li_tl?ie=UTF8&tag=itrevpre-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=B078Y98RG8&linkId=deec657dd0b2215d33db7c148a712446 If you enjoyed The Phoenix Project - you will enjoy this book. The timing is paralleled to Phoenix Project but is from the perspective of a Developer before, during and post Phoenix Project deployment. thanks, jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Thu Dec 19 13:30:43 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Thu, 19 Dec 2019 07:30:43 -0600 Subject: [Platformone] The Unicorn Project In-Reply-To: References: Message-ID: LOL - The link he shared was for the Phoenix Project - that's what I get for blindly copy/pasting The audio book for Unicorn Project is free (with audible trial) here - link verified :) Sorry about that! jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs On Thu, Dec 19, 2019 at 7:27 AM Jonathan Rickard wrote: > Good Morning, > > Sorry to spam you - but I thought this would be a good share - > > One of our Architects shared this gem today - the Unicorn Project is free > today on Amazon: > https://www.amazon.com/gp/product/B078Y98RG8/ref=as_li_tl?ie=UTF8&tag=itrevpre-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=B078Y98RG8&linkId=deec657dd0b2215d33db7c148a712446 > > If you enjoyed The Phoenix Project - you will enjoy this book. The timing > is paralleled to Phoenix Project but is from the perspective of a Developer > before, during and post Phoenix Project deployment. > > thanks, > jonny > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmiller at mitre.org Thu Dec 19 14:56:20 2019 From: tmiller at mitre.org (Miller, Timothy J.) Date: Thu, 19 Dec 2019 14:56:20 +0000 Subject: [Platformone] [EXT] Re: IATT Way Ahead In-Reply-To: References: Message-ID: Do we have tooling to do deltas between OVAL results files? This would be nice to have. -- T ?On 12/18/19, 20:42, "Lastrilla, Jet" wrote: Austen, It was exciting to see the progress and all the collaboration between the platform and app teams. One of the things we mentioned to Nic is the current RMf requires scans in the production environment. Taylor needs to rescan the apps while inside the UP prod VPc to show that there is no change between stages of build. This was the proof of DSOP that I told Nic we needed. We still need to work on the ingress waitlisting from DODIN. I don?t have a tag up scheduled, but I?ll put one in the calendar for tomorrow afternoon. I will be out of pocket in the afternoon for a preschool play. Tina would kill me if I missed it. Get Outlook for iOS ________________________________________ From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP Sent: Wednesday, December 18, 2019 8:37 PM To: Lastrilla, Jet; Miller, Timothy J.; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP; Kevin O'Donnell; platformONE at redhat.com Cc: Tim Gast; Bubb, Mike; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC; tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP; Blade, Eric D [US] (MS); RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP; Leonard, Michael C.; Feiglstok, Colleen M [US] (MS); REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP Subject: RE: [Platformone] [EXT] Re: IATT Way Ahead Jet, Thanks for the quick rundown. Really excited by the teams progress. We need to capture the goodness in this coordination from the past few days and instutionalize it. Not sure when, where and who yet but we need to track that too. In #2, did you mean Taylor scan the apps again since the latest round of code changes and builds? Is the task complete, or is it at a later date, to whitelist DODIN IP addresses? Are we tagging up again tomorrow? If so, please put something on the calendar for end of day-ish. Lastly, we should track an open action for a scheduled meeting with Ms K. Nic and I continue to reach out. -Austen -----Original Message----- From: platformone-bounces at redhat.com On Behalf Of Lastrilla, Jet Sent: Wednesday, December 18, 2019 3:32 PM To: Miller, Timothy J. ; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Kevin O'Donnell ; platformONE at redhat.com Cc: Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; Feiglstok, Colleen M [US] (MS) ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP Subject: [Non-DoD Source] Re: [Platformone] [EXT] Re: IATT Way Ahead All: Great job to the collective team on getting this done together! Here are the actions, in order, that need to be completed: 1. Complete AAM build in UP Prod. Blocker being worked by RH and Gu 2. Colleen scans UP Prod VPC in conjunction with Taylor scanning the VPC 3. Identify delta between testing provided earlier this week and new environment scans 4. Update external interface diagram per Nic's request (no dependencies on others on this list) 5. Send updated IATT package to Nic/Lauren. Let me know if you have any questions. R/Jet 619-508-5888 -----Original Message----- From: Miller, Timothy J. Sent: Wednesday, December 18, 2019 2:41 PM To: DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Lastrilla, Jet ; Kevin O'Donnell ; platformONE at redhat.com Cc: Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; Feiglstok, Colleen M [US] (MS) ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP Subject: Re: [Platformone] [EXT] Re: IATT Way Ahead > " Is Twistlock in runtime in Prod-B (and what about current Prod)? If > not, then it needs to be. (recommend for RH P1 Team) Twistlock is deployed in up-prod w/ runtime defense enabled. There's no custom content and it's still in learning mode, but running containers are being scanned and runtime events are being generated. The compliance report is so-so but the vulnerability reports are fugly. I'm waiting on access to up-prod-b to verify, but I expect it's the same. > " DCAR S3 Bucket  Validate Proxy in place and no direct external > access (recommend for Taylor s DSOP Team) Cybersec needs to be part of this. The DSOP S3 bucket may be ACL'd but it is reachable by anything in the peered VPCs--production-vpc, staging-up-vpc, dev-up-vpc, and up-prod-vpc. > " Need Encryption on open Ports (recommend for RH P1 Team to look > into) There's nothing answering on 80 AFAICT, but having 80 open is useful for TLS redirect. If I can get cert-manager off the ground (still working w/ AF PKI SPO on this), 80 is required for the ACME HTTP01 challenge. > " Need better diagram showing both internal and external > ports/protocols right on the diagram (no IPs or become Classified > Document) with encryption, and what s internal/external to AWS > account, VPC, inside/outside cluster, what s public facing and what s > not, application; for IATT focus on what s outside the cluster what > goes in/out of cluster boundary and identify/define what goes in/out > (which team will take > lead?) I might be able to do much of this w/ cloudmapper, but the result is (a) a freakin' eyechart (I need to work on the filtering feature), and (b) intended to be interactive. I might be able to generate a standalone version I can just host from S3. However, there's no way to do this without IP addresses. AWS internal addresses are encoded into the internal DNS name, which has to be reported or nothing makes sense. > " Action Item: Taylor send DSOP scans of apps to Nic, focus on the > delta (the findings not covered by UBI) Twistlock can report CVEs by layer, but IDK about compliance. That might be a useful source. -- T _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone From mnissley at redhat.com Thu Dec 19 15:02:02 2019 From: mnissley at redhat.com (Mark Nissley) Date: Thu, 19 Dec 2019 09:02:02 -0600 Subject: [Platformone] Switch over to new AWS RBAC Message-ID: Team - There is some urgency to switch over to the new AWS RBAC model. While this holds some risk, it is necessary and must be done as soon as possible. We will not initiate this action until AFTER the meeting with Nic tomorrow in which we will review the IATT package. If significant and immediate actions come out of that we will reevaluate the RBAC timeline. In the meantime, please take a few minutes to review the notes and actions below from Adrian, who is leading that effort: *"Unfortunately the state of the AWS Account and Users are going to require me to disable accounts and have people call me to set their new account up. There are a lot of users without an MFA, never logged in, no point of contact email, no tags, and I think some users are no longer on the program. * *What we can do is leave Jonny and Chris *[Chris is on PTO so I'd propose Dino in this spot as he will be working over the break]* with full admin temporarily in case there is something needed by the Platform1 team in the short term. I can also set a target date of 3 Jan to remove their full admin privileges. As full admins they will be able to utilize any AWS service needed by the Platform1 Team. I would ask them to not change anything in IAM as far as the RBAC is concerned. If the Platform1 team needs access to a specific AWS service I can review and possibly change their permissions in the RBAC policies."* PLEASE NOTE: this will require action from everyone to set up a new account. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled PTO: Dec 23 - Jan 03* -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrian.nunez at bylight.com Thu Dec 19 15:17:54 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Thu, 19 Dec 2019 15:17:54 +0000 Subject: [Platformone] Switch over to new AWS RBAC In-Reply-To: References: Message-ID: I am going to disable everyone's AWS accounts after the meeting Friday. Please contact me in order to set up your account again. This setup will include MFA and POC information. My contact info is Adrian.nunez at bylight.com and my phone is 571-230-5289. Please be ready to answer the following questions: ? What team are you on? ? What is your Role in UP? ? What will your daily activities consist of? ? How long will you need access? ? What is your experience with AWS? V/R Adrian Get Outlook for Android ________________________________ From: Mark Nissley Sent: Thursday, December 19, 2019 10:02:02 AM To: platformONE at redhat.com Cc: Adrian Nunez Subject: Switch over to new AWS RBAC [EXTERNAL EMAIL] Team - There is some urgency to switch over to the new AWS RBAC model. While this holds some risk, it is necessary and must be done as soon as possible. We will not initiate this action until AFTER the meeting with Nic tomorrow in which we will review the IATT package. If significant and immediate actions come out of that we will reevaluate the RBAC timeline. In the meantime, please take a few minutes to review the notes and actions below from Adrian, who is leading that effort: "Unfortunately the state of the AWS Account and Users are going to require me to disable accounts and have people call me to set their new account up. There are a lot of users without an MFA, never logged in, no point of contact email, no tags, and I think some users are no longer on the program. What we can do is leave Jonny and Chris [Chris is on PTO so I'd propose Dino in this spot as he will be working over the break] with full admin temporarily in case there is something needed by the Platform1 team in the short term. I can also set a target date of 3 Jan to remove their full admin privileges. As full admins they will be able to utilize any AWS service needed by the Platform1 Team. I would ask them to not change anything in IAM as far as the RBAC is concerned. If the Platform1 team needs access to a specific AWS service I can review and possibly change their permissions in the RBAC policies." PLEASE NOTE: this will require action from everyone to set up a new account. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled PTO: Dec 23 - Jan 03 This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnissley at redhat.com Thu Dec 19 15:29:06 2019 From: mnissley at redhat.com (Mark Nissley) Date: Thu, 19 Dec 2019 09:29:06 -0600 Subject: [Platformone] Switch over to new AWS RBAC In-Reply-To: References: Message-ID: Adrian - Could team members go ahead and call you in advance to get their new roles set up? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled PTO: Dec 23 - Jan 03* On Thu, Dec 19, 2019 at 9:18 AM Adrian Nunez wrote: > I am going to disable everyone's AWS accounts after the meeting Friday. > Please contact me in order to set up your account again. This setup will > include MFA and POC information. My contact info is > Adrian.nunez at bylight.com and my phone is 571-230-5289. > > > Please be ready to answer the following questions: > > ? What team are you on? > > ? What is your Role in UP? > > ? What will your daily activities consist of? > > ? How long will you need access? > > ? What is your experience with AWS? > > V/R > Adrian > > Get Outlook for Android > ------------------------------ > *From:* Mark Nissley > *Sent:* Thursday, December 19, 2019 10:02:02 AM > *To:* platformONE at redhat.com > *Cc:* Adrian Nunez > *Subject:* Switch over to new AWS RBAC > > > [EXTERNAL EMAIL] > Team - > > There is some urgency to switch over to the new AWS RBAC model. While this > holds some risk, it is necessary and must be done as soon as possible. We > will not initiate this action until AFTER the meeting with Nic tomorrow in > which we will review the IATT package. If significant and immediate actions > come out of that we will reevaluate the RBAC timeline. > > In the meantime, please take a few minutes to review the notes and actions > below from Adrian, who is leading that effort: > > *"Unfortunately the state of the AWS Account and Users are going to > require me to disable accounts and have people call me to set their new > account up. There are a lot of users without an MFA, never logged in, no > point of contact email, no tags, and I think some users are no longer on > the program. * > > *What we can do is leave Jonny and Chris *[Chris is on PTO so I'd propose > Dino in this spot as he will be working over the break]* with full admin > temporarily in case there is something needed by the Platform1 team in the > short term. I can also set a target date of 3 Jan to remove their full > admin privileges. As full admins they will be able to utilize any AWS > service needed by the Platform1 Team. I would ask them to not change > anything in IAM as far as the RBAC is concerned. If the Platform1 team > needs access to a specific AWS service I can review and possibly change > their permissions in the RBAC policies."* > > > PLEASE NOTE: this will require action from everyone to set up a new > account. > > > Mark NISSLEY, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled PTO: Dec 23 - Jan 03* > > This communication (including any attachments) may contain information > that is proprietary, confidential or exempt from disclosure. If you are not > the intended recipient, please note that further dissemination, > distribution, use or copying of this communication is strictly prohibited. > Anyone who received this message in error should notify the sender > immediately by telephone or by return email and delete it from his or her > computer. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrian.nunez at bylight.com Thu Dec 19 15:45:32 2019 From: adrian.nunez at bylight.com (Adrian Nunez) Date: Thu, 19 Dec 2019 15:45:32 +0000 Subject: [Platformone] Switch over to new AWS RBAC In-Reply-To: References: , Message-ID: I'd rather wait until I disable access to all the accounts. I see a lot of people that I suspect are no longer on the program. By disabling all the accounts those people will be purged. We are better off waiting until tomorrow. Get Outlook for Android ________________________________ From: Mark Nissley Sent: Thursday, December 19, 2019 10:29:06 AM To: Adrian Nunez Cc: platformONE at redhat.com ; LASTRILLA, JETHRO S CTR USAF AFMC AFLCMC/HNCP ; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP ; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Goss, Andrew [Semper Valens Solutions (SVS)] ; CRISP, JOSHUA M GS-09 USAF AFMC AFLCMC/HNCP Subject: Re: Switch over to new AWS RBAC [EXTERNAL EMAIL] Adrian - Could team members go ahead and call you in advance to get their new roles set up? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled PTO: Dec 23 - Jan 03 On Thu, Dec 19, 2019 at 9:18 AM Adrian Nunez > wrote: I am going to disable everyone's AWS accounts after the meeting Friday. Please contact me in order to set up your account again. This setup will include MFA and POC information. My contact info is Adrian.nunez at bylight.com and my phone is 571-230-5289. Please be ready to answer the following questions: ? What team are you on? ? What is your Role in UP? ? What will your daily activities consist of? ? How long will you need access? ? What is your experience with AWS? V/R Adrian Get Outlook for Android ________________________________ From: Mark Nissley > Sent: Thursday, December 19, 2019 10:02:02 AM To: platformONE at redhat.com > Cc: Adrian Nunez > Subject: Switch over to new AWS RBAC [EXTERNAL EMAIL] Team - There is some urgency to switch over to the new AWS RBAC model. While this holds some risk, it is necessary and must be done as soon as possible. We will not initiate this action until AFTER the meeting with Nic tomorrow in which we will review the IATT package. If significant and immediate actions come out of that we will reevaluate the RBAC timeline. In the meantime, please take a few minutes to review the notes and actions below from Adrian, who is leading that effort: "Unfortunately the state of the AWS Account and Users are going to require me to disable accounts and have people call me to set their new account up. There are a lot of users without an MFA, never logged in, no point of contact email, no tags, and I think some users are no longer on the program. What we can do is leave Jonny and Chris [Chris is on PTO so I'd propose Dino in this spot as he will be working over the break] with full admin temporarily in case there is something needed by the Platform1 team in the short term. I can also set a target date of 3 Jan to remove their full admin privileges. As full admins they will be able to utilize any AWS service needed by the Platform1 Team. I would ask them to not change anything in IAM as far as the RBAC is concerned. If the Platform1 team needs access to a specific AWS service I can review and possibly change their permissions in the RBAC policies." PLEASE NOTE: this will require action from everyone to set up a new account. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled PTO: Dec 23 - Jan 03 This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylor at redhat.com Thu Dec 19 17:41:47 2019 From: taylor at redhat.com (Taylor Biggs) Date: Thu, 19 Dec 2019 12:41:47 -0500 Subject: [Platformone] [EXT] Re: IATT Way Ahead In-Reply-To: References: Message-ID: Built into the pipeline as Jenkins/Python, but pretty much only for the purposes of comparing to whitelists. Would be very nice to have a tool to use ad-hoc. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Thu, Dec 19, 2019 at 10:02 AM Miller, Timothy J. wrote: > Do we have tooling to do deltas between OVAL results files? This would be > nice to have. > > -- T > > ?On 12/18/19, 20:42, "Lastrilla, Jet" wrote: > > > Austen, > > > It was exciting to see the progress and all the collaboration between > the platform and app teams. One of the things we mentioned to Nic is the > current RMf requires scans in the production environment. Taylor needs to > rescan the apps while inside the UP prod VPc to show that there is no > change between stages of build. This was the proof of DSOP that I told Nic > we needed. > > > We still need to work on the ingress waitlisting from DODIN. I don?t > have a tag up scheduled, but I?ll put one in the calendar for tomorrow > afternoon. I will be out of pocket in the afternoon for a preschool play. > Tina > would kill me if I missed it. > Get > Outlook for iOS > > ________________________________________ > From: BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP < > austen.bryan.1 at us.af.mil> > Sent: Wednesday, December 18, 2019 8:37 PM > To: Lastrilla, Jet; Miller, Timothy J.; DIROCCO, ROGER E GG-13 USAF > AFMC ESC/AFLCMC/HNCP; Kevin O'Donnell; platformONE at redhat.com > Cc: Tim Gast; Bubb, Mike; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC; > tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC > AFLCMC/HNCP; Blade, Eric D [US] (MS); RAMIREZ, JOSE A CTR USAF AFMC > AFLCMC/HNCP; Leonard, Michael C.; Feiglstok, > Colleen M [US] (MS); REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP > Subject: RE: [Platformone] [EXT] Re: IATT Way Ahead > > > Jet, Thanks for the quick rundown. Really excited by the teams > progress. We need to capture the goodness in this coordination from the > past few days and instutionalize it. Not sure when, where and who yet but > we need to track that too. In #2, did you mean Taylor > scan the apps again since the latest round of code changes and > builds? Is the task complete, or is it at a later date, to whitelist DODIN > IP addresses? Are we tagging up again tomorrow? If so, please put something > on the calendar for end of day-ish. Lastly, > we should track an open action for a scheduled meeting with Ms K. Nic > and I continue to reach out. -Austen -----Original Message----- From: > platformone-bounces at redhat.com On > Behalf Of Lastrilla, Jet Sent: Wednesday, December > 18, 2019 3:32 PM To: Miller, Timothy J. ; > DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP roger.dirocco.4 at us.af.mil>; Kevin O'Donnell ; > platformONE at redhat.com Cc: Tim Gast ; Bubb, Mike mbubb at mitre.org>; > TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC elijah.tramble.1 at us.af.mil>; tj.zimmerman at braingu.com; LOPEZDEURALDE, > RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; > Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR > USAF AFMC AFLCMC/HNCP ; Leonard, > Michael C. ; Feiglstok, Colleen M [US] (MS) Colleen.Feiglstok at ngc.com>; REINHARDT, MELISSA A GG-13 USAF AFMC > AFLCMC/HNCP Subject: [Non-DoD > Source] Re: [Platformone] [EXT] Re: IATT Way Ahead All: Great job to > the collective team on getting this done together! Here are the actions, in > order, that need to be completed: 1. Complete AAM build in UP Prod. Blocker > being worked by RH and Gu 2. Colleen > scans UP Prod VPC in conjunction with Taylor scanning the VPC 3. > Identify delta between testing provided earlier this week and new > environment scans 4. Update external interface diagram per Nic's request > (no dependencies on others on this list) 5. Send updated > IATT package to Nic/Lauren. Let me know if you have any questions. > R/Jet 619-508-5888 -----Original Message----- From: Miller, Timothy J. tmiller at mitre.org> Sent: Wednesday, December 18, 2019 2:41 PM To: > DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP roger.dirocco.4 at us.af.mil>; > Lastrilla, Jet ; Kevin O'Donnell kodonnel at redhat.com>; platformONE at redhat.com Cc: Tim Gast ; > Bubb, Mike ; TRAMBLE, ELIJAH Q Capt USAF AFMC > AFLCMC/HNC ; tj.zimmerman at braingu.com; > LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP richard.lopezdeuralde at us.af.mil>; Blade, Eric D [US] (MS) Eric.Blade at ngc.com>; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP jose.ramirez.50.ctr at us.af.mil>; Leonard, Michael C. ; > Feiglstok, Colleen M [US] (MS) ; > REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP melissa.reinhardt.2 at us.af.mil> Subject: Re: [Platformone] [EXT] Re: IATT > Way Ahead > " Is Twistlock in runtime in Prod-B (and what about current > Prod)? If > not, then it needs to be. (recommend for RH P1 Team) > Twistlock is deployed in up-prod w/ runtime defense enabled. There's no > custom content and it's still in learning mode, but running containers are > being scanned and runtime events are being generated. > The compliance report is so-so but the vulnerability reports are > fugly. I'm waiting on access to up-prod-b to verify, but I expect it's the > same. > " DCAR S3 Bucket Validate Proxy in place and no direct external > > access (recommend for Taylor s DSOP Team) > Cybersec needs to be part of this. The DSOP S3 bucket may be ACL'd > but it is reachable by anything in the peered VPCs--production-vpc, > staging-up-vpc, dev-up-vpc, and up-prod-vpc. > " Need Encryption on open > Ports (recommend for RH P1 Team to look > into) > There's nothing answering on 80 AFAICT, but having 80 open is useful > for TLS redirect. If I can get cert-manager off the ground (still working > w/ AF PKI SPO on this), 80 is required for the ACME HTTP01 challenge. > " > Need better diagram showing both internal > and external > ports/protocols right on the diagram (no IPs or become > Classified > Document) with encryption, and what s internal/external to > AWS > account, VPC, inside/outside cluster, what s public facing and what > s > not, application; for IATT focus > on what s outside the cluster what > goes in/out of cluster > boundary and identify/define what goes in/out > (which team will take > > lead?) I might be able to do much of this w/ cloudmapper, but the result is > (a) a freakin' eyechart (I need to work on the > filtering feature), and (b) intended to be interactive. I might be > able to generate a standalone version I can just host from S3. However, > there's no way to do this without IP addresses. AWS internal addresses are > encoded into the internal DNS name, which > has to be reported or nothing makes sense. > " Action Item: Taylor > send DSOP scans of apps to Nic, focus on the > delta (the findings not > covered by UBI) Twistlock can report CVEs by layer, but IDK about > compliance. That might be a useful source. -- T > _______________________________________________ > platformONE mailing list platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > > > > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrickard at redhat.com Thu Dec 19 18:56:14 2019 From: jrickard at redhat.com (Jonathan Rickard) Date: Thu, 19 Dec 2019 12:56:14 -0600 Subject: [Platformone] Anchore in Unified-Platform Message-ID: Keegan / Hayden / Khary, Jay reported that he's been creating user accounts/groups within anchore ( unified-platform.io) but they're not there any longer. Would one of you guys please take a look? Thanks, jonny Jonathan Rickard, RHCE, RHCA Consulting Architect Red Hat Public Sector jonny at redhat.com M: 210.862.9739 @redhatjobs redhatjobs @redhatjobs -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmendez at redhat.com Thu Dec 19 19:45:05 2019 From: kmendez at redhat.com (Khary Mendez) Date: Thu, 19 Dec 2019 14:45:05 -0500 Subject: [Platformone] Anchore in Unified-Platform In-Reply-To: References: Message-ID: Thanks Jonny - we are actively investigating Khary A. Mendez, RHCA (150-047-298) Senior Principal Consultant Red Hat Public Sector khary at redhat.com M: (240)888-9170 On Thu, Dec 19, 2019 at 1:56 PM Jonathan Rickard wrote: > Keegan / Hayden / Khary, > > Jay reported that he's been creating user accounts/groups within anchore ( > unified-platform.io) but they're not there any longer. Would one of you > guys please take a look? > > Thanks, > jonny > > Jonathan Rickard, RHCE, RHCA > > Consulting Architect > > Red Hat Public Sector > > jonny at redhat.com > M: 210.862.9739 > @redhatjobs redhatjobs > @redhatjobs > > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnissley at redhat.com Thu Dec 19 20:17:17 2019 From: mnissley at redhat.com (Mark Nissley) Date: Thu, 19 Dec 2019 14:17:17 -0600 Subject: [Platformone] Update on IATT Scans and POA&M Message-ID: Team - Could someone provide a quick update on the readiness for this afternoon's meeting with Nic? are we good to go? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled PTO: Dec 23 - Jan 03* -------------- next part -------------- An HTML attachment was scrubbed... URL: From steven.bogue.1.ctr at us.af.mil Thu Dec 19 20:22:52 2019 From: steven.bogue.1.ctr at us.af.mil (BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP) Date: Thu, 19 Dec 2019 20:22:52 +0000 Subject: [Platformone] [Non-DoD Source] Update on IATT Scans and POA&M In-Reply-To: References: Message-ID: I?m still waiting on the latest POA&M from the scans. From: Mark Nissley Sent: Thursday, December 19, 2019 2:17 PM To: platformONE at redhat.com; Lastrilla, Jet (jlastrilla at mitre.org) ; BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP Subject: [Non-DoD Source] Update on IATT Scans and POA&M Team - Could someone provide a quick update on the readiness for this afternoon's meeting with Nic? are we good to go? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled PTO: Dec 23 - Jan 03 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnissley at redhat.com Thu Dec 19 20:24:03 2019 From: mnissley at redhat.com (Mark Nissley) Date: Thu, 19 Dec 2019 14:24:03 -0600 Subject: [Platformone] [Non-DoD Source] Update on IATT Scans and POA&M In-Reply-To: References: Message-ID: Who is responsible for compiling this latest POA&M? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled PTO: Dec 23 - Jan 03* On Thu, Dec 19, 2019 at 2:23 PM BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP < steven.bogue.1.ctr at us.af.mil> wrote: > I?m still waiting on the latest POA&M from the scans. > > > > *From:* Mark Nissley > *Sent:* Thursday, December 19, 2019 2:17 PM > *To:* platformONE at redhat.com; Lastrilla, Jet (jlastrilla at mitre.org) < > jlastrilla at mitre.org>; BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP < > steven.bogue.1.ctr at us.af.mil> > *Subject:* [Non-DoD Source] Update on IATT Scans and POA&M > > > > Team - > > > > Could someone provide a quick update on the readiness for this afternoon's > meeting with Nic? are we good to go? > > > > *Mark NISSLEY*, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled PTO: Dec 23 - Jan 03* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From austen.bryan.1 at us.af.mil Thu Dec 19 20:30:36 2019 From: austen.bryan.1 at us.af.mil (BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP) Date: Thu, 19 Dec 2019 20:30:36 +0000 Subject: [Platformone] [Non-DoD Source] Update on IATT Scans and POA&M In-Reply-To: References: Message-ID: A non-text attachment was scrubbed... Name: smime.p7m Type: application/pkcs7-mime Size: 14316 bytes Desc: not available URL: From steven.bogue.1.ctr at us.af.mil Thu Dec 19 20:31:43 2019 From: steven.bogue.1.ctr at us.af.mil (BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP) Date: Thu, 19 Dec 2019 20:31:43 +0000 Subject: [Platformone] [Non-DoD Source] Update on IATT Scans and POA&M In-Reply-To: References: Message-ID: Cory was working it, but he?s traveling. I think Brenna may be working it as well. From: Mark Nissley Sent: Thursday, December 19, 2019 2:24 PM To: BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP Cc: platformONE at redhat.com; Lastrilla, Jet (jlastrilla at mitre.org) Subject: Re: [Non-DoD Source] Update on IATT Scans and POA&M Who is responsible for compiling this latest POA&M? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled PTO: Dec 23 - Jan 03 On Thu, Dec 19, 2019 at 2:23 PM BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP > wrote: I?m still waiting on the latest POA&M from the scans. From: Mark Nissley > Sent: Thursday, December 19, 2019 2:17 PM To: platformONE at redhat.com; Lastrilla, Jet (jlastrilla at mitre.org) >; BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP > Subject: [Non-DoD Source] Update on IATT Scans and POA&M Team - Could someone provide a quick update on the readiness for this afternoon's meeting with Nic? are we good to go? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled PTO: Dec 23 - Jan 03 -------------- next part -------------- An HTML attachment was scrubbed... URL: From roger.dirocco.4 at us.af.mil Thu Dec 19 20:33:49 2019 From: roger.dirocco.4 at us.af.mil (DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP) Date: Thu, 19 Dec 2019 20:33:49 +0000 Subject: [Platformone] [Non-DoD Source] Update on IATT Scans and POA&M In-Reply-To: References: Message-ID: A non-text attachment was scrubbed... Name: smime.p7m Type: application/pkcs7-mime Size: 14825 bytes Desc: not available URL: From bgordon at redhat.com Thu Dec 19 20:52:46 2019 From: bgordon at redhat.com (Brenna Gordon) Date: Thu, 19 Dec 2019 15:52:46 -0500 Subject: [Platformone] [Non-DoD Source] Update on IATT Scans and POA&M In-Reply-To: References: Message-ID: Steven, Have there been updated scans run today? If so, can we get a copy? Most recent I have is from yesterday and the POA&Ms for those are already in draft. Highs: - Two completed/false positives - One not applicable - Two outstanding: AV and FIPS in GRUB2 Thanks, Brenna On Thu, Dec 19, 2019 at 3:32 PM BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP < steven.bogue.1.ctr at us.af.mil> wrote: > Cory was working it, but he?s traveling. I think Brenna may be working it > as well. > > > > *From:* Mark Nissley > *Sent:* Thursday, December 19, 2019 2:24 PM > *To:* BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP < > steven.bogue.1.ctr at us.af.mil> > *Cc:* platformONE at redhat.com; Lastrilla, Jet (jlastrilla at mitre.org) < > jlastrilla at mitre.org> > *Subject:* Re: [Non-DoD Source] Update on IATT Scans and POA&M > > > > Who is responsible for compiling this latest POA&M? > > > > *Mark NISSLEY*, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled PTO: Dec 23 - Jan 03* > > > > > > On Thu, Dec 19, 2019 at 2:23 PM BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP < > steven.bogue.1.ctr at us.af.mil> wrote: > > I?m still waiting on the latest POA&M from the scans. > > > > *From:* Mark Nissley > *Sent:* Thursday, December 19, 2019 2:17 PM > *To:* platformONE at redhat.com; Lastrilla, Jet (jlastrilla at mitre.org) < > jlastrilla at mitre.org>; BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP < > steven.bogue.1.ctr at us.af.mil> > *Subject:* [Non-DoD Source] Update on IATT Scans and POA&M > > > > Team - > > > > Could someone provide a quick update on the readiness for this afternoon's > meeting with Nic? are we good to go? > > > > *Mark NISSLEY*, PMP, CSM, LEAN > > PROGRAM MaNAGER & SR technical Project Manager > > North American Consulting, Public Sector > > > M: 850-530-3234 > > > > *Scheduled PTO: Dec 23 - Jan 03* > > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -- Brenna Gordon Client Manager, NAPS Red Hat bgordon at redhat.com M: 703-650-8755 -------------- next part -------------- An HTML attachment was scrubbed... URL: From steven.bogue.1.ctr at us.af.mil Thu Dec 19 21:39:41 2019 From: steven.bogue.1.ctr at us.af.mil (BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP) Date: Thu, 19 Dec 2019 21:39:41 +0000 Subject: [Platformone] [Non-DoD Source] Update on IATT Scans and POA&M In-Reply-To: References: Message-ID: This is the POA&M that I have. I just want to make sure I have the latest version. Steve From: Brenna Gordon Sent: Thursday, December 19, 2019 2:53 PM To: BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP Cc: Mark Nissley ; platformONE at redhat.com; Lastrilla, Jet (jlastrilla at mitre.org) Subject: Re: [Platformone] [Non-DoD Source] Update on IATT Scans and POA&M Steven, Have there been updated scans run today? If so, can we get a copy? Most recent I have is from yesterday and the POA&Ms for those are already in draft. Highs: * Two completed/false positives * One not applicable * Two outstanding: AV and FIPS in GRUB2 Thanks, Brenna On Thu, Dec 19, 2019 at 3:32 PM BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP > wrote: Cory was working it, but he?s traveling. I think Brenna may be working it as well. From: Mark Nissley > Sent: Thursday, December 19, 2019 2:24 PM To: BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP > Cc: platformONE at redhat.com; Lastrilla, Jet (jlastrilla at mitre.org) > Subject: Re: [Non-DoD Source] Update on IATT Scans and POA&M Who is responsible for compiling this latest POA&M? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [Image removed by sender.] Scheduled PTO: Dec 23 - Jan 03 On Thu, Dec 19, 2019 at 2:23 PM BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP > wrote: I?m still waiting on the latest POA&M from the scans. From: Mark Nissley > Sent: Thursday, December 19, 2019 2:17 PM To: platformONE at redhat.com; Lastrilla, Jet (jlastrilla at mitre.org) >; BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP > Subject: [Non-DoD Source] Update on IATT Scans and POA&M Team - Could someone provide a quick update on the readiness for this afternoon's meeting with Nic? are we good to go? Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [Image removed by sender.] Scheduled PTO: Dec 23 - Jan 03 _______________________________________________ platformONE mailing list platformONE at redhat.com https://www.redhat.com/mailman/listinfo/platformone -- Brenna Gordon Client Manager, NAPS Red Hat bgordon at redhat.com M: 703-650-8755 [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 440 bytes Desc: image001.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: UP-PROD-POA&M .xlsx Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Size: 115051 bytes Desc: UP-PROD-POA&M .xlsx URL: From Colleen.Feiglstok at ngc.com Thu Dec 19 22:35:15 2019 From: Colleen.Feiglstok at ngc.com (Feiglstok, Colleen M [US] (MS)) Date: Thu, 19 Dec 2019 22:35:15 +0000 Subject: [Platformone] Platform1 SAR Message-ID: <666ebf0de51644a89e809ab03bd8be27@XCGVAG22.northgrum.com> All, The SAR and raw results from the new security testing will be sent through NGSafe in a few moments. As usual, I felt very rushed with the testing, and feel like I have not done as thorough of a job as required. I was unable to log into the Web UIs, as no one from the Platform1 team gave me the account information. I had issues with Nessus, so the CVE's were found through OSCAP this time. A lot is the same as the last report, but please read through it, because there is some new information. I had to test as ec2-user again, which is another big issue that needs to be resolved ASAP. The more I use it and find out how it is being used, the more extremely concerned I am. It has multiple keys throughout the platform located in the .ssh directory, one of which is world readable. On some hosts, a real user is using the ec2-user account to create accounts, groups, and pull docker files. The account is non-attributable, so we have no way of knowing who is doing this. Someone could do serious damage with no consequence. I understand that the ec2-user is needed for standing up an ec2-image, but this account should only be used for implementing IAC, so that the changes implemented by ec2-user are codified. If manual admin is required, that IAC should provision the appropriate attributable accounts, and those accounts should be used from then on. In my opinion, this is a critical finding and needs to be addressed ASAP. I will be available during the day tomorrow for any questions. Thanks Colleen -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlastrilla at mitre.org Thu Dec 19 22:51:47 2019 From: jlastrilla at mitre.org (Lastrilla, Jet) Date: Thu, 19 Dec 2019 22:51:47 +0000 Subject: [Platformone] [EXT] Platform1 SAR In-Reply-To: <14445_1576794925_5DFBFB2D_14445_540_1_666ebf0de51644a89e809ab03bd8be27@XCGVAG22.northgrum.com> References: <14445_1576794925_5DFBFB2D_14445_540_1_666ebf0de51644a89e809ab03bd8be27@XCGVAG22.northgrum.com> Message-ID: Thanks Colleen. Sorry for the rushed feeling. If you want to take more time, please use tomorrow to do your testing. Thank you for all you do!!!! Get Outlook for iOS ________________________________ From: Feiglstok, Colleen M [US] (MS) Sent: Thursday, December 19, 2019 4:35:15 PM To: Lastrilla, Jet ; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP ; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Kevin O'Donnell ; platformONE at redhat.com ; Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; tj.zimmerman at braingu.com ; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP ; Taylor Biggs ; Miller, Timothy J. ; CRISP, JOSHUA M GS-09 USAF AFMC AFLCMC/HNCP ; BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP ; Wilcox, John R. (San Antonio, TX) [US] (MS) Subject: [EXT] Platform1 SAR All, The SAR and raw results from the new security testing will be sent through NGSafe in a few moments. As usual, I felt very rushed with the testing, and feel like I have not done as thorough of a job as required. I was unable to log into the Web UIs, as no one from the Platform1 team gave me the account information. I had issues with Nessus, so the CVE?s were found through OSCAP this time. A lot is the same as the last report, but please read through it, because there is some new information. I had to test as ec2-user again, which is another big issue that needs to be resolved ASAP. The more I use it and find out how it is being used, the more extremely concerned I am. It has multiple keys throughout the platform located in the .ssh directory, one of which is world readable. On some hosts, a real user is using the ec2-user account to create accounts, groups, and pull docker files. The account is non-attributable, so we have no way of knowing who is doing this. Someone could do serious damage with no consequence. I understand that the ec2-user is needed for standing up an ec2-image, but this account should only be used for implementing IAC, so that the changes implemented by ec2-user are codified. If manual admin is required, that IAC should provision the appropriate attributable accounts, and those accounts should be used from then on. In my opinion, this is a critical finding and needs to be addressed ASAP. I will be available during the day tomorrow for any questions. Thanks Colleen -------------- next part -------------- An HTML attachment was scrubbed... URL: From noreply at ngc.com Thu Dec 19 22:44:21 2019 From: noreply at ngc.com (Colleen Feiglstok) Date: Thu, 19 Dec 2019 17:44:21 -0500 Subject: [Platformone] Northrop Grumman Safe Access File Exchange Notice Message-ID: <9cc2147a22e04e1599d3b287269639e5@XCSVAG02.northgrum.com> An HTML attachment was scrubbed... URL: From kodonnel at redhat.com Thu Dec 19 23:22:10 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Thu, 19 Dec 2019 17:22:10 -0600 Subject: [Platformone] [EXT] Platform1 SAR In-Reply-To: References: <14445_1576794925_5DFBFB2D_14445_540_1_666ebf0de51644a89e809ab03bd8be27@XCGVAG22.northgrum.com> Message-ID: Colleen, Thank you for the results and recommendations. We will get GIT issues crated for your findings and will prioritize the mitigation and implement them as code in our future IAC deployments. Many of the findings in the current VPC have been mitigated in up-prod-b with our current code release. Please let us know when you have finished and we can power down the host that you have been using for scanning. Note for everyone: Once we power down the ec2 instance ssh or port 22 will not be externally accessible. Thus, mitigating many of the risks associated with the ec2-user and the keys. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Thu, Dec 19, 2019 at 4:52 PM Lastrilla, Jet wrote: > Thanks Colleen. Sorry for the rushed feeling. If you want to take more > time, please use tomorrow to do your testing. > > Thank you for all you do!!!! > > Get Outlook for iOS > ------------------------------ > *From:* Feiglstok, Colleen M [US] (MS) > *Sent:* Thursday, December 19, 2019 4:35:15 PM > *To:* Lastrilla, Jet ; BRYAN, AUSTEN R Capt USAF > AFMC AFLCMC/HNCP ; DIROCCO, ROGER E GG-13 USAF > AFMC ESC/AFLCMC/HNCP ; Kevin O'Donnell < > kodonnel at redhat.com>; platformONE at redhat.com ; > Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH > Q Capt USAF AFMC AFLCMC/HNC ; > tj.zimmerman at braingu.com ; LOPEZDEURALDE, > RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; > Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF > AFMC AFLCMC/HNCP ; Leonard, Michael C. < > leonardm at mitre.org>; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP < > melissa.reinhardt.2 at us.af.mil>; Taylor Biggs ; Miller, > Timothy J. ; CRISP, JOSHUA M GS-09 USAF AFMC > AFLCMC/HNCP ; BOGUE, STEVEN E CTR USAF AFMC > AFLCMC/HNCP ; Wilcox, John R. (San Antonio, > TX) [US] (MS) > *Subject:* [EXT] Platform1 SAR > > > All, > > > > The SAR and raw results from the new security testing will be sent through > NGSafe in a few moments. > > > > As usual, I felt very rushed with the testing, and feel like I have not > done as thorough of a job as required. I was unable to log into the Web > UIs, as no one from the Platform1 team gave me the account information. I > had issues with Nessus, so the CVE?s were found through OSCAP this time. > > > > A lot is the same as the last report, but please read through it, because > there is some new information. I had to test as ec2-user again, which is > another big issue that needs to be resolved ASAP. The more I use it and > find out how it is being used, the more extremely concerned I am. It has > multiple keys throughout the platform located in the .ssh directory, one of > which is world readable. On some hosts, a real user is using the ec2-user > account to create accounts, groups, and pull docker files. The account is > non-attributable, so we have no way of knowing who is doing this. Someone > could do serious damage with no consequence. I understand that the ec2-user > is needed for standing up an ec2-image, but this account should only be > used for implementing IAC, so that the changes implemented by ec2-user are > codified. If manual admin is required, that IAC should provision the > appropriate attributable accounts, and those accounts should be used from > then on. In my opinion, this is a critical finding and needs to be > addressed ASAP. > > > > I will be available during the day tomorrow for any questions. > > > > Thanks > > Colleen > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylor at redhat.com Fri Dec 20 00:20:47 2019 From: taylor at redhat.com (Taylor Biggs) Date: Thu, 19 Dec 2019 19:20:47 -0500 Subject: [Platformone] Anchore in Unified-Platform In-Reply-To: References: Message-ID: Khary, In case ya'll haven't found it already - this is due to the Anchore PGSQL PVC not being used by the DB - all data is lost when it reboots. I'm pretty sure that part is in their helm charts, and gave Hayden the deets. Thanks, Taylor ---- Taylor Biggs taylor at redhat.com 850-449-2220 On Thu, Dec 19, 2019 at 2:45 PM Khary Mendez wrote: > Thanks Jonny - we are actively investigating > > Khary A. Mendez, RHCA (150-047-298) > > Senior Principal Consultant > > Red Hat Public Sector > > khary at redhat.com > M: (240)888-9170 > > > > On Thu, Dec 19, 2019 at 1:56 PM Jonathan Rickard > wrote: > >> Keegan / Hayden / Khary, >> >> Jay reported that he's been creating user accounts/groups within anchore ( >> unified-platform.io) but they're not there any longer. Would one of you >> guys please take a look? >> >> Thanks, >> jonny >> >> Jonathan Rickard, RHCE, RHCA >> >> Consulting Architect >> >> Red Hat Public Sector >> >> jonny at redhat.com >> M: 210.862.9739 >> @redhatjobs redhatjobs >> @redhatjobs >> >> >> _______________________________________________ >> platformONE mailing list >> platformONE at redhat.com >> https://www.redhat.com/mailman/listinfo/platformone >> > _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Eric.Blade at ngc.com Fri Dec 20 14:12:51 2019 From: Eric.Blade at ngc.com (Blade, Eric D [US] (MS)) Date: Fri, 20 Dec 2019 14:12:51 +0000 Subject: [Platformone] EXT :Re: [EXT] Platform1 SAR In-Reply-To: References: <14445_1576794925_5DFBFB2D_14445_540_1_666ebf0de51644a89e809ab03bd8be27@XCGVAG22.northgrum.com> Message-ID: <05de1c3d47f34cc2811beea332efbfaf@XCGVAG22.northgrum.com> I just want to clarify on the topic of the EC2-user. The key is attribution. We need to know who or what is configuring the systems. We are repeatedly seeing the ec2-user account used to manually configure systems. Ec2-user should only be used to perform programmatic tasks from committed and reviewed code as part of initial provisioning. If the code to provision and configure systems is unavailable, then you must create attributable accounts to perform changes to the systems. Any use of ec2-user remote login or su to ec2-user will be flagged as a violation and the source IP or user account logged as the violating source. Any manual/command line activities MUST be performed using an attributable user account. If that is not clear, please let me know and I will attempt to elaborate further. Thank you Eric Blade, GXPN, GPEN Unified Platform System Coordinator Northrop Grumman Mission Systems Work: 410.649.0706 Mobile: 240.258.8089 From: Kevin O'Donnell Sent: Thursday, December 19, 2019 6:22 PM To: Lastrilla, Jet Cc: Feiglstok, Colleen M [US] (MS) ; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP ; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; platformONE at redhat.com; Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP ; Taylor Biggs ; Miller, Timothy J. ; CRISP, JOSHUA M GS-09 USAF AFMC AFLCMC/HNCP ; BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP ; Wilcox, John R. (San Antonio, TX) [US] (MS) Subject: EXT :Re: [EXT] Platform1 SAR Colleen, Thank you for the results and recommendations. We will get GIT issues crated for your findings and will prioritize the mitigation and implement them as code in our future IAC deployments. Many of the findings in the current VPC have been mitigated in up-prod-b with our current code release. Please let us know when you have finished and we can power down the host that you have been using for scanning. Note for everyone: Once we power down the ec2 instance ssh or port 22 will not be externally accessible. Thus, mitigating many of the risks associated with the ec2-user and the keys. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 [https://static.redhat.com/libs/redhat/brand-assets/latest/corp/logo.png] On Thu, Dec 19, 2019 at 4:52 PM Lastrilla, Jet > wrote: Thanks Colleen. Sorry for the rushed feeling. If you want to take more time, please use tomorrow to do your testing. Thank you for all you do!!!! Get Outlook for iOS ________________________________ From: Feiglstok, Colleen M [US] (MS) > Sent: Thursday, December 19, 2019 4:35:15 PM To: Lastrilla, Jet >; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP >; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP >; Kevin O'Donnell >; platformONE at redhat.com >; Tim Gast >; Bubb, Mike >; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC >; tj.zimmerman at braingu.com >; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP >; Blade, Eric D [US] (MS) >; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP >; Leonard, Michael C. >; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP >; Taylor Biggs >; Miller, Timothy J. >; CRISP, JOSHUA M GS-09 USAF AFMC AFLCMC/HNCP >; BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP >; Wilcox, John R. (San Antonio, TX) [US] (MS) > Subject: [EXT] Platform1 SAR All, The SAR and raw results from the new security testing will be sent through NGSafe in a few moments. As usual, I felt very rushed with the testing, and feel like I have not done as thorough of a job as required. I was unable to log into the Web UIs, as no one from the Platform1 team gave me the account information. I had issues with Nessus, so the CVE?s were found through OSCAP this time. A lot is the same as the last report, but please read through it, because there is some new information. I had to test as ec2-user again, which is another big issue that needs to be resolved ASAP. The more I use it and find out how it is being used, the more extremely concerned I am. It has multiple keys throughout the platform located in the .ssh directory, one of which is world readable. On some hosts, a real user is using the ec2-user account to create accounts, groups, and pull docker files. The account is non-attributable, so we have no way of knowing who is doing this. Someone could do serious damage with no consequence. I understand that the ec2-user is needed for standing up an ec2-image, but this account should only be used for implementing IAC, so that the changes implemented by ec2-user are codified. If manual admin is required, that IAC should provision the appropriate attributable accounts, and those accounts should be used from then on. In my opinion, this is a critical finding and needs to be addressed ASAP. I will be available during the day tomorrow for any questions. Thanks Colleen -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnissley at redhat.com Fri Dec 20 18:24:56 2019 From: mnissley at redhat.com (Mark Nissley) Date: Fri, 20 Dec 2019 12:24:56 -0600 Subject: [Platformone] Execution of AWS RBAC In-Reply-To: <4aa147cc4457414eb522835423d9e49a@XCGVAG21.northgrum.com> References: <4aa147cc4457414eb522835423d9e49a@XCGVAG21.northgrum.com> Message-ID: + platformONE at redhat.com FYSA. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 *Scheduled PTO: Dec 23 - Jan 03* On Fri, Dec 20, 2019 at 12:18 PM Nunez, Carlos A [US] (MS) (Contr) < Carlos.Nunez2 at ngc.com> wrote: > Team, > > > > The New AWS RBAC has been implemented. I did place users that I > personally know who are still on the program in their proper groups for > minimal disruption (For example - Taylor, Jonny, Dino, and Victor). Some > Users that I do not know, have not accessed AWS for a while, and did not > have their MFA active I took away all their permissions. I also deleted > users that have never accessed their AWS account. > > > > Security Users like Colleen and John have also been moved to the > ?Security_Write? role and also the roles for developers of each team. This > should provide them all the accesses needed to perform their scans. If > there are any problems please contact me. > > > > My cell is (571)230-5289 and my email is adrian.nunez at bylight.com > > > > The following people are the only ones with Full Admin access: In the > wise words of Uncle Ben ?with great power comes great responsibility?. > Please do not add any users or change any permissions without speaking to > me first. Thanks. > > > > > > V/R > > Adrian Nu?ez > > Senior Cloud Architect > > [image: > http://www.bylight.com/wp-content/uploads/ByLightLogoPaddedTransparentCropped.png] > > *In support of Unified Platform Cyber Factory* > > NG Email: carlos.nunez2 at ngc.com > > Bylight email: *adrian.nunez at bylight.com * > > Cell: (571)230-5289 > > [image: See the source image][image: See the source image] > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 8062 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 4333 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.jpg Type: image/jpeg Size: 1172 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 1962 bytes Desc: not available URL: From jlastrilla at mitre.org Fri Dec 20 20:00:09 2019 From: jlastrilla at mitre.org (Lastrilla, Jet) Date: Fri, 20 Dec 2019 20:00:09 +0000 Subject: [Platformone] [EXT] Platform1 SAR In-Reply-To: <14445_1576794925_5DFBFB2D_14445_540_1_666ebf0de51644a89e809ab03bd8be27@XCGVAG22.northgrum.com> References: <14445_1576794925_5DFBFB2D_14445_540_1_666ebf0de51644a89e809ab03bd8be27@XCGVAG22.northgrum.com> Message-ID: Adding Adrian. From: Feiglstok, Colleen M [US] (MS) Sent: Thursday, December 19, 2019 4:35 PM To: Lastrilla, Jet ; BRYAN, AUSTEN R Capt USAF AFMC AFLCMC/HNCP ; DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; Kevin O'Donnell ; platformONE at redhat.com; Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC ; tj.zimmerman at braingu.com; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF AFMC AFLCMC/HNCP ; Leonard, Michael C. ; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP ; Taylor Biggs ; Miller, Timothy J. ; CRISP, JOSHUA M GS-09 USAF AFMC AFLCMC/HNCP ; BOGUE, STEVEN E CTR USAF AFMC AFLCMC/HNCP ; Wilcox, John R. (San Antonio, TX) [US] (MS) Subject: [EXT] Platform1 SAR All, The SAR and raw results from the new security testing will be sent through NGSafe in a few moments. As usual, I felt very rushed with the testing, and feel like I have not done as thorough of a job as required. I was unable to log into the Web UIs, as no one from the Platform1 team gave me the account information. I had issues with Nessus, so the CVE's were found through OSCAP this time. A lot is the same as the last report, but please read through it, because there is some new information. I had to test as ec2-user again, which is another big issue that needs to be resolved ASAP. The more I use it and find out how it is being used, the more extremely concerned I am. It has multiple keys throughout the platform located in the .ssh directory, one of which is world readable. On some hosts, a real user is using the ec2-user account to create accounts, groups, and pull docker files. The account is non-attributable, so we have no way of knowing who is doing this. Someone could do serious damage with no consequence. I understand that the ec2-user is needed for standing up an ec2-image, but this account should only be used for implementing IAC, so that the changes implemented by ec2-user are codified. If manual admin is required, that IAC should provision the appropriate attributable accounts, and those accounts should be used from then on. In my opinion, this is a critical finding and needs to be addressed ASAP. I will be available during the day tomorrow for any questions. Thanks Colleen -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlastrilla at mitre.org Fri Dec 20 23:29:22 2019 From: jlastrilla at mitre.org (Lastrilla, Jet) Date: Fri, 20 Dec 2019 23:29:22 +0000 Subject: [Platformone] [EXT] Re: Execution of AWS RBAC In-Reply-To: <4970_1576866336_5DFD121F_4970_663_10_CAPeAGCcXX6h8RVZ4WZ1J7kpb93b7U0XiKM_a8a+dZH-oq_OWJg@mail.gmail.com> References: <4aa147cc4457414eb522835423d9e49a@XCGVAG21.northgrum.com> <4970_1576866336_5DFD121F_4970_663_10_CAPeAGCcXX6h8RVZ4WZ1J7kpb93b7U0XiKM_a8a+dZH-oq_OWJg@mail.gmail.com> Message-ID: #GSD Jet Lastrilla Systems Security Engineer The MITRE Corporation O: 210.208.4867 M: 619.508.5888 From: platformone-bounces at redhat.com On Behalf Of Mark Nissley Sent: Friday, December 20, 2019 12:25 PM To: Nunez, Carlos A [US] (MS) (Contr) ; platformONE at redhat.com Cc: Wilcox, John R. (San Antonio, TX) [US] (MS) ; joshua.crisp.2 at us.af.mil; PENA, HOMERO L GG-13 USAF AFMC AFLCMC/HNCP ; Feiglstok, Colleen M [US] (MS) ; LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; LASTRILLA, JETHRO S CTR USAF AFMC AFLCMC/HNCP ; Adrian Nunez ; OZUNA, LETICIA D CTR USAF AFMC AFLCMC/HNCP ; Matthew Huston Subject: [EXT] Re: [Platformone] Execution of AWS RBAC + platformONE at redhat.com FYSA. Mark NISSLEY, PMP, CSM, LEAN PROGRAM MaNAGER & SR technical Project Manager North American Consulting, Public Sector M: 850-530-3234 [https://marketing-outfit-prod-images.s3-us-west-2.amazonaws.com/f5445ae0c9ddafd5b2f1836854d7416a/Logo-RedHat-Email.png] Scheduled PTO: Dec 23 - Jan 03 On Fri, Dec 20, 2019 at 12:18 PM Nunez, Carlos A [US] (MS) (Contr) > wrote: Team, The New AWS RBAC has been implemented. I did place users that I personally know who are still on the program in their proper groups for minimal disruption (For example - Taylor, Jonny, Dino, and Victor). Some Users that I do not know, have not accessed AWS for a while, and did not have their MFA active I took away all their permissions. I also deleted users that have never accessed their AWS account. Security Users like Colleen and John have also been moved to the ?Security_Write? role and also the roles for developers of each team. This should provide them all the accesses needed to perform their scans. If there are any problems please contact me. My cell is (571)230-5289 and my email is adrian.nunez at bylight.com The following people are the only ones with Full Admin access: In the wise words of Uncle Ben ?with great power comes great responsibility?. Please do not add any users or change any permissions without speaking to me first. Thanks. [cid:image001.jpg at 01D5B75B.0341AB80] V/R Adrian Nu?ez Senior Cloud Architect [http://www.bylight.com/wp-content/uploads/ByLightLogoPaddedTransparentCropped.png] In support of Unified Platform Cyber Factory NG Email: carlos.nunez2 at ngc.com Bylight email: adrian.nunez at bylight.com Cell: (571)230-5289 [See the source image][See the source image] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 8062 bytes Desc: image001.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 4333 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.jpg Type: image/jpeg Size: 1172 bytes Desc: image003.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 1962 bytes Desc: image004.png URL: From kodonnel at redhat.com Sat Dec 21 01:45:28 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Fri, 20 Dec 2019 20:45:28 -0500 Subject: [Platformone] EXT :Re: [EXT] Platform1 SAR In-Reply-To: <05de1c3d47f34cc2811beea332efbfaf@XCGVAG22.northgrum.com> References: <14445_1576794925_5DFBFB2D_14445_540_1_666ebf0de51644a89e809ab03bd8be27@XCGVAG22.northgrum.com> <05de1c3d47f34cc2811beea332efbfaf@XCGVAG22.northgrum.com> Message-ID: Hello Eric, To be honest, if we have anyone authenticating to the systems in any way we are not following our IAC standards. This current VPC is a snowflake and it was treated as one. Things were manually done and configurations were changed. And yes that should be tracked by specific users. The current IDM users should be the users that are used for authentication in the snowflake situations. If we continue to auth to these vpc's and make one-off modification we will continue to have the issues that we do today. If we capture the issues and or features and get them prioritized and developed as code the user issue will become a mute point. We also have an issue in git to automate the scans and to have the artifacts shipped to S3 for review. Once this is in place we will no longer need to make the modification and allow SSH or port 22 access into the systems. I would argue that opening the system for security scanning is our biggest security risk currently. Let's take these next couple of weeks and really lock down the system. If we work together and outline the correct requirements and get them implemented as code we will all sleep better at night. Oh and it will make cybercom happy. The next major task will be to leverage IAC to build a more secure VPC with our current code base that currently addresses some of the concerns and move the apps into that VPC. We will continue this pattern as we evolve the IAC and the process. i.e. let's build the feedback loop and continue to develop. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Fri, Dec 20, 2019 at 9:13 AM Blade, Eric D [US] (MS) wrote: > I just want to clarify on the topic of the EC2-user. The key is > attribution. We need to know who or what is configuring the systems. > > > > We are repeatedly seeing the ec2-user account used to manually configure > systems. Ec2-user should only be used to perform programmatic tasks from > committed and reviewed code as part of initial provisioning. If the code > to provision and configure systems is unavailable, then you must create > attributable accounts to perform changes to the systems. > > > > Any use of ec2-user remote login or su to ec2-user will be flagged as a > violation and the source IP or user account logged as the violating > source. Any manual/command line activities MUST be performed using an > attributable user account. > > > > If that is not clear, please let me know and I will attempt to elaborate > further. > > > > Thank you > > > > > > Eric Blade, GXPN, GPEN > > Unified Platform System Coordinator > > Northrop Grumman Mission Systems > > Work: 410.649.0706 > > Mobile: 240.258.8089 > > > > > > > > *From:* Kevin O'Donnell > *Sent:* Thursday, December 19, 2019 6:22 PM > *To:* Lastrilla, Jet > *Cc:* Feiglstok, Colleen M [US] (MS) ; BRYAN, > AUSTEN R Capt USAF AFMC AFLCMC/HNCP ; DIROCCO, > ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP ; > platformONE at redhat.com; Tim Gast ; Bubb, Mike < > mbubb at mitre.org>; TRAMBLE, ELIJAH Q Capt USAF AFMC AFLCMC/HNC < > elijah.tramble.1 at us.af.mil>; tj.zimmerman at braingu.com; LOPEZDEURALDE, > RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; > Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF > AFMC AFLCMC/HNCP ; Leonard, Michael C. < > leonardm at mitre.org>; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP < > melissa.reinhardt.2 at us.af.mil>; Taylor Biggs ; Miller, > Timothy J. ; CRISP, JOSHUA M GS-09 USAF AFMC > AFLCMC/HNCP ; BOGUE, STEVEN E CTR USAF AFMC > AFLCMC/HNCP ; Wilcox, John R. (San Antonio, > TX) [US] (MS) > *Subject:* EXT :Re: [EXT] Platform1 SAR > > > > Colleen, > > > > Thank you for the results and recommendations. We will get GIT issues > crated for your findings and will prioritize the mitigation and implement > them as code in our future IAC deployments. Many of the findings in the > current VPC have been mitigated in up-prod-b with our current code release. > > > > Please let us know when you have finished and we can power down the host > that you have been using for scanning. > > > > Note for everyone: Once we power down the ec2 instance ssh or port 22 will > not be externally accessible. Thus, mitigating many of the risks associated > with the ec2-user and the keys. > > > > Thanks, > > > *KEVIN O'DONNELL * > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > > > > > On Thu, Dec 19, 2019 at 4:52 PM Lastrilla, Jet > wrote: > > Thanks Colleen. Sorry for the rushed feeling. If you want to take more > time, please use tomorrow to do your testing. > > > > Thank you for all you do!!!! > > > > Get Outlook for iOS > ------------------------------ > > *From:* Feiglstok, Colleen M [US] (MS) > *Sent:* Thursday, December 19, 2019 4:35:15 PM > *To:* Lastrilla, Jet ; BRYAN, AUSTEN R Capt USAF > AFMC AFLCMC/HNCP ; DIROCCO, ROGER E GG-13 USAF > AFMC ESC/AFLCMC/HNCP ; Kevin O'Donnell < > kodonnel at redhat.com>; platformONE at redhat.com ; > Tim Gast ; Bubb, Mike ; TRAMBLE, ELIJAH > Q Capt USAF AFMC AFLCMC/HNC ; > tj.zimmerman at braingu.com ; LOPEZDEURALDE, > RICHARD A Lt Col USAF AFMC AFLCMC/HNCP ; > Blade, Eric D [US] (MS) ; RAMIREZ, JOSE A CTR USAF > AFMC AFLCMC/HNCP ; Leonard, Michael C. < > leonardm at mitre.org>; REINHARDT, MELISSA A GG-13 USAF AFMC AFLCMC/HNCP < > melissa.reinhardt.2 at us.af.mil>; Taylor Biggs ; Miller, > Timothy J. ; CRISP, JOSHUA M GS-09 USAF AFMC > AFLCMC/HNCP ; BOGUE, STEVEN E CTR USAF AFMC > AFLCMC/HNCP ; Wilcox, John R. (San Antonio, > TX) [US] (MS) > *Subject:* [EXT] Platform1 SAR > > > > All, > > > > The SAR and raw results from the new security testing will be sent through > NGSafe in a few moments. > > > > As usual, I felt very rushed with the testing, and feel like I have not > done as thorough of a job as required. I was unable to log into the Web > UIs, as no one from the Platform1 team gave me the account information. I > had issues with Nessus, so the CVE?s were found through OSCAP this time. > > > > A lot is the same as the last report, but please read through it, because > there is some new information. I had to test as ec2-user again, which is > another big issue that needs to be resolved ASAP. The more I use it and > find out how it is being used, the more extremely concerned I am. It has > multiple keys throughout the platform located in the .ssh directory, one of > which is world readable. On some hosts, a real user is using the ec2-user > account to create accounts, groups, and pull docker files. The account is > non-attributable, so we have no way of knowing who is doing this. Someone > could do serious damage with no consequence. I understand that the ec2-user > is needed for standing up an ec2-image, but this account should only be > used for implementing IAC, so that the changes implemented by ec2-user are > codified. If manual admin is required, that IAC should provision the > appropriate attributable accounts, and those accounts should be used from > then on. In my opinion, this is a critical finding and needs to be > addressed ASAP. > > > > I will be available during the day tomorrow for any questions. > > > > Thanks > > Colleen > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.lopezdeuralde at us.af.mil Mon Dec 23 13:54:03 2019 From: richard.lopezdeuralde at us.af.mil (LOPEZDEURALDE, RICHARD A Lt Col USAF AFMC AFLCMC/HNCP) Date: Mon, 23 Dec 2019 13:54:03 +0000 Subject: [Platformone] [EXT] Re: Execution of AWS RBAC In-Reply-To: References: <4aa147cc4457414eb522835423d9e49a@XCGVAG21.northgrum.com> <4970_1576866336_5DFD121F_4970_663_10_CAPeAGCcXX6h8RVZ4WZ1J7kpb93b7U0XiKM_a8a+dZH-oq_OWJg@mail.gmail.com> Message-ID: A non-text attachment was scrubbed... Name: smime.p7m Type: application/pkcs7-mime Size: 141631 bytes Desc: not available URL: From darachch at redhat.com Mon Dec 23 20:47:04 2019 From: darachch at redhat.com (Dino Arachchi) Date: Mon, 23 Dec 2019 14:47:04 -0600 Subject: [Platformone] Account for Dirty Dev In-Reply-To: References: Message-ID: All, While provisionin the dev accounts, an issue was identified in the newly created up-appdev-a cluster. Code changes are being made now, after which the stack will be rebuilt. Once this is completed and verified, I will proceed with the creation of the new accounts. These tasks are estimated to be completed today, and an email will be sent out to confirm. Best Regards, DINO ARACHCHI SENIOR CONSULTANT darachch at redhat.com M: 848-203-1809 On Mon, Dec 23, 2019 at 1:46 PM Kevin O'Donnell wrote: > Thanks Dino > > -Kevin > > On Mon, Dec 23, 2019 at 2:38 PM Tim Gast wrote: > >> Thanks Dino. >> >> -Tim >> >> On Dec 23, 2019, at 1:37 PM, Dino Arachchi wrote: >> >> ? >> >> Hi Tim, >> >> I'm on PTO but am working on creating the dev accounts at the moment. The >> team should expect to receive credentials within the next couple hours. >> >> Best Regards, >> >> >> DINO ARACHCHI >> >> SENIOR CONSULTANT >> >> darachch at redhat.com M: 848-203-1809 >> >> >> >> On Mon, Dec 23, 2019 at 12:53 PM Tim Gast wrote: >> >>> Hi Kevin, >>> >>> Any word on when to be looking for those dev accounts? >>> >>> Thanks, >>> -Tim >>> >>> On Dec 21, 2019, at 10:22 AM, Kevin O'Donnell >>> wrote: >>> >>> Build completed in 4 hours and 40 min. On Monday we will get the >>> accounts created and also add some code into our build to automate the >>> account creation process. >>> >>> Everyone enjoy your weekend. >>> >>> >>> KEVIN O'DONNELL >>> ARCHITECT MANAGER >>> >>> Red Hat Red Hat NA Public Sector Consulting >>> kodonnell at redhat.com M: >>> 240-605-4654 >>> >>> >>> >>> On Fri, Dec 20, 2019 at 9:19 PM Tim Gast wrote: >>> >>>> Thanks for the update Kevin. >>>> >>>> -Tim >>>> >>>> On Dec 20, 2019, at 7:26 PM, Kevin O'Donnell >>>> wrote: >>>> >>>> ? >>>> Were about 3 hours in. >>>> >>>> >>>> >>>> KEVIN O'DONNELL >>>> ARCHITECT MANAGER >>>> >>>> Red Hat Red Hat NA Public Sector Consulting >>>> kodonnell at redhat.com M: >>>> 240-605-4654 >>>> >>>> >>>> >>>> On Fri, Dec 20, 2019 at 8:04 PM Kevin O'Donnell >>>> wrote: >>>> >>>>> Hello Tim, >>>>> >>>>> The aws rbac changes impacted our deployment and it didn?t get the >>>>> deployment running till 5pm est. I keep an eye on the job. >>>>> >>>>> On Fri, Dec 20, 2019 at 5:19 PM Tim Gast wrote: >>>>> >>>>>> Hi. >>>>>> Checking in since it?s been a few hours in flight. >>>>>> Do we have a UP-DEV-A cluster and user access now? >>>>>> >>>>>> Thanks, >>>>>> -Tim >>>>>> >>>>>> On Dec 20, 2019, at 11:49 AM, Tim Gast wrote: >>>>>> >>>>>> ?Kevin, Mark, & Roc, >>>>>> >>>>>> >>>>>> Below is the request for dev accounts for UP-APPDEV-A. >>>>>> Can we please mirror what these folks have in the current UP-Prod-A >>>>>> cluster. >>>>>> >>>>>> Also, is it possible to provide an ?elevated read? role for certain >>>>>> users (dcurrian) for discovery/troubleshooting? >>>>>> >>>>>> Please let me know when the environment is live and ready for the >>>>>> team to begin working with. >>>>>> >>>>>> Thanks, >>>>>> -Tim >>>>>> >>>>>> Begin forwarded message: >>>>>> >>>>>> *From:* "Curran, Daniel M" >>>>>> *Date:* December 20, 2019 at 11:15:39 AM CST >>>>>> *To:* "tg at braingu.com" , "nino at braingu.com" < >>>>>> nino at braingu.com> >>>>>> *Cc:* "Buffaloe, Christopher" >>>>>> *Subject:* *Account for Dirty Dev* >>>>>> >>>>>> ? >>>>>> Hey Tim and Nino, >>>>>> >>>>>> Here's the account list for the Ginyu Force (CCAT) team >>>>>> dcurran >>>>>> rcepeda >>>>>> msison >>>>>> psoliz >>>>>> atorres >>>>>> dpalmer >>>>>> rho >>>>>> kmacias >>>>>> jandrichak >>>>>> >>>>>> Obviously we'll take all the privileges you're willing to give us but >>>>>> if we can keep our dev privileges from the unified-platform cluster and put >>>>>> us in a ginyu/ccat group so we can see each others projects that would be >>>>>> enough. >>>>>> >>>>>> It might also be advantageous to have one admin account (dcurran-adm >>>>>> or ccat-adm) or at least one with elevated read privileges for discovery >>>>>> purposes but I wouldn't call that a hard requirement. >>>>>> >>>>>> Thanks, >>>>>> Daniel >>>>>> >>>>>> -- >>>>> KEVIN O'DONNELL >>>>> ARCHITECT MANAGER >>>>> >>>>> Red Hat Red Hat NA Public Sector Consulting >>>>> kodonnell at redhat.com M: >>>>> 240-605-4654 >>>>> >>>>> >>>> >>> -- > > KEVIN O'DONNELL > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tg at braingu.com Mon Dec 23 20:49:42 2019 From: tg at braingu.com (Tim Gast) Date: Mon, 23 Dec 2019 14:49:42 -0600 Subject: [Platformone] Account for Dirty Dev In-Reply-To: References: Message-ID: <1114349A-1B60-4C16-B453-A44057965097@braingu.com> Thanks for the update. What was the issue you found that would cause the entire stack to need a rebuild? Are we looking at another 5 hour build? Thanks, -Tim > On Dec 23, 2019, at 2:47 PM, Dino Arachchi wrote: > > All, > > While provisionin the dev accounts, an issue was identified in the newly created up-appdev-a cluster. Code changes are being made now, after which the stack will be rebuilt. Once this is completed and verified, I will proceed with the creation of the new accounts. These tasks are estimated to be completed today, and an email will be sent out to confirm. > > Best Regards, > > DINO ARACHCHI > SENIOR CONSULTANT > darachch at redhat.com M: 848-203-1809 > > > > > On Mon, Dec 23, 2019 at 1:46 PM Kevin O'Donnell > wrote: > Thanks Dino > > -Kevin > > On Mon, Dec 23, 2019 at 2:38 PM Tim Gast > wrote: > Thanks Dino. > > -Tim > >> On Dec 23, 2019, at 1:37 PM, Dino Arachchi > wrote: >> >> ? > >> Hi Tim, >> >> I'm on PTO but am working on creating the dev accounts at the moment. The team should expect to receive credentials within the next couple hours. >> >> Best Regards, >> >> DINO ARACHCHI >> SENIOR CONSULTANT >> darachch at redhat.com M: 848-203-1809 >> >> >> >> >> On Mon, Dec 23, 2019 at 12:53 PM Tim Gast > wrote: >> Hi Kevin, >> >> Any word on when to be looking for those dev accounts? >> >> Thanks, >> -Tim >> >>> On Dec 21, 2019, at 10:22 AM, Kevin O'Donnell > wrote: >>> >>> Build completed in 4 hours and 40 min. On Monday we will get the accounts created and also add some code into our build to automate the account creation process. >>> >>> Everyone enjoy your weekend. >>> >>> >>> KEVIN O'DONNELL >>> ARCHITECT MANAGER >>> Red Hat?Red Hat NA Public Sector Consulting >>> kodonnell at redhat.com M: 240-605-4654 >>> >>> >>> >>> On Fri, Dec 20, 2019 at 9:19 PM Tim Gast > wrote: >>> Thanks for the update Kevin. >>> >>> -Tim >>> >>>> On Dec 20, 2019, at 7:26 PM, Kevin O'Donnell > wrote: >>>> >>>> ? >>>> Were about 3 hours in. >>>> >>>> >>>> >>>> KEVIN O'DONNELL >>>> ARCHITECT MANAGER >>>> Red Hat?Red Hat NA Public Sector Consulting >>>> kodonnell at redhat.com M: 240-605-4654 >>>> >>>> >>>> >>>> On Fri, Dec 20, 2019 at 8:04 PM Kevin O'Donnell > wrote: >>>> Hello Tim, >>>> >>>> The aws rbac changes impacted our deployment and it didn?t get the deployment running till 5pm est. I keep an eye on the job. >>>> >>>> On Fri, Dec 20, 2019 at 5:19 PM Tim Gast > wrote: >>>> Hi. >>>> Checking in since it?s been a few hours in flight. >>>> Do we have a UP-DEV-A cluster and user access now? >>>> >>>> Thanks, >>>> -Tim >>>> >>>>> On Dec 20, 2019, at 11:49 AM, Tim Gast > wrote: >>>>> >>>>> ?Kevin, Mark, & Roc, >>>> >>>>> >>>>> Below is the request for dev accounts for UP-APPDEV-A. >>>>> Can we please mirror what these folks have in the current UP-Prod-A cluster. >>>>> >>>>> Also, is it possible to provide an ?elevated read? role for certain users (dcurrian) for discovery/troubleshooting? >>>>> >>>>> Please let me know when the environment is live and ready for the team to begin working with. >>>>> >>>>> Thanks, >>>>> -Tim >>>>> >>>>> Begin forwarded message: >>>>> >>>>>> From: "Curran, Daniel M" > >>>>>> Date: December 20, 2019 at 11:15:39 AM CST >>>>>> To: "tg at braingu.com " >, "nino at braingu.com " > >>>>>> Cc: "Buffaloe, Christopher" > >>>>>> Subject: Account for Dirty Dev >>>>>> >>>>>> ? >>>>>> Hey Tim and Nino, >>>>>> >>>>>> Here's the account list for the Ginyu Force (CCAT) team >>>>>> dcurran >>>>>> rcepeda >>>>>> msison >>>>>> psoliz >>>>>> atorres >>>>>> dpalmer >>>>>> rho >>>>>> kmacias >>>>>> jandrichak >>>>>> >>>>>> Obviously we'll take all the privileges you're willing to give us but if we can keep our dev privileges from the unified-platform cluster and put us in a ginyu/ccat group so we can see each others projects that would be enough. >>>>>> It might also be advantageous to have one admin account (dcurran-adm or ccat-adm) or at least one with elevated read privileges for discovery purposes but I wouldn't call that a hard requirement. >>>>>> >>>>>> Thanks, >>>>>> Daniel >>>> -- >>>> KEVIN O'DONNELL >>>> ARCHITECT MANAGER >>>> Red Hat?Red Hat NA Public Sector Consulting >>>> kodonnell at redhat.com M: 240-605-4654 >>>> >> > -- > KEVIN O'DONNELL > ARCHITECT MANAGER > Red Hat?Red Hat NA Public Sector Consulting > kodonnell at redhat.com M: 240-605-4654 > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3905 bytes Desc: not available URL: From darachch at redhat.com Mon Dec 23 20:56:45 2019 From: darachch at redhat.com (Dino Arachchi) Date: Mon, 23 Dec 2019 14:56:45 -0600 Subject: [Platformone] Account for Dirty Dev In-Reply-To: <1114349A-1B60-4C16-B453-A44057965097@braingu.com> References: <1114349A-1B60-4C16-B453-A44057965097@braingu.com> Message-ID: Tim, There were some security features that didn't get merged into the branch used to build the stack. This will require the same 4-5 hours as the last build due to the fully disconnected nature of the build process. We do have an open issue for connected environments that will cut down build time, but that feature is still in progress. Best Regards, DINO ARACHCHI SENIOR CONSULTANT darachch at redhat.com M: 848-203-1809 On Mon, Dec 23, 2019 at 2:49 PM Tim Gast wrote: > Thanks for the update. > What was the issue you found that would cause the entire stack to need a > rebuild? > Are we looking at another 5 hour build? > > > Thanks, > -Tim > > On Dec 23, 2019, at 2:47 PM, Dino Arachchi wrote: > > All, > > While provisionin the dev accounts, an issue was identified in the newly > created up-appdev-a cluster. Code changes are being made now, after which > the stack will be rebuilt. Once this is completed and verified, I will > proceed with the creation of the new accounts. These tasks are estimated to > be completed today, and an email will be sent out to confirm. > > Best Regards, > > DINO ARACHCHI > > SENIOR CONSULTANT > > darachch at redhat.com M: 848-203-1809 > > > > On Mon, Dec 23, 2019 at 1:46 PM Kevin O'Donnell > wrote: > >> Thanks Dino >> >> -Kevin >> >> On Mon, Dec 23, 2019 at 2:38 PM Tim Gast wrote: >> >>> Thanks Dino. >>> >>> -Tim >>> >>> On Dec 23, 2019, at 1:37 PM, Dino Arachchi wrote: >>> >>> ? >>> >>> Hi Tim, >>> >>> I'm on PTO but am working on creating the dev accounts at the moment. >>> The team should expect to receive credentials within the next couple hours. >>> >>> Best Regards, >>> >>> DINO ARACHCHI >>> >>> SENIOR CONSULTANT >>> >>> darachch at redhat.com M: 848-203-1809 >>> >>> >>> >>> On Mon, Dec 23, 2019 at 12:53 PM Tim Gast wrote: >>> >>>> Hi Kevin, >>>> >>>> Any word on when to be looking for those dev accounts? >>>> >>>> Thanks, >>>> -Tim >>>> >>>> On Dec 21, 2019, at 10:22 AM, Kevin O'Donnell >>>> wrote: >>>> >>>> Build completed in 4 hours and 40 min. On Monday we will get the >>>> accounts created and also add some code into our build to automate the >>>> account creation process. >>>> >>>> Everyone enjoy your weekend. >>>> >>>> >>>> KEVIN O'DONNELL >>>> ARCHITECT MANAGER >>>> >>>> Red Hat Red Hat NA Public Sector Consulting >>>> kodonnell at redhat.com M: >>>> 240-605-4654 >>>> >>>> >>>> >>>> On Fri, Dec 20, 2019 at 9:19 PM Tim Gast wrote: >>>> >>>>> Thanks for the update Kevin. >>>>> >>>>> -Tim >>>>> >>>>> On Dec 20, 2019, at 7:26 PM, Kevin O'Donnell >>>>> wrote: >>>>> >>>>> ? >>>>> Were about 3 hours in. >>>>> >>>>> >>>>> >>>>> KEVIN O'DONNELL >>>>> ARCHITECT MANAGER >>>>> >>>>> Red Hat Red Hat NA Public Sector Consulting >>>>> kodonnell at redhat.com M: >>>>> 240-605-4654 >>>>> >>>>> >>>>> >>>>> On Fri, Dec 20, 2019 at 8:04 PM Kevin O'Donnell >>>>> wrote: >>>>> >>>>>> Hello Tim, >>>>>> >>>>>> The aws rbac changes impacted our deployment and it didn?t get the >>>>>> deployment running till 5pm est. I keep an eye on the job. >>>>>> >>>>>> On Fri, Dec 20, 2019 at 5:19 PM Tim Gast wrote: >>>>>> >>>>>>> Hi. >>>>>>> Checking in since it?s been a few hours in flight. >>>>>>> Do we have a UP-DEV-A cluster and user access now? >>>>>>> >>>>>>> Thanks, >>>>>>> -Tim >>>>>>> >>>>>>> On Dec 20, 2019, at 11:49 AM, Tim Gast wrote: >>>>>>> >>>>>>> ?Kevin, Mark, & Roc, >>>>>>> >>>>>>> >>>>>>> Below is the request for dev accounts for UP-APPDEV-A. >>>>>>> Can we please mirror what these folks have in the current UP-Prod-A >>>>>>> cluster. >>>>>>> >>>>>>> Also, is it possible to provide an ?elevated read? role for certain >>>>>>> users (dcurrian) for discovery/troubleshooting? >>>>>>> >>>>>>> Please let me know when the environment is live and ready for the >>>>>>> team to begin working with. >>>>>>> >>>>>>> Thanks, >>>>>>> -Tim >>>>>>> >>>>>>> Begin forwarded message: >>>>>>> >>>>>>> *From:* "Curran, Daniel M" >>>>>>> *Date:* December 20, 2019 at 11:15:39 AM CST >>>>>>> *To:* "tg at braingu.com" , "nino at braingu.com" < >>>>>>> nino at braingu.com> >>>>>>> *Cc:* "Buffaloe, Christopher" >>>>>>> *Subject:* *Account for Dirty Dev* >>>>>>> >>>>>>> ? >>>>>>> Hey Tim and Nino, >>>>>>> >>>>>>> Here's the account list for the Ginyu Force (CCAT) team >>>>>>> dcurran >>>>>>> rcepeda >>>>>>> msison >>>>>>> psoliz >>>>>>> atorres >>>>>>> dpalmer >>>>>>> rho >>>>>>> kmacias >>>>>>> jandrichak >>>>>>> >>>>>>> Obviously we'll take all the privileges you're willing to give us >>>>>>> but if we can keep our dev privileges from the unified-platform cluster and >>>>>>> put us in a ginyu/ccat group so we can see each others projects that would >>>>>>> be enough. >>>>>>> >>>>>>> It might also be advantageous to have one admin account (dcurran-adm >>>>>>> or ccat-adm) or at least one with elevated read privileges for discovery >>>>>>> purposes but I wouldn't call that a hard requirement. >>>>>>> >>>>>>> Thanks, >>>>>>> Daniel >>>>>>> >>>>>>> -- >>>>>> KEVIN O'DONNELL >>>>>> ARCHITECT MANAGER >>>>>> >>>>>> Red Hat Red Hat NA Public Sector Consulting >>>>>> kodonnell at redhat.com M: >>>>>> 240-605-4654 >>>>>> >>>>>> >>>>> >>>> -- >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> >> Red Hat Red Hat NA Public Sector Consulting >> kodonnell at redhat.com M: >> 240-605-4654 >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tg at braingu.com Mon Dec 23 21:22:16 2019 From: tg at braingu.com (Tim Gast) Date: Mon, 23 Dec 2019 15:22:16 -0600 Subject: [Platformone] Account for Dirty Dev In-Reply-To: References: <1114349A-1B60-4C16-B453-A44057965097@braingu.com> Message-ID: <59C8F222-04A5-4223-B301-769FD98D5341@braingu.com> So we shouldn?t expect anything before 7pm this evening? Accounts tonight after build, or in the morning? Are we building from something other than Stable? Thanks, -Tim > On Dec 23, 2019, at 2:56 PM, Dino Arachchi wrote: > > Tim, > > There were some security features that didn't get merged into the branch used to build the stack. This will require the same 4-5 hours as the last build due to the fully disconnected nature of the build process. We do have an open issue for connected environments that will cut down build time, but that feature is still in progress. > > Best Regards, > > DINO ARACHCHI > SENIOR CONSULTANT > darachch at redhat.com M: 848-203-1809 > > > > > On Mon, Dec 23, 2019 at 2:49 PM Tim Gast > wrote: > Thanks for the update. > What was the issue you found that would cause the entire stack to need a rebuild? > Are we looking at another 5 hour build? > > > Thanks, > -Tim > >> On Dec 23, 2019, at 2:47 PM, Dino Arachchi > wrote: >> >> All, >> >> While provisionin the dev accounts, an issue was identified in the newly created up-appdev-a cluster. Code changes are being made now, after which the stack will be rebuilt. Once this is completed and verified, I will proceed with the creation of the new accounts. These tasks are estimated to be completed today, and an email will be sent out to confirm. >> >> Best Regards, >> >> DINO ARACHCHI >> SENIOR CONSULTANT >> darachch at redhat.com M: 848-203-1809 >> >> >> >> >> On Mon, Dec 23, 2019 at 1:46 PM Kevin O'Donnell > wrote: >> Thanks Dino >> >> -Kevin >> >> On Mon, Dec 23, 2019 at 2:38 PM Tim Gast > wrote: >> Thanks Dino. >> >> -Tim >> >>> On Dec 23, 2019, at 1:37 PM, Dino Arachchi > wrote: >>> >>> ? >> >>> Hi Tim, >>> >>> I'm on PTO but am working on creating the dev accounts at the moment. The team should expect to receive credentials within the next couple hours. >>> >>> Best Regards, >>> >>> DINO ARACHCHI >>> SENIOR CONSULTANT >>> darachch at redhat.com M: 848-203-1809 >>> >>> >>> >>> >>> On Mon, Dec 23, 2019 at 12:53 PM Tim Gast > wrote: >>> Hi Kevin, >>> >>> Any word on when to be looking for those dev accounts? >>> >>> Thanks, >>> -Tim >>> >>>> On Dec 21, 2019, at 10:22 AM, Kevin O'Donnell > wrote: >>>> >>>> Build completed in 4 hours and 40 min. On Monday we will get the accounts created and also add some code into our build to automate the account creation process. >>>> >>>> Everyone enjoy your weekend. >>>> >>>> >>>> KEVIN O'DONNELL >>>> ARCHITECT MANAGER >>>> Red Hat?Red Hat NA Public Sector Consulting >>>> kodonnell at redhat.com M: 240-605-4654 >>>> >>>> >>>> >>>> On Fri, Dec 20, 2019 at 9:19 PM Tim Gast > wrote: >>>> Thanks for the update Kevin. >>>> >>>> -Tim >>>> >>>>> On Dec 20, 2019, at 7:26 PM, Kevin O'Donnell > wrote: >>>>> >>>>> ? >>>>> Were about 3 hours in. >>>>> >>>>> >>>>> >>>>> KEVIN O'DONNELL >>>>> ARCHITECT MANAGER >>>>> Red Hat?Red Hat NA Public Sector Consulting >>>>> kodonnell at redhat.com M: 240-605-4654 >>>>> >>>>> >>>>> >>>>> On Fri, Dec 20, 2019 at 8:04 PM Kevin O'Donnell > wrote: >>>>> Hello Tim, >>>>> >>>>> The aws rbac changes impacted our deployment and it didn?t get the deployment running till 5pm est. I keep an eye on the job. >>>>> >>>>> On Fri, Dec 20, 2019 at 5:19 PM Tim Gast > wrote: >>>>> Hi. >>>>> Checking in since it?s been a few hours in flight. >>>>> Do we have a UP-DEV-A cluster and user access now? >>>>> >>>>> Thanks, >>>>> -Tim >>>>> >>>>>> On Dec 20, 2019, at 11:49 AM, Tim Gast > wrote: >>>>>> >>>>>> ?Kevin, Mark, & Roc, >>>>> >>>>>> >>>>>> Below is the request for dev accounts for UP-APPDEV-A. >>>>>> Can we please mirror what these folks have in the current UP-Prod-A cluster. >>>>>> >>>>>> Also, is it possible to provide an ?elevated read? role for certain users (dcurrian) for discovery/troubleshooting? >>>>>> >>>>>> Please let me know when the environment is live and ready for the team to begin working with. >>>>>> >>>>>> Thanks, >>>>>> -Tim >>>>>> >>>>>> Begin forwarded message: >>>>>> >>>>>>> From: "Curran, Daniel M" > >>>>>>> Date: December 20, 2019 at 11:15:39 AM CST >>>>>>> To: "tg at braingu.com " >, "nino at braingu.com " > >>>>>>> Cc: "Buffaloe, Christopher" > >>>>>>> Subject: Account for Dirty Dev >>>>>>> >>>>>>> ? >>>>>>> Hey Tim and Nino, >>>>>>> >>>>>>> Here's the account list for the Ginyu Force (CCAT) team >>>>>>> dcurran >>>>>>> rcepeda >>>>>>> msison >>>>>>> psoliz >>>>>>> atorres >>>>>>> dpalmer >>>>>>> rho >>>>>>> kmacias >>>>>>> jandrichak >>>>>>> >>>>>>> Obviously we'll take all the privileges you're willing to give us but if we can keep our dev privileges from the unified-platform cluster and put us in a ginyu/ccat group so we can see each others projects that would be enough. >>>>>>> It might also be advantageous to have one admin account (dcurran-adm or ccat-adm) or at least one with elevated read privileges for discovery purposes but I wouldn't call that a hard requirement. >>>>>>> >>>>>>> Thanks, >>>>>>> Daniel >>>>> -- >>>>> KEVIN O'DONNELL >>>>> ARCHITECT MANAGER >>>>> Red Hat?Red Hat NA Public Sector Consulting >>>>> kodonnell at redhat.com M: 240-605-4654 >>>>> >>> >> -- >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat?Red Hat NA Public Sector Consulting >> kodonnell at redhat.com M: 240-605-4654 >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3905 bytes Desc: not available URL: From kodonnel at redhat.com Mon Dec 23 21:38:41 2019 From: kodonnel at redhat.com (Kevin O'Donnell) Date: Mon, 23 Dec 2019 16:38:41 -0500 Subject: [Platformone] Account for Dirty Dev In-Reply-To: <59C8F222-04A5-4223-B301-769FD98D5341@braingu.com> References: <1114349A-1B60-4C16-B453-A44057965097@braingu.com> <59C8F222-04A5-4223-B301-769FD98D5341@braingu.com> Message-ID: We are planning on creating the accounts tonight. And yes this is from stable with some additional items that were not merged. We would have caught this earlier but most are on PTO for the holiday. And this is the perfect use case for the testing modules that we need to code out. I could have run the tests over the weekend but I was with my family. Thanks for being patient. Thanks, KEVIN O'DONNELL ARCHITECT MANAGER Red Hat Red Hat NA Public Sector Consulting kodonnell at redhat.com M: 240-605-4654 On Mon, Dec 23, 2019 at 4:22 PM Tim Gast wrote: > So we shouldn?t expect anything before 7pm this evening? Accounts tonight > after build, or in the morning? > Are we building from something other than Stable? > > Thanks, > -Tim > > On Dec 23, 2019, at 2:56 PM, Dino Arachchi wrote: > > Tim, > > There were some security features that didn't get merged into the branch > used to build the stack. This will require the same 4-5 hours as the last > build due to the fully disconnected nature of the build process. We do have > an open issue for connected environments that will cut down build time, but > that feature is still in progress. > > Best Regards, > > DINO ARACHCHI > > SENIOR CONSULTANT > > darachch at redhat.com M: 848-203-1809 > > > > On Mon, Dec 23, 2019 at 2:49 PM Tim Gast wrote: > >> Thanks for the update. >> What was the issue you found that would cause the entire stack to need a >> rebuild? >> Are we looking at another 5 hour build? >> >> >> Thanks, >> -Tim >> >> On Dec 23, 2019, at 2:47 PM, Dino Arachchi wrote: >> >> All, >> >> While provisionin the dev accounts, an issue was identified in the newly >> created up-appdev-a cluster. Code changes are being made now, after which >> the stack will be rebuilt. Once this is completed and verified, I will >> proceed with the creation of the new accounts. These tasks are estimated to >> be completed today, and an email will be sent out to confirm. >> >> Best Regards, >> >> DINO ARACHCHI >> >> SENIOR CONSULTANT >> >> darachch at redhat.com M: 848-203-1809 >> >> >> >> On Mon, Dec 23, 2019 at 1:46 PM Kevin O'Donnell >> wrote: >> >>> Thanks Dino >>> >>> -Kevin >>> >>> On Mon, Dec 23, 2019 at 2:38 PM Tim Gast wrote: >>> >>>> Thanks Dino. >>>> >>>> -Tim >>>> >>>> On Dec 23, 2019, at 1:37 PM, Dino Arachchi wrote: >>>> >>>> ? >>>> >>>> Hi Tim, >>>> >>>> I'm on PTO but am working on creating the dev accounts at the moment. >>>> The team should expect to receive credentials within the next couple hours. >>>> >>>> Best Regards, >>>> >>>> DINO ARACHCHI >>>> >>>> SENIOR CONSULTANT >>>> >>>> darachch at redhat.com M: 848-203-1809 >>>> >>>> >>>> >>>> On Mon, Dec 23, 2019 at 12:53 PM Tim Gast wrote: >>>> >>>>> Hi Kevin, >>>>> >>>>> Any word on when to be looking for those dev accounts? >>>>> >>>>> Thanks, >>>>> -Tim >>>>> >>>>> On Dec 21, 2019, at 10:22 AM, Kevin O'Donnell >>>>> wrote: >>>>> >>>>> Build completed in 4 hours and 40 min. On Monday we will get the >>>>> accounts created and also add some code into our build to automate the >>>>> account creation process. >>>>> >>>>> Everyone enjoy your weekend. >>>>> >>>>> >>>>> KEVIN O'DONNELL >>>>> ARCHITECT MANAGER >>>>> >>>>> Red Hat Red Hat NA Public Sector Consulting >>>>> kodonnell at redhat.com M: >>>>> 240-605-4654 >>>>> >>>>> >>>>> >>>>> On Fri, Dec 20, 2019 at 9:19 PM Tim Gast wrote: >>>>> >>>>>> Thanks for the update Kevin. >>>>>> >>>>>> -Tim >>>>>> >>>>>> On Dec 20, 2019, at 7:26 PM, Kevin O'Donnell >>>>>> wrote: >>>>>> >>>>>> ? >>>>>> Were about 3 hours in. >>>>>> >>>>>> >>>>>> >>>>>> KEVIN O'DONNELL >>>>>> ARCHITECT MANAGER >>>>>> >>>>>> Red Hat Red Hat NA Public Sector Consulting >>>>>> kodonnell at redhat.com M: >>>>>> 240-605-4654 >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Dec 20, 2019 at 8:04 PM Kevin O'Donnell >>>>>> wrote: >>>>>> >>>>>>> Hello Tim, >>>>>>> >>>>>>> The aws rbac changes impacted our deployment and it didn?t get the >>>>>>> deployment running till 5pm est. I keep an eye on the job. >>>>>>> >>>>>>> On Fri, Dec 20, 2019 at 5:19 PM Tim Gast wrote: >>>>>>> >>>>>>>> Hi. >>>>>>>> Checking in since it?s been a few hours in flight. >>>>>>>> Do we have a UP-DEV-A cluster and user access now? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> -Tim >>>>>>>> >>>>>>>> On Dec 20, 2019, at 11:49 AM, Tim Gast wrote: >>>>>>>> >>>>>>>> ?Kevin, Mark, & Roc, >>>>>>>> >>>>>>>> >>>>>>>> Below is the request for dev accounts for UP-APPDEV-A. >>>>>>>> Can we please mirror what these folks have in the current UP-Prod-A >>>>>>>> cluster. >>>>>>>> >>>>>>>> Also, is it possible to provide an ?elevated read? role for certain >>>>>>>> users (dcurrian) for discovery/troubleshooting? >>>>>>>> >>>>>>>> Please let me know when the environment is live and ready for the >>>>>>>> team to begin working with. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> -Tim >>>>>>>> >>>>>>>> Begin forwarded message: >>>>>>>> >>>>>>>> *From:* "Curran, Daniel M" >>>>>>>> *Date:* December 20, 2019 at 11:15:39 AM CST >>>>>>>> *To:* "tg at braingu.com" , "nino at braingu.com" < >>>>>>>> nino at braingu.com> >>>>>>>> *Cc:* "Buffaloe, Christopher" >>>>>>>> *Subject:* *Account for Dirty Dev* >>>>>>>> >>>>>>>> ? >>>>>>>> Hey Tim and Nino, >>>>>>>> >>>>>>>> Here's the account list for the Ginyu Force (CCAT) team >>>>>>>> dcurran >>>>>>>> rcepeda >>>>>>>> msison >>>>>>>> psoliz >>>>>>>> atorres >>>>>>>> dpalmer >>>>>>>> rho >>>>>>>> kmacias >>>>>>>> jandrichak >>>>>>>> >>>>>>>> Obviously we'll take all the privileges you're willing to give us >>>>>>>> but if we can keep our dev privileges from the unified-platform cluster and >>>>>>>> put us in a ginyu/ccat group so we can see each others projects that would >>>>>>>> be enough. >>>>>>>> >>>>>>>> It might also be advantageous to have one admin account >>>>>>>> (dcurran-adm or ccat-adm) or at least one with elevated read privileges for >>>>>>>> discovery purposes but I wouldn't call that a hard requirement. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Daniel >>>>>>>> >>>>>>>> -- >>>>>>> KEVIN O'DONNELL >>>>>>> ARCHITECT MANAGER >>>>>>> >>>>>>> Red Hat Red Hat NA Public Sector Consulting >>>>>>> >>>>>>> kodonnell at redhat.com M: >>>>>>> 240-605-4654 >>>>>>> >>>>>>> >>>>>> >>>>> -- >>> KEVIN O'DONNELL >>> ARCHITECT MANAGER >>> >>> Red Hat Red Hat NA Public Sector Consulting >>> kodonnell at redhat.com M: >>> 240-605-4654 >>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From darachch at redhat.com Tue Dec 24 03:13:18 2019 From: darachch at redhat.com (Dino Arachchi) Date: Mon, 23 Dec 2019 21:13:18 -0600 Subject: [Platformone] Account for Dirty Dev In-Reply-To: References: <1114349A-1B60-4C16-B453-A44057965097@braingu.com> <59C8F222-04A5-4223-B301-769FD98D5341@braingu.com> Message-ID: Hi, The rebuild of the stack has completed, and its functionality verified. Additionally, I have sent out login information to the following accounts as requested: dcurran rcepeda msison psoliz atorres dpalmer rho jandrichak However I was not able to find a name or email address for the user "kmacias" so the account was created but not distributed. Thanks very much for your patience, and please let me know if there are any questions or issues logging in. Best Regards, DINO ARACHCHI SENIOR CONSULTANT darachch at redhat.com M: 848-203-1809 On Mon, Dec 23, 2019 at 3:38 PM Kevin O'Donnell wrote: > We are planning on creating the accounts tonight. And yes this is from > stable with some additional items that were not merged. We would have > caught this earlier but most are on PTO for the holiday. And this is the > perfect use case for the testing modules that we need to code out. > > I could have run the tests over the weekend but I was with my family. > > Thanks for being patient. > > Thanks, > > KEVIN O'DONNELL > > ARCHITECT MANAGER > > Red Hat Red Hat NA Public Sector Consulting > > kodonnell at redhat.com M: > 240-605-4654 > > > > On Mon, Dec 23, 2019 at 4:22 PM Tim Gast wrote: > >> So we shouldn?t expect anything before 7pm this evening? Accounts tonight >> after build, or in the morning? >> Are we building from something other than Stable? >> >> Thanks, >> -Tim >> >> On Dec 23, 2019, at 2:56 PM, Dino Arachchi wrote: >> >> Tim, >> >> There were some security features that didn't get merged into the branch >> used to build the stack. This will require the same 4-5 hours as the last >> build due to the fully disconnected nature of the build process. We do have >> an open issue for connected environments that will cut down build time, but >> that feature is still in progress. >> >> Best Regards, >> >> DINO ARACHCHI >> >> SENIOR CONSULTANT >> >> darachch at redhat.com M: 848-203-1809 >> >> >> >> On Mon, Dec 23, 2019 at 2:49 PM Tim Gast wrote: >> >>> Thanks for the update. >>> What was the issue you found that would cause the entire stack to need a >>> rebuild? >>> Are we looking at another 5 hour build? >>> >>> >>> Thanks, >>> -Tim >>> >>> On Dec 23, 2019, at 2:47 PM, Dino Arachchi wrote: >>> >>> All, >>> >>> While provisionin the dev accounts, an issue was identified in the newly >>> created up-appdev-a cluster. Code changes are being made now, after which >>> the stack will be rebuilt. Once this is completed and verified, I will >>> proceed with the creation of the new accounts. These tasks are estimated to >>> be completed today, and an email will be sent out to confirm. >>> >>> Best Regards, >>> >>> DINO ARACHCHI >>> >>> SENIOR CONSULTANT >>> >>> darachch at redhat.com M: 848-203-1809 >>> >>> >>> >>> On Mon, Dec 23, 2019 at 1:46 PM Kevin O'Donnell >>> wrote: >>> >>>> Thanks Dino >>>> >>>> -Kevin >>>> >>>> On Mon, Dec 23, 2019 at 2:38 PM Tim Gast wrote: >>>> >>>>> Thanks Dino. >>>>> >>>>> -Tim >>>>> >>>>> On Dec 23, 2019, at 1:37 PM, Dino Arachchi >>>>> wrote: >>>>> >>>>> ? >>>>> >>>>> Hi Tim, >>>>> >>>>> I'm on PTO but am working on creating the dev accounts at the moment. >>>>> The team should expect to receive credentials within the next couple hours. >>>>> >>>>> Best Regards, >>>>> >>>>> DINO ARACHCHI >>>>> >>>>> SENIOR CONSULTANT >>>>> >>>>> darachch at redhat.com M: 848-203-1809 >>>>> >>>>> >>>>> >>>>> On Mon, Dec 23, 2019 at 12:53 PM Tim Gast wrote: >>>>> >>>>>> Hi Kevin, >>>>>> >>>>>> Any word on when to be looking for those dev accounts? >>>>>> >>>>>> Thanks, >>>>>> -Tim >>>>>> >>>>>> On Dec 21, 2019, at 10:22 AM, Kevin O'Donnell >>>>>> wrote: >>>>>> >>>>>> Build completed in 4 hours and 40 min. On Monday we will get the >>>>>> accounts created and also add some code into our build to automate the >>>>>> account creation process. >>>>>> >>>>>> Everyone enjoy your weekend. >>>>>> >>>>>> >>>>>> KEVIN O'DONNELL >>>>>> ARCHITECT MANAGER >>>>>> >>>>>> Red Hat Red Hat NA Public Sector Consulting >>>>>> kodonnell at redhat.com M: >>>>>> 240-605-4654 >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Dec 20, 2019 at 9:19 PM Tim Gast wrote: >>>>>> >>>>>>> Thanks for the update Kevin. >>>>>>> >>>>>>> -Tim >>>>>>> >>>>>>> On Dec 20, 2019, at 7:26 PM, Kevin O'Donnell >>>>>>> wrote: >>>>>>> >>>>>>> ? >>>>>>> Were about 3 hours in. >>>>>>> >>>>>>> >>>>>>> >>>>>>> KEVIN O'DONNELL >>>>>>> ARCHITECT MANAGER >>>>>>> >>>>>>> Red Hat Red Hat NA Public Sector Consulting >>>>>>> >>>>>>> kodonnell at redhat.com M: >>>>>>> 240-605-4654 >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Dec 20, 2019 at 8:04 PM Kevin O'Donnell >>>>>>> wrote: >>>>>>> >>>>>>>> Hello Tim, >>>>>>>> >>>>>>>> The aws rbac changes impacted our deployment and it didn?t get the >>>>>>>> deployment running till 5pm est. I keep an eye on the job. >>>>>>>> >>>>>>>> On Fri, Dec 20, 2019 at 5:19 PM Tim Gast wrote: >>>>>>>> >>>>>>>>> Hi. >>>>>>>>> Checking in since it?s been a few hours in flight. >>>>>>>>> Do we have a UP-DEV-A cluster and user access now? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> -Tim >>>>>>>>> >>>>>>>>> On Dec 20, 2019, at 11:49 AM, Tim Gast wrote: >>>>>>>>> >>>>>>>>> ?Kevin, Mark, & Roc, >>>>>>>>> >>>>>>>>> >>>>>>>>> Below is the request for dev accounts for UP-APPDEV-A. >>>>>>>>> Can we please mirror what these folks have in the current >>>>>>>>> UP-Prod-A cluster. >>>>>>>>> >>>>>>>>> Also, is it possible to provide an ?elevated read? role for >>>>>>>>> certain users (dcurrian) for discovery/troubleshooting? >>>>>>>>> >>>>>>>>> Please let me know when the environment is live and ready for the >>>>>>>>> team to begin working with. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> -Tim >>>>>>>>> >>>>>>>>> Begin forwarded message: >>>>>>>>> >>>>>>>>> *From:* "Curran, Daniel M" >>>>>>>>> *Date:* December 20, 2019 at 11:15:39 AM CST >>>>>>>>> *To:* "tg at braingu.com" , "nino at braingu.com" < >>>>>>>>> nino at braingu.com> >>>>>>>>> *Cc:* "Buffaloe, Christopher" >>>>>>>>> *Subject:* *Account for Dirty Dev* >>>>>>>>> >>>>>>>>> ? >>>>>>>>> Hey Tim and Nino, >>>>>>>>> >>>>>>>>> Here's the account list for the Ginyu Force (CCAT) team >>>>>>>>> dcurran >>>>>>>>> rcepeda >>>>>>>>> msison >>>>>>>>> psoliz >>>>>>>>> atorres >>>>>>>>> dpalmer >>>>>>>>> rho >>>>>>>>> kmacias >>>>>>>>> jandrichak >>>>>>>>> >>>>>>>>> Obviously we'll take all the privileges you're willing to give us >>>>>>>>> but if we can keep our dev privileges from the unified-platform cluster and >>>>>>>>> put us in a ginyu/ccat group so we can see each others projects that would >>>>>>>>> be enough. >>>>>>>>> >>>>>>>>> It might also be advantageous to have one admin account >>>>>>>>> (dcurran-adm or ccat-adm) or at least one with elevated read privileges for >>>>>>>>> discovery purposes but I wouldn't call that a hard requirement. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Daniel >>>>>>>>> >>>>>>>>> -- >>>>>>>> KEVIN O'DONNELL >>>>>>>> ARCHITECT MANAGER >>>>>>>> >>>>>>>> Red Hat Red Hat NA Public Sector Consulting >>>>>>>> >>>>>>>> kodonnell at redhat.com M: >>>>>>>> 240-605-4654 >>>>>>>> >>>>>>>> >>>>>>> >>>>>> -- >>>> KEVIN O'DONNELL >>>> ARCHITECT MANAGER >>>> >>>> Red Hat Red Hat NA Public Sector Consulting >>>> kodonnell at redhat.com M: >>>> 240-605-4654 >>>> >>>> >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From tg at braingu.com Tue Dec 24 03:26:13 2019 From: tg at braingu.com (Tim Gast) Date: Mon, 23 Dec 2019 21:26:13 -0600 Subject: [Platformone] Account for Dirty Dev In-Reply-To: References: Message-ID: <975280B1-3CD7-48D5-B90A-057DFFDD2B05@braingu.com> Thanks Dino! -Tim > On Dec 23, 2019, at 9:13 PM, Dino Arachchi wrote: > > ? > Hi, > > The rebuild of the stack has completed, and its functionality verified. Additionally, I have sent out login information to the following accounts as requested: > > dcurran > rcepeda > msison > psoliz > atorres > dpalmer > rho > jandrichak > > However I was not able to find a name or email address for the user "kmacias" so the account was created but not distributed. > > Thanks very much for your patience, and please let me know if there are any questions or issues logging in. > > Best Regards, > > DINO ARACHCHI > SENIOR CONSULTANT > darachch at redhat.com M: 848-203-1809 > > > > >> On Mon, Dec 23, 2019 at 3:38 PM Kevin O'Donnell wrote: >> We are planning on creating the accounts tonight. And yes this is from stable with some additional items that were not merged. We would have caught this earlier but most are on PTO for the holiday. And this is the perfect use case for the testing modules that we need to code out. >> >> I could have run the tests over the weekend but I was with my family. >> >> Thanks for being patient. >> >> Thanks, >> >> KEVIN O'DONNELL >> ARCHITECT MANAGER >> Red Hat Red Hat NA Public Sector Consulting >> kodonnell at redhat.com M: 240-605-4654 >> >> >> >>> On Mon, Dec 23, 2019 at 4:22 PM Tim Gast wrote: >>> So we shouldn?t expect anything before 7pm this evening? Accounts tonight after build, or in the morning? >>> Are we building from something other than Stable? >>> >>> Thanks, >>> -Tim >>> >>>> On Dec 23, 2019, at 2:56 PM, Dino Arachchi wrote: >>>> >>>> Tim, >>>> >>>> There were some security features that didn't get merged into the branch used to build the stack. This will require the same 4-5 hours as the last build due to the fully disconnected nature of the build process. We do have an open issue for connected environments that will cut down build time, but that feature is still in progress. >>>> >>>> Best Regards, >>>> >>>> DINO ARACHCHI >>>> SENIOR CONSULTANT >>>> darachch at redhat.com M: 848-203-1809 >>>> >>>> >>>> >>>> >>>>> On Mon, Dec 23, 2019 at 2:49 PM Tim Gast wrote: >>>>> Thanks for the update. >>>>> What was the issue you found that would cause the entire stack to need a rebuild? >>>>> Are we looking at another 5 hour build? >>>>> >>>>> >>>>> Thanks, >>>>> -Tim >>>>> >>>>>> On Dec 23, 2019, at 2:47 PM, Dino Arachchi wrote: >>>>>> >>>>>> All, >>>>>> >>>>>> While provisionin the dev accounts, an issue was identified in the newly created up-appdev-a cluster. Code changes are being made now, after which the stack will be rebuilt. Once this is completed and verified, I will proceed with the creation of the new accounts. These tasks are estimated to be completed today, and an email will be sent out to confirm. >>>>>> >>>>>> Best Regards, >>>>>> >>>>>> DINO ARACHCHI >>>>>> SENIOR CONSULTANT >>>>>> darachch at redhat.com M: 848-203-1809 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On Mon, Dec 23, 2019 at 1:46 PM Kevin O'Donnell wrote: >>>>>>> Thanks Dino >>>>>>> >>>>>>> -Kevin >>>>>>> >>>>>>>> On Mon, Dec 23, 2019 at 2:38 PM Tim Gast wrote: >>>>>>>> Thanks Dino. >>>>>>>> >>>>>>>> -Tim >>>>>>>> >>>>>>>>>> On Dec 23, 2019, at 1:37 PM, Dino Arachchi wrote: >>>>>>>>>> >>>>>>>>> ? >>>>>>>> >>>>>>>>> Hi Tim, >>>>>>>>> >>>>>>>>> I'm on PTO but am working on creating the dev accounts at the moment. The team should expect to receive credentials within the next couple hours. >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> >>>>>>>>> DINO ARACHCHI >>>>>>>>> SENIOR CONSULTANT >>>>>>>>> darachch at redhat.com M: 848-203-1809 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Mon, Dec 23, 2019 at 12:53 PM Tim Gast wrote: >>>>>>>>>> Hi Kevin, >>>>>>>>>> >>>>>>>>>> Any word on when to be looking for those dev accounts? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> -Tim >>>>>>>>>> >>>>>>>>>>> On Dec 21, 2019, at 10:22 AM, Kevin O'Donnell wrote: >>>>>>>>>>> >>>>>>>>>>> Build completed in 4 hours and 40 min. On Monday we will get the accounts created and also add some code into our build to automate the account creation process. >>>>>>>>>>> >>>>>>>>>>> Everyone enjoy your weekend. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> KEVIN O'DONNELL >>>>>>>>>>> ARCHITECT MANAGER >>>>>>>>>>> Red Hat Red Hat NA Public Sector Consulting >>>>>>>>>>> kodonnell at redhat.com M: 240-605-4654 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Fri, Dec 20, 2019 at 9:19 PM Tim Gast wrote: >>>>>>>>>>>> Thanks for the update Kevin. >>>>>>>>>>>> >>>>>>>>>>>> -Tim >>>>>>>>>>>> >>>>>>>>>>>>>> On Dec 20, 2019, at 7:26 PM, Kevin O'Donnell wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>> ? >>>>>>>>>>>>> Were about 3 hours in. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> KEVIN O'DONNELL >>>>>>>>>>>>> ARCHITECT MANAGER >>>>>>>>>>>>> Red Hat Red Hat NA Public Sector Consulting >>>>>>>>>>>>> kodonnell at redhat.com M: 240-605-4654 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Dec 20, 2019 at 8:04 PM Kevin O'Donnell wrote: >>>>>>>>>>>>>> Hello Tim, >>>>>>>>>>>>>> >>>>>>>>>>>>>> The aws rbac changes impacted our deployment and it didn?t get the deployment running till 5pm est. I keep an eye on the job. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Dec 20, 2019 at 5:19 PM Tim Gast wrote: >>>>>>>>>>>>>>> Hi. >>>>>>>>>>>>>>> Checking in since it?s been a few hours in flight. >>>>>>>>>>>>>>> Do we have a UP-DEV-A cluster and user access now? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> -Tim >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Dec 20, 2019, at 11:49 AM, Tim Gast wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ?Kevin, Mark, & Roc, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Below is the request for dev accounts for UP-APPDEV-A. >>>>>>>>>>>>>>>> Can we please mirror what these folks have in the current UP-Prod-A cluster. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also, is it possible to provide an ?elevated read? role for certain users (dcurrian) for discovery/troubleshooting? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Please let me know when the environment is live and ready for the team to begin working with. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> -Tim >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Begin forwarded message: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> From: "Curran, Daniel M" >>>>>>>>>>>>>>>>> Date: December 20, 2019 at 11:15:39 AM CST >>>>>>>>>>>>>>>>> To: "tg at braingu.com" , "nino at braingu.com" >>>>>>>>>>>>>>>>> Cc: "Buffaloe, Christopher" >>>>>>>>>>>>>>>>> Subject: Account for Dirty Dev >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ? >>>>>>>>>>>>>>>>> Hey Tim and Nino, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Here's the account list for the Ginyu Force (CCAT) team >>>>>>>>>>>>>>>>> dcurran >>>>>>>>>>>>>>>>> rcepeda >>>>>>>>>>>>>>>>> msison >>>>>>>>>>>>>>>>> psoliz >>>>>>>>>>>>>>>>> atorres >>>>>>>>>>>>>>>>> dpalmer >>>>>>>>>>>>>>>>> rho >>>>>>>>>>>>>>>>> kmacias >>>>>>>>>>>>>>>>> jandrichak >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Obviously we'll take all the privileges you're willing to give us but if we can keep our dev privileges from the unified-platform cluster and put us in a ginyu/ccat group so we can see each others projects that would be enough. >>>>>>>>>>>>>>>>> It might also be advantageous to have one admin account (dcurran-adm or ccat-adm) or at least one with elevated read privileges for discovery purposes but I wouldn't call that a hard requirement. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Daniel >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> KEVIN O'DONNELL >>>>>>>>>>>>>> ARCHITECT MANAGER >>>>>>>>>>>>>> Red Hat Red Hat NA Public Sector Consulting >>>>>>>>>>>>>> kodonnell at redhat.com M: 240-605-4654 >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>> -- >>>>>>> KEVIN O'DONNELL >>>>>>> ARCHITECT MANAGER >>>>>>> Red Hat Red Hat NA Public Sector Consulting >>>>>>> kodonnell at redhat.com M: 240-605-4654 >>>>>>> >>>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2349 bytes Desc: not available URL: From mnissley at redhat.com Tue Dec 24 03:57:24 2019 From: mnissley at redhat.com (Mark Nissley) Date: Mon, 23 Dec 2019 21:57:24 -0600 Subject: [Platformone] Account for Dirty Dev In-Reply-To: <975280B1-3CD7-48D5-B90A-057DFFDD2B05@braingu.com> References: <975280B1-3CD7-48D5-B90A-057DFFDD2B05@braingu.com> Message-ID: Nice work! Mark Nissley Sr. Technical Project Manager 850-530-3234 On Mon, Dec 23, 2019, 9:26 PM Tim Gast wrote: > Thanks Dino! > > -Tim > > On Dec 23, 2019, at 9:13 PM, Dino Arachchi wrote: > > ? > Hi, > > The rebuild of the stack has completed, and its functionality verified. > Additionally, I have sent out login information to the following accounts > as requested: > > dcurran > rcepeda > msison > psoliz > atorres > dpalmer > rho > jandrichak > > However I was not able to find a name or email address for the user > "kmacias" so the account was created but not distributed. > > Thanks very much for your patience, and please let me know if there are > any questions or issues logging in. > > Best Regards, > > > DINO ARACHCHI > > SENIOR CONSULTANT > > darachch at redhat.com M: 848-203-1809 > > > > On Mon, Dec 23, 2019 at 3:38 PM Kevin O'Donnell > wrote: > >> We are planning on creating the accounts tonight. And yes this is from >> stable with some additional items that were not merged. We would have >> caught this earlier but most are on PTO for the holiday. And this is the >> perfect use case for the testing modules that we need to code out. >> >> I could have run the tests over the weekend but I was with my family. >> >> Thanks for being patient. >> >> Thanks, >> >> KEVIN O'DONNELL >> >> ARCHITECT MANAGER >> >> Red Hat Red Hat NA Public Sector Consulting >> >> kodonnell at redhat.com M: >> 240-605-4654 >> >> >> >> On Mon, Dec 23, 2019 at 4:22 PM Tim Gast wrote: >> >>> So we shouldn?t expect anything before 7pm this evening? Accounts >>> tonight after build, or in the morning? >>> Are we building from something other than Stable? >>> >>> Thanks, >>> -Tim >>> >>> On Dec 23, 2019, at 2:56 PM, Dino Arachchi wrote: >>> >>> Tim, >>> >>> There were some security features that didn't get merged into the branch >>> used to build the stack. This will require the same 4-5 hours as the last >>> build due to the fully disconnected nature of the build process. We do have >>> an open issue for connected environments that will cut down build time, but >>> that feature is still in progress. >>> >>> Best Regards, >>> >>> DINO ARACHCHI >>> >>> SENIOR CONSULTANT >>> >>> darachch at redhat.com M: 848-203-1809 >>> >>> >>> >>> On Mon, Dec 23, 2019 at 2:49 PM Tim Gast wrote: >>> >>>> Thanks for the update. >>>> What was the issue you found that would cause the entire stack to need >>>> a rebuild? >>>> Are we looking at another 5 hour build? >>>> >>>> >>>> Thanks, >>>> -Tim >>>> >>>> On Dec 23, 2019, at 2:47 PM, Dino Arachchi wrote: >>>> >>>> All, >>>> >>>> While provisionin the dev accounts, an issue was identified in the >>>> newly created up-appdev-a cluster. Code changes are being made now, after >>>> which the stack will be rebuilt. Once this is completed and verified, I >>>> will proceed with the creation of the new accounts. These tasks are >>>> estimated to be completed today, and an email will be sent out to confirm. >>>> >>>> Best Regards, >>>> >>>> DINO ARACHCHI >>>> >>>> SENIOR CONSULTANT >>>> >>>> darachch at redhat.com M: 848-203-1809 >>>> >>>> >>>> >>>> On Mon, Dec 23, 2019 at 1:46 PM Kevin O'Donnell >>>> wrote: >>>> >>>>> Thanks Dino >>>>> >>>>> -Kevin >>>>> >>>>> On Mon, Dec 23, 2019 at 2:38 PM Tim Gast wrote: >>>>> >>>>>> Thanks Dino. >>>>>> >>>>>> -Tim >>>>>> >>>>>> On Dec 23, 2019, at 1:37 PM, Dino Arachchi >>>>>> wrote: >>>>>> >>>>>> ? >>>>>> >>>>>> Hi Tim, >>>>>> >>>>>> I'm on PTO but am working on creating the dev accounts at the moment. >>>>>> The team should expect to receive credentials within the next couple hours. >>>>>> >>>>>> Best Regards, >>>>>> >>>>>> DINO ARACHCHI >>>>>> >>>>>> SENIOR CONSULTANT >>>>>> >>>>>> darachch at redhat.com M: 848-203-1809 >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Dec 23, 2019 at 12:53 PM Tim Gast wrote: >>>>>> >>>>>>> Hi Kevin, >>>>>>> >>>>>>> Any word on when to be looking for those dev accounts? >>>>>>> >>>>>>> Thanks, >>>>>>> -Tim >>>>>>> >>>>>>> On Dec 21, 2019, at 10:22 AM, Kevin O'Donnell >>>>>>> wrote: >>>>>>> >>>>>>> Build completed in 4 hours and 40 min. On Monday we will get the >>>>>>> accounts created and also add some code into our build to automate the >>>>>>> account creation process. >>>>>>> >>>>>>> Everyone enjoy your weekend. >>>>>>> >>>>>>> >>>>>>> KEVIN O'DONNELL >>>>>>> ARCHITECT MANAGER >>>>>>> >>>>>>> Red Hat Red Hat NA Public Sector Consulting >>>>>>> >>>>>>> kodonnell at redhat.com M: >>>>>>> 240-605-4654 >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Dec 20, 2019 at 9:19 PM Tim Gast wrote: >>>>>>> >>>>>>>> Thanks for the update Kevin. >>>>>>>> >>>>>>>> -Tim >>>>>>>> >>>>>>>> On Dec 20, 2019, at 7:26 PM, Kevin O'Donnell >>>>>>>> wrote: >>>>>>>> >>>>>>>> ? >>>>>>>> Were about 3 hours in. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> KEVIN O'DONNELL >>>>>>>> ARCHITECT MANAGER >>>>>>>> >>>>>>>> Red Hat Red Hat NA Public Sector Consulting >>>>>>>> >>>>>>>> kodonnell at redhat.com M: >>>>>>>> 240-605-4654 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Dec 20, 2019 at 8:04 PM Kevin O'Donnell < >>>>>>>> kodonnel at redhat.com> wrote: >>>>>>>> >>>>>>>>> Hello Tim, >>>>>>>>> >>>>>>>>> The aws rbac changes impacted our deployment and it didn?t get the >>>>>>>>> deployment running till 5pm est. I keep an eye on the job. >>>>>>>>> >>>>>>>>> On Fri, Dec 20, 2019 at 5:19 PM Tim Gast wrote: >>>>>>>>> >>>>>>>>>> Hi. >>>>>>>>>> Checking in since it?s been a few hours in flight. >>>>>>>>>> Do we have a UP-DEV-A cluster and user access now? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> -Tim >>>>>>>>>> >>>>>>>>>> On Dec 20, 2019, at 11:49 AM, Tim Gast wrote: >>>>>>>>>> >>>>>>>>>> ?Kevin, Mark, & Roc, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Below is the request for dev accounts for UP-APPDEV-A. >>>>>>>>>> Can we please mirror what these folks have in the current >>>>>>>>>> UP-Prod-A cluster. >>>>>>>>>> >>>>>>>>>> Also, is it possible to provide an ?elevated read? role for >>>>>>>>>> certain users (dcurrian) for discovery/troubleshooting? >>>>>>>>>> >>>>>>>>>> Please let me know when the environment is live and ready for the >>>>>>>>>> team to begin working with. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> -Tim >>>>>>>>>> >>>>>>>>>> Begin forwarded message: >>>>>>>>>> >>>>>>>>>> *From:* "Curran, Daniel M" >>>>>>>>>> *Date:* December 20, 2019 at 11:15:39 AM CST >>>>>>>>>> *To:* "tg at braingu.com" , "nino at braingu.com" < >>>>>>>>>> nino at braingu.com> >>>>>>>>>> *Cc:* "Buffaloe, Christopher" >>>>>>>>>> *Subject:* *Account for Dirty Dev* >>>>>>>>>> >>>>>>>>>> ? >>>>>>>>>> Hey Tim and Nino, >>>>>>>>>> >>>>>>>>>> Here's the account list for the Ginyu Force (CCAT) team >>>>>>>>>> dcurran >>>>>>>>>> rcepeda >>>>>>>>>> msison >>>>>>>>>> psoliz >>>>>>>>>> atorres >>>>>>>>>> dpalmer >>>>>>>>>> rho >>>>>>>>>> kmacias >>>>>>>>>> jandrichak >>>>>>>>>> >>>>>>>>>> Obviously we'll take all the privileges you're willing to give us >>>>>>>>>> but if we can keep our dev privileges from the unified-platform cluster and >>>>>>>>>> put us in a ginyu/ccat group so we can see each others projects that would >>>>>>>>>> be enough. >>>>>>>>>> >>>>>>>>>> It might also be advantageous to have one admin account >>>>>>>>>> (dcurran-adm or ccat-adm) or at least one with elevated read privileges for >>>>>>>>>> discovery purposes but I wouldn't call that a hard requirement. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Daniel >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> KEVIN O'DONNELL >>>>>>>>> ARCHITECT MANAGER >>>>>>>>> >>>>>>>>> Red Hat Red Hat NA Public Sector Consulting >>>>>>>>> >>>>>>>>> kodonnell at redhat.com M: >>>>>>>>> 240-605-4654 >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> -- >>>>> KEVIN O'DONNELL >>>>> ARCHITECT MANAGER >>>>> >>>>> Red Hat Red Hat NA Public Sector Consulting >>>>> kodonnell at redhat.com M: >>>>> 240-605-4654 >>>>> >>>>> >>>> >>>> >>> _______________________________________________ > platformONE mailing list > platformONE at redhat.com > https://www.redhat.com/mailman/listinfo/platformone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From roger.dirocco.4 at us.af.mil Fri Dec 27 18:09:55 2019 From: roger.dirocco.4 at us.af.mil (DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP) Date: Fri, 27 Dec 2019 18:09:55 +0000 Subject: [Platformone] Cannot Access DCCSCR GitLab - Is it Down/Turned Off? Message-ID: A non-text attachment was scrubbed... Name: smime.p7m Type: application/pkcs7-mime Size: 9236 bytes Desc: not available URL: From roger.dirocco.4 at us.af.mil Thu Dec 19 22:35:12 2019 From: roger.dirocco.4 at us.af.mil (DIROCCO, ROGER E GG-13 USAF AFMC ESC/AFLCMC/HNCP) Date: Thu, 19 Dec 2019 22:35:12 -0000 Subject: [Platformone] On-Call Holiday (Next 2 Weeks) POCs for DSOP/DCAR and P1 Message-ID: A non-text attachment was scrubbed... Name: smime.p7m Type: application/pkcs7-mime Size: 15485 bytes Desc: not available URL: