[katello-devel] Organization deletion bug, orchestration, testing

Fri Dec 14 08:06:33 UTC 2012

----- Original Message -----
> Guys,
> 
> now for two days I am working on a nasty bug. When you delete an
> organization, its red hat provider (by default created with name "Red
> Hat") is not deleted. If you have imported any manifests there or
> synced
> content, it is not deleted too. But the organization itself gets
> deleted
> from our database. And there is more - orchestration is somehow
> broken,
> so things are also not deleted in backend engines. Even if I fix the
> provider deletion, deletion orchestration just does not work. Mainly
> because organization deletion was not working for a while.
> 
> I think this is typical error that shows major weak points of our
> orchestration code being tightly coupled with models. Organization
> deletion was refactored to be a background job, because it can take a
> long time to delete.
> 
> The implementation is a bit hacky - each organization has a task_id
> flag
> and when it is set to non-nil value, organization is hidden with a
> default_scope. That means once background task starts, it is
> immediately
> invisible to both UI and CLI.
> 
> This should be a separate process (method or something in our
> business
> logic code), that would be started either directly or via
> delayed_jobs.
> When quality engineers were testing Katello as a "black box",
> everything
> was looking good.
> 
> But inside there was a background job that (due to our bug) deleted
> only
> organization from Katello database leaving all providers, products,
> repositories and stuff there. Basically it only deleted one record
> from
> our database and then it stopped.

I'm not 100% sure this happened every time an org was deleted. In
that case, we wouldn't be able to run system tests more times in a row.

> 
> We have our unit tests that are able to reveal errors in units, and
> QA
> have their system tests which tests Katello as a project. But we are
> missing one important thing - integration tests. Something similar to
> PulpV2 VCR test suite that is able to test all required HTTP REST
> calls
> were made. De-facto standard in enterprise integration is the very
> similar approach of "recording" interactions between systems and then
> making stubs and comparing against results. By the way, I have been
> trained on a software that is called Green Hat (it's proprietary but
> funny name, right).
> 
> User story:
> 
> As a dev, I want decent integration test suite for all backend
> engines

+1

> 
> The bug also points on our orchestration - because it is tightly
> coupled
> with models, we have designed the orchestration deletion that hacky
> way.
> The proper and logical approach is to start a process (or at least a
> Ruby code bit) that has a procedural structure and does all necessary
> things in simple steps - like one function or several sub-function
> calls. In the EI world, these are processes and sub-processes.
> 
> But since our orchestration is hooked into Katello database, we tend
> to
> rely on it for things that should definitely not be written as
> updates
> or deletes in our database.
> 
> This example also shows how important is ability to write some
> orchestration in one-way messaging pattern (katello integration is
> nothing else than message handling between backend systems). For
> example
> deletion is a typical one-way process that should either finished or
> suspended until someone investigates what is wrong resuming it or
> cancelling. This makes recovery much more easier. This is not my
> invention, but standard approach for most integration projects.
> 
> In short, katello orchestration should be a separate
> component/service
> with independent parties: Katello, Candlepin, Pulp, Foreman. And it
> should be able to work online or as a background service allowing
> request-reply or one-way MEPs (message exchange patterns).
> 
> There are existing solutions like JBoss Drools, Apache Camel, Apache
> ServiceMix - all Java based. As I don't see feasible to integrate
> with
> those, I need to insist on adding tasks that would change way how
> orchestration works today.

With JRuby, reusing this tools might be feasible, right?

> 
> Recovery from data inconsistency bugs is _very_ expensive. Actually
> there are enterprises that have offerings solely dedicated to this
> topic.
> 
> User stories:
> 
> As a dev, I want to detach orchestration from models
> As a dev, I want to have clean and consistent orchestration code
> Design out: Process-like orchestration with various MEPs
> 
> We are not done yet! There is more. We have lots of database hooks
> and
> validations. For this particular deletion, there are before_delete
> hooks
> and in those it is not sufficient to return false if there is a
> problem
> (validation issue or general error). We must throw an exception,
> otherwise Rails will not rollback the whole transaction.
> 
> User stories:
> 
> As a dev, I want all callbacks to be validated to throw errors when
> transaction should be rolled back
> 
> As I will only fix for the particular BZ and I will continue
We also need better testing for edge cases
> investigation about what is wrong, what data was or was not deleted
> in
> each particular backend engine and prepare some kind of migration
> script
> that will correct data inconsistencies, we should add those onto our
> backlog, because I will only fix this particular (org deletion) case.

We also need better testing for edge cases. E.g. what happens when I delete
an organization and restart all the services in the mean time. How can I recover?
Trying to solve this issues can point us the the weak points (which will be probably
very similar to those described in this thread.)

I agree neither black box testing nor unit tests will help us much here. Unit test because
of the isolation, black box because everything might seem working at the moment and
knowledge of the code base can help finding the issue. 

Stories:

As a integrator, I want to have the edge cases automated.
As a integrator, I want to provide standardized ways for recovering from various non-standard situations.

When talking about keeping consistency between the systems, the first obvious step, as Mirek pointed out, is to finally introduce foreign keys
to keep the consistency in our own database. This should be #1 priority, if we don't want to spend the rest of our lives fixing bugs like this
one.

-- Ivan
> 
> LZ
> 
> --
> Later,
> 
>  Lukas "lzap" Zapletal
>  #katello #systemengine
> 
> _______________________________________________
> katello-devel mailing list
> katello-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/katello-devel
>