Request for test data based off of obfuscated live data

Wed Nov 19 15:29:36 UTC 2008

----- "Toshio Kuratomi" <a.badger at gmail.com> wrote:

> Mike McGrath wrote:

> > We're actually in a pretty unique situation in that most of our data
> is
> > public anyway, replicating pkgdb and bodhi data for example should
> be
> > fairly easy.  Replicating the fas stuff should be easy too.
> > 
> > We're going to need to replicate not only the data but access to the
> data
> > and this, to me at least, sounds like another development
> environment that
> > is more mature then the pt setup but still not as strict as the
> staging
> > environment.
> > 
> > 
> Yep, that seems to be where the need fits in.
> 
> > What do others think on this?  I like the low overhead of the pt
> servers
> > since people are kind of on their own in getting stuff done and it
> doesn't
> > cause extra work to the sysadmin-web guys.  But there are drawbacks
> to it.
> > 
> I'm not sure what's best.  There's a lot of problems with doing this
> in
> a shared development environment.  Even if we're controlling the
> access
> to the data we'd still be more open with it here than in production
> or
> staging.  For instance, people who are not primary fas authors or
> system
> admins would have access to make modifications to fas.  So I think
> we'd
> still end up wanting to modify the data before it hits this
> environment.
>  We'd also have to devote resources to it.... another db server, host
> to
> run koji-web,hub,builder, etc.  We'd have to update them.  We'd have
> to
> work out conflicts between different developers, for instance if we
> work
> on CSRF fixes in this environment and it makes developing other apps
> like myfedora just flat out fail for a while.
> 
> If we can munge the data enough to be comfortable releasing it to the
> public, it seems like that would cost us less man hours.  However, it
> isn't entirely free.  We'd still have to make new dumps of data,
> modify
> it for changes in the data model, etc.  Then the developer would
> become
> responsible for downloading the sanitised data and running it on
> their
> network.  Which is good because it isn't us but bad because it's not
> trivial to set all this up.

I would be willing to write scripts and a kickstart file to make this trivial to get a qemu image or test machine up and running in a couple of hours (mostly waiting for download and installs to happen).  What I was thinking was an environment that setup a stable fedora infrastructure environment complete with puppet scripts to configure the services to work with one another and a set of scripts for pulling fresh data, modifying common pieces of the various dbs (like changing dates to stay current or setting up one of the users as your test user),  and pulling down code from the various source trees for hacking on particular pieces of the infrastructure while integrating them into the environment.  

A note on the code drop, by requiring the author to modify a spec file if needed in order to deploy their changes into their environment (revision numbers would be automated) patches would include spec file changes instead of the maintainer having to sync by hand.  This would also make sure the build files are kept up to date as the author would have to make their changes work in an RPM environment just to test them as opposed to just installing from their source tree which often leads to annoying bugs (like missing files in a distributed tarball).  Also by making it easy to generate a patch and submit it to trac we will get more consistent formatted patches (such as using the VCS's patch format) and most likely more people getting involved as the overhead shrinks a bit (how many people want to go to individual trac instances to file a patch?).

--
John (J5) Palmieri
Software Engineer
Red Hat, Inc.