Meeting Log - 2009-07-02

Ricky Zhou ricky at fedoraproject.org
Thu Jul 2 20:50:51 UTC 2009


20:00 < mmcgrath> #startmeeting
20:00 < fedbot> Meeting started Thu Jul  2 20:00:29 2009 UTC.  The chair is mmcgrath.
20:00 < fedbot> Information about MeetBot at http://wiki.debian.org/MeetBot , Useful Commands: #action #agreed #halp #info #idea #link #topic.
20:00 < dgilmore> gday mmcgrath
20:00  * ricky 
20:00 < mmcgrath> #topic Infrastructure -- Who's here?
20:00 -!- fedbot changed the topic of #fedora-meeting to: Infrastructure -- Who's here?
20:00  * johe|home takes a seat
20:00 < mmcgrath> dgilmore: how's it going?
20:00  * SmootherFrOgZ is
20:01  * sijis sijis is here.
20:01  * ke4qqq is
20:01 < dgilmore> mmcgrath: 2 builders to go
20:01 < SmootherFrOgZ> dgilmore: for stg ?
20:01 < smooge> hello
20:01 < mmcgrath> dgilmore: excellent, happy to hear it.
20:01 < mmcgrath> Well lets get started
20:01 -!- StabbyMc [n=StabbyMc at rrcs-71-41-150-146.sw.biz.rr.com] has left #fedora-meeting ["Stab ya later!"]
20:01 < mmcgrath> #topic Infrastructure -- Tickets
20:01 -!- fedbot changed the topic of #fedora-meeting to: Infrastructure -- Tickets
20:01 < mmcgrath> .tiny https://fedorahosted.org/fedora-infrastructure/query?status=new&status=assigned&status=reopened&group=milestone&keywords=~Meeting&order=priority
20:01 < zodbot> mmcgrath: http://tinyurl.com/47e37y
20:02 < mmcgrath> .ticket 1503
20:02 < mmcgrath> abadger1999: take it
20:02 < zodbot> mmcgrath: #1503 (Licensing Guidelines for apps we write) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/1503
20:02 < dgilmore> SmootherFrOgZ: nope
20:02 < abadger1999> So we've had a new license pop up in apps we've written recently
20:02 < abadger1999> AGPLv3+
20:02 < abadger1999> That's incompatible with GPLv2 which is what the majority of our apps use presently.
20:03 < abadger1999> After looking over the situation with spot, it seems like it would be good to move everything to AGPLv3+.
20:03 < dgilmore> im ok with the move
20:03 < abadger1999> (With libraries going to LGPLv2+)
20:03 < smooge> abadger1999, when you say we use.. do you mean we right or other stuff
20:03 < abadger1999> We write.
20:03 < smooge> s/right/write/
20:03 < smooge> thanks
20:04 < abadger1999> smooge: This would not affect code that we don't write.
20:04 < abadger1999> And it's a recommendation rather than a hard and fast rule.
20:04 < mmcgrath> abadger1999: have you run into anyone saying "ehh, I don't think we should do this." ?
20:04 < abadger1999> ie: mdomsch wants mirrormanager to be MIT; mediawiki plugins should follow mediawiki's license
20:04 < abadger1999> mmcgrath: So far everyone's been positive.
20:04 < mmcgrath> abadger1999: ok, so how do we actually _do_ it?
20:05 < mmcgrath> sed?
20:05 < abadger1999> yeah, we have to replace COPYING files with AGPL/LGPL and then change the headers in source files.
20:05 < smooge> well you need to look at each app and see if its something we wrote or pulled in from somewhere else
20:06 < sijis> do you need to get written proof from author before changing?
20:06 < ricky> How urgent is this time-wise?
20:06 < smooge> if its pulled in we need to deal with it.. if its something we wrote 100% we should be able to replace COPYING/headers
20:06 < abadger1999> sijis: for the majority of things no, but I am going to notify authors of pkgdb and python-fedora before I make chanes.
20:06 < mmcgrath> ricky: I'd say not real urgent, but the longer we wait... the longer we're going to wait I suspect.
20:06 < ricky> For example, with FAS, I'd like to eventually rewrite the OpenID provider part instead of dealing with licensing pain because of samadhi or anything.
20:06 < abadger1999> sijis: The CLA gives us the ability to do a relicense if the contribution was made without an explicit license.
20:07 < mmcgrath> abadger1999: some seemed timid about that on f-a-b.  I'm less timid.
20:07 < abadger1999> <nod> ricky the other option is to find out what jcollie thinks about AGPLv3+
20:07 < mmcgrath> but we should ask
20:07 < mmcgrath> abadger1999: lets take an app like fas first.
20:07 < mmcgrath> just see how it goes.
20:08 < abadger1999> yeah, it's common courtesy and also gives people a chancce to holler "Oh wait, I actually didn't own the copyright to that code.. sorry."
20:08 -!- mcepl [n=mcepl at 49-117-207-85.strcechy.adsl-llu.static.bluetone.cz] has left #fedora-meeting []
20:08 < mmcgrath> abadger1999: are you going to lead the effort on this?
20:08 < abadger1999> I'd like to do python-fedora soon It's moving to LGPLv2+ which is more permissive
20:08 < mmcgrath> should we open a ticket for each app?
20:08 < sijis> how many apps are we talking about for this? +/-15?
20:08 < abadger1999> mmcgrath: I can.  Yes, each app.
20:08 < mmcgrath> sijis: less then 15
20:08 < abadger1999> sijis: Less htan 15
20:09 < mmcgrath> abadger1999: sounds good, so anything else?
20:09 < abadger1999> A ticket for each app will let us come back next week and say -- half of our app authors like a licensing policy but don't want to change *their* app.
20:09 < abadger1999> Which would mean we need to rethink.
20:10 < abadger1999> I think that's all unless someone wants to shout that it's a bad idea now :-)
20:10 -!- kolesovdv [n=kolesovd at 82.162.141.18] has joined #fedora-meeting
20:10 < mmcgrath> anyone have anything to say?  If not now, take it to the list.
20:10 < mmcgrath> and do it sooner, not later.
20:10 < mmcgrath> Ok, so next topic
20:10 < mmcgrath> #topic Infrastructure -- The merge, outages and issues.
20:10 -!- fedbot changed the topic of #fedora-meeting to: Infrastructure -- The merge, outages and issues.
20:11 < mmcgrath> So we had a merge last week.
20:11 < mmcgrath> and since the merge we've had some issues
20:11 < mmcgrath> and it's not something obvious.
20:11 < smooge> define merge for me?
20:11 < mmcgrath> and, in fact, could be completely unrelated.
20:11 < mmcgrath> smooge: merge from staging to master branches in puppet.
20:11 < ricky> smooge: We made a ton of changes in the staging branch and merged them to production :-)
20:11 < mmcgrath> Which basically involved refactoring a bunch of puppet code, cleaning things up, creating some new modules, etc, etc.
20:11 < mmcgrath> I've not seen a wiki outage since yesterday.
20:12 < mmcgrath> I need to go through the logs and look.
20:12 -!- cassmodiah [n=cass at fedora/cassmodiah] has quit Remote closed the connection
20:12 < mmcgrath> while doing some digging we, just in general, found strange issues in our environment.
20:13 < smooge> mmcgrath, ricky thanks..
20:13 < smooge> what have been the strange ones
20:13 < mmcgrath> for example - http://mmcgrath.fedorapeople.org/proxy-errors.html
20:13 < mmcgrath> 200,000+ 502's per day.
20:13 < mmcgrath> just seems massive to me.
20:14 < ricky> In terms of the big outages, they've all seemed to happen during mysql database backups (which lock tables) or smolt render stats jobs.
20:14 < ricky> The proxy errors and 500s seem to be something else though.
20:14 < mmcgrath> <nod>
20:14 < mmcgrath> and our current lead on the 500's errors for fas is a new mod_wsgi
20:14 < ricky> Have the 500 errors stayed normal?
20:14 < mmcgrath> jbowes is working on that.
20:15 < ricky> (As in, have they gone up after the merge or not?)
20:15 < mmcgrath> ricky: hard to say
20:15 < mmcgrath> http://mmcgrath.fedorapeople.org/JuneErrors.html
20:15 < mmcgrath> I'll re-check today now that it's been a few more days.
20:15 < mmcgrath> clearly we had a major spike
20:16 < sijis> mmcgrath: the first graph shows it being mostly proxy2
20:16 < mmcgrath> but it seems to have gone back down.
20:16 < ricky> Strange.
20:16 < mmcgrath> sijis: yeah, and proxy2 is an odd beast.
20:16 < mmcgrath> proxy2 is load balanced with proxy1 behind the PHX balancer.
20:16 < mmcgrath> _however_
20:16 < mmcgrath> anything in phx uses proxy2 directly to get to the account system.
20:16 < mmcgrath> which not only includes shell accounts.
20:17 < mmcgrath> but also includes our web applications contacting fas for session, auth, etc.
20:17 < mmcgrath> which is a significant amount of traffic.
20:17 < smooge> interesting.. is there a reason for just proxy2?
20:17 < ricky> Funny that proxy1 seems fine.
20:17 < mmcgrath> ricky: well it does get a lot less traffic.
20:17 < ricky> Like it didn't jump significantly at all.
20:17 < ricky> I guess.
20:17 < mmcgrath> smooge: the network team won't let us contact the balancer IP directly.
20:18 < sijis> so you are forced to pick a proxy?
20:18 < smooge> ah ok could we setup another proxy?
20:18 < mmcgrath> smooge: we have two of them there.
20:18 < mmcgrath> but no good way to balance between the two of them.
20:18 < mmcgrath> we could put a load balancer in there, but it'd be just another box, and would need to be rebooted as often as proxy2 is anyway
20:18 < ricky> Is the problem really coming from our PHX admin.fp.o setup though?
20:19 < smooge> mmcgrath, no what I meant was one that was just for that so we could cut down on what might be causing the erorrs?
20:19 < ricky> The 502s really jumped everywhere, so that's what I want to know the root cause of.
20:20 < smooge> so if its a bruteforce attack on stuff we could get an idea of what app is being targeted or soemthing
20:20 < mmcgrath> I think the errors are on our end, I need to do more log checking to know for sure though
20:20 < ricky> But the brute force shouldn't be causing 502, it should be working :-)
20:20 < mmcgrath> but yeah we can add and remove more proxy servers in PHX if we want to
20:20 -!- JSchmitt [n=s4504kr at fedora/JSchmitt] has quit Read error: 104 (Connection reset by peer)
20:20 < ricky> mmcgrath: Can we separate that graph into apache 502s and haproxy 502s?
20:21 < ricky> Right now they're lumped together in the source where you're getting it from, right?
20:21 -!- ddumas [n=ddumas at h69-131-97-205.wltonh.dsl.dynamic.tds.net] has joined #fedora-meeting
20:21 < mmcgrath> ricky: I don't think so, because if haproxy or the app server returned a 502, apache would log a 502.
20:21 < mmcgrath> so proxyX will always have our largest number of 502's
20:21 < mmcgrath> then haproxy (if we're logging that, not even sure)
20:21 < mmcgrath> then the app server
20:22 < mmcgrath> although the app servers probably don't throw 502
20:22 < ricky> mmcgrath: But some 502s are coming from apache, as in proxy1 couldn't contact locahost:10009
20:22 < ricky> Those are the strangest ones to me.
20:22 < mmcgrath> I'll have to look closer then.
20:22 < sijis> firewall?
20:23 < ricky> sijis: I don't think so - it definitely works a large percent of the time
20:23 < mmcgrath> sijis: I'd actually think that's the app server not responding to haproxy, and thus not responding to the proxy server.
20:23 < ricky> But that should strictly cause haproxy 502s not apache 502s, correct?
20:23 < mmcgrath> and I'm not seeing us hitting our haproxy limit.
20:23 < ricky> and we've seen both :-(
20:23 < mmcgrath> ricky: when looking at the logs, how can you tell the difference?
20:24 < mmcgrath> oh from it saying it couldn't contact localhost:10009
20:24 < ricky> I'm not sure.  I'd expect the apache 502s to show up in the apache error log and both types of 502s to show up in the error log.
20:24 < ricky> I'll have to verify that tohugh.
20:24 < ricky> **though
20:24 < mmcgrath> hm
20:24 < mmcgrath> hm
20:24 < mmcgrath> hmmmm
20:24 < ricky> Was your source for these graphs the error log or the access log?
20:25 < mmcgrath> acciess I believe
20:25 < sijis> is haproxy on a different server or on proxy2?
20:25  * mmcgrath looks
20:25 < mmcgrath> sijis: each proxy server has it's own haproxy service on the same host
20:25 < mmcgrath> ricky: access.log
20:26 < mmcgrath> perhaps we should continue discussing this after the meeting.
20:26 < ricky> Ah, OK.
20:26 < mmcgrath> any objections?
20:26 < ricky> Sure thing
20:26 < sijis> nope.
20:27 < mmcgrath> # topic Infrastructure -- Eye in know db.  - INNODB
20:27 -!- kolesovdv [n=kolesovd at 82.162.141.18] has quit Remote closed the connection
20:27 < mmcgrath> #topic Infrastructure -- Eye in know db.  - INNODB
20:27 -!- fedbot changed the topic of #fedora-meeting to: Infrastructure -- Eye in know db.  - INNODB
20:27 < mmcgrath> ricky: this one's you.  Talk about your plans, what's going on, what's going wrong, etc.
20:27 < smooge> is that a rock band?
20:27 < ricky> Any MySQL experts around, by the way?  :-)
20:27 < mmcgrath> ricky: abadger1999 is a mysql expert
20:27 < Jeff_S> ricky: for some definition of expert
20:27 < mmcgrath> :-P
20:28 < ricky> Part of the big outages we've seen since the merge seems to be due to mysql backups (and smolt's stats refresh script, which might be a separate problem)
20:28 < ricky> We've seen this behavior with the zabbix database, where the backup would lock entire tables
20:28 < abadger1999> ricky: Yep, of the yum erase '*ysql' ; yum install 'postgres*' variety
20:28 < ricky> abadger1999: Hehe
20:28  * mmcgrath notes we've always had a small problem with backups and outages.  But they've been tiny blips.  Lately they've been throwing nagios alerts.
20:29 < smooge> how many mysql databases do we have?
20:29 < ricky> We'd like to move to using the --single-transaction option to mysqldump, which combined with InnoDB, should make backups not lock the entire table
20:30 < Jeff_S> ricky: yes!
20:30 < ricky> THe main mysql usage we have is mediawiki, smolt, and zabbix
20:30 < ricky> Although we have a few others for stuff like cacti, prelude/prewikka, etc.
20:30 < Jeff_S> ricky: FWIW, we've also had good luck with http://www.zmanda.com/backup-mysql.html (community edition)
20:30 < smooge> ricky, are they seperate servers or one single one
20:30 -!- kolesovdv [n=kolesovd at 82.162.141.18] has joined #fedora-meeting
20:30 < ricky> Jeff_S: Thanks, I'll take a look at that later
20:30 < ricky> smooge: They're all on db1
20:30 < mmcgrath> smooge: all mysql db's are on db1
20:31 < ricky> So far, the biggest pain we've had so far is the host_links table in smolt
20:31 < mmcgrath> ricky: and how big is it?
20:31 < mmcgrath> O:-)
20:31 < ricky> It has above 70M rows, and I haven't gotten a single successful conversion to InnoDB yet.
20:31 < ricky> And the thing with --single-transaction is that the tables need to be InnoDB to be sure that everything gets dumped in a consistent state
20:31 < Jeff_S> but single-transaction will probably solve your main problem of locking the table(s)
20:32 < abadger1999> ricky: We're able to dump that table?  Are we able to reload it except as innodb?
20:32 < smooge> wow thats quite a bit
20:32 < mmcgrath> ricky: and what are the downsides to innodb?  (space, etc, etc)
20:32 < abadger1999> slower
20:32 < Jeff_S> mmcgrath: slower at certain operations
20:32 < ricky> So the approaches that we've tried so far are: converting using alter table, and sedding a dump to change the table type, and loading it.
20:32 < mmcgrath> how much slower?
20:32 < ricky> The first didn't finish after some large number of hours, and the second is going now.
20:32 < ricky> mmcgrath: I'm actually not that sure about the downsides yet.  Apparently loading huge tables is a huge pain.
20:33 < mmcgrath> ricky: I'm going to want render-stats metrics too
20:33 < Jeff_S> mmcgrath: depends on the dataset & queries.  the locking though more than makes up for it IMO
20:33 < ricky> Also, some tables needed MyISAM for full text search - the only table affected by this is mediawiki's searchindex tables
20:33 < abadger1999> :-(
20:33 < ricky> (Which is just a copy of another InnoDB table, I believe)
20:33 < mmcgrath> ricky: and, in theory, we'll be able to get rid of that when we have a fedora search engine.
20:33 < ricky> Hopefully.
20:34 < ricky> Anyway, we'll probably have a mysql outage some time in the future once we get a successful test in staging.
20:34 < Jeff_S> mmcgrath: one of our past employees wrote this, I think it explains the reasons for using InnoDB pretty well http://tag1consulting.com/MySQL_Engines_MyISAM_vs_InnoDB
20:34 < mmcgrath> ricky: yeah, how have the other conversions gone?
20:34 < ricky> what might be the case now is that maybe our configs aren't tuned for large innodb tables.
20:34 < smooge> ok what books/sites should I read to catch up how to help this. (DB's are not my specialty :/)
20:35 -!- hanthana [n=hanthana at 124.43.57.16] has quit Remote closed the connection
20:35 < ricky> mmcgrath: All of the other tables in the smolt db other than host_links have finished in <20 minutes
20:35 < ricky> Apart from the smolt db, most of the mediawiki db is already innodb
20:36 < ricky> The other databases that need conversions are: cacti, prelude._format, prewikka, and transifex (which isn't used anymore anyway)
20:36 < mmcgrath> ricky: I believe I went through and did some innodb conversions back in the day on some of those.
20:36 -!- openpercept_ [n=openperc at fedora/openpercept] has joined #fedora-meeting
20:36 -!- tatica is now known as tatica-out
20:36 -!- sharkcz [n=dan at plz1-v-4-17.static.adsl.vol.cz] has quit "Ukončuji"
20:36 < ricky> prelude and prewikka are pretty much dispensable since that stuff is still being tested (lmacken even purged and recreated some of those dbs recently)
20:37 < mmcgrath> ricky: how big were those dumps?
20:37 -!- openpercept [n=openperc at fedora/openpercept] has quit Nick collision from services.
20:37 < ricky> So smolt is basically the big hurdle - although I have some questoins about the smolt upgrade and the db changes there
20:37 < ricky> The dump of the smolt database is 2.5G
20:37  * lmacken looks at the time, and rolls in late
20:37 < mmcgrath> ricky:
20:38 < mmcgrath> alter table host modify column cpu_model varchar(80);
20:38 < mmcgrath> alter table host add column cpu_stepping int(11) DEFAULT NULL;
20:38 < mmcgrath> alter table host add column cpu_family int(11) DEFAULT NULL;
20:38 < mmcgrath> alter table host add column cpu_model_num int(11) DEFAULT NULL;
20:38 < mmcgrath> that's the smolt upgrade.
20:38 < ricky> mmcgrath: Oh, OK - that's no problem at all then.
20:38 < ricky> The host table took <20 minutes, so we can do that before or after, and it's fine
20:38  * mmcgrath doesn't really even know what "int(11)" means
20:38 < lmacken> have you guys been using SQLAlchemy-migrate for that stuff? or doing it by hand?
20:38 < mmcgrath> I need to look that up :)
20:39 < mmcgrath> lmacken: honestly I can't stand alchemy-migrate so I've been doing it by hand.
20:39 < lmacken> mmcgrath: heh.  I've never used it before
20:39 < mmcgrath> :)
20:39 < mmcgrath> ricky: ok, so anything else on the db front?
20:40 < ricky> Nope, but if anybody knows a lot about MySQL, let us know about your experiences with stuff like this
20:40 < ricky> Jeff_S: Thanks again for the links!
20:40 < mmcgrath> k
20:40 < Jeff_S> ricky: np.  I'm glad to have our current DBA lend a hand if needed
20:40 < mmcgrath> #topic Infrastructure -- Posse
20:40 -!- fedbot changed the topic of #fedora-meeting to: Infrastructure -- Posse
20:41 < mmcgrath> So I haven't been as transparent with this as I should be
20:41 < mmcgrath> It's basically this
20:41 < mmcgrath> #link http://teachingopensource.org/index.php/POSSE_2009
20:41 < mmcgrath> we're providing some guests for a week for them to use.
20:41 < mmcgrath> +1 to open source :)
20:41 < ricky> Is it going to be on fasClient?  :-)
20:41 < mmcgrath> ricky: nope, they're completely disconnected atm.
20:42 < mmcgrath> this is their first time through this.
20:42 < ricky> Ah, OK
20:42 < mmcgrath> maybe next year.
20:42 < mmcgrath> but all of these guests are on cnode1
20:42 -!- opossum1er [n=opossum1 at cnv94-4-88-160-98-200.fbx.proxad.net] has joined #fedora-meeting
20:42 < mmcgrath> part of the cloud stuff.
20:42 < smooge> what servers are their guest on
20:42 < ricky> Hehe
20:42 < smooge> ah
20:42 < mmcgrath> I ended up not using osuosl1
20:42 < mmcgrath> since it's RHEL5 and for some reason xen+fedora 11 seems to be my white whale.
20:42 -!- opossum1er [n=opossum1 at cnv94-4-88-160-98-200.fbx.proxad.net] has quit Client Quit
20:42 < mmcgrath> but cnode1 was F10, and using KVM worked just fine
20:43 < mmcgrath> Anyone have any other questions on that?
20:43 -!- Pikachu_2014 [n=Pikachu_ at 85-169-128-251.rev.numericable.fr] has quit Read error: 60 (Operation timed out)
20:43 -!- Pikachu_2014 [n=Pikachu_ at 85-169-128-251.rev.numericable.fr] has joined #fedora-meeting
20:43 < mmcgrath> Ok
20:43 < mmcgrath> #topic Infrastructure -- Open Floor
20:43 -!- fedbot changed the topic of #fedora-meeting to: Infrastructure -- Open Floor
20:43 < mmcgrath> anyone have anything they'd like to discuss?
20:44 < lmacken> I'm going to be deploying a new version of bodhi tonight/tomorrow to support EPEL :)
20:44 < lmacken> hopefully we'll be able to start queueing updates up tonight
20:44 < smooge> yeah
20:44 < lmacken> and ideally mashing repos tomorrow
20:45 < mmcgrath> lmacken: sounds good
20:45 < mmcgrath> and on a related note, I need to rebuild relepel1
20:45  * mmcgrath fail built it
20:45 < mmcgrath> anyone have anything else?
20:45 < mmcgrath> smooge: ?
20:46 < smooge> sorry
20:46 < smooge> keyboard problems
20:46 < smooge> I am checking to see what boxes need updates and I am working on seeing what ones I can do
20:46 < smooge> I should have that done by tonight/tomorrow.
20:47 < smooge> After that I am checking to see that func and puppet are working on the boxes
20:47 < smooge> and then finding out all the secret handshakes and such
20:47 < mmcgrath> heheh
20:47 < mmcgrath> fun times
20:47 < smooge> I should have the func done by friday and then it will be time to work on zabbix
20:48 < mmcgrath> smooge: excellent.
20:48 < mmcgrath> Ok, and with that if no one has anything else we'll close in 30
20:48 < smooge> zabbix will be next weeks project
20:48 < smooge> done
20:49 < mmcgrath> ok everyone, thanks for coming!
20:49 < mmcgrath> #endmeeting
20:49 -!- fedbot changed the topic of #fedora-meeting to: Channel is used by various Fedora groups and committees for their regular meetings | Note that meetings often get logged | For questions about using Fedora please ask in #fedora | See http://fedoraproject.org/wiki/Meeting_channel for meeting schedule
20:49 < fedbot> Meeting ended Thu Jul  2 20:49:12 2009 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot .
20:49 < fedbot> Minutes: http://www.scrye.com/~kevin/fedora/fedora-meeting/2009/fedora-meeting.2009-07-02-20.00.html
20:49 < fedbot> Log:     http://www.scrye.com/~kevin/fedora/fedora-meeting/2009/fedora-meeting.2009-07-02-20.00.log.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/fedora-infrastructure-list/attachments/20090702/a59d0b32/attachment.sig>


More information about the Fedora-infrastructure-list mailing list