16:30:03 <djmitche> #startmeeting weekly
16:30:03 <bb-supy> Meeting started Tue Sep 20 16:30:03 2016 UTC and is due to finish in 60 minutes.  The chair is djmitche. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:30:03 <bb-supy> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:30:03 <bb-supy> The meeting name has been set to 'weekly'
16:30:08 <djmitche> #topic Introduction
16:30:15 <djmitche> Agenda: https://titanpad.com/buildbot-agenda
16:30:35 <djmitche> https://titanpad.com/ep/pad/view/buildbot-agenda/DD7hIIU56p
16:30:47 <djmitche> I see InitHello and gracinet around
16:30:56 <InitHello> I'm alive
16:31:01 <gracinet> I am too
16:31:07 <InitHello> at work, so my latency might be higher
16:31:09 <djmitche> it's good to be alive :)
16:31:12 <djmitche> no worries
16:31:22 <gracinet> We were born to that
16:32:13 <djmitche> #topic Week In Review
16:32:57 <djmitche> #info Pierre made a bunch of improvements to the nine buildbot - https://github.com/buildbot/metabbotcfg/pull/65
16:33:26 <gracinet> for a while it was testing buildbot PRs, right ?
16:33:36 <skelly> still is I think
16:33:43 <skelly> but now it's not spamming irc
16:33:45 <djmitche> I don't think that's changed, right
16:33:46 <djmitche> right
16:34:15 <djmitche> #info Pierre made a bunch of changes to buildbot-infra to make IP management more automatic, including generating DNS
16:34:22 <djmitche> as part of the vagrant work
16:34:52 <djmitche> #info https://syslog.buildbot.net/ now using ELK to show infra syslogs
16:34:56 <djmitche> authenticated by membership in the BB org
16:35:37 <skelly> I think that leaves just me as part of infra but not part of the org
16:35:57 <djmitche> oh, we can fix that :)
16:36:04 <djmitche> what else has landed this week?
16:36:44 <skelly> almost service3
16:37:06 <djmitche> haha, we'll get to that in a bit
16:37:10 <djmitche> you're in the org now
16:37:26 <djmitche> #info BuildbotNetData is landed and rc3 release made
16:38:07 <djmitche> #info Chris Laws built a prometheus plugin for 0.9.0
16:38:09 <djmitche> https://github.com/claws/buildbot-prometheus
16:38:26 <djmitche> that's all I know about -- have I missed anything?
16:39:03 <bdbaddog> will the BuildbotNetData info show in the syslog…. ?
16:39:08 <verm__> hmm
16:39:20 <djmitche> bdbaddog: it has a separate ELK instance
16:39:47 <verm__> is it protected to only buildbot team members?
16:39:57 <djmitche> bdbaddog: https://events.buildbot.net/
16:39:58 <djmitche> yes
16:40:28 <djmitche> #info BulidbotNetData is also available via ELK - https://events.buildbot.net/
16:40:51 <verm__> it doesn't appear to know about the hosts or jails
16:41:07 <djmitche> syslog doesn't?
16:41:18 <verm__> yeah it's just IPs
16:41:31 <verm__> it's not splitting into per-host just per IP
16:41:49 <verm__> also for events.buildbot.net i get: 403 Permission Denied
16:41:50 <verm__> Invalid Account
16:42:01 <djmitche> events may be limited to just botherders
16:42:05 <djmitche> pierre would know for sure
16:42:17 <verm__> i thought i was part of that group?
16:42:20 <djmitche> regarding syslog, though -- yes, good point
16:42:30 <bdbaddog> ditto on events.. get 403.
16:42:31 <djmitche> oh, yearh, hm
16:43:10 <djmitche> it's limited to the "committers" team
16:43:14 <djmitche> let me see if you're in that :)
16:44:20 <djmitche> bdbaddog: added to committers
16:44:50 <djmitche> syslog is limited to "core"
16:44:55 <djmitche> maybe we should change those to purpose-specific groups
16:45:00 <bdbaddog> djmitche: still 403, reload gets 500.
16:45:16 <verm__> hmm i should be unless that's changed?
16:45:25 <skelly> sounds like an action item to refine the org groups
16:45:27 <djmitche> verm__: you are in "committers"
16:45:28 <djmitche> yeah
16:45:43 <djmitche> #action djmitche to refine access to both ELK stacks (events and syslog)
16:45:48 <djmitche> thanks :)
16:45:50 <djmitche> ok
16:46:15 <djmitche> #action tardyp to check that all hosts/jails are reporting to syslog, and that reporting is by hostname, not just IP
16:46:27 <verm__> it is definatly reporting by hostname
16:46:33 <verm__> i mean the syslog daemons are
16:46:42 <verm__> because that is how syslogng was breaking it into seperate files
16:46:43 <tardyp> oups. I forgot the meeting..
16:46:46 <djmitche> right, I used the wrong word
16:46:47 * djmitche waves
16:46:55 <tardyp> hi hi
16:46:58 <djmitche> hi!
16:47:02 <gracinet> hi
16:47:21 <verm__> tardyp: i fixed service3 this morning so ssh is working again
16:47:24 <tardyp> reporting by hostname I think will require reverse dns to work for the internal network
16:47:29 <djmitche> so we have some access issues around events and syslog -- we can fix those up after
16:47:31 <djmitche> ah
16:47:43 <verm__> tardyp: really? so it just drops it from the syslog packet?
16:47:59 <tardyp> I am not sure actually
16:48:03 <verm__> that is strange, because reverse dns won't work in all cases since there are services that can report directly to syslog
16:48:04 <tardyp> but I guessed that
16:48:08 <verm__> which will come from the same IP
16:48:24 <djmitche> the hostname is included in the syslog info.. just a matter of getting it out and into the right slot in kibana :)
16:48:42 <tardyp> maybe this is a bad configuration, and the host tag is only the one that syslog server sees
16:48:47 <tardyp> I'll dig this
16:49:04 <djmitche> ok
16:49:16 <verm__> cool
16:49:24 <djmitche> #topic Releases
16:49:31 <djmitche> #info 0..9.0r3 released last Wednesday
16:49:39 <verm__> tardyp: if you could look at events.buildbot.net too that would be great :)
16:49:41 <verm__> login doesn't work
16:49:42 <djmitche> any news about that?
16:50:06 <djmitche> (I get a 403 from events too, btw)
16:50:11 <tardyp> events.buildbot.net works
16:50:22 <tardyp> I verified that there is just a config needed
16:50:31 <tardyp> the right github team needs to be setup
16:51:01 <tardyp> I have a pending syslog patch that will contain that
16:51:07 <djmitche> cool
16:51:16 <djmitche> you've been working your tail of tardyp
16:51:37 <djmitche> (sorry if that's too culturally specific a reference.. you've been doing a lot of work :)
16:51:48 <verm__> agreed you rock
16:52:02 <bdbaddog> +1 to Tardyp
16:52:10 <djmitche> #agreed tardyp is awesome
16:52:35 <gracinet> :-)
16:52:45 <djmitche> #topic host upgrades and service3 downtime
16:52:53 <tardyp> :)
16:53:14 <djmitche> verm__ / skelly -- can you give a quick rundown of what happened here?  I didn't totally follow as it was happening
16:53:34 <skelly> I broke it, verm__ then had to shave a herd of yaks
16:53:43 <verm__> i don't know the details but it was painful
16:53:50 <verm__> haha yeah yaks sounds about right
16:53:53 <djmitche> lol
16:54:09 <skelly> based on what koobs said, I should have targetted 10.2-RELEASE instead of 10.1
16:54:17 <skelly> or just jumped to 10.3
16:54:24 <djmitche> ok
16:54:28 <verm__> basically i was out of town this weekend, spent the better part of monday tracking down several (seperate) issues and finally got into the machine
16:54:31 <djmitche> so part of the issue was downgrading instead of upgrading
16:54:37 <skelly> yeah
16:54:39 <verm__> freebsd-update broke, it updated the kernel and 3 modules
16:54:46 <verm__> and it also only did a few libs in /lib/
16:54:49 <skelly> the hosts are old but I think 10.1 was a bit too old
16:54:57 <verm__> i built a custom libthr, libc and got it via 'fetch' to sshd could be started
16:55:02 <verm__> the rest is skelly's problem :)
16:55:16 <skelly> whereas an upgrade would have changed everything
16:55:25 <skelly> what kernel is it running? your custom or GENERIC?
16:55:33 <verm__> no idea i never checked
16:55:50 <verm__> generic is looks like a whole lot of modules are loaded
16:55:50 <skelly> GENERIC
16:56:06 <verm__> i tried to keep as many of the changes it made in place
16:56:37 <verm__> for the next upgrade let's schedule a time where we can be here together so it can be fixed right away
16:56:40 <skelly> it's back at -STABLE, so I think essentially you undid the damage and got it back to where it was when I started
16:57:02 <verm__> skelly: sort of there are still a lot of partial upgrades all over the system
16:57:10 <verm__> it 'works' but only by pure coincidence.
16:57:24 <skelly> sounds like good to get the upgrade done soon then
16:57:26 <verm__> also the jails weren't updated so they're safe and jails are mostly kernel
16:58:04 <djmitche> so the idea is to get us onto a more "mainline" upgrade process (via freebsd-update), but that failed here and we're largely back to the src-based approach we've been on for a while?
16:58:15 <skelly> we can try again with freebsd-update
16:58:27 <skelly> I am pretty sure it needs to be told to do an upgrade
16:58:38 <skelly> a full upgrade
16:58:45 <skelly> rather than scan the system and do a minimal upgrade
16:58:56 <djmitche> gotcha
16:58:57 <skelly> verm__: when would be a good time to try again?
16:59:04 <verm__> what times are good for you?
16:59:09 <djmitche> haha
16:59:13 <skelly> evenings and weekends
16:59:14 <verm__> i am in the EST timezone
16:59:22 <skelly> I can try during the day if it's largely handsoff
16:59:35 <djmitche> #info the idea is to get us onto a more "mainline" upgrade process (via freebsd-update), but that failed here and we're largely back to the src-based approach we've been on for a while
16:59:36 <skelly> which depends on how much in /etc needs merging
16:59:53 <verm__> what timezone are you in?
16:59:56 <skelly> CDT
16:59:59 <djmitche> #agreed Amar and Sean will schedule a time when they can both be around to try a full freebsd-update run
17:00:22 <verm__> skelly: ok so that works in my favour.. i don't know pick a time?
17:00:36 <skelly> tonight at 8:30 it is then!
17:00:37 <verm__> i'm usually out for two hours from 6pm-8pm every day
17:00:43 <verm__> EST?
17:00:46 <verm__> or CDT?
17:00:47 <skelly> (my time, so 9:30 yours)
17:00:52 <verm__> ok that works for me
17:00:56 <verm__> i've put it in my calendar
17:01:00 <djmitche> awesome :)
17:01:10 <skelly> that's a good range to be out as I am out for a superset of that every day
17:01:19 <djmitche> tardyp: do you want to talk a bit about the vagrant work?
17:01:24 <verm__> heh cool
17:01:58 <tardyp> Basically, vagrant is re-creating the prod in VMs
17:02:07 <tardyp> so we have 3 VMs that start freebsd
17:02:16 <djmitche> #topic Vagrant setup for Buildbot Infra
17:02:26 <tardyp> and then runs the ansible setup for those 3 hosts
17:02:36 <tardyp> which means creating the BSD jails for each of the services
17:02:39 <djmitche> including creating all of the jails?
17:02:58 <tardyp> those jails initially only contain sshd in vagrant mode
17:03:17 <tardyp> so that after you can run ansible against each of those jail as you wish
17:03:30 <tardyp> the whole setup would take to much time
17:04:02 <djmitche> gotcha
17:04:04 <tardyp> then there is nothing very complicated. It took me huge amount of time to fight against the proxies
17:04:21 <verm__> proxies?
17:04:27 <djmitche> I kinda want to get you a mifi hotspot for work :)
17:04:37 <djmitche> pierre has mandatory outgoing http proxies at work :(
17:04:39 <tardyp> my corporate proxies
17:04:40 <skelly> ssh tunnel
17:04:44 <gracinet> -D
17:04:49 <tardyp> those are http
17:05:09 <verm__> oh, that sucks :(
17:05:18 <verm__> you want me to setup openvpn for you?
17:05:27 <tardyp> now that is figured out, and does not get to much on my way anymore
17:05:40 <verm__> sorry, wish i had known
17:06:09 <verm__> i've been out of the loop on this what is vagrant going to be used for?
17:06:10 <djmitche> #info buildbot-infra is set up to create development environments, including hosts and jails, using Vagrant
17:06:20 <verm__> nevermind
17:06:22 <djmitche> it's for development of ansible patches
17:06:23 <djmitche> :)
17:06:29 <djmitche> ask and ye shall recieve an answer :)
17:06:33 <verm__> hehe
17:06:38 <verm__> very cool and great idea
17:06:45 <verm__> is it worth it to discuss the warranty issue?
17:06:46 <djmitche> gracinet: do you have an update for TLS/Endpoints?
17:06:53 <djmitche> verm__: yep, that's up next after TLS
17:06:55 <verm__> ok
17:06:58 <tardyp> verm__: so you are all setup to create trac roles in ansible!
17:07:11 <djmitche> hint hint :)
17:07:42 <verm__> tardyp: me? hah someone else volunteered to do that years ago :)  i'm against putting trac under ansible or any managment system it is a nightmare to keep stable but someone is free to give it a shot i will help any way i can
17:07:45 <gracinet> djmitche: yes, since last week, I got the integration tests to work on Twisted >= 14, fixed the various lints that required it. Next are the windows failures
17:08:20 <tardyp> cool!
17:08:35 <gracinet> last week I was speaking of reaching out to Twisted to upstream the generic parts, but that's not done yet
17:08:40 <djmitche> #topic TLS/Endpoints
17:08:53 <tardyp> verm__: at least the mysql would be interresting to control
17:09:00 <tardyp> and also the backup
17:09:02 <gracinet> I also need to tidy up docstrings etc
17:09:06 <djmitche> #info progress since last week: gracinet got the integration tests to work on Twisted >= 14, fixed the various lints that required it. Next are the windows failures
17:09:32 <djmitche> oh tardyp I totally forgot to highlight that you rebuilt syslog from the ground with ansible -- *very* cool
17:09:38 <verm__> tardyp: agreed that would be very handy to not worry about that
17:09:47 <djmitche> gracinet: so sounds like we're close to ready to land in buildbot
17:09:52 <djmitche> I assume the twisted upstreaming could happen after landing
17:09:54 <gracinet> by the way, if that rings a bell, the windows failure says that privateKey is passed more than once, that looks like what happens with kwargs used positionally, I did not really closer than that
17:10:05 <gracinet> djmitche: yes, definitely about the upstreaming
17:10:11 <djmitche> cool
17:10:24 <djmitche> yes, positional vs keyword args :( :(
17:10:40 <djmitche> python-3 has some extra syntax to help with that, but of course we can't use it :(
17:10:48 <gracinet> the thing is, I can do guesswork with windows, but I don't have any windows system near me to actually reproduce
17:10:57 <tomprince> gracinet: What is it that you are looking at upstreaming?
17:10:59 <djmitche> #info Will try to upstream the generic parts to twisted after landing
17:11:04 <gracinet> hey tomprince
17:11:19 <djmitche> gracinet: i can help look at the traceback after the meeting if you'd like
17:11:25 <gracinet> it's about the PB factories in the worker
17:11:29 <gracinet> djmitche: that'd be great, thanks
17:12:17 <gracinet> to summarize, most of the features here are generic (applicative keepalives, an additional timeout compared to what Twisted's ClientService provides and autologin)
17:13:15 <gracinet> besides I suspect that integration of PB clients with endpoints might move a bit in Twisted (e.g, they are still on ClientFactory, that's not 100% new, endpoint style)
17:14:47 <verm__> tardyp: FYI there is a logstash.conf.sample which has the bits we need to capture the hostname
17:14:59 <gracinet> on top of that,  the Worker service adds graceful shutdowns, and that's definitely buildbot-specific
17:15:05 <verm__> it hasn't been copied to the running conf
17:16:24 <djmitche> tomprince: any advice on trying to upstream that?
17:17:03 <tomprince> It isn't entirely clear that there is anything that needs upstreaming there.
17:17:09 <tardyp> verm__: I saw that, but this is for syslog files, and here we are using syslog tcp server
17:18:01 <djmitche> I remember that confusing me about logstash too
17:18:22 <djmitche> tomprince: haha, ok, so I read that as "gracinet may need to make a strong case that these should be upstreamed"
17:18:28 <gracinet> tomprince: it's not a need, I mean, we'll be able to maintain it, sure, but on the other hand why not ? It's generic
17:18:37 <djmitche> yeah, worth a conversation anyway
17:18:47 <verm__> tardyp: you need to add a forwarder under "network" and list the IP
17:18:48 <djmitche> ok, let's talk about warranties
17:19:00 <djmitche> #topic Hardware Warranties
17:19:02 <verm__> on the same level as "files" i forget but it's possible
17:19:25 <djmitche> #info Warranties are expiring on our iX hardware; renewal for two years is about $900
17:19:29 <tomprince> djmitche: More like, looking at the code, it isn't clear to me what is being probdied on top of ClientService.
17:19:40 <tardyp> verm__: gtg. I'll look at that tomorrow
17:19:47 <verm__> tardyp: no no problem cya
17:19:50 <gracinet> I suppose many other PB applications would use an autologin with auto-relogin (if login depends on the client, not on a user of client)
17:20:11 <verm__> djmitche: we should have money leftover even if we use the money put aside for the mac mini
17:20:24 <verm__> having a warranty for another two years would be really good
17:20:40 <skelly> agreed
17:20:40 <djmitche> i agree
17:20:48 <verm__> RTEMS had a machine fail
17:20:49 <skelly> would we want to replace after that?
17:20:50 <djmitche> that's the maximum extension, right? (3yr to 5yr)
17:20:56 <gracinet> tomprince: discuss later ?
17:20:57 <verm__> it was sent out and back from california, took 2 days
17:21:00 <tomprince> gracinet: Sure.
17:21:09 <djmitche> thanks tomprince gracinet
17:21:23 <verm__> djmitche: no idea good question i can ask but i think they'll let you extend more if your machines are OK
17:21:41 <djmitche> ok
17:21:43 <verm__> skelly: if we can afford it yes, at least our main machines
17:21:53 <verm__> then the current ones become developer boxes or whatever we need them for.. or sell them
17:21:58 <bb-github> [13buildbot-infra] 15tardyp opened pull request #157: Syslogelk (06master...06syslogelk) 02https://git.io/viSP6
17:22:07 <bdbaddog> in my experience after a certain point they start to jack up the prices on warrantees..
17:22:22 <djmitche> like "you're paying rent for these ancient spare parts on our shelf" :)
17:22:26 <verm__> iXsystems is really good to opensource projects
17:22:48 <verm__> anyway if they don't offer it or it is too expensive we can think about it then
17:22:50 <djmitche> it sounds like there's general agreement that not running un-covered hardware is good
17:22:58 <verm__> we'll have peace of mind for the next two years atleast hehe
17:23:11 <tomprince> Does it make sense for us to have physical machines, rather then getting sponsored cloud hosting?
17:23:18 <verm__> djmitche: yes absolutely, not for critical machines
17:23:21 <djmitche> so I think the practical question is, do we have the funding (I agree we probably do, I just want to check)
17:23:53 <skelly> have we asked about sponsored cloud hosting?
17:23:56 <verm__> tomprince: more flexibility and security
17:24:02 <djmitche> tomprince: as we move off this hardware, I think that's definitely worth considering, but building cloud infra requires a different approach to reliability
17:24:14 <verm__> yes and a lot more expensive to maintain that redundancy
17:24:22 <djmitche> depending on how you do it, yeah
17:24:24 <tomprince> verm__: Do we need the flexibility?
17:24:38 <verm__> it's been nice to setup and do what we want on a whim
17:24:41 <djmitche> we've definitely massively underutilized our 3 years of hardware
17:24:50 <verm__> i think that is changing now
17:24:55 <djmitche> yeah
17:25:18 <verm__> and i also belive the only reason it's changing is because we can, with very little effort since we only answer to ourselves
17:25:23 <verm__> i guess we'll talk about it in 2 years?
17:25:24 <tomprince> verm__: You can get that with cloud VMs too.
17:25:43 <djmitche> it sounds like it's worth considering not paying the warranty renewal and planning a cloud migration
17:25:51 <skelly> action item for 18 months from now
17:25:54 <djmitche> at a high level though
17:26:09 <djmitche> 1) someone would need to manage that migration
17:26:09 <verm__> i think there's enough going on right now that if we can pay the warranty to keep what we have let's do it
17:26:12 <gracinet> what's the order of magnitude we're talking about ?
17:26:17 <djmitche> 2) cloud services will require some kind of ongoing income
17:26:22 <gracinet> (of cost)
17:26:31 <djmitche> gracinet: the warranty is about $900
17:26:42 <verm__> <skelly> action item for 18 months from now <-yes please
17:26:45 <gracinet> and the hosting has no recurring fares ?
17:26:52 <verm__> other than the service3 flub everything has been extremely stable
17:26:55 <djmitche> it would -- what those would be is hard to say
17:26:55 <djmitche> haha
17:27:06 <verm__> gracinet: no it's free
17:27:14 <skelly> I'm too used to console access for when I break things :(
17:27:15 <verm__> www.osuosl.org
17:27:22 <djmitche> oh, sorry, I thought you meant the cloud-hosting alternative.. yes - osuosl is free
17:27:22 <verm__> skelly: i will fix that, we have console access
17:27:30 <verm__> but an unrelated problem has happened
17:27:32 <tomprince> djmitche: Or ongoing sponsorship, but cloud sponsorship is probably easier than $$.
17:27:33 <skelly> right
17:27:38 <djmitche> tomprince: right
17:27:47 <skelly> hyper gave us some credits
17:27:49 <verm__> i will bump up it up on my priority list after i move i will get everyone in sysadmin access to the private vlan that has the IPMI hosts
17:27:52 <djmitche> tomprince: so are you OK tabling that until 2018 or so and paying the warranty for now?
17:27:57 <tomprince> I know twisted has something like $1000/month from rackspace.
17:28:09 <tomprince> Yeah, I'm fine with that.
17:28:15 <verm__> yes let's do that please
17:28:47 <verm__> skelly: i have a hack that lets you forward IPMI over SSH
17:28:53 <verm__> i will send an email to bsys about it
17:29:00 <skelly> okay
17:29:00 <verm__> it's annoying and frail but it works
17:29:15 <skelly> can't be worse than bringup on beta hardware that I've done
17:30:01 <djmitche> #agreed but for now, pursue the renewal
17:30:03 <verm__> heh
17:30:14 <djmitche> yeah, I do not miss IPMI
17:30:32 <verm__> oh, IPMI is really stable i've never had problems with it it's just the SSH tunneling that sucks
17:30:49 <djmitche> u funny man :)
17:31:07 <verm__> more likely my brain has learned to ignore all the annoying bits but it gets the job done
17:31:59 <djmitche> haha, yeah
17:32:03 <djmitche> so it seems like the remaining two bits are for botherders:
17:32:05 <djmitche> 2. vote on whether to spend
17:32:07 <djmitche> 1. determine if we have the funds
17:32:11 <verm__> yep
17:32:19 <djmitche> ugh, irc & reordering
17:32:31 <djmitche> any other business or should we wrap up?
17:32:45 <verm__> wrap up i think, great meeting and thank you for charing it!! awesome job as usual
17:33:43 <djmitche> #endmeeting