16:30:03 <djmitche> #startmeeting weekly 16:30:03 <bb-supy> Meeting started Tue Sep 20 16:30:03 2016 UTC and is due to finish in 60 minutes. The chair is djmitche. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:30:03 <bb-supy> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:30:03 <bb-supy> The meeting name has been set to 'weekly' 16:30:08 <djmitche> #topic Introduction 16:30:15 <djmitche> Agenda: https://titanpad.com/buildbot-agenda 16:30:35 <djmitche> https://titanpad.com/ep/pad/view/buildbot-agenda/DD7hIIU56p 16:30:47 <djmitche> I see InitHello and gracinet around 16:30:56 <InitHello> I'm alive 16:31:01 <gracinet> I am too 16:31:07 <InitHello> at work, so my latency might be higher 16:31:09 <djmitche> it's good to be alive :) 16:31:12 <djmitche> no worries 16:31:22 <gracinet> We were born to that 16:32:13 <djmitche> #topic Week In Review 16:32:57 <djmitche> #info Pierre made a bunch of improvements to the nine buildbot - https://github.com/buildbot/metabbotcfg/pull/65 16:33:26 <gracinet> for a while it was testing buildbot PRs, right ? 16:33:36 <skelly> still is I think 16:33:43 <skelly> but now it's not spamming irc 16:33:45 <djmitche> I don't think that's changed, right 16:33:46 <djmitche> right 16:34:15 <djmitche> #info Pierre made a bunch of changes to buildbot-infra to make IP management more automatic, including generating DNS 16:34:22 <djmitche> as part of the vagrant work 16:34:52 <djmitche> #info https://syslog.buildbot.net/ now using ELK to show infra syslogs 16:34:56 <djmitche> authenticated by membership in the BB org 16:35:37 <skelly> I think that leaves just me as part of infra but not part of the org 16:35:57 <djmitche> oh, we can fix that :) 16:36:04 <djmitche> what else has landed this week? 16:36:44 <skelly> almost service3 16:37:06 <djmitche> haha, we'll get to that in a bit 16:37:10 <djmitche> you're in the org now 16:37:26 <djmitche> #info BuildbotNetData is landed and rc3 release made 16:38:07 <djmitche> #info Chris Laws built a prometheus plugin for 0.9.0 16:38:09 <djmitche> https://github.com/claws/buildbot-prometheus 16:38:26 <djmitche> that's all I know about -- have I missed anything? 16:39:03 <bdbaddog> will the BuildbotNetData info show in the syslog…. ? 16:39:08 <verm__> hmm 16:39:20 <djmitche> bdbaddog: it has a separate ELK instance 16:39:47 <verm__> is it protected to only buildbot team members? 16:39:57 <djmitche> bdbaddog: https://events.buildbot.net/ 16:39:58 <djmitche> yes 16:40:28 <djmitche> #info BulidbotNetData is also available via ELK - https://events.buildbot.net/ 16:40:51 <verm__> it doesn't appear to know about the hosts or jails 16:41:07 <djmitche> syslog doesn't? 16:41:18 <verm__> yeah it's just IPs 16:41:31 <verm__> it's not splitting into per-host just per IP 16:41:49 <verm__> also for events.buildbot.net i get: 403 Permission Denied 16:41:50 <verm__> Invalid Account 16:42:01 <djmitche> events may be limited to just botherders 16:42:05 <djmitche> pierre would know for sure 16:42:17 <verm__> i thought i was part of that group? 16:42:20 <djmitche> regarding syslog, though -- yes, good point 16:42:30 <bdbaddog> ditto on events.. get 403. 16:42:31 <djmitche> oh, yearh, hm 16:43:10 <djmitche> it's limited to the "committers" team 16:43:14 <djmitche> let me see if you're in that :) 16:44:20 <djmitche> bdbaddog: added to committers 16:44:50 <djmitche> syslog is limited to "core" 16:44:55 <djmitche> maybe we should change those to purpose-specific groups 16:45:00 <bdbaddog> djmitche: still 403, reload gets 500. 16:45:16 <verm__> hmm i should be unless that's changed? 16:45:25 <skelly> sounds like an action item to refine the org groups 16:45:27 <djmitche> verm__: you are in "committers" 16:45:28 <djmitche> yeah 16:45:43 <djmitche> #action djmitche to refine access to both ELK stacks (events and syslog) 16:45:48 <djmitche> thanks :) 16:45:50 <djmitche> ok 16:46:15 <djmitche> #action tardyp to check that all hosts/jails are reporting to syslog, and that reporting is by hostname, not just IP 16:46:27 <verm__> it is definatly reporting by hostname 16:46:33 <verm__> i mean the syslog daemons are 16:46:42 <verm__> because that is how syslogng was breaking it into seperate files 16:46:43 <tardyp> oups. I forgot the meeting.. 16:46:46 <djmitche> right, I used the wrong word 16:46:47 * djmitche waves 16:46:55 <tardyp> hi hi 16:46:58 <djmitche> hi! 16:47:02 <gracinet> hi 16:47:21 <verm__> tardyp: i fixed service3 this morning so ssh is working again 16:47:24 <tardyp> reporting by hostname I think will require reverse dns to work for the internal network 16:47:29 <djmitche> so we have some access issues around events and syslog -- we can fix those up after 16:47:31 <djmitche> ah 16:47:43 <verm__> tardyp: really? so it just drops it from the syslog packet? 16:47:59 <tardyp> I am not sure actually 16:48:03 <verm__> that is strange, because reverse dns won't work in all cases since there are services that can report directly to syslog 16:48:04 <tardyp> but I guessed that 16:48:08 <verm__> which will come from the same IP 16:48:24 <djmitche> the hostname is included in the syslog info.. just a matter of getting it out and into the right slot in kibana :) 16:48:42 <tardyp> maybe this is a bad configuration, and the host tag is only the one that syslog server sees 16:48:47 <tardyp> I'll dig this 16:49:04 <djmitche> ok 16:49:16 <verm__> cool 16:49:24 <djmitche> #topic Releases 16:49:31 <djmitche> #info 0..9.0r3 released last Wednesday 16:49:39 <verm__> tardyp: if you could look at events.buildbot.net too that would be great :) 16:49:41 <verm__> login doesn't work 16:49:42 <djmitche> any news about that? 16:50:06 <djmitche> (I get a 403 from events too, btw) 16:50:11 <tardyp> events.buildbot.net works 16:50:22 <tardyp> I verified that there is just a config needed 16:50:31 <tardyp> the right github team needs to be setup 16:51:01 <tardyp> I have a pending syslog patch that will contain that 16:51:07 <djmitche> cool 16:51:16 <djmitche> you've been working your tail of tardyp 16:51:37 <djmitche> (sorry if that's too culturally specific a reference.. you've been doing a lot of work :) 16:51:48 <verm__> agreed you rock 16:52:02 <bdbaddog> +1 to Tardyp 16:52:10 <djmitche> #agreed tardyp is awesome 16:52:35 <gracinet> :-) 16:52:45 <djmitche> #topic host upgrades and service3 downtime 16:52:53 <tardyp> :) 16:53:14 <djmitche> verm__ / skelly -- can you give a quick rundown of what happened here? I didn't totally follow as it was happening 16:53:34 <skelly> I broke it, verm__ then had to shave a herd of yaks 16:53:43 <verm__> i don't know the details but it was painful 16:53:50 <verm__> haha yeah yaks sounds about right 16:53:53 <djmitche> lol 16:54:09 <skelly> based on what koobs said, I should have targetted 10.2-RELEASE instead of 10.1 16:54:17 <skelly> or just jumped to 10.3 16:54:24 <djmitche> ok 16:54:28 <verm__> basically i was out of town this weekend, spent the better part of monday tracking down several (seperate) issues and finally got into the machine 16:54:31 <djmitche> so part of the issue was downgrading instead of upgrading 16:54:37 <skelly> yeah 16:54:39 <verm__> freebsd-update broke, it updated the kernel and 3 modules 16:54:46 <verm__> and it also only did a few libs in /lib/ 16:54:49 <skelly> the hosts are old but I think 10.1 was a bit too old 16:54:57 <verm__> i built a custom libthr, libc and got it via 'fetch' to sshd could be started 16:55:02 <verm__> the rest is skelly's problem :) 16:55:16 <skelly> whereas an upgrade would have changed everything 16:55:25 <skelly> what kernel is it running? your custom or GENERIC? 16:55:33 <verm__> no idea i never checked 16:55:50 <verm__> generic is looks like a whole lot of modules are loaded 16:55:50 <skelly> GENERIC 16:56:06 <verm__> i tried to keep as many of the changes it made in place 16:56:37 <verm__> for the next upgrade let's schedule a time where we can be here together so it can be fixed right away 16:56:40 <skelly> it's back at -STABLE, so I think essentially you undid the damage and got it back to where it was when I started 16:57:02 <verm__> skelly: sort of there are still a lot of partial upgrades all over the system 16:57:10 <verm__> it 'works' but only by pure coincidence. 16:57:24 <skelly> sounds like good to get the upgrade done soon then 16:57:26 <verm__> also the jails weren't updated so they're safe and jails are mostly kernel 16:58:04 <djmitche> so the idea is to get us onto a more "mainline" upgrade process (via freebsd-update), but that failed here and we're largely back to the src-based approach we've been on for a while? 16:58:15 <skelly> we can try again with freebsd-update 16:58:27 <skelly> I am pretty sure it needs to be told to do an upgrade 16:58:38 <skelly> a full upgrade 16:58:45 <skelly> rather than scan the system and do a minimal upgrade 16:58:56 <djmitche> gotcha 16:58:57 <skelly> verm__: when would be a good time to try again? 16:59:04 <verm__> what times are good for you? 16:59:09 <djmitche> haha 16:59:13 <skelly> evenings and weekends 16:59:14 <verm__> i am in the EST timezone 16:59:22 <skelly> I can try during the day if it's largely handsoff 16:59:35 <djmitche> #info the idea is to get us onto a more "mainline" upgrade process (via freebsd-update), but that failed here and we're largely back to the src-based approach we've been on for a while 16:59:36 <skelly> which depends on how much in /etc needs merging 16:59:53 <verm__> what timezone are you in? 16:59:56 <skelly> CDT 16:59:59 <djmitche> #agreed Amar and Sean will schedule a time when they can both be around to try a full freebsd-update run 17:00:22 <verm__> skelly: ok so that works in my favour.. i don't know pick a time? 17:00:36 <skelly> tonight at 8:30 it is then! 17:00:37 <verm__> i'm usually out for two hours from 6pm-8pm every day 17:00:43 <verm__> EST? 17:00:46 <verm__> or CDT? 17:00:47 <skelly> (my time, so 9:30 yours) 17:00:52 <verm__> ok that works for me 17:00:56 <verm__> i've put it in my calendar 17:01:00 <djmitche> awesome :) 17:01:10 <skelly> that's a good range to be out as I am out for a superset of that every day 17:01:19 <djmitche> tardyp: do you want to talk a bit about the vagrant work? 17:01:24 <verm__> heh cool 17:01:58 <tardyp> Basically, vagrant is re-creating the prod in VMs 17:02:07 <tardyp> so we have 3 VMs that start freebsd 17:02:16 <djmitche> #topic Vagrant setup for Buildbot Infra 17:02:26 <tardyp> and then runs the ansible setup for those 3 hosts 17:02:36 <tardyp> which means creating the BSD jails for each of the services 17:02:39 <djmitche> including creating all of the jails? 17:02:58 <tardyp> those jails initially only contain sshd in vagrant mode 17:03:17 <tardyp> so that after you can run ansible against each of those jail as you wish 17:03:30 <tardyp> the whole setup would take to much time 17:04:02 <djmitche> gotcha 17:04:04 <tardyp> then there is nothing very complicated. It took me huge amount of time to fight against the proxies 17:04:21 <verm__> proxies? 17:04:27 <djmitche> I kinda want to get you a mifi hotspot for work :) 17:04:37 <djmitche> pierre has mandatory outgoing http proxies at work :( 17:04:39 <tardyp> my corporate proxies 17:04:40 <skelly> ssh tunnel 17:04:44 <gracinet> -D 17:04:49 <tardyp> those are http 17:05:09 <verm__> oh, that sucks :( 17:05:18 <verm__> you want me to setup openvpn for you? 17:05:27 <tardyp> now that is figured out, and does not get to much on my way anymore 17:05:40 <verm__> sorry, wish i had known 17:06:09 <verm__> i've been out of the loop on this what is vagrant going to be used for? 17:06:10 <djmitche> #info buildbot-infra is set up to create development environments, including hosts and jails, using Vagrant 17:06:20 <verm__> nevermind 17:06:22 <djmitche> it's for development of ansible patches 17:06:23 <djmitche> :) 17:06:29 <djmitche> ask and ye shall recieve an answer :) 17:06:33 <verm__> hehe 17:06:38 <verm__> very cool and great idea 17:06:45 <verm__> is it worth it to discuss the warranty issue? 17:06:46 <djmitche> gracinet: do you have an update for TLS/Endpoints? 17:06:53 <djmitche> verm__: yep, that's up next after TLS 17:06:55 <verm__> ok 17:06:58 <tardyp> verm__: so you are all setup to create trac roles in ansible! 17:07:11 <djmitche> hint hint :) 17:07:42 <verm__> tardyp: me? hah someone else volunteered to do that years ago :) i'm against putting trac under ansible or any managment system it is a nightmare to keep stable but someone is free to give it a shot i will help any way i can 17:07:45 <gracinet> djmitche: yes, since last week, I got the integration tests to work on Twisted >= 14, fixed the various lints that required it. Next are the windows failures 17:08:20 <tardyp> cool! 17:08:35 <gracinet> last week I was speaking of reaching out to Twisted to upstream the generic parts, but that's not done yet 17:08:40 <djmitche> #topic TLS/Endpoints 17:08:53 <tardyp> verm__: at least the mysql would be interresting to control 17:09:00 <tardyp> and also the backup 17:09:02 <gracinet> I also need to tidy up docstrings etc 17:09:06 <djmitche> #info progress since last week: gracinet got the integration tests to work on Twisted >= 14, fixed the various lints that required it. Next are the windows failures 17:09:32 <djmitche> oh tardyp I totally forgot to highlight that you rebuilt syslog from the ground with ansible -- *very* cool 17:09:38 <verm__> tardyp: agreed that would be very handy to not worry about that 17:09:47 <djmitche> gracinet: so sounds like we're close to ready to land in buildbot 17:09:52 <djmitche> I assume the twisted upstreaming could happen after landing 17:09:54 <gracinet> by the way, if that rings a bell, the windows failure says that privateKey is passed more than once, that looks like what happens with kwargs used positionally, I did not really closer than that 17:10:05 <gracinet> djmitche: yes, definitely about the upstreaming 17:10:11 <djmitche> cool 17:10:24 <djmitche> yes, positional vs keyword args :( :( 17:10:40 <djmitche> python-3 has some extra syntax to help with that, but of course we can't use it :( 17:10:48 <gracinet> the thing is, I can do guesswork with windows, but I don't have any windows system near me to actually reproduce 17:10:57 <tomprince> gracinet: What is it that you are looking at upstreaming? 17:10:59 <djmitche> #info Will try to upstream the generic parts to twisted after landing 17:11:04 <gracinet> hey tomprince 17:11:19 <djmitche> gracinet: i can help look at the traceback after the meeting if you'd like 17:11:25 <gracinet> it's about the PB factories in the worker 17:11:29 <gracinet> djmitche: that'd be great, thanks 17:12:17 <gracinet> to summarize, most of the features here are generic (applicative keepalives, an additional timeout compared to what Twisted's ClientService provides and autologin) 17:13:15 <gracinet> besides I suspect that integration of PB clients with endpoints might move a bit in Twisted (e.g, they are still on ClientFactory, that's not 100% new, endpoint style) 17:14:47 <verm__> tardyp: FYI there is a logstash.conf.sample which has the bits we need to capture the hostname 17:14:59 <gracinet> on top of that, the Worker service adds graceful shutdowns, and that's definitely buildbot-specific 17:15:05 <verm__> it hasn't been copied to the running conf 17:16:24 <djmitche> tomprince: any advice on trying to upstream that? 17:17:03 <tomprince> It isn't entirely clear that there is anything that needs upstreaming there. 17:17:09 <tardyp> verm__: I saw that, but this is for syslog files, and here we are using syslog tcp server 17:18:01 <djmitche> I remember that confusing me about logstash too 17:18:22 <djmitche> tomprince: haha, ok, so I read that as "gracinet may need to make a strong case that these should be upstreamed" 17:18:28 <gracinet> tomprince: it's not a need, I mean, we'll be able to maintain it, sure, but on the other hand why not ? It's generic 17:18:37 <djmitche> yeah, worth a conversation anyway 17:18:47 <verm__> tardyp: you need to add a forwarder under "network" and list the IP 17:18:48 <djmitche> ok, let's talk about warranties 17:19:00 <djmitche> #topic Hardware Warranties 17:19:02 <verm__> on the same level as "files" i forget but it's possible 17:19:25 <djmitche> #info Warranties are expiring on our iX hardware; renewal for two years is about $900 17:19:29 <tomprince> djmitche: More like, looking at the code, it isn't clear to me what is being probdied on top of ClientService. 17:19:40 <tardyp> verm__: gtg. I'll look at that tomorrow 17:19:47 <verm__> tardyp: no no problem cya 17:19:50 <gracinet> I suppose many other PB applications would use an autologin with auto-relogin (if login depends on the client, not on a user of client) 17:20:11 <verm__> djmitche: we should have money leftover even if we use the money put aside for the mac mini 17:20:24 <verm__> having a warranty for another two years would be really good 17:20:40 <skelly> agreed 17:20:40 <djmitche> i agree 17:20:48 <verm__> RTEMS had a machine fail 17:20:49 <skelly> would we want to replace after that? 17:20:50 <djmitche> that's the maximum extension, right? (3yr to 5yr) 17:20:56 <gracinet> tomprince: discuss later ? 17:20:57 <verm__> it was sent out and back from california, took 2 days 17:21:00 <tomprince> gracinet: Sure. 17:21:09 <djmitche> thanks tomprince gracinet 17:21:23 <verm__> djmitche: no idea good question i can ask but i think they'll let you extend more if your machines are OK 17:21:41 <djmitche> ok 17:21:43 <verm__> skelly: if we can afford it yes, at least our main machines 17:21:53 <verm__> then the current ones become developer boxes or whatever we need them for.. or sell them 17:21:58 <bb-github> [13buildbot-infra] 15tardyp opened pull request #157: Syslogelk (06master...06syslogelk) 02https://git.io/viSP6 17:22:07 <bdbaddog> in my experience after a certain point they start to jack up the prices on warrantees.. 17:22:22 <djmitche> like "you're paying rent for these ancient spare parts on our shelf" :) 17:22:26 <verm__> iXsystems is really good to opensource projects 17:22:48 <verm__> anyway if they don't offer it or it is too expensive we can think about it then 17:22:50 <djmitche> it sounds like there's general agreement that not running un-covered hardware is good 17:22:58 <verm__> we'll have peace of mind for the next two years atleast hehe 17:23:11 <tomprince> Does it make sense for us to have physical machines, rather then getting sponsored cloud hosting? 17:23:18 <verm__> djmitche: yes absolutely, not for critical machines 17:23:21 <djmitche> so I think the practical question is, do we have the funding (I agree we probably do, I just want to check) 17:23:53 <skelly> have we asked about sponsored cloud hosting? 17:23:56 <verm__> tomprince: more flexibility and security 17:24:02 <djmitche> tomprince: as we move off this hardware, I think that's definitely worth considering, but building cloud infra requires a different approach to reliability 17:24:14 <verm__> yes and a lot more expensive to maintain that redundancy 17:24:22 <djmitche> depending on how you do it, yeah 17:24:24 <tomprince> verm__: Do we need the flexibility? 17:24:38 <verm__> it's been nice to setup and do what we want on a whim 17:24:41 <djmitche> we've definitely massively underutilized our 3 years of hardware 17:24:50 <verm__> i think that is changing now 17:24:55 <djmitche> yeah 17:25:18 <verm__> and i also belive the only reason it's changing is because we can, with very little effort since we only answer to ourselves 17:25:23 <verm__> i guess we'll talk about it in 2 years? 17:25:24 <tomprince> verm__: You can get that with cloud VMs too. 17:25:43 <djmitche> it sounds like it's worth considering not paying the warranty renewal and planning a cloud migration 17:25:51 <skelly> action item for 18 months from now 17:25:54 <djmitche> at a high level though 17:26:09 <djmitche> 1) someone would need to manage that migration 17:26:09 <verm__> i think there's enough going on right now that if we can pay the warranty to keep what we have let's do it 17:26:12 <gracinet> what's the order of magnitude we're talking about ? 17:26:17 <djmitche> 2) cloud services will require some kind of ongoing income 17:26:22 <gracinet> (of cost) 17:26:31 <djmitche> gracinet: the warranty is about $900 17:26:42 <verm__> <skelly> action item for 18 months from now <-yes please 17:26:45 <gracinet> and the hosting has no recurring fares ? 17:26:52 <verm__> other than the service3 flub everything has been extremely stable 17:26:55 <djmitche> it would -- what those would be is hard to say 17:26:55 <djmitche> haha 17:27:06 <verm__> gracinet: no it's free 17:27:14 <skelly> I'm too used to console access for when I break things :( 17:27:15 <verm__> www.osuosl.org 17:27:22 <djmitche> oh, sorry, I thought you meant the cloud-hosting alternative.. yes - osuosl is free 17:27:22 <verm__> skelly: i will fix that, we have console access 17:27:30 <verm__> but an unrelated problem has happened 17:27:32 <tomprince> djmitche: Or ongoing sponsorship, but cloud sponsorship is probably easier than $$. 17:27:33 <skelly> right 17:27:38 <djmitche> tomprince: right 17:27:47 <skelly> hyper gave us some credits 17:27:49 <verm__> i will bump up it up on my priority list after i move i will get everyone in sysadmin access to the private vlan that has the IPMI hosts 17:27:52 <djmitche> tomprince: so are you OK tabling that until 2018 or so and paying the warranty for now? 17:27:57 <tomprince> I know twisted has something like $1000/month from rackspace. 17:28:09 <tomprince> Yeah, I'm fine with that. 17:28:15 <verm__> yes let's do that please 17:28:47 <verm__> skelly: i have a hack that lets you forward IPMI over SSH 17:28:53 <verm__> i will send an email to bsys about it 17:29:00 <skelly> okay 17:29:00 <verm__> it's annoying and frail but it works 17:29:15 <skelly> can't be worse than bringup on beta hardware that I've done 17:30:01 <djmitche> #agreed but for now, pursue the renewal 17:30:03 <verm__> heh 17:30:14 <djmitche> yeah, I do not miss IPMI 17:30:32 <verm__> oh, IPMI is really stable i've never had problems with it it's just the SSH tunneling that sucks 17:30:49 <djmitche> u funny man :) 17:31:07 <verm__> more likely my brain has learned to ignore all the annoying bits but it gets the job done 17:31:59 <djmitche> haha, yeah 17:32:03 <djmitche> so it seems like the remaining two bits are for botherders: 17:32:05 <djmitche> 2. vote on whether to spend 17:32:07 <djmitche> 1. determine if we have the funds 17:32:11 <verm__> yep 17:32:19 <djmitche> ugh, irc & reordering 17:32:31 <djmitche> any other business or should we wrap up? 17:32:45 <verm__> wrap up i think, great meeting and thank you for charing it!! awesome job as usual 17:33:43 <djmitche> #endmeeting