14:00:50 #startmeeting infra weekly 14:00:50 Meeting started Mon Jun 24 14:00:50 2013 UTC. The chair is ewoud. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:50 Useful Commands: #action #agreed #help #info #idea #link #topic. 14:00:53 #chair obasan knesenko 14:00:53 Current chairs: ewoud knesenko obasan 14:01:01 eedri: ? 14:02:44 dcaro not here either? 14:03:00 ewoud, dcaro is not here today 14:03:24 ok 14:03:30 I see I've been slacking with the agenda 14:04:26 lets got guys 14:04:31 go* 14:04:36 #topic hosting 14:05:01 knesenko: any progress on the rackspace servers? 14:05:19 ewoud: yes ... I have installed ovirt engine service there 14:05:33 there was some issues with PTR records ... 14:06:07 * eedri here 14:06:11 so I have installed DNS server on rackspace01.ovirt.org that holds PTR records for the rackspace01,02 14:06:21 #chair eedri 14:06:21 Current chairs: eedri ewoud knesenko obasan 14:06:51 knesenko: and you have set up that as recursor for the rackspace machines? 14:07:02 I opened 80,443 ports in iptables , but seems like we are blocked by the HW firewall there , so I opened a ticket for rackspace guys 14:07:11 ewoud: yes 14:07:38 So I think the firewall issue will be solved soon 14:08:04 also I changed the schema a little bit 14:08:10 knesenko: and the DNS issue? 14:08:22 ewoud: DNs issue solved 14:08:31 regarding the schema ... 14:08:53 We will use rackspace01 as engine and NFS server .... instead of sing the localstorage 14:09:16 knesenko: but I think you don't want to run a DNS server in the long run and just have the PTR records served by rackspace 14:09:19 I mean rackspace01 will be engine/hosts at the same time but without localstoage 14:09:35 how so? won't that be a lot slower? 14:09:35 ewoud: they can't handle it .... we asked them 14:09:36 ewoud, rackspace said they don't support PTR records for private ips 14:09:47 ewoud, only public ips 14:09:51 ah 14:09:59 and you need PTR? /etc/hosts is insufficient? 14:10:02 ewoud: this will be a bit slower, but we will have all HA features 14:10:24 ewoud: PTR is must ... 14:10:34 lest thing I wanted to do is to install a DNS server 14:10:35 :) 14:10:39 last* 14:10:44 LOL ewoud is a chair ;) 14:10:58 chair == voorzitter 14:11:06 knesenko: but NFS isn't HA, so what do you win? 14:11:08 ewoud: chair"man";) 14:11:20 ewoud: NFS can be HA 14:11:25 if the backend supports it 14:11:37 ewoud: 2 hosts in the same DC 14:11:55 instead of using 1 host per DC 14:12:08 knesenko: but who is the NFS server? 14:12:10 we will have 1 DC with 2 hosts in it 14:12:16 rackspace01 14:12:42 so if rackspace01 goes down, it all goes down? 14:13:12 ewoud: same with allinone 14:13:40 knesenko: not true, with all-in-one rackspace02 will keep running if rackspace01 goes down 14:14:29 ewoud: yes, but you can't manage them 14:14:38 ewoud: engine will be down 14:14:50 knesenko: but that's less of a problem imho 14:15:10 ewoud: there are benefits to use NFS ... 14:15:21 ewoud: for what enviroment is this ? ovirt test ? 14:15:29 ewoud: we have two choices here ... NFS or 2 local DCs 14:15:39 knesenko: but why not gluster instead of NFS? then you'd at least have the benefit of HA storage 14:15:55 ewoud: possibl 14:15:55 * Yamaksi has some netapps laying around... 14:15:56 e 14:16:10 Yamaksi: computing power for CI using jenkins 14:16:26 ewoud: CI ? 14:16:29 ewoud: gluster is an option .... we can go with that as well 14:16:30 Code Igniter ? 14:16:38 continious integration 14:16:59 ewoud: and that is going todo ? 14:17:01 ewoud: using gluster will make our NFS HA ? 14:17:06 Yamaksi, stateless vms for jenkins slaves 14:17:24 ah ok 14:17:53 uhm, guys, why not have a "mirror" somewhere which can provide it ? We have redundant netapps in a cluster that cannot go down 14:18:10 unless you unplug the cable 14:18:15 (s) 14:18:55 ewoud: i am sorry 14:18:56 Yamaksi: they're rather stateless so it's all throw away data, which is why I think HA is less important than uptime 14:19:15 ewoud: i was disconnected .... can you repeat ? 14:19:25 knesenko: you missed nothing 14:19:40 ewoud: I asked if gluster will make NFS HA ? 14:19:42 sorry got disconnected from network 14:19:46 ewoud: okay, but you want to "share" data don't you ? 14:19:49 knesenko: and you DC'ed before I could answer 14:20:27 knesenko: I don't know how production ready gluster is and what the performance does, but gluster will replace NFS 14:20:52 knesenko: it does replication so the data will be both on rackspace01 and rackspace02 14:21:15 ewoud, i don't think we need to invest too much in HA for jenkins slaves 14:21:17 ewoud: want to try gluster ? 14:21:29 ewoud, it's stateless vms that we can always resintall with foreman 14:21:32 i am really don;t want to use local storage 14:21:49 ewoud, as long as they will be properly puppetize 14:22:23 eedri: I fully agree, but I don't think NFS is a solution for us 14:22:38 ewoud, and local storage? 14:22:49 ewoud, will be a problem too? 14:22:49 it only gives the illusion of HA while in practice it will double the chance of downtime in this case 14:23:45 eedri: if you use local storage, the VMs on rackspace02 will keep running when rackspace01 is down 14:24:19 but we need to think about the future as well .... what if we will grow and we will grow ? 14:24:21 when you use NFS on rackspace01, both hosts will be down while you perform maintenance 14:24:35 we will get one more bare metal host 14:24:53 but gluster solves it ... 14:24:55 right ? 14:25:02 knesenko: then depending on what we want to do, we IMHO either go for gluster or local storage again 14:25:40 ewoud: doesn't it depends on the rackspace backend ? I mean performance 14:25:52 Yamaksi: they're bare metal 14:26:02 I vote for gluster 14:26:50 obasan: eedri ewoud ? 14:26:50 knesenko, what is the process for installing gluster? 14:26:58 knesenko, installign the rpms on one baremetal? 14:27:07 eedri: its built in the allinone installation 14:27:09 knesenko, I heard that gluster is good solution 14:27:26 knesenko, ok, we're still early in the installaion, so no harm 14:27:30 +1 for gluster 14:27:52 ewoud: aha, no local storage than 14:27:53 guys, we can try to use gluster .... is this wont work, installing a localstorage takes 5 minutes 14:28:08 s/is/if 14:28:09 +1 on trying, if not fall back to local 14:28:22 ok so we decided to go with gluster 14:29:03 #agree we're going to try to set up gluster on rackspace hosts and fall back to local storage if it doesn't work out 14:30:10 knesenko: I also see another action item for you 14:30:20 ewoud: which one please ? 14:30:21 the migration plan for linode resources => alterway 14:30:50 ewoud: didn't touched it yet ... let me finish with rackspace servers and i will move to the migration plan 14:31:12 sounds good to me 14:31:39 ewoud: still we can;t migrate until we wil have answers for alterway setup ... 14:31:47 external sotrage and VM for engine 14:32:05 ewoud, i'm waiting for answers on addtional resources from rackspace that might help 14:32:21 ewoud, we might get an additional baremetal and some VMs. 14:32:42 knesenko: true, and it seems quite stable now so I'd rather focus on installing the jenkins slaves now 14:32:47 ewoud, do you know if there might be an issue running engine on rackspace that manages alterway servers ? 14:32:48 eedri: ok 14:33:21 eedri: I think you need layer 2 access and I don't know how well it reacts to a higher latency 14:34:57 it will be better to use a VM that will be located in the alterway DC 14:35:08 ewoud: i am not sure about L2 ... 14:36:29 knesenko: I don't know either 14:36:40 ewoud: I can ask ... 14:36:42 :) 14:36:49 please do 14:37:24 ok 14:37:49 ewoud, can we ask kevin is that's possible? 14:37:50 so to summarize: we're going to install the rackspace hosts now as a gluster cluster, then think about alterway hosting and linode migration? 14:38:02 ewoud, +! 14:38:03 ewoud: yes 14:38:04 +1 14:38:45 ok, then let's move on 14:38:56 unless there's more about hosting 14:39:16 no more 14:39:48 quaid: hello 14:40:00 ok 14:40:23 obasan: your action item about monitoring openshift quota, any progress? 14:40:37 ewoud, yes 14:40:41 ewoud, I have a solution for that 14:41:07 ewoud, all there is to do is ssh to the openshift instance 14:41:17 eedri: Oved fixed - ovirt_engine_find_bugs 14:41:21 eedri: good news 14:41:24 ewoud, ssh foo@bar-ohadbasan.rhcloud.com 14:41:27 knesenko, :) 14:41:38 ewoud, and then run the command "quota" 14:41:54 knesenko, great, now we need to get unit_tests fixed (but let's wait till we reach jenkins topic) 14:42:13 obasan: I knew that part was possible, but do you know if we can easily hook that into icinga? 14:42:28 ewoud, that won't be any problem. 14:42:35 ewoud, it can be executed by icinga as a command... 14:43:10 ewoud, just a custom script that sends the command. parses the output and alerts if needed... 14:44:06 obasan: cool 14:45:58 ok, anything else on hosting? 14:47:41 ewoud, well 14:47:54 ewoud, about fedora17 slaves upgrade to f19 14:48:12 ewoud, we need to ask on ovirt meeting if it's OK to stop running tests / delivery nightly builds for f17 14:48:19 ewoud, and upgrade your host to f19 instead 14:48:38 ewoud, or we can wait for rackspace to be ready and install f19 slave there 14:49:07 eedri: then I think that f17 will still be outdated 14:49:18 eedri: can you ask if it's OK to stop? 14:49:42 ewoud, i can send email to the list, not sure if i'll attend the meeting tomororw 14:49:46 mburns, ping 14:50:06 mburns, do you know if we can stop supporting f17 in jenkins and upgrade the slave to f19? 14:50:21 eedri: i'd say yes 14:50:34 mburns, so no more nightly builds for f17 14:50:41 eedri: makes sense to me 14:50:48 mburns, would you say it's worth rasing in tomorrow meeting? 14:50:53 though we should definitely have f19 builds 14:50:54 mburns, or to go ahead with it 14:51:21 eedri: probably worth bringing up 14:51:30 mburns, ok 14:51:37 mburns, thanks 14:51:42 eedri: i would think you could move most of the slaves to f19 14:51:51 mburns, what about f18? 14:52:00 mburns, we currently have 2 f18, 1 f17 14:52:04 oh 14:52:06 and one rhel 14:52:20 let's leave it as is for now, and we'll get agreement on the weekly meeting 14:52:26 mburns, ok 14:53:18 eedri: anything else on jenkins? 14:53:21 Hi 14:53:31 Sorry I am so late - was on a train 14:53:31 dneary: hi 14:53:41 ewoud, there is an issue with jenkins backups 14:53:43 ewoud, i opened a ticket 14:53:54 ewoud, might worth going over the trac tickets 14:54:12 dneary, hi 14:54:50 eedri: I didn't see it 14:55:16 but we certainly should go over the issues 14:55:36 RH TLV has been a bit unstable lately 14:56:27 bad network issues here... sorry 14:57:03 eedri: yes, it's been bad for the past week I think 14:57:13 ewoud, you're too? 14:57:19 ewoud, so it's OFTC issue? 14:57:31 eedri: no, I just see a huge wave of nat-pool-tlv-t1 going offline 14:58:01 mburns: where will the ISO be published, also on the docs or only on gerrit ? 14:58:15 eedri: can you link which ticket you were refereing to? I can't find it 14:58:27 Yamaksi: it will be published on ovirt.org 14:58:47 eedri: is it https://fedorahosted.org/ovirt/ticket/59? 14:58:49 Yamaksi: it will go under here: http://resources.ovirt.org/releases/node-base/ 14:59:30 ewoud, yep 14:59:51 mburns: ah nice, was looking there already. Will place an nephew on it an tell him to press F5 every second ;) 14:59:54 *a 14:59:57 ewoud, i have another topic on hosting 15:00:13 eedri: do go ahead 15:00:24 ewoud, recently we've been hitting alot of issue with wiki on openshift... out of space/slowness 15:00:34 ewoud, and lack of response on irc channel as well 15:00:36 Yamaksi: we're probably at least a few hours away from having something posted 15:01:00 mburns: ah it will keep him busy, he has vacation I guess :) 15:01:02 ewoud, should we consider migrating it out or it and into another service ? (on one of our vms/rackspace) 15:01:05 keeps them from the street ;) 15:01:21 eedri: possibly 15:01:34 ewoud, worth openning a thread on it on the list 15:01:40 ewoud, see what our options are 15:01:56 ewoud, the wiki page had too much downtime lately, which is not health for the project... 15:01:59 eedri: Yes, a ML thread sounds good 15:02:08 and I fully agree with that 15:02:20 dneary, ^^? 15:02:23 dneary, what do you think? 15:02:29 by using PaaS we shouldn't have to worry about it 15:02:47 eedri, Catching up 15:02:55 ewoud, yea.. but something isn't working apparantley 15:03:03 eedri, Yes, agreed re wiki 15:03:14 dneary, what are our options ? 15:03:37 Garrett is working on an update this week which will make things better wrt disk usage on the PaaS - that's been our main issue 15:03:51 dneary, and the slowness? 15:03:56 There was a mjor upgrade of infrastructure ~3 weeks ago which is causing this "no email" situation 15:04:03 it will also have other bugfixes and an improved mobile experience 15:04:10 The slowness was another badly behaved app. That just shouldn't happen 15:04:24 dneary, i got a compain today from tlv site 15:04:26 I'm chasing it down with the OpenShift guys 15:04:35 dneary, but that might be relevant to local network issues.. not sure 15:04:45 eedri, Yes, it was very slow this morning, it cleared up ~11:30 CEST 15:05:22 dneary, so you're saying we should give it a chance ? and keep it in openshift for now 15:06:20 eedri, Yes - let us get this update out the door, and we'll re-evaluate in a month 15:06:29 eedri, Report will go to infra@ after that 15:06:34 dneary, ok. thanks 15:06:38 (after the update, that is) 15:07:10 I'm also quite overdue with setting a new meeting time 15:07:56 right, I think we're over time now, so any last items? 15:09:47 going once 15:09:48 eedri, This was probably covered before I arrived, but we talked about getting together a "who has access to what/how to restart/fix service X if it's down/broken" in the wiki 15:09:52 Does anyone own that? 15:10:08 dneary: I don't really think so 15:10:55 we did discuss it a few times and I think the closest we came was http://lists.ovirt.org/pipermail/infra/2013-April/002625.html 15:11:22 ewoud, Can we put a name and a deadline to it? 15:11:39 If it doesn't get done by then, fair enough - but at least we'll be able to check progress each week 15:11:59 dneary: do I hear a volunteer? :) 15:14:34 eedri: is there a jenkisn job that runs the engine junit tests? 15:17:09 ewoud, I wish I could 15:17:14 I don't have most of the information 15:17:23 Nor a decent chunk of time 15:17:44 theron, Do you have some time? 15:17:56 dneary, I do. but we have a call in 15. 15:18:14 theron, I mean, in the next month or so, to put together ^^^ 15:18:25 dneary, yes lol :) 15:18:42 It doesn't have to be done in the next 15 mins 15:18:50 Althoug if it were, that would be cool :-) 15:18:53 dneary: I'm also quite lacking time 15:19:12 ewoud, Seems like Theron just "volunteered" :-) 15:19:13 dneary: we need to compile more info from the ML to the wiki 15:19:35 dneary, I can certainly "try" 15:20:25 dneary, we'll need to sort it out certainly. 15:22:34 lhornyak, yes 15:23:04 #action theron compile a list of services and who has access 15:23:09 #endmeeting