15:04:53 <knesenko> #startmeeting oVirt Infra
15:04:53 <ovirtbot> Meeting started Mon Dec 23 15:04:53 2013 UTC.  The chair is knesenko. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:04:53 <ovirtbot> Useful Commands: #action #agreed #help #info #idea #link #topic.
15:05:00 <knesenko> #chair ewoud dcaro eedri
15:05:00 <ovirtbot> Current chairs: dcaro eedri ewoud knesenko
15:05:03 <knesenko> orc_orc: here ?
15:06:42 <knesenko> #topic Hosting
15:06:47 <knesenko> hello guys
15:07:22 <ewoud> so I think rackspace is a big issue to talk about now
15:07:28 <knesenko> ewoud: yes
15:07:52 <knesenko> I don't even looking on their answer on the ticket they are trying to handle for few month
15:08:00 <knesenko> not worth our time
15:08:06 <bkp> knesenko: Here, too. Running late
15:08:13 <ewoud> I think eedri sent a good start, but I'm having a hard time deciding because I don't know budgets constraints etc
15:08:15 <knesenko> since we are planning to move to another infra provider
15:08:25 <knesenko> #chair bkp
15:08:25 <ovirtbot> Current chairs: bkp dcaro eedri ewoud knesenko
15:08:32 <knesenko> hello bkp
15:08:34 <eedri> ewoud, i already have a proposal for engine VM and storage server
15:08:52 <eedri> ewoud, i wanted to wait for other suggestions of the hypervisors specs
15:09:01 <knesenko> bkp: maybe you should introduce yourself to ewoud ? :)
15:09:14 <ewoud> that would be helpful :)
15:09:22 <bkp> ewoud: This is Brian Proffitt, the new Community Manager
15:09:45 <eedri> bkp, welcome brian!
15:09:50 <bkp> Thanks!
15:09:53 <ewoud> bkp: welcome
15:09:59 <knesenko> welcome aboard !
15:10:09 <knesenko> ok lets continue
15:10:19 <nsoffer> bkp, welcome
15:10:20 <knesenko> I think that starting with NFS is good
15:10:48 <eedri> knesenko, i have a quote for storage server
15:10:48 <ewoud> without knowing anything about the budget, there's not much more I can say than 'bigger is better' ;)
15:10:51 * eedri looking for it
15:10:52 <skyy111> is there any updated instructions for installation from source for ovirt 3.3 or later?
15:11:02 <eedri> ewoud, yea, that's always true
15:11:10 <knesenko> :)
15:11:13 <eedri> ewoud, i'm trying to really understand our needs though
15:11:34 <eedri> ewoud, for example i wouldn't got with 96 MEM, but rather with 64
15:11:39 <knesenko> eedri: we need powerful hypervisors to run jenkins slaves
15:11:40 <eedri> ewoud, if we can upgrade it later
15:11:59 <eedri> ewoud, every cost we'll involve might affect other services we might not get
15:12:09 <eedri> ewoud, i'm not sure exactly what is the budget
15:12:31 <ewoud> eedri: we could build slaves more dedicated to some tasks to save memory
15:12:45 <knesenko> eedri: maybe we should sync with the guy who know more about the budget we have, and then we can discuss what iss the best we can get with the budget we have
15:12:47 <knesenko> ?
15:13:03 <ewoud> eedri: for example, we don't need to build java on all platforms IMHO, so having big slaves for all platforms might not be needed
15:13:15 <ewoud> eedri: where vdsm needs more platforms but less memory
15:13:26 <bkp> skyy111: We're having a meeting right now, can you give us a bit? Thanks!
15:13:36 <eedri> ewoud, yea, i also thought of that
15:14:07 <eedri> ewoud, but i think there is a value in having the same hardware for all hypervisors
15:14:19 <eedri> ewoud, since we will need them in the same cluster for maintance/migration/etc..
15:14:34 <ewoud> eedri: hypervisors: yes; guests: no
15:14:48 <eedri> ewoud, yes, i'm only talking on hypervisors
15:14:56 <eedri> ewoud, the bare metal hosts
15:15:06 <ewoud> so I like the proposal of having 2 or 3 bare metal hosts + SAN/NAS
15:15:10 <eedri> ewoud, guests we'll handle ourselves after we'll set the hypervisors
15:15:18 <eedri> ewoud, i think 3 is a much
15:15:19 <eedri> must
15:15:23 <eedri> ewoud, for maintanance
15:15:27 <knesenko> eedri: +1
15:15:30 <ewoud> given NetApp is a sponsor, could we get them to sponsor some HW?
15:15:59 <eedri> ewoud, that's can be an option, but the current propoal for softlayer isnt relevant to him
15:16:06 <eedri> maybe bkp can help with that
15:16:31 <knesenko> ewoud: why NetApp ?
15:16:46 <bkp> As far as getting the budget numbers, or getting budget lined up?
15:16:59 <eedri> ewoud, this is the storage server i got a qoute for : https://www.softlayer.com/Sales/orderQuote/7bff92a098d78da6ed795c5832d99738/1052649
15:17:01 <ewoud> knesenko: a sponsor can generally provide better HW for the same budget
15:17:37 <knesenko> ewoud: yes I understand that ... but why NetApp ... maybe we can ask for EMC as well ?
15:17:43 <ewoud> that said, I have little knowledge of SANs
15:18:13 <eedri> https://www.softlayer.com/services/storagelayer/quantastor-servers
15:18:19 <ewoud> knesenko: NetApp came to mind first because I read most about them and oVirt on blogs, but I have no preference
15:18:23 <eedri> quantastor server is anyone is familiar with it
15:18:31 <knesenko> ewoud: ah ok :)
15:18:49 <eedri> ewoud, knesenko guys, let's keep to the real and actual provider we have now
15:19:09 <eedri> ewoud, knesenko we don't know if any other those companies is will to provide support for ovirt yet,
15:19:30 <eedri> focusing on migrating as soon as we can from existing vendor
15:19:36 <knesenko> eedri: so this is a proposal for a storage server ?
15:19:39 <eedri> anyone has comments on the suggested storage server?
15:19:44 <knesenko> the link you sent ?
15:19:52 * knesenko is looking
15:19:53 <eedri> knesenko, yes, initial 2TB storage nfs
15:20:09 <eedri> on quanta store
15:20:38 <knesenko> eedri: I am not a storage expert, but it depends on a disks we have
15:20:42 <eedri> currently runs on SATA disks
15:20:45 <bkp> Newbie question: how does this compare to what we were paying before?
15:21:07 <eedri> of course SSD will be faster, but much more expensive
15:21:11 <ewoud> and SAS?
15:21:13 <eedri> bkp, in terms of cost?
15:21:26 <eedri> bkp, or performance?
15:21:33 <bkp> Yes, to start. Config too...
15:22:41 <knesenko> eedri: SAS = SATA * 2 in price
15:22:44 <knesenko> eedri: +-
15:23:03 * orc_orc rolls in late to the office
15:23:17 <eedri> bkp, softlayer should be cheaper than current vendor afaik
15:23:44 <eedri> bkp, also, service should be better, they a usefull live chat option that proved helpfull when i was digging for proposals
15:24:03 <eedri> knesenko, i'm not sure they even offer sas.
15:24:11 <eedri> orc_orc, here?
15:24:12 <ewoud> eedri: more a general question than the storage: suppose we do run into limitations, how easy/fast can we switch?
15:24:20 <bkp> Right, and from what I've picked up, it's going to be better service from the get-go.
15:24:26 <eedri> orc_orc, i remember you wanted to comment on the hardware specs
15:24:48 <eedri> ewoud, i think they offer very flexible upgrades
15:24:49 <knesenko> #chair orc_orc
15:24:49 <ovirtbot> Current chairs: bkp dcaro eedri ewoud knesenko orc_orc
15:25:15 <eedri> ewoud, for example if we choose a server that can support up to 256 GB mem, no issues with upgrading
15:25:28 <eedri> ewoud, also, each storage server supports up to 12-24 disks
15:25:28 <ewoud> eedri: and how long would the contract be?
15:25:52 <ewoud> not that I expect the same thing we have now at rackspace, but then again, we didn't expect it at rackspace either
15:25:53 <eedri> ewoud, so even if we choose one disk, we can monitor it and change to a better one laster one
15:26:13 <orc_orc> eedri: I run a public colo / hosting business in a high end datacenter
15:26:19 <eedri> ewoud, from experienee from other groups in $company, it seems the they are safisfied
15:26:29 <ewoud> eedri: ok, sounds good
15:26:34 <eedri> orc_orc, did you happen to see the email i sent on the specs?
15:26:55 <eedri> orc_orc, i'm trying to get a ballpark estimation on which servers we should use for the hypervisors
15:27:02 <orc_orc> yes * I did
15:27:05 <eedri> orc_orc, which storage server we should use
15:27:17 <orc_orc> usually a hoting center does not care so much about the hardware as the following:
15:27:20 <orc_orc> the RUs used
15:27:23 <orc_orc> the BW used
15:27:27 <orc_orc> the A used, and
15:27:28 <orc_orc> the '
15:27:36 <orc_orc> 'hands time' needed
15:27:54 <orc_orc> the customer specifies needs and they return a price
15:28:01 <eedri> orc_orc, i'm trying to think on it form a CI point of view, what our slaves will need
15:28:09 <orc_orc> sometimes IP leasing if the custoer does not have an ASN block to use
15:28:20 <eedri> orc_orc, not sure i follow
15:28:30 <orc_orc> eedri: how capacty constrained are we presently from usage stats?
15:28:50 <eedri> orc_orc, pretty constrained from serveral points
15:28:51 <orc_orc> as I understand it, R'03 was needed for space, not compute strength
15:29:01 <orc_orc> eedri: what other points?
15:29:03 <eedri> orc_orc, 1st i would say that using local storage for all vms pretty much lower the performance
15:29:15 <eedri> orc_orc, and limits us from adding more vms
15:29:19 <ewoud> orc_orc: I think we're more interested in iops than raw storage, but we are low on storage currently
15:29:28 <eedri> orc_orc, so one of the most important issues is storage imo
15:29:36 <orc_orc> eeI have heard that said -- but I do not find a formal study indicating local store is sloter than, say, NFS on like loads
15:29:43 <orc_orc> slower*
15:29:54 <ewoud> I'd expect local storage to be faster tbh
15:29:58 <eedri> orc_orc, so maybe its worth investing more in NAS/SAN solution than taking the best servers for hypervisors
15:29:59 <ewoud> no network latency
15:30:01 <orc_orc> as do I
15:30:12 <orc_orc> but I am engaged in a study of this atm
15:30:32 <eedri> ewoud, maybe the disks that were used were not fast enough then
15:30:33 <orc_orc> we are rarely compute constrained
15:30:42 <eedri> orc_orc, ewoud there is also the specific jobs on ci that needs cpu
15:30:47 <eedri> orc_orc, ewoud like findbugs for e.g
15:30:53 <orc_orc> eedri: cpu, or ram to work in?
15:30:55 <eedri> orc_orc, ewoud or any other static analysis
15:31:11 <orc_orc> we find ram constraints are the major chokepoint
15:31:16 <eedri> orc_orc, i think there are some more mem oriented like maven builds with gwt compilation
15:31:32 <knesenko> for sure we need good HDs .... jenkins slaves creates a lot of IO
15:31:36 <eedri> and other cpu cusuming like findbugs
15:32:13 <orc_orc> eedri: as to commercial backend SAN servers, is this saying that iscsi, nfs, and gluster are 'less good' choices', or more the 'brand name effect is driving the desire?
15:32:45 <eedri> orc_orc, i wouldn't care the any brand.. as long as it's performance is good enough for us
15:32:47 <orc_orc> ... so I thought the email specifying hardware was a bit early in the process
15:32:47 <ovirtbot> orc_orc: Error: ".." is not a valid command.
15:32:50 <ewoud> though if we're flexible in upgrades I'm leaning towards starting sooner and upgrade in a month or 2 if needed
15:32:50 <eedri> orc_orc, and maintance is low
15:32:53 <orc_orc> ... so I thought the email specifying hardware was a bit early in the process
15:33:07 <eedri> orc_orc, what do you suggest?
15:33:25 <ewoud> that said, generally upgrading a hypervisor is easier because downtime is more acceptable than your SAN
15:33:30 <orc_orc> eedri: first, I think this interactive discussion is very good, compared to email
15:33:41 <eedri> orc_orc, i agree
15:33:45 <orc_orc> perhaps we should ask for a conference bridge and discuss in real time
15:34:00 <eedri> orc_orc, i can arrange a conf call if needed
15:34:15 <orc_orc> bkp: knows the model of the weekly LSB conference call, and thoase are very productive in knocking out issues
15:34:35 <orc_orc> eedri: I would ask for that .. the holiday schedule hurts a bit, but
15:34:42 <eedri> orc_orc, i'd really like to make the right choise here for ovirt infra going forward
15:34:50 <orc_orc> eedri: ++
15:35:09 <eedri> orc_orc, and not revisit again a wrong infra layout
15:35:14 <bkp> I agree, with the caveat of the holidays
15:35:18 <orc_orc> so testing and surveying where the hurt points before deciding is a good thing
15:35:21 <knesenko> agree
15:35:45 <eedri> dcaro, can you do a survery on our current bottle necks?
15:36:02 <orc_orc> is sysstat running on all units to get real stats?
15:36:03 <eedri> dcaro, assuming checking ovirt-engine webadmin + stats from awstats
15:36:16 <orc_orc> web stats may not tell the tale
15:36:17 <eedri> dcaro, or other monitoring tool we have running
15:36:32 <eedri> orc_orc, we need something like cacti/graphite
15:36:48 <orc_orc> we also track traffic in and out, and disk IP, and 'free' load
15:36:59 <dcaro> eedri: setting up some performance monitoring is an open issue, I can try to focus on that
15:37:02 <orc_orc> IO*
15:37:16 <eedri> orc_orc, i can tell you from current observation that jobs in ci takes longer than on other systems we have
15:37:30 <eedri> orc_orc, even on other VMs, not just bare metal
15:37:39 <orc_orc> eedri: but this may imply just that the CI tool is sluggish ;)
15:38:02 <eedri> orc_orc, well.. i'm running very similar jobs on a differnt env, with much faster results
15:38:08 <knesenko> eedri: I am pretty sure its because we are running on the local disks
15:38:20 <orc_orc> eedri: great, in that this permits comparing to find 'choke points'
15:38:47 <orc_orc> eedri: can you set up your CI environment in an EL6 environment?
15:39:01 <eedri> orc_orc, we do run it on RHEL6
15:39:03 <orc_orc> I am running a test w the LSB atm on this and we can add your
15:39:06 <orc_orc> load ...
15:39:22 <orc_orc> I ill contact you out of band with details then
15:39:22 <eedri> orc_orc, it's not public though
15:39:30 <orc_orc> my tool is quite private
15:39:38 <eedri> orc_orc, ok
15:40:21 <orc_orc> #action conference call to discuss COLO needs to be scheduled by eedri
15:40:33 <eedri> so we agreed that we need to do some research of performance chock points before moving forward?
15:40:41 <eedri> or we'll discuss it on the conf call?
15:40:59 <orc_orc> I will look at graphite and the other later today and know more by then
15:41:58 <eedri> when is a good date to set up the call., with all the holidays
15:42:36 <orc_orc> post 2 jan, sadly, I think ... isn't RH already on shutdown til EOY?
15:42:45 <ewoud> eedri: that's always hard
15:43:01 <eedri> orc_orc, yea, most of it, exluding israel though
15:43:50 <bkp> orc_orc: Pretty much, starting tomorrow
15:44:27 <eedri> ok
15:45:05 <eedri> so who's taking a lead on finding the chockpoints for current ci infra?
15:45:30 <eedri> we'll need i guess a week worth of mem/cpu/io/network stats of ci jobs on current slaves?
15:45:50 <orc_orc> yes ...
15:45:55 <ewoud> eedri: I converted your mail into an etherpad: http://etherpad.ovirt.org/p/Hardware_requirements_infra
15:46:08 <ewoud> would that be a good to use as a working document?
15:46:18 <eedri> ewoud, +1
15:46:21 <knesenko> we probably will need to monitor our slaves and hypervisors here - http://monitoring.ovirt.org/icinga/
15:46:38 <eedri> ewoud, i would add the relevant links there to the current servers/proposals
15:47:22 <orc_orc> google docs and etherpad are poor for revision history .. if simultaneous editting is not needed, perhaps the wiki should be preferred?
15:47:35 <eedri> knesenko, can we can it public?
15:47:49 <eedri> knesenko, it needs a login to view that
15:48:17 <knesenko> eedri: mmmm .... I think we can get a public ro permissions
15:48:36 <eedri> orc_orc, personally i find adding/updating the wiki bit more cumbersome for collaborating on a in progress issue
15:48:45 <orc_orc> * nod *
15:48:47 <eedri> orc_orc, wiki is more for documenting a final doc imo
15:49:10 <ewoud> agreed: etherpad is working document which is then finalized into a wiki
15:49:52 <ewoud> also, I've seen that in the past we've been discussing some issues for a long time
15:50:05 <ewoud> can we help speeding it up by setting a general time frame with deadlines?
15:51:05 <orc_orc> ewoud: some projects doing time based releases turn out poor product ... I think a general 'rule' is dangerous
15:51:16 <ewoud> for example, we want to decide on the HW before mid Januari and have the basics installed by mid February (dates are made up)
15:51:32 <eedri> ewoud, i agree
15:51:45 <eedri> ewoud, we should set a basic deadlines and try to follow on it
15:51:45 <ewoud> orc_orc: but infra isn't really a product with releases and this is more project management
15:51:49 <eedri> ewoud, and not leave it in the air
15:52:45 <eedri> orc_orc, i understand where ewoud comes from
15:52:59 <eedri> orc_orc, we might have open issues that are taking too long to be resolved sometimes
15:53:12 <orc_orc> oh. I do too -- that is part of why I started logging the r'03 updates weekly, so it would become a barb to action
15:53:13 <ewoud> eedri: exactly
15:53:27 <eedri> and it feels like this weekly meeting might not be enough to push things forward
15:54:00 <eedri> so 1st i don't think that not going over tasks weekly cause we've reached 18:00 is good pratice
15:54:09 <eedri> this results in forever procastinating
15:54:37 <eedri> ewoud, we should either appoint someone to make sure there is progress made, or even do a rorating montly
15:54:55 <ewoud> at $work we've made standard filters on issues to show which ones are open too long
15:55:00 <ewoud> we could do the same with trac reports
15:55:11 <eedri> or set a ground rule of at least going over some tracs during the meeting
15:55:13 <orc_orc> eedri: or simply have an agenda whre new bugs are triaged to priority, and old open items come first
15:55:23 <ewoud> +1
15:55:40 <eedri> orc_orc, ewoud do you think a different ticketing system will help? or its not the case
15:55:52 <orc_orc> eedri: the problem is not the tool, it is the process
15:56:02 <ewoud> eedri: I think all ticketing system can be made to work, but as orc_orc said the process
15:56:21 <ewoud> the tool is the implementation of the process
15:57:42 <ewoud> so now we've gone a bit offtopic: what do we decide on the hardware?
15:58:05 <ewoud> I saw some suggestions of monitoring
15:58:23 <orc_orc> and learning the budget we have to work with
15:59:38 <orc_orc> having real stats on local store builds network store ones, per eedri use case
16:01:03 <tal> OVF on any domain feature overview starting now
16:01:21 <laravot> http://www.ovirt.org/Feature/OvfOnWantedDomains
16:01:46 <ewoud> eedri: knesenko it seems our time is up; can we finish it with some action items?
16:01:59 <eedri> ewoud, there was an action item on me setting a conf call
16:02:09 <knesenko> I think we should monitor our current infra ... to see what we have now
16:02:11 <eedri> ewoud, we need action item on who's doing the stats analysis
16:02:14 <knesenko> what do you think ?
16:02:20 <orc_orc> eedri: and I am composing an OOB email to you atm
16:02:21 <knesenko> I can do
16:02:30 <eedri> knesenko, like i said, we need to run a week long analysis
16:02:41 <eedri> knesenko, on io/net/cpu/mem on our servers
16:02:57 <eedri> knesenko, and find the chock points.. then we can have the conf and talk about the needs of the infra
16:03:11 <knesenko> #action knesenko add jenkins slaves and hypervisors to http://monitoring.ovirt.org/icinga/
16:03:35 <knesenko> ok what else do we need ?
16:04:05 <eedri> knesenko, is icigna more like nagios or cacai?
16:04:12 <ewoud> eedri: nagios
16:04:25 <eedri> knesenko, ewoud so i don't see how that helps us
16:04:34 <eedri> knesenko, ewoud we need to monitor performance..
16:04:41 <eedri> knesenko, ewoud like cacti or graphite do
16:05:31 <eedri> isn't nagios for monitoring services?
16:05:48 <ewoud> eedri: yes, cacti, munin or graphite should be better tools
16:06:15 <knesenko> can we install it on the same machine we are running icinga ?
16:07:34 <knesenko> ok guys , let me handle it ...
16:07:36 <tal> Enter the etherpad for the discussion: http://etherpad.ovirt.org/p/OvfOnAnyDomain
16:07:54 <knesenko> I will install one of these tools and will monitor
16:08:09 <knesenko> any objects I will handle it ?
16:08:36 <eedri> knesenko, +1
16:09:46 <knesenko> #action knesenko install one of cacti, munin or graphite
16:10:25 <knesenko> do we need more action items ?
16:11:07 <knesenko> eedri: ?
16:11:10 <knesenko> ewoud: ?
16:11:11 <eedri> knesenko, we do, but we need to revise the way we doi the meeeings
16:11:17 <eedri> knesenko, like we talked
16:12:10 <eedri> knesenko, we need to think how make things happen faster
16:12:17 <eedri> knesenko, in terms of open tickets, etc...
16:12:20 <ewoud> it's hard to meet in person, but maybe FOSDEM can be a good place to talk about it?
16:12:47 <ewoud> or cfgmgmtcamp.eu which is 3 & 4 February
16:12:51 <eedri> ewoud, i'd love that, but unfourtunately i wont be there due to a test i hdave at the same day
16:12:55 <eedri> ewoud, i think dcaro will be there
16:13:41 <dcaro> ewoud: yep :)
16:13:54 <ewoud> or http://community.redhat.com/blog/2013/12/announcing-infrastructure-next/ even
16:14:51 <knesenko> ok guys
16:14:56 <knesenko> I think we are done here
16:15:00 <ewoud> agreed
16:15:06 <orc_orc> * nod *
16:15:12 <knesenko> Have a nice holiday !
16:15:29 <knesenko> happy new year !
16:15:30 <knesenko> :)
16:15:51 <knesenko> #endmeeting