15:02:00 #startmeeting 15:02:00 Meeting started Mon Dec 2 15:02:00 2013 UTC. The chair is gchaplik_. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:00 Useful Commands: #action #agreed #help #info #idea #link #topic. 15:02:08 #topic oVirt 3.4 SLA & Scheduling features 15:02:27 #info brief description of oVirt 3.4 SLA & Scheduling features 15:03:23 #info agenda 15:03:32 each team member will talk about his feature: 15:03:59 #info time based policy (SLA during the day, power saving at night) (Kobi) 15:04:08 #info application level HA (monitor the service inside the VM) (Jiri) 15:04:15 #info power saving policy moving hosts to sleep (Martin) 15:04:26 #info positive/negative affinity between group of VMs (Gilad) 15:04:33 #info others (Doron) 15:04:53 #topic time based policy (SLA during the day, power saving at night) 15:05:10 let's start, I'd like to invite Kobi to talk about time based policy (SLA during the day, power saving at night) 15:05:57 Hi All 15:06:24 feature: 15:06:33 Time Based Policy BZ=1036729 15:06:45 description: 15:06:56 The Time Based Policy feature allows the user to select different cluster policy for different configurable periods of time. 15:07:25 in a few words: 15:07:39 In the current flow available the user can select the policy he wishes to apply to the cluster, for example Power Saving. 15:07:39 The new Time Based Policy will expand the policy capabilities and will allow the user to select multiple policies that each one of them will be active at different times. 15:07:39 A good use for this feature will be a Time Based Policy that is configured to use Evenly Distributed on work days and Power Saving on weekends. 15:08:40 for the user interface implementation we have thought about two solutions. 15:08:40 1st: 15:08:40 in the Cluster new/edit popup window, at the Cluster Policy section, 15:08:40 there will be a possibility to select Time Based Policy. 15:08:40 when selected, the Policy's properties will show. 15:08:42 the properties will be as followed: 15:08:44 DefaultPolicy - the default policy to use when no other policy was selected. 15:08:46 FirstPolicyName - a policy from the existing pool of policies (configured under Configure->Cluster policies). 15:08:49 FirstPolicyStartTime - the time that the policy kicks in to action, this field we accept cron expressions. 15:08:51 FirstPolicyDuration - the duration that the policy will be active. 15:08:53 SecondPolicyName - same as FirstPolicyName. 15:08:59 SecondPolicyStartTime - same as FirstPolicyStartTime. 15:09:01 SecondPolicyDuration - same as FirstPolicyDuration. 15:09:03 2nd (preferred): 15:09:05 in the Cluster new/edit popup window, at the Cluster Policy section, 15:09:07 there will be a possibility to select Time Based Policy. 15:09:09 when selected, the Policy's properties will show. 15:09:13 the properties will contain a scheduler bar(similar to a mail appointment scheduler bar). 15:09:15 the user will select sections on the bar and assign them to policies from the existing pool of policies (configured under Configure->Cluster policies). 15:10:17 basicly thats it. any Qs? 15:10:44 kobi, are there more than 2 policies allowed? 15:11:22 in the second solution there can be more the 2 15:11:37 kobi: but not at the same time right? 15:11:47 ok... In my opinion, a limit of 2 is unacceptable. 15:11:50 +1 to 2nd option. 15:11:57 much more intuitive. 15:12:02 +1 to 2nd option too. 15:12:16 in the first solution we are limited to a predefined number 15:12:43 hopefully there is an existing widget that allows selection of time intervals. 15:13:02 kobi, not necessarily - there can be solutions to that (e.g. something similar to custom properties, with "+" and "-" buttons; but the 2nd way is simply more intuitive, as Itamar mentioned. 15:13:05 msivak: not at the same time 15:13:27 While unattractive, even start/end time fields with a policy dropdown with error validation to prevent overlap 15:14:17 ecohen: you are right, 10x for the idea 15:14:24 question is representation / capabilities. then gui i think. 15:14:28 okay guys we'll more time for questions in the end 15:14:34 I'd like to move on... 15:14:55 #topic application level HA (monitor the service inside the VM) 15:15:11 assuming at any time only one policy is possible, doing it time based seems easier (for 24*7). if we need something more sophisticated (first of the month, etc. a bit more tricky. 15:15:12 I'd like to invite Jiri to talk about application level HA (monitor the service inside the VM) 15:15:25 Hi All 15:15:58 the application HA feature should allow users to define a list of applications to be watched inside the VM 15:17:20 and if the application stops working it would do some predefined action, send a notification to the engine, restart the app (tricky), or restart the whole VM 15:17:44 or just show a warning message 15:18:01 my tentative design: 15:18:30 There should be a list of all service somewhere on the Virtual Machine tab, either a new tab "Services" can be added or we can use the "Applications" tab for it 15:19:12 Every service in the list would have a checkbox saying "Watch this service" 15:19:23 if it's marked then the engine would send the name of the service to guest agent and the agent would send a warning if the service changes it's state to "not running" 15:19:56 and there should be also a combo to select the action to do when the service stops working 15:21:18 the guest agent should provide an API to set the list of the watched services and get their states (maybe some tweaks will be needed) 15:22:03 and we need to store the list of watched services and actions to the database 15:22:58 that's basically it 15:23:08 jmoskovc: how does engine knows which services can be monitored by guest agent? how do you update/validate the guest agent configuration of services to be monitored? 15:23:32 itamar: guest agent would send a list of services which can be monitored 15:23:44 right 15:23:55 no deduced from its version? 15:24:27 itamar: first version of this feature would just support the system services 15:24:43 jmoskovc: what do you mean by system services? 15:24:46 so guest agent would just return the list of all the services reported by the OS 15:25:01 ie- standard services 15:25:04 itamar: anything that systemd/windows reports as a service 15:25:05 and their status running/stopped? 15:25:10 yes 15:25:24 Could also be nice to at least have a hook for a user to provide a quick custom script. For Example: check HTTP status (returns 200) and return a proper status code saying "Still good", or "httpd is running, but not properly responding", perhaps for a phase 2. 15:25:50 how do we make sure the design is generic to cover future use cases and not having to code for each? i.e., what's the next reuqest? 15:26:08 we know for ovirt-engine the service is not the interesting part, hence we have a check health servlet 15:26:10 yes, the proper detection of "not working" is tricky 15:26:18 itamar: as long as we use standard system services we should be ok 15:26:21 is ti stuck, is it busy? 15:26:56 so for now, it's only running or not runnig as reported by the system service management (systemd, windows, etc..) 15:27:38 question, if i have several VM's running the same application... how can i aggregate ? 15:27:50 btw, for specific cases we may be able to use a watchdog device 15:28:00 srevivo: it's per vm. 15:28:21 no configuration needed from engine to guest agent at later phases? 15:28:29 which port to monitor for a service, etc? 15:28:33 i see, but what is the use of having it per VM while customers want to monitor application ... 15:28:48 thoughts on integrating with a monitoring system or its agents? 15:29:00 exactly ... 15:29:10 srevivo: how do you identify the app across cluster? 15:29:15 or thoughts if this should be a guest service or an external service monitoring health as visible externally to the vm? 15:29:46 what is the use case we are trying to cover ? 15:30:06 sorry for joining a bit late (in case someone already explained) 15:30:10 srevivo: the originl request was 15:30:32 for basic service monitoring 15:30:40 as described here. 15:30:44 anything more advanced 15:30:58 should be considered with other platform / services 15:31:13 such as nagios / heat + ceilometer, etc 15:31:36 and to be honest, 3.4 does not have the time for such a scope. 15:31:47 Exactly, i don't see any data center running without such tools 15:31:51 yet we should be able to come up with something useful. 15:32:10 this is why i am asking about the value of developing it 15:32:35 doron: I agree, although we could teach guest agent to collect nagios/rrdtool/.. data and display them in the admin console 15:32:47 srevivo: so I do think it has a value for basic common cases 15:33:15 and for distributed apps we should think of an advanced feature. or- 15:33:30 as it is becoming today highly-available apps, 15:33:42 which means the apps should be highly available aware 15:34:24 anyway I suggest we discuss it in the ML to allow other features to be presented. 15:34:27 most of the apps today are such and usually the interesting part is the app SLA and not a specific VM status 15:34:53 agree :-) 15:35:06 srevivo: to a reasonable extent. let's discuss it in length in the ML. 15:35:22 gr8, lets continue, I'd like to invite Martin to talk about 'power saving policy moving hosts to sleep' 15:35:27 #topic power saving policy moving hosts to sleep 15:35:39 Martin go ahead 15:35:46 hi everybody 15:35:48 gchaplik_: sleep or shutdown? 15:35:48 Support for moving hosts to sleep when using Power Saving policy 15:36:02 itamar: ^^^^^ 15:36:08 itamar: well probably shutdown for now 15:36:18 sleep is tricky.. 15:36:36 http://www.ovirt.org/Features/HostPowerManagementPolicy 15:36:36 https://bugzilla.redhat.com/show_bug.cgi?id=1035238 15:36:38 so powermanagement needed to be configured to wake them up? wake-on-lan? 15:36:39 although boot will be quicker 15:36:48 Goals: 15:36:48 - shutdown a host when Powersaving is selected and engine clears the host (migrates all VMs elsewhere) 15:36:48 - wake up a host when the available resources get below configured level 15:37:00 Design: 15:37:00 - shutdown methods: standard engine's fencing methods - IPMI, SSH, Drac 15:37:00 - wake methods: standard methods - IPMI, SSH, ... - with fallback to WOL when needed and supported 15:37:00 - use Start/StopVdsCommand internally 15:37:00 - bogomips used as the CPU power unit (/proc/cpuinfo) 15:37:15 Shutdown rules: 15:37:16 - host is empty 15:37:16 - there is enough available CPU-power in the cluster without this host 15:37:16 - make sure spm is not killed 15:37:16 - consider keeping an extra host 15:37:27 Wake Up rule: 15:37:27 - not enough free resources in the cluster 15:37:27 - probably a separate thread or VdsUpdateRuntimeInfo based periodic check 15:37:33 Support needed from VDSM: 15:37:33 - CPU power information - /proc/cpuinfo's bogomips per CPU 15:37:33 - [WOL] each host has to report MAC and WOL capability for each NIC 15:37:33 - [WOL] locality information to know who can send the proper WOL packet (each host will report it's ARP database of visible MAC addresses for each NIC) 15:37:33 UI: 15:37:35 - Cluster policy needs to allow the user to set the minimum amount of free resources (hosts / CPU / RAM) and whether host shutdown is allowed 15:37:55 so to answer the questions 15:38:08 I would start with shutdown first 15:38:23 and use our existing fencing mechanisms to control the hosts 15:39:02 we can implement WOL as an additional method, but that requires some additional support from vdsm 15:39:51 why separate thread and not part of normal scheduling (every minute iirc)? 15:40:13 that is an option as well 15:40:24 I forgot about the balancing thread 15:40:33 why WOL for each nic and not just mgmt nic? 15:40:42 msivak: right, the idea is to use it in load balancing. 15:40:59 itamar: to have better chance of getting a path 15:41:21 why list of arp, etc. - any host in same cluster should be able to WOL other hosts - they are on same layer 2 (more than likely)? 15:43:10 itamar: if this is an assumption we can take, it will make our life easier. 15:43:14 itamar: I do not really have an answer for this, my setup has a host that is outside L2 :) 15:43:42 unless we want to support SDNs ;) 15:44:21 doron: well think about our env in Brno, we have multiple server rooms 15:44:46 doron: and some of the machines are inside of routed vlan 15:44:56 hosts in same cluster are expected to be in same layer 2. otherwise live migration of VMs won't work as well. 15:45:09 we can most likely assume that for the mgmt interface as well. 15:45:26 +1 for phase 1. 15:45:39 msivak: implied earlier, but wanted to make sure: Only Hosts that have fence agents defined would be able to be candidates for sleep, correct (at least while we intend to use existing fencing mechanisms)? 15:45:43 also, WOL is a nice addition. its basically a new fence agent - [ssh|vdsm]-shutdown / WOL 15:46:07 ecohen: yes, there is no other way of shutting them down atm 15:46:16 so i assume worth to first just start with the existing fence devices (require pm configuration), and only later enhance for soft fencing/wake up. 15:46:31 agreed 15:46:46 #agreed assume all hosts in same L2 15:47:21 #agreed used existing fence devices 15:47:30 moving on 15:47:40 I will talk about positive/negative affinity between group of VMs 15:47:45 #topic positive/negative affinity between group of VMs 15:47:51 VM Affinity: definition 15:47:59 Policy/set of rules that make VMs run together (possitive) or separated (negative). 15:47:59 This policy can be mandaotry or optional (best effort). 15:48:31 positive VM affinity helps reduce traffic across networks, i.e - If two virtual machines communicate frequently and should share a host, we can create a positive affinity rule to make them run on the same host. 15:48:31 Conversely, loaded VMs can have a negative (anti-)affinity rule that will keep the VMs from sharing a host and exhaust its resources. 15:48:49 currently we decided that the scope of the feature for 3.4 will be VM-VM affinity. 15:49:01 high-lever design: we'll add an affinity object, with the following attributes: 15:49:01 - positive/negative affinity 15:49:01 - mandatory/optional 15:49:01 - set of hosts 15:49:15 each VM can be attached to an Affinity object and will act according to its rules when scheduling the VM (running/migrating). 15:49:16 by using the filtering and scoring mechanisms introduced in the Scheduler, it is now posible to add an affinity filter and weight function: 15:49:37 Affinity Filter for mandatory affinity - using an hard constraint to filter out host that doesn't meet the mandatory affinity rules. 15:49:45 Affinity Weight function for optional affinity- target Hosts that match the optional condition will get a lower weight (higher score) than hosts that do not match the condition, which will ultimately position them higher in the priority list. 15:50:07 my open questions: 15:50:09 - a VM can be attach to several affinity object, and there may be conflicts which cause the vm to end up with no hosts. we need to decide how to handle that. 15:50:10 - can HA VMs violate affinity rules? 15:50:10 - how load balancing affects affinity, the load balancing isn't aware of it... 15:50:10 - UI: should we show the affinity object? we can just attach VM to VM (with params) and hide the object from the user. 15:50:42 is this admin level feature or power user as well? are affinity groups managed entities with permissions? 15:51:04 gchaplik_: can you have more affinity rules per vm? 15:51:13 itamar: lets start with admin 15:51:59 itamar, about permissions it depends if we want to hide it 15:52:21 msivak: yes 15:52:38 gchaplik_: I would hide the objects, but show a list of relationships on a VM subtab 15:53:15 gchaplik_, can you explain a bit the relation to the 'set of Hosts'? if I want to define that VM1 must always run with VM2 - what is the 'set of Hosts' involvement in it? 15:53:20 msivak: this can end up being noisy 15:53:50 doron: do we have any VM groups/tags concept? 15:54:04 msivak: we have an rfe for it 15:54:35 doron: we could group the VMs and base the affinity on group membership 15:54:48 ecohen: Some apps may have strict (and ridiculous) licensing requirements (like certain database companies). Ability to pin VMA and VMB to licensed Host1 is a valid use case 15:54:58 doron: that would make it less noisy and in alignment with our policy plan "documents" 15:55:20 msivak: basically the new entity gives you this functionality 15:55:34 sherold, sure - it just doesn't sound related to the VM Affinity feature - this is why we have the "run only on host" feature... 15:55:41 without creating new hierarchies in the system 15:56:10 ecohen: I agree basic vm affinity should be kept simple 15:56:21 ie- hosts are another layer we may add later 15:56:34 but basic functionality is vm-vm relations. 15:56:38 doron, but the use-case is what sherold mentioned above? 15:57:18 ecohen: yep. opnce vm a is pinned to host1 15:57:23 gchaplik_: so the idea is to create a group with defined affinity and a list of VMs that belong to it? 15:57:27 other vms will follow that 15:57:29 ecoehn: Perfect, scenario/requirement satisfied ;-). 15:57:38 doron, sherold : thanks. 15:58:22 msivak: sort of 15:58:45 any other question..? 15:58:48 s 15:59:37 #topic additional RFEs 15:59:42 Doron 15:59:53 ok some additional RFE's we're now designing 15:59:59 doron: what if the pinned vm is not running? 16:00:13 si this is just a brief explanations on them 16:00:21 msivak: let's keep it for the end. 16:00:24 sure 16:00:31 High Availability flag should be included when exporting/importing from Export Domain 16:00:41 this is basically a nit we want to cover 16:01:05 missing flag we need to update in the OVF file. 16:01:33 nothing dramatic about it. 16:01:41 so I'll proceed to the next one; 16:01:44 Even Distribution Policy by number of VMs 16:02:05 this is an new additional policy we'll add 16:02:28 which basically does distribution by counting VMs instead of considering CPU load. 16:02:54 As the standard even distribution does. 16:03:21 Quite straight forward. Any questions on it? 16:03:54 I think the policy name should clarify the distinction from standard even distribution 16:03:57 I'll take that as a 'no' ;) 16:04:12 mrao: good ppint 16:04:58 #agreed Even Distribution Policy by number of VMs policy name should clarify the distinction from standard even distribution 16:05:10 I'll proceed to the next one 16:05:14 Make reservations for HA VMs to make sure there's enough capacity to start them if N hosts fail 16:05:26 this is not a trivial one. 16:05:57 I'm looking into implementing it using the new oVirt scheduler, so will provide more details when we have a clear picture of it. 16:06:39 in general it means we should sustain a loss of several hosts and still retain capacity for HA VMs. 16:06:58 Questions? 16:07:48 I'll take that as a 'no' as well. 16:08:06 Next one is Memory capping using cgroups 16:08:17 this is basically closing a gap we have 16:08:32 on forcing a limit on memory consumption of a VM. 16:08:43 currently we can balloon it wit hsome limitations 16:08:58 but missing a hard limit which we'll have in 3.4 using cgroups. 16:09:18 Questions? 16:09:20 All I ask is that we're careful and don't artificially cause the host to massively swap when it's not necessary (aka VMware) 16:09:47 #agreed to be careful when using this limitation. 16:10:06 Other questions / comments? 16:10:08 well this is hard cap, that won't affect swapping 16:10:24 ballooning and overcommitment might though :) 16:10:30 Exactly 16:10:32 msivak: it can actually affect swapping 16:10:57 if guest has no sufficient ram, it will swap 16:11:10 Background on what I want to avoid - http://www.vmguru.com/articles/hypervisor/7-memory-behavior-when-vm-limits-are-set-revisited 16:11:16 doron: sure, but sherold talked about hosts 16:11:37 msivak: I suspect it will affect the host. 16:11:55 very well, let's proceed. 16:12:05 rhev-h support for hosted engine nodes 16:12:15 ovirt-node you mean ;) 16:12:17 this one may endup as a setup issue for the integration group 16:12:24 itamar: right ;) 16:12:55 basically now hosted engine supports fedora / rhel only. 16:13:00 (centos, etc) 16:13:16 and we should provide the support from ovirt node. 16:13:29 Questions? 16:13:48 plans for TUI or just packaging? 16:14:03 s/packaging/inclusion/ 16:14:14 (and persistence of files) 16:14:19 itamar: depends on capacity. will check with fabiand and others 16:14:48 I assume it will be included, and deploy will run the relevant bits. 16:15:07 with the ovirt node requirements suh as persistence. 16:15:25 other questions? 16:15:53 Very well, almost there... 16:16:02 hosted engine on san 16:16:15 Currently hosted engine supports NFS only and we need to extend it 16:16:37 sice we are using vdsm code, I expect the architecture to be fine, with 16:16:42 nits we'll need to resolve. 16:17:12 (hopefully ;) 16:17:24 Questions? 16:17:41 One more hosted engine- 16:17:43 Unify maintenance path of hosted engine with host maintenance 16:18:07 basically hosted engine has a special maintenance mode which disarms the HA services. 16:18:29 This feature should unify the standard 'maintenance host' flow with 16:18:42 hosted engine maintenance. 16:19:10 The idea is to extend VDSM API, so the engine will ask vdsm to move the HA services to maintenance 16:19:34 and it should all be a part from the existing host maintenance flow. 16:19:48 That's it for this feature. Any questions? 16:20:33 very well. This finalizes the SLA & Scheduling features. Back to Gilad. 16:20:42 Thanks doron, 16:20:55 Last chance for questions? 16:20:59 on vm affinity - we know vm groups are a requirement for other things (like permissions), maybe separate them from affinity (i..e, add support for (vm) groups, then allow affinity to them) 16:21:42 itamar: vm taging? 16:22:13 yeah, that is what I had in mind as well 16:22:15 Tagging seems simple as there is a mechanism that exists for it today 16:22:36 The thing is that negative affinity is an 'anti-group' 16:22:46 sherold: only if we make tags first class citizens, and add permission around tags. 16:22:52 we may get complicated for something we did not plan for 16:22:59 also 16:23:00 also, a bit confusing is that a tag can contain both vms and hosts (and templates, pools, users) 16:23:03 ? 16:23:06 Hmmm 16:23:17 so we need to decide if tags is our way to go in the future before basing more stuff on it. 16:23:18 we may want to mix afinity of vm and nework for example 16:23:27 or vm and host 16:23:33 Longer term, yes... Affinity will be more than VM:VM 16:23:59 Separate storage devices/paths, etc 16:24:01 there are no permissions around tags, they may be used for something else. 16:24:05 also, they are hiearchical. 16:24:19 I'm really not sure they are the best fit for this. 16:24:31 doron: negative affinity on network is possible, positive.. is there a use case for it? 16:24:34 too many inconsistencies/complexities 16:24:48 networks not yet supported by tags. and networks are on all hosts in the cluster... 16:24:55 msivak: let's start with negative vm:vm 16:25:18 doron: vm:vm is technically easy, the question is how to present it to the user 16:25:41 msivak: if we use affinity 16:25:46 as a hidden object 16:25:49 just like MLA 16:26:06 we can work with it while presenting the relevant relations where needed 16:26:47 msivak: so we can represent relations between VM 16:26:50 doron: i think the more correct approach is to based it on vm groups. an entity we can manage permissions for 16:26:53 when looking at any given VM 16:26:58 since affinity is more than 1:1, could be 1:n, etc. 16:27:18 itamar: right, also :n// 16:27:22 n:n.. 16:27:46 ie- a vm may reference several affinity instances 16:27:54 which in turn reference other VMs. 16:28:02 or even networks 16:28:29 scheduler API is going to get a workout on this one 16:28:39 not necessarily 16:29:03 so I'd try to keep it simple if we so not want to solve equasions here 16:30:20 doron: if a vm is part of a single group that references a list of other groups with defined affinity, we just create a scoring func that sums the numbers up based on a list of running vms on each host 16:30:45 but the support in all the structures will touch a lot of apis.. 16:30:48 msivak: scoring is optimization. we may need a filter 16:31:10 anyway, 16:31:12 msivak: while filtering out hosts that dont meet an affinity/anti rule 16:31:29 which should be done first I'd think 16:31:34 yeap 16:31:56 we'll work on a design which we can later discuss 16:32:18 I was wondering about our cpu load representation 16:32:23 I mentioned that in my part 16:32:33 msivak: ? 16:32:55 currently we do not distinguish between old and lowly pentium I and strong supercomputer 16:33:09 we just read cpu% and use that for scheduling 16:33:18 is that right? 16:33:25 msivak: yeh this is a known issue many try to resolve 16:33:50 msivak: basically any platform tries to resolve it differently 16:33:56 doron: all our hosts are linux based, /proc/cpuinfo gives you bogomips value 16:34:01 so VMware has one solution, ec2 has wnother and 16:34:13 microsoft has a third. 16:34:38 bogomips is nice but more an indication than somehting people would rely on 16:34:46 that;s whee bogo comes from. 16:34:59 right, kernel computes it during startup 16:35:02 (that's there) 16:35:20 but even if we hide it from the user, we could use it for scheduling (and host power management) 16:35:20 so I have a way to handle it but we'll need to discuss it seperatly. 16:35:47 cpu%*bogomips = used/free power 16:36:22 msivak: this is an indication and not really accurate. What happens with numa? 16:36:31 you cannot really use it. 16:36:57 but numa is memory, isn't it 16:37:11 msivak: numa cell is ram+cpu 16:37:28 right, but currently we have separate cpu and memory filters 16:37:44 which we'll need to adapt to numa 16:37:47 that we will have to change probably, but.. 16:37:55 but as long as it's simple we can handle it. 16:38:34 msivak: let's see the case in a different forum. ok? 16:38:38 sure 16:38:43 thanks guys, I will send a meeting summary later on 16:38:50 safe drive 16:39:00 #endmeeting