VMworld 2014/Other Notes

8/24/14 (VMWorld in SFO)
easy walk from hotel registered using qr code in email

sw defined data center
umbrella term for underlying technologies: vcloud management, vsan, nsx reduces overhead of datacenter IT   customer reports IT deparment went from 500 people to 39 - runs business on 6 racks vCloud - private cloud this is seizmic - maybe you felt it early this morning vRealize - packages of management components enterprise smb SaaS (sw as a service) - vRealize Air Automation new competencies this year: mgmt automation, sw defined storage, betworking virtualization paas (question) openstack APIs provides access to sddc infrastructure - vmware contributes to this open source project www.openstack.org - Open source software for building private and public clouds.

hybrid cloud strategy
vcloud air is a hybrid strategy vcloud air replaces on-premises services, as needed same mgmt, networking, & security today 6% of workload is in the cloud services devops db as a service Microsoft SQL & MySQL object storage beta using EMC in Sept - GA about EOY mobility services cloud mgmt vRealize Air Automation (formerly vCloud automation center)

whats new in vCloud suite
dc virtualization & standardization vCenter support assistant automatic regular data collection with pattern recognition security controls native to infrastructure vSphere replication improvements HA & Resilient Infrastructure vCenter Site Recovery Manager (SRM) disaster recovery disaster avoidance planned migration can test a proposed migration/upgrade new vCO plugin APIs are also accessible via power CLI support for more vms faster using batch processing integrated with web UI           works within a local NSX environment, not across the entire NSX environment does not support vCloud air, but the cloud will have this type of functionality someday app & infrastructure delivery automation vcac new interfaces with more flexible workflows NSX - control from vSphere puppet integration localization - 10+ languages

Storage DRS deep dive
stuff is in vShpere 6.0 beta problem shared datastore with 2 different workloads you add a backup, but it uses alot of your bw       you want for it to just use enough that it is done on time storage performance controls shares VM is assigned a shares value indicating the relative IOPs load it should get limit max IOPs allowed per VM       reservations min IOPs per VM   ESX 5.5 IO scheduler (mClock) implements scheduling using the above controls breaks large IOs in 32KB for accounting perposes, so the IOPs controls also control bw   storage IO control works across hosts that share a store congestion detection based on latency threshold, causes host to be throttled threshold is a setting sdrs overview gbalancegggs load by moving vdisks between stores in the storage cluster allows vdisks to have affinity for each other, so it one wants to move, the others will also sdrs deployment you have to understand how this works when using complex storage use cases thin dedup auto-tiering sdrs monitors replications storage io control best practices don't mix vsphere luns and non-vsphere luns hosts io queue size set highest allowed set congestion threshold conservatively high ds cluster best practices similar ds performance similar capacities ds & host connectivity allow max possible connectivity vSphere storage poicy based management now works with different profiles

How VMware virtual volumes (VVols) will provide shared storage with x-ray vision
challenges in external storage architectures hypervisor can help knows the needs of the apps in real time global view of infrastructure SDS & VVols policy-driven control plane virtual data plane virtual data services virtual datastores VASA provider is a new player - agent for the array, ESX manages the array via vasa APIs arrays are logically partitioned into Storage Containers vm disks called virtual volumes are created natively on the Storage Containers IO from ESX to array is throug an access point calle protocol Endpoint (PE), so data path is essentially unchanged advertised data services are offloaded to the array managed through policies - no need to do LUN management HP 3PAR and VMware

Understanding virtualized memory management performance
concerns vms configured memory size too small -> low performance too large -> high overhead #vms/host too many -> low performance too few -> wastes host memory memory reclamation method proper -> minimal performance impact layered mem mgmt (app, vm, host) each layer assumes it owns all configured memory each layer improves mem utilization by using free memory for optimizations cross-layer knowledge is limited memory undercommit sum of all vm memory size <= host memory no reclamation memory overcommit sum > host memory ESX may map only a subset of VM memory (reclaims the rest) memory entitlement & reclamation compute memory entitlement for each VM & reclaim if < consumed based on reservation, limit, shares, memory demand ESX classifies memory as active & idle sample each page each minute & see which were used entitlement parameters configured memory size (what guest sees) reservation (min) limit (max) shares (relative priority for the VM) idle memory reclamation techniques transparent page sharing - remove duplicate 4K pages in background uses content hash ballooning - pushes memory pressure from ESX into VM - used when host free memory > 4% of ESX memory allocates pinned memory from guest now that we know the guest can't use that memory it is reclaimed and given to another VMa possible side effect: cause paging in guest swapping & compression if ballooning runs out of memory randomly chooses a page to compress/swap - use swap if compression savings < 50% best practices performance goals handle burst memory pressure well constant memory pressure should be handleled by DRS/vMotion, etc monitoring tools vCenter perforace chart, esxtop, memstats host level use when isolating problem cVenter OPeraions (vCOps) monitor cluster/dc determine if yuou have a problem guard against active memory reclamation vm mem size > highest demand during peak loads if necessary, setting reservation above guest demand use stats from vCOps manager gui page sharing & large page memory saving from page sharing good for homogeneous vms intra- & inter-vm sharing what prevents sharing guest has ASLR (address space layout randomization) guest has super fetching (proactive caching) host has large page because ESXi does not share large pages why large page fewer tlb misses faster page table look up time impact on memory overcommitment sharing borken when any small page is ballooned or swapped best practices don't disable page sharing don't disable host large page, except with VDI install vmware tools & enable ballooning provide sufficient swap space in guest place guest swap file/partition on separate vdisk don't disable memory compression host cache is nice to have - maybe 20% of ssd - more is potentially wasteful optimizations of host swapping sharing before swap compressing before swap swap to host cache ssd memory overcomittment guidance comfigured sum mem all vm mem / host mem size keep > 1 active sum mem active vm mem / host mem size keep < 1 use vCenter Operasiont sto track avg & max mem demand monitor performace counters mwm.consumed does not mean anything reclamation counters (mem.balloon, swapUsed, compressed, shared) - non-0 values does not mean there is a problem it just means these things have done their job somewhere in the past mem.swapInRate constant non-0 means problem mem.latency - estimates the perf impact due to compression/swapping mem.active - if low, reclaimed memory is not a problem virtDisk.readRate writeRate large means more swapping is happening

IO Filtering
allow filters to process a vms io to its vmdks. inside esx, outside vm. allow 3rd party data services VAIO filters running in userspace allows for out-of-band releases - isolates filters from kernel extremely performant - ~1us latency for filters framework the ESX kernel was modified to allow a usermode driver like this to be extremely performant general purpose API - raw IO stream limit v1 SDK to 2 use cases (for test considerations) cache replication only on vSCSI devices vSCSI turns T10 cmds into ioctls - find out more about this (?) services high performance event queue access tight integration with vSphere full access to guest ios - synch access automated deployment flexible - requires user to add vC extensions to manage design filter driver registers with VAIO IO: VM -> VAIO -> filter driver -> VAIO -> hardware filter has to send the IO on eventually response: hardware -> VAIO -> filter driver -> VAIO -> VM       filter may initiate its own IOs filter may talk to flash or "other" that is recognized as a block device filters may share kernel space memory or can use IP sockets enents indicate when a snapshot or vmotion occurs only in C       need both 32 & 64 bit version, because esx is a 32bit OS with a 64bit process space one instance per VMX, must be re-entrant EMC recoverpoint is partner - to be in their 2015 release SanDisk VAIO server side caching scalable, distributed r/w cache beta Q4 2014 with ESX6.0 filters must be certified (signed by vmware) expect GA early in 2015 (depends on ESX6)

SanDisk cache
virtual SAN - 3-32 nodes share local storage contains vmdks virtual SAN cache 30% reserved for write buffer storage policy failure to tolerate setting number disk drives per object (stripe width) design considerations performance #disk groups - speed/capacity tradeoff SSD parameters - ~10% HDD capacity storage policy disk controller - bw, qdepth, pass-thru vs raid0 capacity use sd card to install vshpere & free 2 disk slots availability vsan monitoring gui would like to see historical data added used esxtop for that

meet the vvol engr team
Derek Uluski, tech lead Patrick Dirks, Sr. Manager does not work with SRM yet - huge shift for SRM

whats next for sds?
what are docker linux containers (?) a control abstraction, collecting storage by how it can be used (policies)

vsan deep dive
product goals customer: vsphere admin reduce total cost of ownership (capex & opex) SDS for vmware what is it       aggregates local flash & hdds shared ds for all hosts in the cluster no single point of failure scale-out - add nodes scale-up - increase capacity of existing storage 3-32 nodes <= 4.4 PB   2M IOPs 100% reads, 640K IOPs 70% READS highly fault tolerant resiliency goals in policy a combination of user and kernel code embedded into ESXi 5.5 to reduce latency simple cluster config & mgmt a check box in the new cluster dialog then automatic or manual device selection simplified provisioning for applications pick storage policy for each vm   policy parameters space reservation # failures to tolerate # disk stripes % flash cache disk groups 1 flash device + 1-7 magneteic disks host has up to 5 groups flash 30% write-back buffer 70% read cache ~10% of hdd storage controllers good queue depth helps pass-through or RAID0 mode supported network layer 2 multicast must be enabled on physical switches

NSX-MH reference design
2 flavors of NSX cloud compute - provided by hypervisors storage network & security - provided by NSX NSX-MH is for non-ESXi and/or mix of hypervisors any CMS any compute any storage any network fabric

vsan performance benchmarking
exchange simulation oltp simulation olio (lots of vms) kept RAM/vm low to reduce vm caching and get vsan traffic analytics single vm/node separate ds and inter-vm networks VPI 2.0 (beta) data collection appliance analyzes live vm IO workloads each vm gets a score as to if it should be in a vsan cluster configuring for performance ssd:md ratio so ssd holds most of working set stripe width adjust if % of vms ios being served … I wonder if they have looked into the impact of using DCE components?

Quid - augmented intelligence (vCenter)
SSO ability to view multiple vCenters from one place multiple identity sources ability to use different security policies web client inventory service cache inventory information from the vpxd allows other products to show up in web client they are working on hiding the fact that this exists vCenter server vpxd communicates with hypervisors records stats services client requests Vctomcat health SRS - stats reporting service EAM - ESX Agent Manager log broswer PBSM SMS + policy engines services storage views client requests resource usage java processes for all these services, except vpxd performance biggest issue: resource requirements may need to tune JVM heap size accouring to inventory size minimum system configurations are just that embedded db for inventory service requires 2-3K IOPs, depending on load place on its own spindles, possibly ssds heaps must be tuned manually db performance vc stores statstics at 5-min intervals vc saves config changes vc answers certain client queries vc persists version rolls up stats - 30 min, 2 hours, 1 day purges stats purges events (if auto-purge is enabled, which is recommended) purges tasks (...) topN computation - 10 min, 30 min, 2 hrs, 1 day SMS data refresh - 2 hrs vc-to-db latency important (often more so than esx-to-vc latency) place db and vc close db traffic is mostly writes manage db disk growth ~80-85% is stats, events, alarms, tasks ~10-15% is inventory data 640 concurrent operations supported, after that queued 2000 concurrent sessions max 8 provisioning operations/host at a time so when cloning, can use multiple identical sources to increase the concurrency 128 vmontions/host at a time 8 storage vmotions/host at a time limits can be changed but not officially supported beyond vc5.0 5.1 & 5.5: stats tables are partitioned stats level level 2 uses 4x more db activity than level 1 level 3 uses 6x more than level 2 level 4 uses 1.4 more than level 3 use vc stats calculator VCOps can be used for more advanced stats API performance powerCLI - simple to use, but involves client-side filtering web client C# client uses aggressive refresh of client data web client decouples client requests from vpxd 3x less load than C# cleint make it easier for clients to write plugins - by adding data to inventory service merge on-premise and hybrid experience platform independence reduced refresh frequency leverages flex issues flex has issues performance - login time, ... different nav model (they tried to hide things that were used less) resource requirements performance chrome/IE faster than firefox browser machine should have 2 CPUs & 4GB browser, app server, & inventory server should be in the same geography can RDP to a local browser server size heaps looking ahead putting tasks back in their place right click will work like it used to       improve lateral nav deployment single vs multiple vCenters single reduces latency but requires fully-resourced vm       vCenter performance is sensitive to vc-to-esx latency sweet spot is 200H,2000VMs per vc       seperate by            departments pci/non-pci rackeds server/desktop workloads geographies SSO and linked mode SSO does not share roesprivileges/licenses linked mode allows this uses windows-only technology slower login slower search one slow vc can slow everything blogger #vCenterGuy future linux appliance with performance/feature parity with windows html5 cross-vc vmotion linked sso convertence performance