Microsoft TechEd Europe 2014 Day 2

This post is probably a bit more for my own records than anything as it’s a bit session specific but those sessions are obviously related to my interests and thus this blog so here we go anyway.

Next Version of Hyper-V:

Lots and lots of new features in here, pretty much all covered by the post on Technet however there was also some news about changes to the way VMs are backed up, specifically:

  • Backup is decoupled from the underlying infrastructure – not reliant on VSS
  • Not dependent on hardware snapshots on the SAN – good for large LUNs
  • Built-in change tracking rather than relying on third party backup agent

Couple of other bits as well:

  • Ability to name NICs in Hyper-V and have the name show up in the OS
  • Can modify static memory allocation for VMs running Technical Preview
  • Hot-add of VHDs doesn’t break replication

No one big feature across all the lists but plenty of polish which shows just how far Hyper-V has come.

System Center OM and Azure Operational Insights

Much was made of protecting the existing investment in Operations Manager, particularly MPs, as OM continues to be developed. Support for OSS platforms was also highlighted which is becoming more and more important. There’s also a few shiny things including new dashboards in 2012 R2 U2, monitoring O365, a new Exchange MP and so on.

The main focus though is definitely Azure Operational Insights (previously called System Center Advisor). It’s able to take data both directly from systems or from an existing OM deployment and do big-data analytics using the power of Azure.

This opens up a lot of new functionality including the ability to do amazing drill downs into data at great speed from a wide variety of devices. It’s natively multi-tenant and is designed to work Partners which is particularly useful. At the moment there are only a few integration packs available for it but MS are actively developing this (the system is still in preview) and it’ll be good to see the pace of release of these.

It obviously has a long way to go, particularly as one of the hugely valuable aspects of OM is the third party MP development and integration but MS are building a third party ecosystem into the platform. Definitely one to watch.

I’ve already been asked “Will this replace OM?” and I’m not sure. My feeling at the moment is that customers will be running OM for a long time to come and that new OM deployments are still sensible. There’s a clear commitment to OM and to get Operational Insights to leverage existing on-premises OM deployments so I can’t see any risk there.

Storage Replica:

The new functionality to replicate storage is quite exciting. There’s clearly a lot of work still to do to get the UI (and some of the Powershell) to where it needs to be and there are a few nasty bugs that people are likely to hit but it’s very promising. I’ll be getting it into a lab when I get back and covering it there so I won’t go into too much detail here.

One thing that did disappoint me though is that although it can do synchronous replication, it’s asymmetric, meaning that only one of the two sides can be active at a time – this is true of both cluster to cluster and server to server. I’m quite keen to find a replacement for the HP VSA-style network RAID to allow small, cost effective shared-nothing clusters to be built on branch offices and this doesn’t quite fit the bill as a result. Still much to like though!

General

So far, a really good event and been impressed by how well it’s organised and run. (Un)fortunately there’s an awful lot of content to see and I can see myself needing to spend another week just watching videos of all the sessions I’ve missed. Last night was also some great light relief at Carpe Diem on Barcelonetta beach which was a good opportunity to meet some people (including customers as it turned out) and relax.

Microsoft TechEd Europe 2014 Day 1

I’m fortunate to be at TechEd Europe this year (at slightly short notice) and wanted to share some of the things that I’m picking up on at the event. There’s a few topics I’ll do specific blog posts about but here’s a general overview of Day 1 and some of the bits I’ve seen.

Themes:

The keynote was good; couple of hours but kept reasonably fast paced with some interesting announcements and demos. What was interesting was the areas that were touched on and what was left out. The key areas are:

  • Data
  • Consistent Device Experience
  • Cloud (Hybrid and Public)
  • Software Defined Datacentre

There was nothing really on traditional on-premises systems such as Exchange, SharePoint, etc which depending on your perspective is either because we’ve just gone through a major release of these products or because it’s a strong focus on cloud.

Data:

People who don’t understand the value of data often fail to understand why Microsoft is playing in the search space with Bing when Google has so much of the market covered however this is extremely short sighted. As pointed out in the keynote, there are now more connected devices on the planet than people and with the Internet of Things taking off, the ability for Microsoft to understand the internet and the huge amount of data contained becomes really crucial to delivering valuable services to users and this is what Bing enables. Cortana’s brain, for example, is effectively Bing.

Consistent Device Experience:

Obviously there’s a lot of information coming out about Windows 10 and Joe Belfiore did a great job of showing some of the features there and it’s a good middle ground between Windows 7 and 8. It’s also worth noting that security was a huge part of this. The subtext though is that Microsoft is clearly trying to push the experience right across the device spectrum from phones to the largest PCs. This also encompasses other breeds of devices including iOS and Android; Office is a good example.

Cloud:

Microsoft is showing a clear commitment to making every part of the software defined datacentre integrate with the public cloud as much as possible and to make that as easy as possible, whether it’s backup or remote desktop. Here are the areas from the Keynote:

  • Management
  • Virtualisation
  • Identity
  • Networking
  • Data
  • Development

SDDC:

Given my focus, this one has me really excited. The next versions of Windows Server and the System Center suite will have significantly enhanced capabilities in many areas, building on foundations that have been laid already. Of particular note:

I’ll be posting my thoughts about CPS in the next day or as I think it’s a very strategic play by Microsoft. More to come!

Unable To Delete Hyper-V Root Snapshot in Hyper-V Manager

During a build-out for a customer it became necessary to move some virtual machines between a Hyper-V 2012 cluster and a Hyper-V 2012 R2 cluster but when trying to do so, all sorts of nasty errors came cropping up:

Live Migration Error Due To Differencing Disk

Error (12700)
VMM cannot complete the host operation on the host1.contoso.com server because of the error: Virtual machine migration operation for ‘MachineToMove.contoso.com’ failed at migration destination ‘host2.contoso.com’. (Virtual machine ID 1D5042AA-1A93-4635-9F0A-F7C7B0D10BDD)

Failed to access disk ‘C:\ClusterStorage\Volume2\MachineToMove.contoso.com\Windows Server 2012 DC with SP1_disk_1_3F40B5A6-E8DC-4752-873C-D9742C9419F4.avhdx’: ‘The system cannot find the file specified.'(‘0x80070002’).
Unknown error (0x800b)

Error (23753)
The virtual machine or tier load balancer configuration requires an IP pool and there are no appropriate IP pools accessible from the host.

Recommended Action
Select a host with access to an appropriate IP pool and try the operation again.

Live Migration Error Due To Differencing Disk 2

Error (12700)
VMM cannot complete the host operation on the MachineToMove.contoso.com server because of the error: Virtual machine migration operation for ‘MachineToMove.contoso.com’ failed at migration source ‘Host1’. (Virtual machine ID 1D5042AA-1A93-4635-9F0A-F7C7B0D10BDD)

Virtual machine migration for ‘MachineToMove.contoso.com’ failed because configuration data root cannot be changed for a clustered virtual machine. (Virtual machine ID 1D5042AA-1A93-4635-9F0A-F7C7B0D10BDD)
Unknown error (0x8005)

Recommended Action
Resolve the host issue and then try the operation again.

You may notice in the top error that the disk path is pointing to an odd file name. Looking at the settings for the machine in Hyper-V Manager and inspecting the disk, we find:

Live Machine Properties

Lo and behold, it’s a differencing disk. Let’s try removing the snapshot that created it:

Hyper-V Snapshot Missing Delete

And there’s the problem – no delete option!

Let’s look at the snapshot in PowerShell. To do so, open an elevated PowerShell session on a Machine with the Hyper-V PowerShell tools installed and run:

Get-VMSnapshot -VMName MachineToMove.contoso.com -ComputerName host1.contoso.com | fl

Here’s the output for the above VM:

SnapshotType : Recovery
VMId : 1d5042aa-1a93-4635-9f0a-f7c7b0d10bdd
VMName : MachineToMove.contoso.com
State : Off
Key : Microsoft.HyperV.PowerShell.SnapshotObjectKey
IsDeleted : False
ComputerName : host1.contoso.com
Id : 4382dc53-2fdd-476f-91b8-81963c292d24
Name : MachineToMove.contoso.com - Backup - (1/16/2014 - 6:00:19 PM)
Version :
Notes : #CLUSTER-INVARIANT#:{434c76e7-5581-463a-b1b4-71027d39770f}
Generation :
Path : C:\ClusterStorage\Volume2\MachineToMove.contoso.com
CreationTime : 16/01/2014 20:22:24
IsClustered : True
SizeOfSystemFiles : 49254
ParentSnapshotId :
ParentSnapshotName :
MemoryStartup : 8589934592
DynamicMemoryEnabled : False
MemoryMinimum : 536870912
MemoryMaximum : 1099511627776
ProcessorCount : 4
RemoteFxAdapter :
NetworkAdapters : {MachineToMove.contoso.com}
FibreChannelHostBusAdapters : {}
ComPort1 : Microsoft.HyperV.PowerShell.VMComPort
ComPort2 : Microsoft.HyperV.PowerShell.VMComPort
FloppyDrive : Microsoft.HyperV.PowerShell.VMFloppyDiskDrive
DVDDrives : {DVD Drive on IDE controller number 1 at location 0}
HardDrives : {Hard Drive on IDE controller number 0 at location 0, Hard Drive on SCSI controller
number 0 at location 0}
VMIntegrationService : {Time Synchronization, Heartbeat, Key-Value Pair Exchange, Shutdown...}

Time to remove it:

Get-VMSnapshot -VMName MachineToMove.contoso.com -ComputerName host1.contoso.com | Remove-VMSnapshot

You can run this command while the machine is running and if you look in Hyper-V after running this you’ll see that the differencing disk will quickly merge into the parent and then the recovery-point snapshot will be removed. Migrating the VM in this state should go without a hitch.

What caused this?

In this instance, the environment is running HP Data Protector 8.0 which is HP’s incredibly powerful (albeit rather old-looking) backup platform. The environment had been configured to back up the machines in the Hyper-V cluster using the HP StoreVirtual P4000 VSS/VDS Providers along with Application Aware Snapshot Manager. As I understand it, this uses the differencing disks so that incremental backups can be achieved – they’re merged and renewed during each Full backup. This is why you see the word “Backup” in the snapshot name along with the data and time that Data Protector took the backup.

Hyper-V 2012 -> 2012 R2 Cluster Migration Issues

Quick post more to document an oddity than anything…

Migrating machines from a 2012 cluster to a 2012 R2 cluster using VMM 2012 R2 with mixed results. In particular, I’m seeing the machines duplicated in Failover Cluster Manager – one of the two seems to be the live machine and the second, prefixed with SCVMM (as all VMM created machines are) seems to be broken with various errors such as ID 21502 “Missing or invalid virtual machine ID resource property”. Simply removing the duplicate starting with SCVMM and all seems to be ok.

Odd though.

SCVMM 2012 R2 Bare Metal Deploy WinRM Error

I’ve been getting to know SCVMM much better, in particular the ability to provision new hosts using the iLO port on a fresh HP server and I found this problem that the search engines don’t seem to have an answer for.

Towards the end of the deploy process, after the OS is installed, joined to the domain and the agent is installed, it stops with this error:

VMM Bare Metal Deploy Error

Error (20552)
VMM does not have appropriate permissions to access the resource C:\Windows\system32\qmgr.dll on the server.domain.com server.

Recommended Action
Ensure that Virtual Machine Manager has the appropriate rights to perform this action.

Also, verify that CredSSP authentication is currently enabled on the service configuration of the target computer server.domain.com. To enable the CredSSP on the service configuration of the target computer, run the following command from an elevated command line: winrm set winrm/config/service/auth @{CredSSP=”true”}

As a result the network connections and a few other bits don’t correctly apply but the host does appear in VMM.

Looking at the host properties, you can see it’s a WinRM issue:

VMM Bare Metal Deploy Error 2

Error (20506)
Virtual Machine Manager cannot complete the Windows Remote Management (WinRM) request on the computer server.domain.com.

Recommended Action
Ensure that the Windows Remote Management (WinRM) service and the Virtual Machine Manager Agent service are installed and running. If a firewall is enabled on the computer, ensure that the following firewall exceptions have been added: a) Port exceptions for HTTP/HTTPS; b) A program exception for scvmmagent.

Having checked all of the obvious, including that WinRM is enabled as it should be, GPOs aren’t getting in the way and firewall rules are set up to allow the traffic, I took a look at the security log on the new host:

VMM Bare Metal Deploy Error 3

In the Microsoft Documentation, it says very specifically that when creating a Host Profile for the deployment, the Run As account that you use to join the host to the domain should have “very limited privileges” and “should be used only to join computers to the domain”. Hence the dedicated Domain Join account I used.

So why is this account logging into the server after deployment? A quick trip to the host properties reveals the answer:

VMM Bare Metal Deploy Host Properties

D’oh! Nicely done SCVMM.

Go back into the Host Profile:

VMM Bare Metal Deploy Host Profile

And there is our Domain Join account. Create a new Run As account with the appropriate permissions to administer newly created hosts (unfortunately this is possibly Domain Admins, depending on your environment), update the Host Profile and redeploy the host and you should be good. Please note that you cannot use the SCVMM service account for this task, it has to be separate account.

Multiple Failures – UPS Edition

One of the things that comes up quite a lot in the consulting work I do with customers is the concept of multiple failures in the context of Business Continuity Planning [BCP] and Disaster Recovery [DR] discussions. In simple terms, two or more failures on their own could have relatively benign effects but combined the impact can have a severity far beyond the sum of the individual severities – a sort of negative synergy.

Looking for, understanding and mitigating the risk of these kind of failures is difficult in simple systems and nigh-on impossible in complex ones. History is littered with examples of these kind of failures in various settings. Some examples:

These are obviously some rather serious examples but smaller examples occur all the time within businesses (not just within IT) and their impact can be significant for the affected. I come across lots of situations where these kind of multiple failures have occured and I hope to share some of these in the future along with some thoughts on how to analyse infrastructures for and mitigate this kind of risk.

One instance I can share occured this week. A failure of the mains supply occured at one of a customer’s sites, likely caused by the nasty weather here lately, which caused the APC UPS to send a message to all physical servers letting them know to shutdown. All servers with the APC software shut down automatically whilst the Hyper-V hosts were shut down manually – all good so far. Several hours later, the power was restored and everything powered on automatically as it was supposed to. DCs booted, VMs restored from their saved states and so on.

That would be the end of a normal story. What happened next was unfortunate.

As a result of everything being off for several hours, the inclement weather caused the temperature in the server room to drop to about 15C, below the minimum temperature setpoint configured for the environmental monitoring probe on the UPS. This temperature out-of-range event caused a subsequent message to the phsyical servers which triggered another shutdown.

The temperature in the room gradually increased as the other equipment (PBX, SAN, switches, etc) remained powered on – along with the Hyper-V nodes that didn’t have the APC software installed. The physical servers didn’t turn back on again because the automatic power on only triggers when power is restored to the server which a temperature violation doesn’t cause.

The environment contains one physical and one virtual domain controller. The physical domain controller was off at this point as a result of the second shutdown whilst the virtual remained up. Unfortunately, the stop action for the DC’s VM was Save State, not Shutdown. This meant that the DC resumed at the time at which its state was saved – several hours before. This time was then propogated out to the other physical servers and then to the other VMs via Hyper-V’s time sync service causing a time skew that generated a whole load of Kerberos issues.

To resolve the issues, the physical servers were booted up remotely via their iLO interfaces, the domain controllers were resynched with an external time source followed by the Hyper-V machines and then most servers were rebooted to clear any errors and ensure correct service startup. Obviously this took some diagnosing and a lot of manual work which could have been avoided.

To recap, here are the issues:

  • Initial power failure
  • Low temperature alarm
  • APC software not installed on Hyper-V nodes
  • Incorrect stop action on DC VMs

On their own, not issues – together, several hours of early morning headaches!

2012 Core APC PowerChute Network Shutdown

Hit an issue with APC PowerChute Network Shutdown on Windows Server 2012 Core running Hyper-V:

PowerChute cannot communicate with the Network Management Card

PCNS is NOT receiving the data from the NMC.

The client was successfully installing and the IP was registering on the NMC but PCNS wouldn’t connect.

Resolution:

Make sure that the firewall rule “PCNS NMC Communication Port (UDP 3052)” is enabled for all profiles, not just Public which is all that’s selected by default.

Here’s the full set of installation steps:

  1. Install the correct version of PCNS for Windows Server 2012 from the command line.
  2. Connect to the server via an MMC console using the Windows Firewall with Advanced Services snap-in.
  3. For each of the three PCNS rules, open properties, head to the Advanced tab and enable Private and Domain
  4. Connect to https://server:6547/
  5. Run through the configuration wizard as normal