SCVMM 2012 R2 Bare Metal Deploy WinRM Error

I’ve been getting to know SCVMM much better, in particular the ability to provision new hosts using the iLO port on a fresh HP server and I found this problem that the search engines don’t seem to have an answer for.

Towards the end of the deploy process, after the OS is installed, joined to the domain and the agent is installed, it stops with this error:

VMM Bare Metal Deploy Error

Error (20552)
VMM does not have appropriate permissions to access the resource C:\Windows\system32\qmgr.dll on the server.domain.com server.

Recommended Action
Ensure that Virtual Machine Manager has the appropriate rights to perform this action.

Also, verify that CredSSP authentication is currently enabled on the service configuration of the target computer server.domain.com. To enable the CredSSP on the service configuration of the target computer, run the following command from an elevated command line: winrm set winrm/config/service/auth @{CredSSP=”true”}

As a result the network connections and a few other bits don’t correctly apply but the host does appear in VMM.

Looking at the host properties, you can see it’s a WinRM issue:

VMM Bare Metal Deploy Error 2

Error (20506)
Virtual Machine Manager cannot complete the Windows Remote Management (WinRM) request on the computer server.domain.com.

Recommended Action
Ensure that the Windows Remote Management (WinRM) service and the Virtual Machine Manager Agent service are installed and running. If a firewall is enabled on the computer, ensure that the following firewall exceptions have been added: a) Port exceptions for HTTP/HTTPS; b) A program exception for scvmmagent.

Having checked all of the obvious, including that WinRM is enabled as it should be, GPOs aren’t getting in the way and firewall rules are set up to allow the traffic, I took a look at the security log on the new host:

VMM Bare Metal Deploy Error 3

In the Microsoft Documentation, it says very specifically that when creating a Host Profile for the deployment, the Run As account that you use to join the host to the domain should have “very limited privileges” and “should be used only to join computers to the domain”. Hence the dedicated Domain Join account I used.

So why is this account logging into the server after deployment? A quick trip to the host properties reveals the answer:

VMM Bare Metal Deploy Host Properties

D’oh! Nicely done SCVMM.

Go back into the Host Profile:

VMM Bare Metal Deploy Host Profile

And there is our Domain Join account. Create a new Run As account with the appropriate permissions to administer newly created hosts (unfortunately this is possibly Domain Admins, depending on your environment), update the Host Profile and redeploy the host and you should be good. Please note that you cannot use the SCVMM service account for this task, it has to be separate account.

HP MSM720 Wireless Controller Factory Reset & Firmware Bug

I couldn’t find any correct documentation about how to actually reset the configuration of a HP MSM720 Wireless Controller without using the web interface and I had to figure it out for myself – the issue that caused me needing to do this is in the second half of this post. Here’s how you do it:

Connect via serial to console port

Here’s a screenshot from PuTTY with everything you need to know:

MSM 720 Serial Settings

Reset the configuration

Type in the following commands to clear the configuration and reboot the device:

enable
config
factory settings

What doesn’t work

The documentation talks about using the Reset and Clear buttons together to return the device to factory defaults. Here’s a picture of the device:

MSM 720 Front Panel Features

What you actually find when you try it is that there isn’t a clear button, only a hole. I’ve seen this on at least two different controllers so this definitely isn’t a manufacturing fault but I’ve no idea why. What this means, of course, is that you have to use the CLI method above to reset the device.

The original issue

I install a lot of HP MSM equipment and am used to the more than occasional idiosyncrasy (more on that another time) but by and large they do what you tell them to do. This one had me stumped. Consider the following:

Access Network VLAN IP: 10.100.1.10/24
Internet Network VLAN IP: 10.100.99.10/24
Default Gateway IP: 10.100.99.254
Static Route: Destination 10.0.0.0/8, next-hop 10.100.1.254

Here you have the basic information for the initial configuration of an MSM720 with an “inside” and “outside” network assuming that the internal LAN/WAN is based on 10.x.x.x addresses and the internet is available through the gateway on the Internet VLAN at 10.100.99.254. You need the static route as you can’t have two default gateways and need the controller to be able to talk to the APs across the internal networks. All completely textbook.

Unfortunately, when configuring these settings, in this order, on a controller that came out of the box running version 5.7.1.1 – the controller stopped responding when applying the static route. Power cycling the box would appear to work but I couldn’t ping the device on the LAN or Internet VLANs but the console was perfectly responsive once I’d figured out the very odd serial settings.

After resetting the box, I upgraded it to V6.0.0.1 and went through the above steps with no issues this time. It’s also my understanding that this issue is fixed in 5.7.3.0 but I’ve not fully tested this.

Exchange 2013 PowerShell Unavailable

The problem with the bleeding edge is that there’s not a huge amount of supporting knowledge that goes with it as not many people have experienced all of the corner cases that Google/Bing so helpfully index for us when we have a problem. Here’s one that hit me today.

After completing a migration to Exchange 2013, I wanted to install Remote Desktop Gateway and Remote Desktop Web roles onto the same server as Exchange 2013 was running on since this is only a small environment and it didn’t make sense to provision separate servers for what will only be used by a handful of people. After doing this, I couldn’t open the Exchange 2013 PowerShell console due to multiple occurrences of the following error:

VERBOSE: Connecting to CT-EXCH-01.corp.collective-tech.com.
New-PSSession : [ct-exch-01.corp.collective-tech.com] Connecting to remote server ct-exch-01.corp.collective-tech.com
failed with the following error message : The client cannot connect to the destination specified in the request.
Verify that the service on the destination is running and is accepting requests. Consult the logs and documentation
for the WS-Management service running on the destination, most commonly IIS or WinRM. If the destination is the WinRM
service, run the following command on the destination to analyze and configure the WinRM service: “winrm quickconfig”.
For more information, see the about_Remote_Troubleshooting Help topic.
At line:1 char:1
+ New-PSSession -ConnectionURI “$connectionUri” -ConfigurationName Microsoft.Excha …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OpenError: (System.Manageme….RemoteRunspace:RemoteRunspace) [New-PSSession], PSRemotin
gTransportException
+ FullyQualifiedErrorId : CannotConnect,PSSessionOpenFailed

In addition, authentication on multiple virtual directories had been reset and the various services such as OWA were flaky at best.

Removing and recreating the PowerShell directory as per some suggestions didn’t help.

Resolution:

Exchange 2013 uses two web sites in IIS; one for production and one for back-end. The former uses the normal ports (80/443) and the latter normally uses incremented port numbers (81/444). For some reason, an additional binding on the back-end site had been added for port 443. This was causing all HTTPS traffic for the front-end to end up on the back-end site. To fix:

  • Open up IIS Manager
  • Navigate to the site “Exchange Back End”
  • Click “Bindings…” under “Actions” on the right-hand pane
  • Click the item with the values https/443 (not 444!)
  • Click “Remove” and then “Close”
  • Restart IIS
  • Make sure that both sites are started

From this point you should be able to connect to the Exchange Shell as normal and reset the authentication settings on the various virtual directories as required.

Error 1010 during Exchange 2010 SP3 Upgrade

During an upgrade of Exchange 2010 to Service Pack 3 on Windows 2008 R2 in perparation for an upcoming migration to Exchange 2013, the installation failed at the Language Files section and the following was logged in the setup log:

[04/03/2013 00:06:40.0161] [1] [ERROR] Unexpected Error
[04/03/2013 00:06:40.0161] [1] [ERROR] Performance counter names and help text failed to unload. Lodctr exited with error code ‘1010’.
[04/03/2013 00:06:40.0223] [1] Ending processing install-Languages

Resolution:

From an elevated command prompt, run the following:

lodctr /r

Re-run the Service Pack and it should complete this time.

Multiple Failures – UPS Edition

One of the things that comes up quite a lot in the consulting work I do with customers is the concept of multiple failures in the context of Business Continuity Planning [BCP] and Disaster Recovery [DR] discussions. In simple terms, two or more failures on their own could have relatively benign effects but combined the impact can have a severity far beyond the sum of the individual severities – a sort of negative synergy.

Looking for, understanding and mitigating the risk of these kind of failures is difficult in simple systems and nigh-on impossible in complex ones. History is littered with examples of these kind of failures in various settings. Some examples:

These are obviously some rather serious examples but smaller examples occur all the time within businesses (not just within IT) and their impact can be significant for the affected. I come across lots of situations where these kind of multiple failures have occured and I hope to share some of these in the future along with some thoughts on how to analyse infrastructures for and mitigate this kind of risk.

One instance I can share occured this week. A failure of the mains supply occured at one of a customer’s sites, likely caused by the nasty weather here lately, which caused the APC UPS to send a message to all physical servers letting them know to shutdown. All servers with the APC software shut down automatically whilst the Hyper-V hosts were shut down manually – all good so far. Several hours later, the power was restored and everything powered on automatically as it was supposed to. DCs booted, VMs restored from their saved states and so on.

That would be the end of a normal story. What happened next was unfortunate.

As a result of everything being off for several hours, the inclement weather caused the temperature in the server room to drop to about 15C, below the minimum temperature setpoint configured for the environmental monitoring probe on the UPS. This temperature out-of-range event caused a subsequent message to the phsyical servers which triggered another shutdown.

The temperature in the room gradually increased as the other equipment (PBX, SAN, switches, etc) remained powered on – along with the Hyper-V nodes that didn’t have the APC software installed. The physical servers didn’t turn back on again because the automatic power on only triggers when power is restored to the server which a temperature violation doesn’t cause.

The environment contains one physical and one virtual domain controller. The physical domain controller was off at this point as a result of the second shutdown whilst the virtual remained up. Unfortunately, the stop action for the DC’s VM was Save State, not Shutdown. This meant that the DC resumed at the time at which its state was saved – several hours before. This time was then propogated out to the other physical servers and then to the other VMs via Hyper-V’s time sync service causing a time skew that generated a whole load of Kerberos issues.

To resolve the issues, the physical servers were booted up remotely via their iLO interfaces, the domain controllers were resynched with an external time source followed by the Hyper-V machines and then most servers were rebooted to clear any errors and ensure correct service startup. Obviously this took some diagnosing and a lot of manual work which could have been avoided.

To recap, here are the issues:

  • Initial power failure
  • Low temperature alarm
  • APC software not installed on Hyper-V nodes
  • Incorrect stop action on DC VMs

On their own, not issues – together, several hours of early morning headaches!

2012 Core APC PowerChute Network Shutdown

Hit an issue with APC PowerChute Network Shutdown on Windows Server 2012 Core running Hyper-V:

PowerChute cannot communicate with the Network Management Card

PCNS is NOT receiving the data from the NMC.

The client was successfully installing and the IP was registering on the NMC but PCNS wouldn’t connect.

Resolution:

Make sure that the firewall rule “PCNS NMC Communication Port (UDP 3052)” is enabled for all profiles, not just Public which is all that’s selected by default.

Here’s the full set of installation steps:

  1. Install the correct version of PCNS for Windows Server 2012 from the command line.
  2. Connect to the server via an MMC console using the Windows Firewall with Advanced Services snap-in.
  3. For each of the three PCNS rules, open properties, head to the Advanced tab and enable Private and Domain
  4. Connect to https://server:6547/
  5. Run through the configuration wizard as normal

SCOM Hyper-V 2008 2012 Event Log Issue

I’ve been doing some work with System Center Operations Manager (SCOM) 2012 SP1 for a customer lately and was hit by an issue that I couldn’t seem to find an answer on. The environment incorporates both Hyper-V 2008 R2 and Hyper-V 2012 servers and for the latter, the following alert was being fired:

The Windows Event Log Provider is still unable to open the Microsoft-Windows-Hyper-V-Image-Management-Service-Admin event log on computer ‘vs-01.contoso.com’.
The Provider has been unable to open the Microsoft-Windows-Hyper-V-Image-Management-Service-Admin event log for 720 seconds.

Most recent error details: The specified channel could not be found. Check channel configuration.

SCOM Hyper-V 2008 2012 Alerts

The same alert was also being generated for Microsoft-Windows-Hyper-V-Network-Admin.

Cause

It seems that the MPs for Hyper-V 2008 are incorrectly looking for event logs on 2012 servers which don’t exist in 2012.

Resolution

To solve this, stop the monitors from targetting the 2012 servers:

Head into the Authoring workspace and then under Management Pack Objects click Monitors. From there, click Scope on the toolbar. Select View all targets then click Clear All at the bottom. Enter “Hyper-V” into the box at the top and then click Select All then OK:

SCOM Hyper-V 2008 2012 Scope MPs

Expand Hyper-V Virtual Hard Disk and Hyper-V Virtual Network:

SCOM Hyper-V 2008 2012 Monitors

Click properties on Mounted Drive Read-only, Port Connectivity and Port Disconnectivity and heading to the Event Log tab will show you the Event Log targetted by the monitors which as you’ll see, are the ones we’re having a problem with:

SCOM Hyper-V 2008 2012 Mounted Drive Read-only Properties
SCOM Hyper-V 2008 2012 Port Connectivity Properties
SCOM Hyper-V 2008 2012 Port Disconnectivity Properties

For each of these three montiors, you need to disable them for the 2012 servers. For Mounted Drive Read-only, first right click on the monitor and choose:

SCOM Hyper-V 2008 2012 Override Menu

You can select the Windows Server 2012 Computer Group for this monitor:

SCOM Hyper-V 2008 2012 Group Selection

For the Port Connectivity and Port Disconnectivity monitors you’ll need to disable them “For a specific object of class: Hyper-V Virtual Network” and pick the objects that relate to each of your 2012 Hyper-V machines. For some reason, picking a group as above doesn’t work.

Reset the unhealthy monitors and clear the alerts and you should be good to go.

My thanks go to Kevin Greene for his post on half of the issue which led me down the right path to solving both alerts.