Download Microsoft Standalone System Sweeper Beta


Microsoft has released the beta version of Microsoft Standalone System Sweeper Beta. You can find more information and both the x86  and x64 versions to download over here on the connect site.

I know that all our environments and clients are well protected, patched and maintained Smile but unfortunately this is not the case all over the board. So this tool can help you to address malware issues.

Microsoft describes the product as follows:

A recovery tool that can help you start an infected PC and perform an offline scan to help identify and remove rootkits and other advanced malware. In addition, Microsoft Standalone System Sweeper Beta can be used if you cannot install or start an antivirus solution on your PC, or if the installed solution can’t detect or remove malware on your PC.

Microsoft Standalone System Sweeper Beta is not a replacement for a full antivirus solution providing ongoing protection; it is meant to be used in situations where you cannot start your PC due to a virus or other malware infection. For no-cost, real-time protection that helps guard your home or small business PCs against viruses, spyware, and other malicious software, download Microsoft Security Essentials*.

To get started, please make sure that you have a blank CD, DVD, or USB drive with at least 250 MB of space. Next, download and run the tool – the tool will help you to create the bootable media required to run the software on your PC.

Yet another tool in your box. Happy hunting and if you do use it, please provide your feedback via the connect site. It only helps make the product better.

I’m Attending The E2E Virtualization Conference


Well I’ve just finished doing the paperwork for attending the Experts 2 Experts conference in London http://www.pubforum.info/pubforum/E2E2011London.aspx. It runs from 18th to 20th November 2011. I’m looking forward to this one as I’m going to meet up with a lot of people from my on line network and have a change to discuss our virtualization experiences and share information in real life, face to face.

It’s good to get to attend vendor independent events and exchange information, enrich and extend our networks. I already know several people from my twitter/blogging network will be attending and I’m happy to meet up with you if you’re there. Just let me know via e-mail, the feedback option on this blog or via twitter (@workinghardinit). Well, I’ll see you there!

Introducing 10Gbps Networking In Your Hyper-V Failover Cluster Environment


This is a 1st post in a series of 4. Here’s a list of all parts:

  1. Introducing 10Gbps Networking In Your Hyper-V Failover Cluster Environment (Part 1/4)
  2. Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4)
  3. Introducing 10Gbps & Thoughts On Network High Availability For Hyper-V (Part 3/4)
  4. Introducing 10Gbps & Integrating It Into  Your Network Infrastructure (Part 4/4)

A lot of early and current Hyper-V clusters are built on 1Gbps network architectures. That’s fine and works very well for a large number of environments. Perhaps at this moment in time you’re running solutions using blades with 10Gbps mezzanine cards/switches and all this with or without cutting up the bandwidth available for all the different networks needs a Hyper-V cluster has or should have for optimal performance and availability. This depends on the vendor and the type of blades you’re using. It also matters when you bought the hardware (W2K8 or W2K8R2 era) and if you built the solution yourself or bought a fast track or reference architecture kit, perhaps even including all Microsoft software and complete with installation services.

I’ve been looking into some approaches to introducing a 10Gbps network for use with Hyper-V clusters mainly for Clustered Shared Volume (CSV) and Live Migration (LM) Traffic. In brown field environments that are already running Hyper-V clusters there are several scenarios to achieve this, but I’m not offering the “definite guide” on how to do this. This is not a best practices story. There is no one size fits all. Depending on your capabilities, needs & budget you’ll approach things differently, reflecting what’s best for your environment. There are some “don’t do this in production whatever you environment is” warnings that you should take note of, but apart from that you’re free to choose what suits you best.

The 10Gbps implementations I’m dealing with are driven by one very strong operational requirement: reduce the live migration time for virtual machines with a lot of memory running a under a decent to heavy load. So here it is all about bandwidth and speed. The train of taught we’re trying to follow is that we do not want introduce 10Gbps just to share its bandwidth between 4 or more VLANs as you might see in some high density blade solutions. There that has often to do with limited amount of NIC/switch ports in some environments where they also want to have high availability. In high density scenarios the need to reduce cabling is also more urgent. All this is also often driven a desire to cut costs or keep those down as much as possible. But as technology evolves fast my guess is that within a few years we won’t be discussing the cost of 10Gbps switches anymore and even today there very good deals to be made. The reduction of cabling safes on labor & helps achieve high density in the racks. I do need to stress however that way too often discussions around density, cooling and power consumption in existing data centers or server rooms is not as simple as it appears. I would state that the achieve real and optimal results from an investment in blades you have to have the server room, cooling, power and ups designed around them. I won’t even go into the discussion over when blade servers become a cost effective solutions for SMB needs.

So back to 10Gbps networking. You should realize that Live Migration and Redirected Access with CSV absolutely benefit from getting a 10Gbps pipes just for their needs. For VMs consuming 16 Gb to 32 GB of memory this is significant. Think about it. Bringing 16 seconds back to 4 seconds might not be too big of a deal for a node with 10 to 15 VMs. But when you have a dozen SQL Servers that take 180 to 300 seconds to live migrate and reduce that to 20 to 30 seconds that helps. Perhaps not so during automated maintenance but when it needs to be done fast (i.e. on a node indicating serious hardware issues) those times add up. To achieve such results we gave the Live Migration & CSV network both a dedicated 10Gbps network. They consume about 50% of the available bandwidth so even a failover of the CSV traffic to our Live Migration network or vice versa should be easily handled. On top of the “Big Pipes” you can test jumbo frames, VMQ, …

Now the biggest part of that Live Migration time is in the “Brown-Out” phase (event id 22508 in the Hyper-V-Worker log) during which the memory transfer happens. Those are the times we reduce significantly by moving to 10Gbps. The “Black-Out” phase during which the virtual machine is brought on line on the other node creates a snapshot with the last remaining delta of “dirty memory pages”, followed by quiescing the virtual machine for the last memory copy to be performed and finally by the unquiescing of the virtual machine which is then running on the other node. This is normally measured in hundreds of milliseconds (event id 22509 in the Hyper-V-Worker log) . We do have a couple of very network intensive applications that sometimes have a GUI issue after a live migration (the services are fine but the consoles monitoring those services act up). We plan on moving those VMs to 10Gbps to find out if this reduces the “Black-Out” phase a bit and prevents that GUI of acting up. When can give you more feedback on this, I’ll let you known how that worked out.

An Example of these events in the Hyper-V-Worker event log is listed below:

Event ID 22508:

‘XXXXXXXX-YYYY-ZZZZ-QQQQ-DC12222DE1′ migrated with a Brown-Out time of 64.975 seconds.

Event ID 22509:

‘XXXXXXXX-YYYY-ZZZZ-QQQQ-DC12222DE1′ migrated with a Black-Out time of 0.811 seconds, 842 dirty page and 4841 KB of saved state.

Event ID 22507:

Migration completed successfully for ‘XXXXXXXX-YYYY-ZZZZ-QQQQ-DC12222DE1′ in 66.581 seconds.

In these 10Gbps efforts I’m also about high availability but not when that would mean sacrificing performance due to the fact I need to keep costs down and perhaps use approaches that are only really economical in large environments. The scenarios I’m dealing with are not about large hosting environments or cloud providers. We’re talking about providing the best network performance to some Hyper-V clusters that will be running SQL Server for example, or other high resource applications. These are relatively small environments compared to hosting and cloud providers. The economics and the needs are very different. As a small example of this: saving a ten thousand switch ports means that you’ll need you’ll save 500 times the price of a switch. To them that matters a lot more, not just in volume but also in relation to the other costs. They’re probably running services with an architecture that survives loosing servers and don’t require clustering. It all runs on cheap hardware with high energy efficiency as they don’t care about losing nodes when the service has been designed with that in mind. Economics of scale is what they are all about. They’d go broke building all that on highly redundant hardware and fail at achieving their needs. But most of us don’t work in such an environment.

I would also like to remind you that high availability introduces complexity. And complexity that you can’t manage will sink your high availability faster than a torpedo mid ship downs a cruiser. So know what you do, why and when to do it. One final piece of advice: TEST!

So to conclude this part take note of the fact I’m not discussing the design of a “fast track” setup that I’ll resell for all kinds of environments and I need a very cost effective rinse & repeat solution that has a Small, Medium & Large variety with all bases covered. I’m not saying those aren’t good or valuable, far from it, a lot of people will benefit from those but I’m serving other needs. If you wonder why they want to virtualize the applications at all, it has to do with disaster recovery & business continuity and replicating the environment to a remote site.

I intend to follow up on this in future blog posts when I have more information and some time to write it all up.

WDeployConfigWriter Account Issues – Trouble Shooting Web Deploy 2.0 With Lessons Learned


Here’s a small recap of a trouble shooting incident we dealt with recently and that served as a coaching exercise for trouble shooting. It seems we have Web Deploy 2.0 in use for in house deployments of web apps. It seems to be a valued asset as well. At least valuable enough to land a help request on the desk of one of the young, eager, smart and upward mobile IT Professionals when it stops working and they need some assistance.

Hello ICT,

To deploy our we websites remotely we use web deployment service (see http://technet.microsoft.com/en-us/library/dd569087(WS.10).aspx for more info).

This service runs under the network service account by default. Deploying fails now. In the security log on the server I find  "The specified account’s password has expired".

Does anyone know the password of this account?

Best regards,

Hardworking Web Guy In Trouble

Basically we have enough information to know something went wrong and that they need it to work again. But that’s about it. Password for the network service account expired? They also included an error log and reading it learns us something. The lesson to be learned here: investigate yourself, read the log, interpret them. Don’t let patients give you a diagnosis. Their input is critical, but you need to draw your own conclusions.

An account failed to log on.

Subject:
                Security ID:                           LOCAL SERVICE
                Account Name:                    LOCAL SERVICE
                Account Domain:                NT AUTHORITY
                Logon ID:                              0x3e5

Logon Type:                                         8

Account For Which Logon Failed:
                Security ID:                           NULL SID
                Account Name:                    WDeployConfigWriter
                Account Domain:                lab.test

Failure Information:
                Failure Reason:                     The specified account’s password has expired.
                Status:                                0xc000006e
                Sub Status:                            0xc0000071

Process Information:
Caller Process ID: 0x1f44
Caller Process Name: C:\Windows\System32\inetsrv\WMSvc.exe

What did we just read and learn? No it’s not the Network Service Account whose password has expired. This doesn’t happen/doesn’t work that way … so that was our first indication that this isn’t quite right in the support ticket. As you can see the real problem account mentioned in the error log:  WDeployConfigWriter. That account is indeed a local account.

WdeployAccounst

 

Cool, now we check what service runs under that account by looking in the services panel …. none! The easy way to check is to sort on the "Log On As" column. You won’t find WDeployConfigWriter. Right … , what else do we learn from the Services panel. Well we do have service called Web Deployment Agent Service running under the local Network Service account. We can stop and start it just fine so there is nothing wrong with the Network Service account , which is as expected and this service is not our culprit.  What we also learn that this is Web Deploy 2.0.

Service

 

As the Web Deployment Agent Service has nothing to do with the problem at hand. So where is that WDeployConfigWriter being used and what is it status? Let’s take a look.

WdeployAccountsettings

 

Hey, how could this account have expired? This is impossible. Unless they changed it while trying to fix the error. We check this with  quick phone call and yes, they did exactly that.  The good thing is that this web guy is professional and tells us what they did. Some people think this might get them into trouble and won’t do that. It doesn’t change anything, things are what they are, but it does make communication less easy when you discover people act that way… So the lessons here are to double check & verify what happened if at all possible. Originally the settings were:

WDeployAccountOriginalSettings 

 

They changed them after they ran into issues hop that checking those options might fix it. Well no, expired is expired and you can’t fix it like that. You need indeed to correct the settings if you don’t want the password to expire and even prevent the user from changing it but you also need to set a new password when it has already expired. After doing so we contact the hardworking web guy in trouble to let ‘m test and predict a new error: whatever runs under that Account will now fail to run due to an incorrect password. And guess what? “Unknown user name or bad password” in the security log.

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          24/06/2011 10:30:39
Event ID:      4625
Task Category: Logon
Level:         Information
Keywords:      Audit Failure
User:          N/A
Computer:     server1.lab.test
Description:
An account failed to log on.

Subject:
    Security ID:        LOCAL SERVICE
    Account Name:        LOCAL SERVICE
    Account Domain:        NT AUTHORITY
    Logon ID:        0x3e5

Logon Type:            8

Account For Which Logon Failed:
    Security ID:        NULL SID
    Account Name:        WDeployConfigWriter
    Account Domain:        lab.test

Failure Information:
    Failure Reason:        Unknown user name or bad password.
    Status:            0xc000006d
    Sub Status:        0xc000006a

Process Information:
    Caller Process ID:    0x1f44
    Caller Process Name:    C:\Windows\System32\inetsrv\WMSvc.exe

 

The user wants to repair install or uninstall and reinstall the application to “get a quick fix” but we do not to give in and keep trouble shooting. It’s better to learn what the cause really is and how to fix it instead of relying on wishful reinstalling.

So where is the thing that runs under that account. We start a quick search in the registry and on the file system for the  account name just in case it’s configured in the registry or a configuration file and let it run while we keep investigating.  We also send  a tweet in to the universe, as perhaps some one out there  knows this and can help out. We search the internet for Web Deploy 2.0 and WDeployConfigWriter. This results in very few hits, hmmm, interesting  … . One of them is http://blogs.iis.net/msdeploy/archive/2011/04/05/announcing-web-deploy-2-0-refresh.aspx

Where we learn a few things, the most important is the one line from that blog post I formatted in bold and red from the blog snippet right below. I also enlarged the picture from the blog post to make it readable where you can find in IIS  what we learned here:

Notice that Web Deploy setup created two new local user accounts:

- WDeployConfigWriter, which has Write permissions to the IIS server’s applicationHost.config. This is used by delegation rules for createApp, appPoolNetFx and appPoolPipelineMode.

I’ve included the entire block of text from where this was taken below.

1. Easier setup for non-administrator deployments on IIS7

One of the common requests from our users was to make it easier to setup Web Deploy so non-administrators can publish to their sites. Typically, you will need to do this if you are running a shared hosting environment or if you are administering a build machine and you do not want users to have admin access.

If you launch the Web Deploy installer and choose “Custom”, you will notice a new option, “Configure for Non-administrator Deployments”:

clip_image001

If you choose this option, Web Deploy will automatically create Management Service Delegation rules for the following providers, as well as user the accounts needed for providers like createApp and recycleApp that need elevated privileges.

These are the rules you will have in the Management Service Delegation UI in IIS Manager after you install this component:

Notice that Web Deploy setup created two new local user accounts:

- WDeployConfigWriter, which has Write permissions to the IIS server’s applicationHost.config. This is used by delegation rules for createApp, appPoolNetFx and appPoolPipelineMode.

- WDeployAdmin, which is an administrator. This is used by delegation rules for recycleApp.

If you prefer to create these rules by hand, uncheck the component in the installer. We also provide a PowerShell script for creating delegation rules (more on this later in the post) if you prefer that route.

Well armed with this information we go have a look at the Management Service Delegation:

ManagementServiceDelegation

 

Where we indeed find createApp, appPoolNetFx and appPoolPipelineMode:

ManagementServiceDelegationWebdeployconfig

 

So now we take a look a bit what we can configure here and  sure enough, by double clicking on them the Edit Rule form:

ManagementServiceDelegationWebdeployconfigSettings

 

So we click on Edit security credentials and are welcomed by this form:

ManagementServiceDelegationWebdeployconfigSettingsPW1

 

So we enter the account name and the new password we set before (remember to do this for both providers):

ManagementServiceDelegationWebdeployconfigSettingsPW2

 

Guess what, end user happy, things are working again. Jay! From service down report to helpdesk to fully operational again in less than an hour with a technology new to the service desk. Well done young, eager, smart and upward mobile IT Pro Winking smile with lessons learned.

How did this happen and did they end up with this funky configuration (expiring password of an account that no one knows where it is used for and where configured)? Aha, operational control => know the configuration of what you use and know why it is configured that way and where it’s configured. Is it a mistake/assumption in the installer that the accounts WDeployConfigWriter and WDeployAdmin have their passwords set to expired and can be changed by the user or did somebody mess with them after the install? Well I did the test by setting it up on a test server and found that they are indeed installed with their passwords set to expire and that the password can be changed by the user. It assumes that the person doing the install knows and realizes the implications. I’m not saying either setting is wrong but you should know why, when and where. There is no documentation on this as far as we could find right now and perhaps the installer should mention the benefits/risks of both types of configuration and ask what to choose. This, together with better documentation, could help prevent this issue. As always, no guarantees given Winking smile 

Overall lesson: don’t assume things, trust but verify …

Hotfixes For Hyper-V & Failover Clustering Can Be Confusing KB2496089 & KB2521348


As I’m building or extending a number of Hyper-V Clusters in the next 4 months I’m gathering/updating my list with the Windows 2008 R2 SP1 hotfixes relating to Hyper-V and Failover Clustering. Microsoft has once published KB2545685: Recommended hotfixes and updates for Windows Server 2008 R2 SP1 Failover Clusters but that list is not kept up to date, the two hotfixes mentioned are in the list below. I also intend to update my list for Windows Server 2008 SP2 and Windows 2008 R2 RTM. As I will run into to these and it’s nice to have a quick reference list.

I’ll include my current list below. Some of these fixes are purely related to Hyper-V, some to a combination of hyper-V and clusters, some only to clustering and some to Windows in general. But they are all ones that will bite you when running Hyper-V (in a failover cluster or stand alone). Now for the fun part with some hotfixes I’ll address in this blog post. Confusion Smile Take a look at the purple text and the green text hotfixes and the discussion below. Are there any others like this I don’t know about?

* KB2496089 is included in SP1 according to “Updates in Win7 and WS08R2 SP1.xls” that can be downloaded here (http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=269) but the Dutch language KB article states it applies to W2K8R2SP1 http://support.microsoft.com/kb/2496089/nl

Artikel ID: 2498472 – Laatste beoordeling: dinsdag 10 februari 2011 – Wijziging: 1.0

Vereisten

Deze hotfix moet worden uitgevoerd een van de volgende besturings systemen:

  • Windows Server 2008 R2
  • Servicepack 1 (SP1) voor Windows Server 2008 R2
Voor alle ondersteunde x64 versies van Windows Server 2008 R2

6.1.7600.20881
4,507,648
15-Jan-2011
04: 10
x64

Vmms.exe
6.1.7601.21642
4,626,944
15-Jan-2011
04: 05
x64

When you try to install the hotfix it will. So is it really in there? Compare file versions! Well the version after installing the hotfix on a W2K8R2 SP1 Hyper-V server the version of vmms.exe was 6.1.7601.21642 and on a Hyper-V server with SP1 its was 6.1.7061.17514. Buy the way these are English versions of the OS, no language packs installed.

With hotfix installed on SP1

Withhotfix_thumb[1]

Without hotfix installed on SP1

Withoutpatch_thumb[1]

To make matters even more confusing while the Dutch KB article states it applies to both W2K8R2 RTM and W2K8R2SP1 but the English version of the article has been modified and only mentions W2K8R2 RTM anymore.

http://support.microsoft.com/kb/2496089/en-us

Article ID: 2496089 – Last Review: February 23, 2011 – Revision: 2.0

For all supported x64-based versions of Windows Server 2008 R2

Vmms.exe
6.1.7600.20881
4,507,648
15-Jan-2011
04:10
x64

So what gives? Has SP1 for W2K8R2 been updated with the fix included and did the SP1 version I installed (official one right after it went RTM) in the lab not yet include it? Do the service packs differ with language, i.e. only the English one got updated?. Sigh :-/ Now for the good news: ** It’s all very academic because of this KB 2521348 A virtual machine online backup fails in Windows Server 2008 R2 when the SAN policy is set to “Offline All” which brings the vmms.exe version to 6.1.7601.21686 and this hot fix supersedes KB2496089 Smile. See http://blogs.technet.com/b/yongrhee/archive/2011/05/22/list-of-hyper-v-windows-server-2008-r2-sp1-hotfixes.aspx where this is explicitly mentioned.

Ramazan Can mentions hotfix 2496089 and whether it is included in SP1 in the comments on his blog post http://ramazancan.wordpress.com/2011/06/14/post-sp1-hotfixes-for-windows-2008-r2-sp1-with-failover-clustering-and-hyper-v/ but I’m not very convinced it is indeed included. The machine I tested on are W2K8R2 English RTM updated to SP1, not installations for the media including SP1 so perhaps there could also be a difference. It also should not matter that if you install SP1 before adding the Hyper-V role, so that can’t be the cause.

Anyway, keep your systems up to date and running smoothly, but treat your Hyper-V clusters with all due care and attention.

  1. KB2277904: You cannot access an MPIO-controlled storage device in Windows Server 2008 R2 (SP1) after you send the “IOCTL_MPIO_PASS_THROUGH_PATH_DIRECT” control code that has an invalid MPIO path ID
  2. KB2519736: Stop error message in Windows Server 2008 R2 SP1 or in Windows 7 SP1: “STOP: 0x0000007F”
  3. KB2496089: The Hyper-V Virtual Machine Management service stops responding intermittently when the service is stopped in Windows Server 2008 R2
  4. KB2485986: An update is available for Hyper-V Best Practices Analyzer for Windows Server 2008 R2 (SP1)
  5. KB2494162: The Cluster service stops unexpectedly on a Windows Server 2008 R2 (SP1) failover cluster node when you perform multiple backup operations in parallel on a cluster shared volume
  6. KB2496089: The Hyper-V Virtual Machine Management service stops responding intermittently when the service is stopped in Windows Server 2008 R2 (SP1)*
  7. KB2521348: A virtual machine online backup fails in Windows Server 2008 R2 (SP1) when the SAN policy is set to “Offline All”**
  8. KB2531907: Validate SCSI Device Vital Product Data (VPD) test fails after you install Windows Server 2008 R2 SP1
  9. KB2462576: The NFS share cannot be brought online in Windows Server 2008 R2 when you try to create the NFS share as a cluster resource on a third-party storage disk
  10. KB2501763: Read-only pass-through disk after you add the disk to a highly available VM in a Windows Server 2008 R2 SP1 failover cluster
  11. KB2520235: “0x0000009E” Stop error when you add an extra storage disk to a failover cluster in Windows Server 2008 R2 (SP1)
  12. KB2460971: MPIO failover fails on a computer that is running Windows Server 2008 R2 (SP1)
  13. KB2511962: “0x000000D1″ Stop error occurs in the Mpio.sys driver in Windows Server 2008 R2 (SP1)
  14. KB2494036: A hotfix is available to let you configure a cluster node that does not have quorum votes in Windows Server 2008 and in Windows Server 2008 R2 (SP1)
  15. KB2519946: Timeout Detection and Recovery (TDR) randomly occurs in a virtual machine that uses the RemoteFX feature in Windows Server 2008 R2 (SP1)
  16. KB2512715: Validate Operating System Installation Option test may identify Windows Server 2008 R2 Server Core installation type incorrectly in Windows Server 2008 R2 (SP1)
  17. KB2523676: GPU is not accessed leads to some VMs that use the RemoteFX feature to not start in Windows Server 2008 R2 SP1
  18. KB2533362: Hyper-V settings hang after installing RemoteFX on Windows 2008 R2 SP1
  19. KB2529956: Windows Server 2008 R2 (SP1) installation may hang if more than 64 logical processors are active
  20. KB2545227: Event ID 10 is logged in the Application log after you install Service Pack 1 for Windows 7 or Windows Server 2008 R2
  21. KB2517329: Performance decreases in Windows Server 2008 R2 (SP1) when the Hyper-V role is installed on a computer that uses Intel Westmere or Sandy Bridge processors
  22. KB2532917: Hyper-V Virtual Machines Exhibit Slow Startup and Shutdown
  23. KB2494016: Stop error 0x0000007a occurs on a virtual machine that is running on a Windows Server 2008 R2-based failover cluster with a cluster shared volume, and the state of the CSV is switched to redirected access
  24. KB2263829: The network connection of a running Hyper-V virtual machine may be lost under heavy outgoing network traffic on a computer that is running Windows Server 2008 R2 SP1
  25. KB2406705: Some I/O requests to a storage device fail on a fault-tolerant system that is running Windows Server 2008 or Windows Server 2008 R2 (SP1) when you perform a surprise removal of one path to the storage device
  26. KB2522766: The MPIO driver fails over all paths incorrectly when a transient single failure occurs in Windows Server 2008 or in Windows Server 2008 R2

KB Article 2522766 & KB Article 2135160 Published Today


At this moment in time I don’t have any more Hyper-V clusters to support that are below Windows Server 2008 R2 SP1. That’s good as I only have one list of patches to keep up to date for my own use. As for you guys still taking care of Windows 2008 R2 RTM Hyper-V cluster you might want to take a look at KN article 2135160 FIX: "0x0000009E" Stop error when you host Hyper-V virtual machines in a Windows Server 2008 R2-based failover cluster that was released today. The issue however is (yet again) an underlying C-State issue that already has been fixed in relation to another issue published as KB article 983460 Startup takes a long time on a Windows 7 or Windows Server 2008 R2-based computer that has an Intel Nehalem-EX CPU installed.

And for both Windows Server 2008 R2 RTM and SP1 you might take a look at an MPIO issue that was also published today (you are running Hyper-V on a cluster and your are using MPIO for redundant storage access I bet) KB article 2522766 The MPIO driver fails over all paths incorrectly when a transient single failure occurs in Windows Server 2008 or in Windows Server 2008 R2

It’s time I add a page to this blog for all the fixes related to Hyper-V and Failover Clustering with Windows Server 2008 R2 SP1 for my own reference Smile

Hyper-V Is Right Up There In Gartner’s Magic Quadrant for x86 Server Virtualization Infrastructure


So how do you like THEM apples? 

Well take a look at this people, Gartner published the following on June 30th Magic Quadrant for x86 Server Virtualization Infrastructure

A-Hyper-V-GartnerQuadrant

Figure 1: Magic Quadrant for x86 Server Virtualization Infrastructure (Source: Gartner 2011)

That’s not a bad spot if you ask me. And before the “they paid there way up there” remarks flow in, Gartner works how Gartner works and it works like that for everyone (read” the other vendors” on there) so that remark could fly right back into your face if you’re not careful. To get there in 3 years time is not a bad track record. And if you believe some of the people out there this can’t be true. Now knowing that they only had Virtual Server to offer before Hyper-V was available and I was not using that anywhere. No, not even for non critical production an testing as the lack of X64 bit VM support made it a “no go” product for me. So the success if Hyper-V is quite an achievement. But back in 2008, I did go with Hyper-V as a high available virtualization solution, after having tested and evaluated it during the Beta & Release Candidate time frame. Some people thought I was making a mistake.

But the features in Hyper-V were  “good enough” for most needs I needed to deal with and yes I knew VMware had a richer offering and was proven technology, something people never forget to mention that to me for some reason. I guess they wanted to make sure I hadn’t been living under a rock the last couple of years. They never mentioned the cost and some trends however or looked at the customers real needs. Hyper-V was a lot better than what most environments I had to serve had running at the time. In 2008 those people I needed to help were using VMware Server or Virtual Server. Both were/are free but for anything more than lightweight applications on the “not that important” list they are not suitable. If you’re going to do virtualization structurally you need high availability to avoid the risks associated with putting all your eggs in one basket. However as you might have guessed these people did not use ESX. Why? In all honesty, the cost associated.

In the 2005-2007 time frame servers where not yet at the cost/performance ratio spent they reached in 2008 and far cry from where they are now. Those organizations didn’t do server virtualization because from the cost perspective in both licensing fees for functionality and hardware procurement. It just didn’t fit in yet.  The hardware cost barrier had come down and now with Hyper-V 1.0 we got a hyper visor that we knew could deliver something that was good enough to get the job done at a cost they could cover. We also knew that Live Migration and Dynamic Memory were in the pipe lines and the product would only become better. Having tested Hyper-V I knew I had a technology to work with at a very reasonable price (or even for free) and that included high availability.  Combine this with the notion at the time that hyper visors are becoming commodities and that people are announcing the era of the cloud. Where do you think the money needs to go? Management & Applications. Where did Microsoft help with that? The System Center suite. System Center Virtual Machine Manager and Operations Manager. Are those perfect at their current incarnations? Nope. But have you looked at SCVMM 2012 Beta? Do you follow the buzz around Hyper-V 3.0 or vNext? Take a peak and you know where this is going. Think private & hybrid cloud. The beef with the MS stack lies in the hyper visor & management combination. Management tools and integration capability  to help with application delivery and hence with the delivery of services to the business. Even if you have no desire or need for the public cloud, do take a look. Having a private cloud capability enhances your internal service delivery. Think of it as “Dynamic IT on Steroids”. Having a private cloud is a prerequisite for having a Hybrid cloud, which aids in the use of the public cloud when that time comes for your organization. And if never, no problem, you have gotten the best internal environment possible, no money or time lost. See  my blog for more Private Clouds, Hybrid Clouds & Public Clouds musings on this.

Is Hyper-V and System Center the perfect solution for everyone in every case? No sir.  No single product or product stack can be everything to everyone. The entire VMware versus Hyper-V mud slinging contests are at best amusing when you have the time and are in the mood for it. Most of the time I’m not playing that game. The consultants answer is correct: “It depends”. And very few people know all virtualization products very well and have equal experience with them. But when you’re looking to use virtualization to carry your business into the future you should have a look at the Microsoft stack and see if can help you. True objectivity is very hard. We all have our preferences and monetary incentives and there are always those who’ll take it to extreme levels. There are still people out there claiming you need to reboot a Windows server daily and have BSODs all over the place. If that is really the case they should not be blaming technology. If the technology was that bad they would not need to try and convince people not to use it, they would run away from it by themselves and I would be asking you if you want fries with your burger. Things go “boink” sometimes with any technology, really, you’d think it was created by humans, go figure. At BriForum 2011 in London this year it was confirmed that more and more we’re seeing multi hyper visors in use with large to medium organizations. That means there is a need for different solutions in different areas and that Hyper-V was doing particular well in green field scenarios.

Am I happy with the choices I made? Yes. We’re getting ready  to do some more Hyper-V projects and those plans even include SCVMM 2012 & SCOM 2012 together with and upgrade path to Hyper-V vNext. I mailed the Gartner link to my manager, pointing out my obstinate choice back then turned out rather well Winking smile.

Free Support Rant


<rant>

I blog and help out in news groups because I like to share ideas, solutions and help out when and where I can. I’m active on twitter because I enjoy the discussions, the out loud thinking and the reflection we all get of just throwing ideas, conclusions, opinions, experiences and knowledge in a pool of diverse but very skilled passionate IT Professionals and Developers.

It is not always easy to share information. The potential complexity of environments that may well have other issues and restrictions in combination with the vast amount of possible configurations and designs, both valid and ill advised, make it near to impossible to cover all eventualities. If one of my blog posts does not contain the answer to your specific problem or does not apply to your particular situation, do not complain & moan about it, let alone demand of me to come up with a solution. What is written here are bits and pieces of information which I choose to share because I think they have some value and can help other people out.  I do this in my own time. Really, I am not paid to blog, research technologies or build labs. I do this out of my own interest and because I enjoy it and it has value to me in my own work. I work a lot of hours “for a boss” and those are not always the most esoteric. When you read my “About” page you’ll read the following:

I’m still in the trenches with my boys and gals. Empty suits or hollow bunnies are neither wanted nor needed. In IT you live by the sword and you die by the sword. There is no hiding when you mess up, all our mistakes are in plain sight of everyone using what we build.

That is my reality and I live by it. Perhaps others should try this.  I’ve seen to many ICT “gods” come down from heaven for a short while pushing their latest religion or product. Loudly proclaiming it is the truth and the only way forward. Failure to achieve success is always due to a lack of faith with us subjects, our (at best) mediocre skills or because we have to wait and see the benefits,  much later in time, but we need to keep the faith. When the shit hits the fan those gods are back on the Olympus, pushing daggers into the back of us infidels who couldn’t make it work. No thank you. I think the people I work with know the  strengths and weaknesses of both my self or my solutions. I have however never ever left them out in the cold when something didn’t work out as planned or when things failed. Yes, eventually things, big and small, do fail. How you try and prevent that as much as possible and how you deal with it when it happens is what makes a huge difference. That’s where my professional responsibilities lie, not with some Microsoft bashing, impolite, wannabe who thinks insulting me is a good approach to getting me to solve their issues with a Microsoft product. You know the type, they open a pack of “M$ Sucks Quick Mix” to try and get some “Instant credibility” and fail miserably, they even fail at asking for help.

I am not your free support desk, your dedicated Microsoft technology research engineer or trouble shooter. I’m an IT Pro with a busy job. I think certain people out there need to learn that you can catch more flies with honey than with vinegar. Don’t be a “jerk”.

<\rant>

Follow Up on Power Options for Performance When Virtualizing


So some people asked where they can find and configure those power settings we were talking about in a previous blog Consider CPU Power Optimization Versus Performance When Virtualizing. So in this blog entry I’ll do a quick run through of this. As I can get my hands on some DELL servers from two different generations (G10/G11), the screenshot are of those servers.

Let’s first look at CPUz screenshots from a DELL PE2950 III where we see to different P-States. So here we see the fluctuation between CPU Power. This CPU knows SpeedStep but not TurboBoost for example.

imageimage

 

By default/normally SpeedStep is enabled in the BIOS and Windows 2008 R2 has the “Balanced” power plan as a default. So this shows up something like this.

image

 

This means you can play around and set the power plan in Windows. So far so good. Naturally when your PCU doesn’t support fancy power there not much Windows can do for you on that front. Depending on the CPU you can also enable features like C-Sate (core parking), P-States (SpeedStep) and TurboBoost in the BIOS. Where exactly and what it is called depends a bit on the hardware /BIOS you’re running and the CPUs that are in there. When you disable all power saving settings in the BIOS or set the for maximum performance you can’t use it in Windows anymore. That’s when you’ll see something like this:

Balanced

 

So on a Windows 2008 R2 Server you’ll note that the Power Options in the GUI are disabled when BIOS options are set to maximum performance. Note that when you install the Hyper-V role it turns Standby & Hibernation off. No need for that, unless it’s you demo machine/laptop and than you can turn it back on (see Hibernate and Sleep with Hyper-V Role Enabled) . But Microsoft does state that P-states (SpeedStep) are supported and can be used, but it needs to be enabled in the BIOS for this.

To demonstrate the settings let’s look at the BIOS of a DELL R710 this look like what you see in the picture below. You disable SpeedStep by setting the option for CPU Power and Performance Management to “Maximum Performance”. For DELL G11 hardware you can find more information on the available options in the article Best Practices in Power Management. I suggest you search for the documentation for the servers you have at hand to see what the vendors have to offer in advice on settings and how to set them.

PER710

 

Possible Values here are:

Static MAX Performance
DBPM Disabled ( BIOS will set P-State to MAX) Memory frequency = Maximum Performance Fan algorithm = Performance

OS Control
Enable OS DBPM Control (BIOS will expose all possible P states to OS) Memory frequency = Maximum Performance Fan algorithm = Power

Active Power Controller
Enable DellSystem DBPM (BIOS will not make all P states available to OS) Memory frequency = Maximum Performance Fan algorithm = Power

Custom
CPU Power and Performance Management: Maximum Performance | Minimum Power | OS DBPM | System DBPM Memory Power and Performance Management: Maximum Performance |1333Mhz |1067Mhz |800Mhz| Minimum Power Fan Algorithm Performance | Power

 

And since, I’m a nice guys for all you people running a bit older hardware like a PE2950 III there it is called “Demand-Based Power Management” under the CPU Information and you actually disable it Smile.

BiosPe2950

 

Now when you’re running Hyper-V and you disabled SpeedStep or “Cool’n’Quiet” you’ll see something like this in the GUI:

Balanced:

 

There is nothing to configure so it’s greyed out but it doesn’t really reflect your intentions. There can change this using the GUI if the fact the faded out options are not reflecting what you configured in the BIOS  annoys you or you can use powercfg to make them less “contradictionary”. All you need to do is run the following line from the command prompt: “powercfg -setactive 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c” …

powercfg

 

… and immediately you’ll see the greyed out GUI reflect a bit more what you actually set in the BIOS. Mind you this is cosmetics, but hey, we’re inclined that way by evolution.

HighPerfromance

 

As stated above you can also use the “Change settings that are currently unavailable” to enable the radio buttons for “High performance” but do note again that if you didn’t enable the operating system to control the power it’s cosmetics.

Balanced - Copy

 

So now when you think you have this figured out and you’re gazing at CPUz to watch the results you might still see some differences. Aha, well there is still Turbo Boost (no, not that turbo button on your 1990’s PC)  seen in the DELL R710 BIOS as Turbo Mode (AMD offers similar functionality in Turbo Core)that we left enabled under “Processor Information” in the BIOS. This means that sometimes, when the CPU can use an extra power boost, it will get it, on top of the full power it has now by default since we configured it for Maximum Performance.

r710turbomode

 

So Turbo Mode will sometimes cause you to see  a higher frequency than what your CPU’s specification says it has in CPUz as in the left picture below. Without Turbo Boost it looks more like the specs (right picture below)

imageimage

 

And voila, that was a quick overview of where to see & do what. I don’t have access to more modern HP kit right now so the BIOS screen shots are from 2 different generation DELL Servers, but you’ll figure it out for your hardware I’m sure. Hope this clarified certain tings to you all. I know there is a lot more to all this, how it works, how many P-States there are but I’m not a CPU engineer or a hard core over clocker. I’m just a systems engineer trying to get the most out of his hardware in a realistic way.