Fixing Two Small DELL Compellent Hardware Hiccups


Here’s two little tips to solve some small hardware issues you might run into with a Compellent SAN. But first, you’re never on your own with CoPilot support. They are just one phone call away so I suggest if you see these to minor issues you give them a call. I speak from experience that CoPilot rocks. They are really good and go the extra mile. Best storage support I have ever experienced.

Notes

  • Always notify CoPilot as they will see the alerts come in and will contact you for sure Smile. Afterwards they’ll almost certainly will do a quick health check for you. But even better during the entire process they keep an eye on things to make sure you SAN is doing just fine. And if you feel you’d like them to tackle this, they will send out an engineer I’m sure.
  • Note that we’re talking about the SC40 controllers & disk bays here. The newer genuine DELL hardware is better than the super micro ones.

The audible alert without any issues what so ever

We kept getting an audible alert after we had long solved any issues on one of the SANs. The system had been checked a couple of times and everything was in perfect working order. Except for that audible alarm that just didn’t want to quit. A low priority issue I know but every time we walk into the data center we were going “oh oh” for a false alert. That’s not the kind of conditioning you want. Alerts are only to be made when needed and than they do need to be acted upon!

Working on this with CoPilot support we got rid of it by reseating the upper I/O module. You can do this on the fly – without pulling SAS-cables out or so, they are redundant, as long as you do it one by one and the cabling is done right (they can verify that remotely for you if needed).

image

But we got lucky after the first one. After the “Swap Clear” was requested  every warning condition was cleared and we got rid of the audible alert beep!  Copilot was on the line with us and made sure all paths are up and running so no bad things could happen. That’s what you have a copilot for.

Front panel display dimming out on a Compellent Disk Bay

We have multiple Compellent SANs and on one of those we had a disk bay with a info panel that didn’t light up anymore. A silly issue but an annoying one as this one also show you the disk bay ID.

image

Do we really replace the disk bay to solve this one? As that light had come on and of a couple of time it could just be a bad contact so my colleague decided to take a look. First  he removed the protective cover and then, using some short & curved screw drivers, he took of the body part. The red arrow indicates the little latch that holds the small ribbon cable in place.

image

That was standing right open. After locking that down the info appeared again on the panel. The covers was screwed on again and voila. Solved.

Advertisements

TechNet Top Support Solutions From Microsoft Support Blog


As this year comes to an end I’d like to draw your attention to Microsoft’s new Top Support Solutions blog on TechNet. It was created this as part of their continuous efforts to keep the various  technical communities informed about the most relevant answers to the top questions or issues experienced with their products. They identify these top issues by analyzing the question in their forums and their other support channels.

image

So if you need to find answers for your self or your customers go take a look at the "Top Solutions Content" blog. Changes are you’ll find valuable information about the Microsoft top support solutions for several of their popular products in Server and Tools. It might save you and your clients or manager a lot of time, effort and money. It’s also a great resource to make your colleagues, community, user group or clients aware of.

April 24th–Windows 2003 Is 10 Years Old


I’d like to chime in on a recent blog post by Aidan Finn Hey Look–Your Business Is Running On A 10-Year Old Server Operating System (W2003). The sad thing is this is so true and “the good” thing is some are even still on Windows Server 2000 so even in worse shape. Now I realize that not all industries are the same but keeping your operating systems up to date does have it’s benefits for all types of companies.

  • Security Improvements
  • Improved, richer, enhanced features
  • New functionality
  • Support for state of the art hardware & software
  • Supported for that day the SHTF
  • Future Proofing of your current investments

For one, all the above  it will save you time and money. On top of that mitigates the risks of lost revenue due to security incidents & unsupported environments no one can fix for you.

Think about it, if you’re running Windows Server 2000 or 2003 chances are you are paying for software to provide functionality that’s available right out of the box. You’re also putting in the extra effort & jumping through loops to run those on modern server grade hardware.

You’re also building up debt. Instead of yearly improvements keeping your infrastructure & services top notch you’re actively digging an ever bigger, very expensive, complex and high risk hole where you’ll have to dig your self out off. If you can, that is. Not a good place to be in. Still think leveraging software assurance is a bad thing?

So while way to many companies now have to assigned resources to mitigating that looming problem we’re focusing on other ventures (such as Hyper-V, Azure, Hybrid Cloud, …) and just keep our OS up to date at a steady pace, like before. Well people that doesn’t happen by accident. We’ve maintained a very healthy pace of upgrading to the most recent version of windows in our environments and at times I have had to fight for that and I’m I will again..But look at our base line, even if the economy tanks completely we’re in darn good shape to weather that storm and come out ahead. But it’s not going to happen by sitting there avoiding change out of fear or laziness. So start today.A point where I agree with Aidan completely: if your “Zombie ISV” and other vendors are telling you Windows 2003 is great and you shouldn’t use those new unproven versions of the OS; they are really touting BS. They have fallen behind so far on the technology stack that they need you to stay in their black hole of despair with them or they’ll go broke. Just move one. Trust me, they need you more than the other way around

Using Host Names in IIS in Combination with a KEMP LoadMaster


At a client the change over of a web site from old servers to new ones lead to the investigation of an issue with the hardware load balancer. Since that web site is related to an existing surveyors solutions suite that already had a KEMP LoadMaster 2200 in use the figured we’d also use it for the web site and no longer use WNLB.

Now the original web site had multiple DNS entries and host header names defined in IIS (see Configure a Host Header for a Web Site (IIS 7)) . Host header names in IIS allow you to host multiple web sites on an IIS server using the same IP address and port. A small added security benefit is that surfing on IP address fails which means we marginally disrupt some script kiddies & get an extra security checkbox marked during an audit Winking smile.

In our example we needed:

Note: The real names have been changed as well as the reasons why as this has some business & historical justifications that don’t matter here.

ntrip.surveyor.lab needs to be handled by the load balanced web servers in the solution. The http://www.surveyor.lab needs to be redirected to another web server to keep the business happy. However for political reasons we have to keep the DNS record for http://www.surveyor.lab pointing to the load balanced servers, i.e. the load master VIP.

Now without host names IIS al worked fine until we wanted to use HTTP redirect. As the web site is the same IP address for both names we either redirected them both or none. To fix this we needed two sites in IIS. The real one hosting ntrip.surveyor.lab and a “fake” one hosting the http://www.surveyor.lab that we want to redirect. Well as both are hosted on the same IP address and port on the IIS server we need to use host names. But then the sites became unavailable.

When checking the LoadMaster configuration, the virtual service for the web servers seemed well.

image

Is this a limitation of hardware load balancing or this specific Loadmaster? Some searching on the internet made it look like I was about the only on on the planet dealing with this issue so no help there.

Kemp Support Rocks

I already knew this but this experience reaffirms it. KEMP Technologies really does care about their customers and are very fast & responsive. I threw a quick question on twitter to @KempTech on Twitter and they responded very fast with some pointers. After that I replied with some more details, they offered to take it on via other means as twitter has it limits. OK, no problems. The next morning I got an e-mail from one of their engineers (Ekkehard) with more information and a request for more input from our side. I quickly made a VISIO diagram of the current and the desired situation. Based on this he let me know this should work.

image

He asked for a copy of the configuration and already pointed to the solution:

And what exactly happens – does the RS turn “red” in the “View/Modify Services” view? That might be caused by the health check settings…
(Remember that a 302 is considered NOT ok, so you had to enter the proper check URL and or / HTTP1.1 hostname)

But at that moment I did not realize this yet. I saw no error or the real server turning red indicating it was down. So we went through the configuration and decided to test without forcing layer 7 to see what happened. This didn’t make a difference and it wasn’t really a solution if it had as we needed layer 7 and layer 7 transparency.

Ekkehard also noticed my firmware was getting rather old (don’t fix what isn’t broken Smile) and suggest an upgrade (5.1-24 to 5.1.-74). So I did, reboot and tested some more settings. To make sure I didn’t miss anything I threw a network sniffer (WireShark) against the issue. And guess what?  As soon as I added a host name to the IIS web site bindings I didn’t even get any request from my client on that server anymore. So it was definitely being stopped at the Loadmaster. Without it request from a client came through perfectly.  That was not IIS doing as with a host name nothing came into the server. So why would the LoadMaster stop traffic to a real server? Because it’s down, that’s why, just like Ekkehard has indicated in one of his mails but we didn’t see it then.

Better check again and sure enough, the health service told me the real servers are down. Hey … that’s new. Did the previous firmware not show this, or just slower? I can’t say for sure. It’s either me being to impatient, a hiccup, the firmware or premature dementia Confused smile

Root Cause

So what happens? The default health check uses HTTP 1.0. You can customize it with a path like  /owa or such but in essence it uses the IP address of the real server and guess what. With a Host header name in IIS that isn’t allowed other wise it can’t figure out what website you want to go to if you’re using this feature to run multiple sites on the same IP address and port. So we need to check the health based on host name. Can the LoadMaster do that for us? Yes it can!

The fix

You need to enable HTTP 1.1 and fill out the host name you want to use for health checking.  In our case that’s ntrip.surveyor.lab. That’s all there’s to it. Easy as can be if you know. And Ekkehard knew he indicated to this in his quoted mail above.

HTTP1 1host

 

Lessons Learned

So how did I not know this? Isn’t this documented? Sure enough on page  56 of the LoadMaster manual it says the following:

7  HTTP  The LoadMaster opens a TCP connection to the Real Server on the Service port (port80). The LoadMaster sends a HTTP/1.0 HEAD request the server, requesting the page ―/‖.  If the server sends a HTTP response with a status code of 2 (200-299, 301, 302, 401) the LoadMaster closes the connection and marks the server as active.  If the server fails to respond within the configured response time for the configured number of times or if it responds with a different status code, it is assumed dead.  HTTP 1.0 and 1.1 support available, using HTTP 1.1 allows you to check host header enabled web servers.

Typical, you read the exact line of information you need AND understand it after having figured it out. Now linking that information (yes we always read all manuals completely Embarrassed smile) to the situation at hand isn’t always that fast a process but I got there in the end with some help from KEMP Technologies.

One hint is perhaps to mention this is in the handy tips that pop up when you hover over a setting in the LoadMaster console. I rely on this a lot and a mention of “HTTP 1.1 allows you to check host header enabled web servers” might have helped me out. But it’s not there. A very poor excuse I know … Embarrassed smile

image

Host Header Names & HTTP redirection

After having fix this issue I proceeded to configure HTTP redirect in IIS 7.5. For this is used two sites. One was just a fake site tied to the www.surveyors.lab hostname in IIS on port 80.

image

For this site I created a HTTP redirect to www.bussines.lab/surveyors/services. This works just fine as long as you don’t forget the http:// in the redirect URL.

image

So it has to be http://www.bussines.lab/surveyors/services or you’ll get a funky loop effect looking like this:

http://www.surveyors.lab/www.bussines.lab/surveyors/services/www.bussines.lab/surveyors/services/www.bussines.lab/surveyors/services

Firefox will tell you you have a loop that will never end but Internet Explorer doesn’t, it just fails. You do get that URL as a pointer to the cause of the issue. That is if you can relate it to that.

The other was the real site  and was configured with following bindings and without redirection.

image

Don’t forget to do this on all real servers in the farm! The next thing I need to find out is how to health check two host names in the LoadMaster as I have two websites with the same IP address, port but different host names.

WDeployConfigWriter Account Issues – Trouble Shooting Web Deploy 2.0 With Lessons Learned


Here’s a small recap of a trouble shooting incident we dealt with recently and that served as a coaching exercise for trouble shooting. It seems we have Web Deploy 2.0 in use for in house deployments of web apps. It seems to be a valued asset as well. At least valuable enough to land a help request on the desk of one of the young, eager, smart and upward mobile IT Professionals when it stops working and they need some assistance.

Hello ICT,

To deploy our we websites remotely we use web deployment service (see http://technet.microsoft.com/en-us/library/dd569087(WS.10).aspx for more info).

This service runs under the network service account by default. Deploying fails now. In the security log on the server I find  "The specified account’s password has expired".

Does anyone know the password of this account?

Best regards,

Hardworking Web Guy In Trouble

Basically we have enough information to know something went wrong and that they need it to work again. But that’s about it. Password for the network service account expired? They also included an error log and reading it learns us something. The lesson to be learned here: investigate yourself, read the log, interpret them. Don’t let patients give you a diagnosis. Their input is critical, but you need to draw your own conclusions.

An account failed to log on.

Subject:
                Security ID:                           LOCAL SERVICE
                Account Name:                    LOCAL SERVICE
                Account Domain:                NT AUTHORITY
                Logon ID:                              0x3e5

Logon Type:                                         8

Account For Which Logon Failed:
                Security ID:                           NULL SID
                Account Name:                    WDeployConfigWriter
                Account Domain:                lab.test

Failure Information:
                Failure Reason:                     The specified account’s password has expired.
                Status:                                0xc000006e
                Sub Status:                            0xc0000071

Process Information:
Caller Process ID: 0x1f44
Caller Process Name: C:\Windows\System32\inetsrv\WMSvc.exe

What did we just read and learn? No it’s not the Network Service Account whose password has expired. This doesn’t happen/doesn’t work that way … so that was our first indication that this isn’t quite right in the support ticket. As you can see the real problem account mentioned in the error log:  WDeployConfigWriter. That account is indeed a local account.

WdeployAccounst

 

Cool, now we check what service runs under that account by looking in the services panel …. none! The easy way to check is to sort on the "Log On As" column. You won’t find WDeployConfigWriter. Right … , what else do we learn from the Services panel. Well we do have service called Web Deployment Agent Service running under the local Network Service account. We can stop and start it just fine so there is nothing wrong with the Network Service account , which is as expected and this service is not our culprit.  What we also learn that this is Web Deploy 2.0.

Service

 

As the Web Deployment Agent Service has nothing to do with the problem at hand. So where is that WDeployConfigWriter being used and what is it status? Let’s take a look.

WdeployAccountsettings

 

Hey, how could this account have expired? This is impossible. Unless they changed it while trying to fix the error. We check this with  quick phone call and yes, they did exactly that.  The good thing is that this web guy is professional and tells us what they did. Some people think this might get them into trouble and won’t do that. It doesn’t change anything, things are what they are, but it does make communication less easy when you discover people act that way… So the lessons here are to double check & verify what happened if at all possible. Originally the settings were:

WDeployAccountOriginalSettings 

 

They changed them after they ran into issues hop that checking those options might fix it. Well no, expired is expired and you can’t fix it like that. You need indeed to correct the settings if you don’t want the password to expire and even prevent the user from changing it but you also need to set a new password when it has already expired. After doing so we contact the hardworking web guy in trouble to let ‘m test and predict a new error: whatever runs under that Account will now fail to run due to an incorrect password. And guess what? “Unknown user name or bad password” in the security log.

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          24/06/2011 10:30:39
Event ID:      4625
Task Category: Logon
Level:         Information
Keywords:      Audit Failure
User:          N/A
Computer:     server1.lab.test
Description:
An account failed to log on.

Subject:
    Security ID:        LOCAL SERVICE
    Account Name:        LOCAL SERVICE
    Account Domain:        NT AUTHORITY
    Logon ID:        0x3e5

Logon Type:            8

Account For Which Logon Failed:
    Security ID:        NULL SID
    Account Name:        WDeployConfigWriter
    Account Domain:        lab.test

Failure Information:
    Failure Reason:        Unknown user name or bad password.
    Status:            0xc000006d
    Sub Status:        0xc000006a

Process Information:
    Caller Process ID:    0x1f44
    Caller Process Name:    C:\Windows\System32\inetsrv\WMSvc.exe

 

The user wants to repair install or uninstall and reinstall the application to “get a quick fix” but we do not to give in and keep trouble shooting. It’s better to learn what the cause really is and how to fix it instead of relying on wishful reinstalling.

So where is the thing that runs under that account. We start a quick search in the registry and on the file system for the  account name just in case it’s configured in the registry or a configuration file and let it run while we keep investigating.  We also send  a tweet in to the universe, as perhaps some one out there  knows this and can help out. We search the internet for Web Deploy 2.0 and WDeployConfigWriter. This results in very few hits, hmmm, interesting  … . One of them is http://blogs.iis.net/msdeploy/archive/2011/04/05/announcing-web-deploy-2-0-refresh.aspx

Where we learn a few things, the most important is the one line from that blog post I formatted in bold and red from the blog snippet right below. I also enlarged the picture from the blog post to make it readable where you can find in IIS  what we learned here:

Notice that Web Deploy setup created two new local user accounts:

– WDeployConfigWriter, which has Write permissions to the IIS server’s applicationHost.config. This is used by delegation rules for createApp, appPoolNetFx and appPoolPipelineMode.

I’ve included the entire block of text from where this was taken below.

1. Easier setup for non-administrator deployments on IIS7

One of the common requests from our users was to make it easier to setup Web Deploy so non-administrators can publish to their sites. Typically, you will need to do this if you are running a shared hosting environment or if you are administering a build machine and you do not want users to have admin access.

If you launch the Web Deploy installer and choose “Custom”, you will notice a new option, “Configure for Non-administrator Deployments”:

clip_image001

If you choose this option, Web Deploy will automatically create Management Service Delegation rules for the following providers, as well as user the accounts needed for providers like createApp and recycleApp that need elevated privileges.

These are the rules you will have in the Management Service Delegation UI in IIS Manager after you install this component:

Notice that Web Deploy setup created two new local user accounts:

– WDeployConfigWriter, which has Write permissions to the IIS server’s applicationHost.config. This is used by delegation rules for createApp, appPoolNetFx and appPoolPipelineMode.

– WDeployAdmin, which is an administrator. This is used by delegation rules for recycleApp.

If you prefer to create these rules by hand, uncheck the component in the installer. We also provide a PowerShell script for creating delegation rules (more on this later in the post) if you prefer that route.

Well armed with this information we go have a look at the Management Service Delegation:

ManagementServiceDelegation

 

Where we indeed find createApp, appPoolNetFx and appPoolPipelineMode:

ManagementServiceDelegationWebdeployconfig

 

So now we take a look a bit what we can configure here and  sure enough, by double clicking on them the Edit Rule form:

ManagementServiceDelegationWebdeployconfigSettings

 

So we click on Edit security credentials and are welcomed by this form:

ManagementServiceDelegationWebdeployconfigSettingsPW1

 

So we enter the account name and the new password we set before (remember to do this for both providers):

ManagementServiceDelegationWebdeployconfigSettingsPW2

 

Guess what, end user happy, things are working again. Jay! From service down report to helpdesk to fully operational again in less than an hour with a technology new to the service desk. Well done young, eager, smart and upward mobile IT Pro Winking smile with lessons learned.

How did this happen and did they end up with this funky configuration (expiring password of an account that no one knows where it is used for and where configured)? Aha, operational control => know the configuration of what you use and know why it is configured that way and where it’s configured. Is it a mistake/assumption in the installer that the accounts WDeployConfigWriter and WDeployAdmin have their passwords set to expired and can be changed by the user or did somebody mess with them after the install? Well I did the test by setting it up on a test server and found that they are indeed installed with their passwords set to expire and that the password can be changed by the user. It assumes that the person doing the install knows and realizes the implications. I’m not saying either setting is wrong but you should know why, when and where. There is no documentation on this as far as we could find right now and perhaps the installer should mention the benefits/risks of both types of configuration and ask what to choose. This, together with better documentation, could help prevent this issue. As always, no guarantees given Winking smile 

Overall lesson: don’t assume things, trust but verify …

Free Support Rant


<rant>

I blog and help out in news groups because I like to share ideas, solutions and help out when and where I can. I’m active on twitter because I enjoy the discussions, the out loud thinking and the reflection we all get of just throwing ideas, conclusions, opinions, experiences and knowledge in a pool of diverse but very skilled passionate IT Professionals and Developers.

It is not always easy to share information. The potential complexity of environments that may well have other issues and restrictions in combination with the vast amount of possible configurations and designs, both valid and ill advised, make it near to impossible to cover all eventualities. If one of my blog posts does not contain the answer to your specific problem or does not apply to your particular situation, do not complain & moan about it, let alone demand of me to come up with a solution. What is written here are bits and pieces of information which I choose to share because I think they have some value and can help other people out.  I do this in my own time. Really, I am not paid to blog, research technologies or build labs. I do this out of my own interest and because I enjoy it and it has value to me in my own work. I work a lot of hours “for a boss” and those are not always the most esoteric. When you read my “About” page you’ll read the following:

I’m still in the trenches with my boys and gals. Empty suits or hollow bunnies are neither wanted nor needed. In IT you live by the sword and you die by the sword. There is no hiding when you mess up, all our mistakes are in plain sight of everyone using what we build.

That is my reality and I live by it. Perhaps others should try this.  I’ve seen to many ICT “gods” come down from heaven for a short while pushing their latest religion or product. Loudly proclaiming it is the truth and the only way forward. Failure to achieve success is always due to a lack of faith with us subjects, our (at best) mediocre skills or because we have to wait and see the benefits,  much later in time, but we need to keep the faith. When the shit hits the fan those gods are back on the Olympus, pushing daggers into the back of us infidels who couldn’t make it work. No thank you. I think the people I work with know the  strengths and weaknesses of both my self or my solutions. I have however never ever left them out in the cold when something didn’t work out as planned or when things failed. Yes, eventually things, big and small, do fail. How you try and prevent that as much as possible and how you deal with it when it happens is what makes a huge difference. That’s where my professional responsibilities lie, not with some Microsoft bashing, impolite, wannabe who thinks insulting me is a good approach to getting me to solve their issues with a Microsoft product. You know the type, they open a pack of “M$ Sucks Quick Mix” to try and get some “Instant credibility” and fail miserably, they even fail at asking for help.

I am not your free support desk, your dedicated Microsoft technology research engineer or trouble shooter. I’m an IT Pro with a busy job. I think certain people out there need to learn that you can catch more flies with honey than with vinegar. Don’t be a “jerk”.

<\rant>