Virtualization with Hyper-V & The NUMA Tax Is Not Just About Dynamic Memory


First of all to be able to join in this little discussion you need to know what NUMA is and does. You can read up on that on the Intel (or AMD) web site like http://software.intel.com/en-us/blogs/2009/03/11/learning-experience-of-numa-and-intels-next-generation-xeon-processor-i/ and http://software.intel.com/en-us/articles/optimizing-software-applications-for-numa/. Do have a look at the following SQL Skills Blog http://www.sqlskills.com/blogs/jonathan/post/Understanding-Non-Uniform-Memory-AccessArchitectures-(NUMA).aspx which has some great pictures to help visualize the concepts.

What Is It And Why Do We Care?

We all know that a CPU contains multiple cores today. 2,4,6,8,12,16 etc. cores. So in terms of a physical CPU we tend to talk about a processor that fits in a socket and about cores for logical CPUs. When hyper threading is enabled you double the logical processors seen and used. It is said that Hyper-V can handle hyper threading so you can leave it on. The logic being that it will never hurt performance and can help to improve it. I suggest you test it Smile as there was a performance bug with it once.  A processor today contains it own memory controller and access to memory from that processor is very fast. The NUMA node concept is older than the multi core processor technology but today you can state that a NUMA node translates to one processor/socket and all cores contained in that processor belong to the same NUMA node. Sometimes a processors contains two NUMA node like the AMD 12 core processors. In the future, with the ever increasing number of cores, we’ll perhaps see even more NUMA nodes per processor. You can state that all Intel processors since Nehalem with Quick Path Interconnect and AMD processors with Hyper-Transport are NUMA processors. But To be sure, check with your vendors before buying. Assumptions right?

Beyond NUMA nodes there is also a thing called processor groups which help Windows to use more than 64 logical processors (its former limit) by grouping logical processors into groups of which Windows handle 4 meaning in total Windows today can support 4*64=256 logical processors. Due to the fact that memory access within a NUMA node is a lot faster than between NUMA nodes you can see where a potential performance hit is waiting to happen. I tried to create a picture of this concept below. Now you know why I don’t make my living as a graphical artist Eye rolling smile

imageimage

 

To make it very clear NUMA is great and helps us in a lot of ways. But under certain conditions and with certain applications it can cause us to take a (serious) performance hit. And if there is anything certain to ruin a system administrators day than it is a brand new server with a bunch of CPUs and loads of RAM that isn’t running any better (or worse?) than the one you’re replacing. Current hyper visors like Hyper-V are NUMA aware and the better servers like SQL Server are as well. That means that under the hood they are doing their best to optimize the CPU & memory usage for performance. They do an very good job actually and you might, depending on your environment never, ever know of any issue or even the existence of NUMA.

But even with a NUMA knowledgeable hyper visor and NUMA aware applications you run the risk of having to go to remote memory. The introduction of Dynamic Memory in Windows 2008 R2 SP1 evens increases this likelihood as there is a lot of memory reassigning going on. Dynamic Memory actually educated a lot of Hyper-V people on what NUMA is and what to look out for. Until Dynamic Memory came on the scene, and the evangelizing that came with it by Microsoft, it was "only" the people virtualizing  SQL Server or Exchange & other big hungry application that were very aware of NUMA with its benefits and potential draw backs. If  you’re lucky the application is NUMA aware, but not all of them are, even the big names.

A Peak Into The Future

As it bears on this discussion, what is interesting that leaked screenshots from Hyper-V 3.0 or vNext  … have NUMA configuration options for both memory and CPU at the virtual machine level! See Numa Settings in Hyper-V 3.0 for a picture. So the times that you had to script WMI calls (see http://blogs.msdn.com/b/tvoellm/archive/2008/09/28/looking-for-that-last-once-of-performance_3f00_-then-try-affinitizing-your-vm-to-a-numa-node-.aspx) to assign a VM to a NUMA node might be over soon (speculation alert) and it seems like a natural progression from the ability to disable NUMA with W2K8R2SP1 Hyper-V in case you need it to avoid NUMA issues at the Hyper-V host level. Hyper-V today is already pretty NUMA aware and as such it will try to get all memory for a virtual machine from a single NUMA node and only when that can’t be done will it span across NUMA nodes. So as stated, Hyper-V with Windows Server 2008 R2 SP1 can prevent this form happening as we can disable NUMA for a Hyper-V host now. The downside is that you can’t get more memory even if it’s available on the host.

NumaSpanning

A working approach to reduce possible NUMA overhead is to limit the number of CPUs to 2 as this gives the largest amount of memory to the CPUs, in this case 50%. 4 CPUs only control 25%, etc.So with more CPU (and NUMA nodes) the risk of NUMA spanning is getting bigger very fast. For memory intensive applications scaling out is the way to go. Actually you could state that we do scale up the NUMA nodes per socket (lots of cores with the most amount of direct accessible memory possible) and as such do not scale up the server. If you can keep your virtual machines tied to a single CPU on a dual socket server to try and prevent any indirect memory access and thus a performance hit. But that won’t always work. If you ever wondered when an 8/12/16 core CPU comes in handy, well voila … here a perfect case: packing as many cores on a CPU becomes very handy when you want to limit sockets to prevent NUMA issues but still need plenty of CPU cycles. This should work as long as you can address large amounts of RAM per socket at fast speeds and the CPU internally isn’t cut up into to many multiple NUMA nodes, which would be scaling out NUMA node in the same CPU and we don’t want that or we’re back to a performance penalty.

Stacking The Deck

One way of stacking the deck in your favor is to keep the heavy apps on their own Hyper-V cluster. Then you can tweak it all you want to optimize for SQL Server, Exchange, … etc. When you throw these virtual machines in your regular clusters or for crying out loud on a VDI cluster your going to wreak havoc on the performance. Just like mixing server virtualization & VDI is a bad idea (don’t do it), throwing vCPU hungry, memory hogging servers on those cluster is just killing of performance and capacity of a perfectly good cluster. I have gotten into arguments over this as some thing one giant cluster for whatever need is better. Well no, you’ll end up micro managing placement of VM with very different needs on that cluster effectively “cutting” it up in smaller “cluster parts”. Now is separate clusters for different needs always the better approach? No, it depends, If you only have some small SQL Server needs you  get away with one nice cluster. It depends, I know, the eternal consultants answer, but I have to say it. I don’t want to get angry mails from managers because someone set up a 6 node clusters for a couple of SQL Server Express databases Winking smile There are also concepts called testing, proof of concept, etc. It’s called evidence based planning. Try it, it has some benefits that become very apparent  when you’re going to virtualize beefy SQL Server, SharePoint and Exchange servers.

How do you even know it is happening apart from empirical testing. Aha, excellent question! Take a look at the "Hyper-V VM Vid Numa Node" counter set and read this blog entry by on this subject http://blogs.msdn.com/b/tvoellm/archive/2008/09/29/hyper-v-performance-counters-part-five-of-many-hyper-vm-vm-vid-numa-node.aspx. And keep an eye on the event log for http://technet.microsoft.com/hi-in/library/dd582929(en-us,WS.10).aspx (for some reason there is no comparable entry for W2K8R2 on TechNet)

Conclusions

To conclude, all of the above people is why I’m interested in the some of the latest generation of servers. The architecture of the hardware allows for a the processor to address twice the "normal" amount of memory when you only put dual CPUs on a quad socket motherboard. The Dell PowerEdge R810 and the M910 have this and it’s called a FlexMemory Bridge and that allows more memory to be available without a performance hit. They also allow for more memory per socket at higher speeds. If you put a lot of memory directly addressable to one CPU you see a speed drop. A DELL R710 with 48 GB of RAM runs at 1033 MHZ  but put 96 GB in there and you fall back to 800 Mhz. So yes, bring on those new quad socket motherboards with just 2 sockets used, a bunch of fast direct accessible memory in a neat 2 unit server package with lost of space for NIC cards & FC HBAs if needed. Virtualization heaven :-) That’s what I want so I can give my VMs running SQL Server 2008 R2 & "Denali" (when can I call it SQL Server 2012?) a bigger amount of direct accessible memory form their NUMA node. This can be especially helpful if you need to run NUMA unaware applications like SAP or such. Testing is the way to go for knowing how well a NUMA aware hyper visor and a NUMA aware application figure out the best approach to optimize the NUMA experience together.  I’m sure we’ll learn more about this as more and more information becomes available and as technology evolves.  For now we optimize for performance with NUMA where we can, when we can with what we have :-) For Exchange 2010 (we even have virtualization support for DAG mailbox servers now as well) scaling out is easier as we have all the neatly separate roles and control just about everything down to the mail client. With SQL Server applications this is often less clear. There is a varied selection of commercial and home grown applications out there and a lot of them can’t even scale out, only up. So your mileage of what you can achieve may vary. But for resource & memory heavy applications under your control, for now, scaling out is the way to go.

A Brighter Future For Public Folders?


The Exchange Team posted a blog entry asking for feedback on how we use public folders. Nice to see they are taking an interest again. The past 4 years the mantra was “move away from them”, “do it now while you still have the time”, etc. SharePoint was always put forwards as number one replacement option. For some scenarios this is indeed a good choice but let’s face it, for some public folder uses there is no decent replacement and that hurts us as they haven’t seen any decent improvements in the last 2 Exchange releases. I know public folders have always been a bit problematic and finicky for us administrators. They tend to need a bit of voodoo and patience to trouble shoot and get running smoothly (see  blog post by me for an example of this). But instead of using that of an excuse to get rid of them they could also choose to invest in making them as reliable and robust as mail databases. Giving them the same high availability features might also be a welcome improvement, especially now with DAGs in Exchange 2010.

Especially in the Exchange 2007 era Microsoft was promoting getting rid of them actively. But they are still around because so many people use them and they have not decent alternative for all scenarios. In that respect they do listen to their customers. But we want improvements. Some of the functionality we need is there but we really need more robust, reliable and high available public folders. As as shared mail instrument for both sending and receiving mail in a team public folders beat shared mailboxes and SharePoint any time.  It also shines for maintaining a shared repository of contacts. I’m not a proponent of using public folder for a document repository but I understand that its relative simple usage and data protection via replicas still sounds attractive to some versus the complexity of SharePoint. Sure SharePoint has more to offer but perhaps they don’t need those capabilities and to make matters even less attractive; it’s quite an effort to migrate from public folders to SharePoint.

So that left us public folders users feeling a bit abandoned with a message of get out but no easy path to go anywhere else that serves all our needs. So until today all my customers are still and want to  keep using public folders. They are a worried however that one day they will be left out in the cold. But perhaps there is a better future on the horizon for public folders.  They are asking us to “Help us learn more about how you use public folders today!” in that blog post. The emphasis is on “usage scenarios, folder management habits or thought process around public folder data organization”. So if you need and use public folders in any way and you’d like for them to get more attention and evolve into more robust and functional instruments give Microsoft your feedback. Exchange 2010 has brought us great features & very affordable high availability together with support for virtualization. Now we either need a better alternative to public folders than the ones we got now or (my preference) we need better public folders. Since consumption of public folders occurs mostly in Outlook I would suggest the latter. And while we’re asking, bring back access to folder shares in OWA Winking smile.

Exchange 2010 SP1 Rollup 3 Pulled – BlackBerrys sending duplicate messages


Just a quick notification. Due to the duplicate message issue with RIM Blackberry devices and Exchange 2010 Sp1 Rollup 3 Microsoft is temporarily pulling RU3. If you don’t use BES and have no other issues, don’t sweat it. If you wanted RU for UDP support with Outlook 2003 or to fix the DAG Copies GUI bug you’ll have to wait especially if you have Blackberry devices. More the the Exchange Team Blog here.

Exchange 2010 SP1 Rollup 3 Released: Fixes Bug since SP1 in EMC & Brings Back UDP Support


UPDATE March 9th 2011: I have installed Exchange 2010 SP1 Rollup 3 at one site and this did indeed fix this issue finally.

The Microsoft Exchange Team Blog just announced the release here Released: Update Rollup 3 for Exchange 2010 SP1 and Exchange 2007 SP3. This is good news for all the folks out there that got bitten by the Exchange 2010 SP1 bug that causes the Exchange Management Shell (EMC) not to show all database copies after upgrading to exchange 2010 SP1. I’ve blogged about this in EMC Does Not Show All Database Copies After Upgrade To Exchange 2010 SP1 and chimed in to the discussion at Database copies are not all showing up in EMC after SP1 upgrade on the Exchange forums. So apart from cheers for the UDP notifications returning in support of Outlook 2003 let’s hear it for a the EMC case sensitivity bug getting fixed Smile

After while Microsoft also blogged about this Database copies fail to display after upgrading to Exchange 2010 Service Pack 1

We got notified around October 13th that they would included the fix in Exchange 2010 SP1 Roll Up 3 but that they where working on an interim update. They dropped the ball there because communication died about the latter and we were left to conclude we would have to wait for Rollup 3. Well that took it’s time. It’s now march 2011. One of the reasons I think it took so long for Rollup 3 to arrive is the decision for to re-add UDP support for Exchange 2010 for use with Outlook 2003 as blogged about in Microsoft Listens To Customers & Adds UDP Notification Support Back to Exchange 2010

In the ends we will have silly and long unaddressed bug fixed and a welcome aid in migrating customers to Exchange 2010 that are running Outlook 2003. I do wonder however if the bug had been with  PowerShell in the EMS and not in the EMC if Microsoft would have fixed this sooner.  Sure it wasn’t an issue as you could manage everything perfectly using PowerShell and it was only a GUI bug but for some users/customers this is not as obvious  and it made ‘m feel a bit like 2nd class citizens so we had to do some extra “damage” control on that front as well.

Microsoft Listens To Customers & Adds UDP Notification Support Back to Exchange 2010


Well, after almost 14 months of deploying Exchange 2010 and tweaking the Outlook 2003 settings via GPO’s to give users an acceptable experience Microsoft adds support for User Datagram Protocol (UDP) notification functionality back into Microsoft Exchange Server 2010. By doing so they recognize that a lot of businesses & organizations will be using Outlook 2003 for a while and that not all of them where happy to deal with the way Outlook 2003 functions with Exchange 2010. More information on the UDP issue can be found here http://support.microsoft.com/kb/2009942 (In Outlook 2003, e-mail messages take a long time to send and receive when you use an Exchange 2010 mailbox). Now most my customers use cached mode where possible and a GPO Setting to reduce the Maximum Polling Frequency registry entry to 5 seconds helped. But there are places where cached mode is not an option (Terminal Services) or people don’t accept this change in behavior and go with Outlook 2007 instead of 2010  or even choose to deploy Exchange 2007 over 2010. All because of this dropping of the UDP notification support.

Now this functionality will be back with in Exchange Server 2010 Service Pack 1 Roll-Up 3 (SP1 RU3).  Good news for people dealing with Outlook 2003 and Exchange 2010. Less good news for the people dealing with the GUI bug that Exchange 2010 SP1 introduced where the Exchange Management Console does not show all database copies after upgrading to Exchange 2010 SP1. This is set to be fixed in Roll-Up 3 but to get the UDP support back they adjusted the release schedule for the E2K10 Sp1 Roll-Up 3, which is now expect to be released in March 2011. So we’ll have to wait a bit longer for that fix. As you noted you need to be running Exchange 2010 SP1 to get this backward compatibility support for outlook 2003.

Read this announcement on the Exchange Team Blog: UDP Notification Support Re-added to Exchange 2010

Exchange 2010 SP1 Public Folder High Availability Returns with Roll Up 2


Al lot of people were cheering in the inter active session on Exchange 2010 SP1 High Availability with Scott Schnoll and Ross Smith of the Exchange Team. They announced (between goofing around) that the alternate server that provides failover to the clients (so they can select another public folder database to connect to) for public folders and that is sadly missing from Exchange 2010 would return with Exchange 2010 SP 1 Roll Up 2. This feature is needed by Outlook to automatically connect to an alternate public folder and it’s return means that high availability will finally be achievable for public folders in Exchange 2010 SP1. That’s great news and frankly an “oversight” that shouldn’t have happened even in Exchange 2010 RTM. The issue is described in knowledge base article “You cannot open a public folder item when the default public folder database for the mailbox database is unavailable in an Exchange Server 2010 environment” which you can find here  http://support.microsoft.com/kb/2409597.

In previous versions of exchange you made public folders highly available to Outlook clients by having replica’s. The Outlook clients could access an replica on another server if the default public folders as defined in the client settings of the database was not available. Clustering in Exchange 2010 does nothing for public folders. In Exchange 2010 the Outlook clients connect directly to the mailbox server in order to get to a public folder so they do not leverage the CAS or CAS array. Also the DAG does not support public folders and as clustering happens at the database level on DAG members and no longer at the server level we no longer get any high availability for the clients with clustering in Exchange 2010. Sure, if you have multiple replica’s the data is highly available but the access to another replica/database/server for public folder doesn’t happen automatically in Outlook when you’re running Exchange 2010. To make that happen you need an alternate server to be offered to the client for selection But as this feature is missing in Exchange 2010 up until SP1 Roll Up 1 in reality until now you need to keep using Exchange 2003/2007 to have public folder high availability.  Exchange 2010 SP1 Roll Up 2 will change that. I call that good news.

Exchange 2010 Public Folder Worries At Customer: No existing ‘PublicFolderProxyInformation’ matches the following Identity


A customers was recently using the EMC GUI in their Exchange 2010 environment, having a look a the public folder properties when they got this error:

—————————
Microsoft Exchange
—————————
Can’t log on to the Exchange Mailbox server ‘DAGMBX.demolab.com’. No existing ‘PublicFolderProxyInformation’ matches the following Identity: ‘\demolab\HeadQuarters\FincanceDepartment\FiscalUnit’. Make sure that you specified the correct ‘PublicFolderProxyInformation’ Identity and that you have the necessary permissions to view ‘PublicFolderProxyInformation’.. It was running the command ‘Get-MailPublicFolder -Identity ”\demolab\HeadQuarters\FincanceDepartment\FiscalUnit” -Server ‘DAGMBX.demolab.com”.
—————————
OK  
—————————

image

Hey … when did this start?  They never complained about this before, but did they ever use it.This probably was actually the first time they tried to look/edit the public folder permissions after doing the following over the past month and in this particular order:

  1. Moving to Exchange 2010 SP1
  2. Removing the last Exchange 2007 servers from the organization.

Now I know about a bug that exist and that was recently blogged about by Dan Rowley in Exchange 2010 get-mailpublicfolder \name returns No existing ‘PublicFolderProxyInformation’. The point is that there should be a mailbox database mounted on the server that has the System Attendant mailbox associated with it.  However, this is not the case here.  The mailbox servers are member of a DAG and all of them host a copy of the PF. The replication runs fine, users can work with them, the remaining Outlook 2003 users report no issues. But there is more in that blog: “Basically the work around is to mount a mailbox store on the server that is generating the error, or if there is a database already mounted – verify the system attendant is properly configured to point to a valid homemdb.” Now that last point is interesting and indeed that was the issue here. On two members of the DAG the homeMDB attribute was not set. Now what could be the root cause of this? I don’t know, certainly not in this case. All things have been done by the book … Ah well, luckily the fix is not very difficult. We need to put a valid entry in the homemdb. In this case we’ll take the value of the DAG member that had it filled in. This seems to be the most recently created database in the DAG. In Exchange 2010 this is done as described below. Note we have a DAG here, so we can work with any database that has a valid copy on the server(s) in question.

How to check the homeMDB attribute value:

  • Start ADSI Edit and navigate to CN=Configuration,DC=,DC=,DC=/Services/Microsoft Exchange//Administrative Groups/Exchange Administrative Group (FYDIBOHF23SPDLT)//Servers/MBXServerWithIssue
  • Right-click Microsoft System Attendant, and then click Properties to display the  Attributes list and find the homeMDB attribute.
  • If the homeMDB attribute has a value make sure  it points to a valid mailbox database. If the value of the homeMDB attribute is empty (not set) or incorrect you need to fix this.

image

How Fix the homeMDB attribute value:

  • In ADSI Edit navigate to Start ADSI Edit and navigate to CN=Configuration,DC=,DC=,DC=/Services/Microsoft Exchange//Administrative Groups/Exchange Administrative Group (FYDIBOHF23SPDLT)/Databases."
  • Right-click a mailbox database that is local (NON DAG) or has a valid copy on the server (DAG) , select Properties and in  the Attributes list, select the distinguishedName, and then click View.
  • Copy the value of the distinguishedName attribute and close the dialogs

image

NOTE in this particular case we can copy the value that was filled in the homeMDB attribute on one of the DAG members. You might not have one set in any.

  • Right-click Microsoft System Attendant, and then click Properties to get to the Attributes list, click homeMDB, and then choose Edit
  • In the Value box, paste the value that you copied form the distinguishedName attribute
  • Close the dialog boxes and exit ADSI Edit

When you’ve don this you’ll find following entry in the application event viewer:

Log Name:      Application

Source:        MSExchangeSA

Date:          11/2/2010 3:25:59 PM

Event ID:      9159

Task Category: General

Level:         Warning

Keywords:      Classic

User:          N/A

Computer:      DAGMBX.demolab.com

Description:

Microsoft Exchange System Attendant has detected that the system attendant object in the DS has been modified. System Attendant needs to restart the Microsoft Exchange Free Busy Publishing Service.

image

After that, I wait 10 minutes to get AD replicated and make sure to close the EMC and start it again and voila, it’s fixed.

No ADSI Edit required to fix “Object is read only because it was created by a future version of Exchange: 0.10 (14.0.100.0). Current supported version is 0.1 (8.0.535.0).”


During the removal of the last Exchange 2007 SP3 Mailbox server after completing the transition of Exchange 2007 to Exchange 2010 SP1 we ran into the following well known error: Object is read only because it was created by a future version of Exchange: 0.10 (14.0.100.0). Current supported version is 0.1 (8.0.535.0).

 

image

The issue is that due to the coexistence of Exchange 2007 & Exchange 2010 we can no longer remove the public folder database with the Exchange 2007 GUI (EMC). But the public folder is not visible in the Exchange 2010 GUI (EMC) as it lives on an Exchange 2007 server. Trying to remove the public folder database manually using the Exchange 2007 GUI confirms this, you’ll get the same error.

This error has been described in some blogs as early as October 2009 on http://www.proexchange.be/blogs/exchange2010/archive/2009/10/28/remove-exchange-2007-mailbox-role-fails-with-error-object-is-read-only-because-it-was-created-by-a-future-version-of-exchange-0-10-14-0-100-0-current-supported-version-is-0-1-8-0-535-0.aspx and later on as recently as October 2010 on http://www.howexchangeworks.com/2010/10/object-is-read-only-because-it-was.html

The described solution/work around in these blogs get the job done perfectly, using ADSI Edit to delete the offending Exchange 2007 public folder database. It wouldn’t be the first time ADSI Edit saves an Exchange Consultants proverbial bacon. But if it can be done without using it I often recommend not to do it.  I’ve seen to many over eager deletions in ADSI Edit get people into trouble (like deleting a public folder database before it could be dumped safely without data loss).

For this problem, it’s not required to use ADSI Edit to get rid of the public folder on the Exchange 2007 Mailbox server. You can just fire up the Exchange Command Shell (EMS) in Exchange 2010 and execute following PowerShell command:


Remove-PublicFolderDatabase "E2K7MBX\SGPublicFolders\StoreSGPublicFolders"

Confirm   4: Are you sure you want to perform this action?5: Removing public folder database "E2K7MBX\SGPublicFolders\StoreSGPublicFolders".
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [?] Help (default is "Y"): y
WARNING: The specified database has been removed. You must remove the database file located in K:\E2K7Data\SGPublicFolders\PublicFolderDatabase.edb from your computer manually if it exists. Specified database: PublicFolderDatabase

This works just fine. I have no objections using ADSIEdit when needed but I don’t advise using it to others unless really necessary.In this case it just isn’t needed to fix the problem. For good measure I also deleted the storage group in which the public folder lived. After that the install went well end without any issues.

Workaround for Exchange 2010 & Outlook 2003 Shared Calendars Connectivity Issues "The connection to the Microsoft Exchange Server is unavailable. Outlook must be online or connected to complete this action."


This is a rather long blog about an issue we had during an Exchange 2007 to Exchange 2010 transition project @ a 1.500 mail boxes sized company. This was last summer but I needed to find the time to blog this so that it might help people who also need to find a solution. We ran in to this very annoying situation for users who still have outlook 2003 (SP3, fully patched) and were moved over to Exchange 2010 Roll Up 4. The users have rather a lot of shared calendars open & after accessing some of them (the sweet spot seemed to be around 6 to 7) they got this error:

"The connection to the Microsoft Exchange Server is unavailable. Outlook must be online or connected to complete this action."

Edit November 15th 2010: Microsoft released a KB article http://support.microsoft.com/kb/2299468 titled  “Error message when Outlook 2003 clients try to open multiple shared calendars in Exchange Server 2010: "The connection to the Microsoft Exchange server in unavailable. Outlook must be online or connected to complete this action" on this issue that is based on the work done on this case. The support engineer who worked on this case with me notified me by mail that the article had been released. Thanks Kovai! They only mention the throttling Policy as the /resetnavpane is not always required, this depends on the environment history. 

Edit February 3th 2011: The support engineer published a very content rich blog about all this and other possible causes of the error notification on his TechNet blog: Things you need to know about “The connection to the Microsoft Exchange server in unavailable. Outlook must be online or connected to complete this action” prompts in an Outlook 2003–Exchange 2010 world. It’s a must read for all people dealing with this error message.

When the user closes and reopens Outlook 2003 access sometimes worked for the calendar that threw the error before but than other calendars throw the error. So it becomes game of closing and opening Outlook 2003 in the hope you can open the desired calendar. This does not make users very happy as you can imagine. We stopped the migration to Exchange 2010 for users still on Outlook 2003. On the Exchange 2010 forums you’ll find http://social.technet.microsoft.com/Forums/en/exchange2010/thread/93db8cd7-3380-443f-8dbe-fbfb79cd9978 and http://social.technet.microsoft.com/Forums/en/exchange2010/thread/85c75bae-93dc-453e-be28-0425de3d5227 which both discuss this problem. It’s also pretty random. Sometimes it’s a shared calendar of a user on Exchange 2010 or on Exchange 2007. There is also a mention of this issue on others sites and one of them mentioned a private discussion with Microsoft about a fix coming in E2K10 Roll Up 5.I asked MS support later on if they had any knowledge of this and they said that they had none what so ever. By the way SP1 for Exchange 2010 arrived before we even got to roll up 5 for Exchange 2010 RTM.

We investigated and searched for a solution. One very promising work around was related to the throttling mechanism in Exchange 2010.

See also http://blogs.msdn.com/b/pepeedu/archive/2010/01/13/exchange-2010-client-access-throttling.aspx for more information on this. This is a good right up as well: http://eightwone.wordpress.com/2010/06/22/exchange-2010-throttling-policies-rtm-sp1/

The default throttling policy limits the RPC connections to 20. Now when Outlook 2003 opens a shared calendar it consumes a RPC connection. It doesn’t release that until outlook is closed. We taught we had a winner here as this could lead to the error while sending and receiving mail keeps working.

So in an attempt to fix the problem we tried this:

New-ThrottlingPolicy –name Outlook2003Calendar

Set-ThrottlingPolicy –identity Outlook2003Calendar –RCAMaxConcurrency 100

Set-Mailbox –Identity “annoyed user” –ThrottlingPolicy Outlook2003Calendar

We’re supposed to have some patience before this would work, due to Active Directory replication so we did. We even left it overnight. No joy. We also tried disabling throttling by using $NULL. Unfortunately this didn’t work either. We verified if the throttling was indeed the issue by counting the connections for a troublesome Outlook 2003 user and found that the error even occurred below the throttle limit. You can figure out the number of RPC connections by using:

Get-LogonStatistics –Identity <annoyed usser> | fl applicationid

Well we ran out of ideas (can you believe that?!) and so we called Microsoft Support to log a case. One of the goodies with our TechNet subscriptions via Software Assurance is that we have free support calls! Open-mouthed

The feedback was not coming fast. We also did not get any requests for more information. Not good sign. And when we pinged ‘m for info they acknowledged the bug and told us that it would probably be fixed in Exchange 2010 SP1. They were not 100% certain that our issue was due to this bug. But we did not get any request for further information or diagnostics. They asked us to go to Exchange 2007 SP3. At the time we were on Exchange 2007 SP2 Roll Up 4. There was not really an indication this would fix anything. The users who had the issue were the ones who had been moved to Exchange 2010. But we went along, put in the overtime past office hours and we went to Exchange 2007 SP3. As expected this did not help at all.

Then finally advanced support came in to play, now that was a different ballgame. Lots of questions request for logs, network traces, executing tests both client side and server side related. The engineer was really engaged and was working hard on this. He has my thanks and it took some time to do all the testing and work trough the results.

We found out some interesting things. When a user with a mailbox on Exchange 2007 or 2010 opens his shared calendars using Outlook 2003 the list is different than what you see in OWA or using Outlook 2007 and Outlook 2010. This is due to changes in the way those shared calendars are accessed. I was not able to find out any more details on the subject but this could be due to the fact that Outlook 2003 by default used referenced MDB model when additional calendars/mailboxes were opened. This feature isn’t supported in Exchange 2010 and due to the underlying design server side changes outlook 2003 now establishes more connections than it did with previous exchange versions. I assume it was still supported in Exchange 2007.

With Exchange 2010 the combination of the throttling policy and the changes in how shared calendars are accessed and the fact they contain “persistence information” caused this issue. We created a new throttling policy and i was necessary to delete all shared calendars from outlook, close outlook, start outlook and add them again. Manually this is done one calendar at the time and I pretty tedious. At least you can speed up things by using outlook.exe /resetnavpane. This will delete all the shared calendars for you and saves a lot of time. It also resets the entire navigation pane so any other “issues” lingering around are thrown out as well.

In summary (this an adaption of the support case conclusions. The escalation engineer working on the case was a great guy and he really put in an effort).

Problem:

  • When Outlook 2003 clients whose mailbox is on Exchange 2010 try to open additional calendars (on Exchange 2010 or on Exchange 2007) they get error popup "The connection to the Microsoft Exchange server in unavailable. Outlook must be online or connected to complete this action"

Observations:

  • The popups might happen while opening any additional calendar , no specific pattern on occurrence of pop up message based on specific number of additional calendars opened or a specific additional calendar/calendars being opened.
  • The issue didn’t happen when the mailboxes were on Exchange 2007, issue doesn’t happen with versions of outlook higher than Outlook 2003.
  • All the concerned mailboxes are on Exchange 2010.
  • There is an indication that mailboxes that existed in the organization since Exchange 2000 days face the problem. Mailboxes that were created since the introduction of Exchange 2007 and never had a mailbox prior to that version didn’t have the problem.

Causes:

  1. While the mailbox is on Exchange 2010, outlook 2003 will use Exchange 2010’s Address Book service to query the legacyDN of target shared mailboxes/calendars that needs to be opened and Exchange 2010 returns this as mungedDNs (not typical legacy exchange DNs) which forces outlook 2003 to think it’s a new server every time and establish more connections so eventually the default allowed maximum of 20 connections would exhaust.
  2. Outlook 2003 maintains hidden persistence messages containing various information about the shared mailboxes/calendars, such as the user name and the server where the mailbox resides. Thus it doesn’t have updated information and could very well reach a non-existing server as well.

Solutions:

  1. For the cause1, we set up a custom throttling policy with RCAMaxConcurrency set at 100 and applied this policy on mailboxes still using Outlook 2003.  For the change to take effect immediately, we need to force AD replication, restart of throttling service, RPCClientAccess service on CAS and MBX servers
  2. For the cause2, we either need to remove those stale entries and re-add the shared calendar entries (need to sync the deletion with server, then close and reopen outlook prior to re-adding entries again) or run ‘outlook.exe /resetnavpane’ switch.

Most issues with Outlook 2003 are well documented in KB articles and on the Microsoft Exchange Team Blog (http://msexchangeteam.com/) but not this one and it hurt us when the users complaints came in. We did not catch this one in the labs prior to the transition. We can’t win ‘m all, I know that. But the most obvious solution, upgrade to Outlook 2007/2010, is often not an acceptable option for financial, timing, practical and compatibility reasons. A couple of weeks ago I saw a tweet by Jetze Mellema about what he read somewhere “Friends don’t let friends upgrade to Exchange 2010 when using Outlook 2003.” I wouldn’t go that far. But this shared calendar issue should have been documented by now and made public I think.

New Version of ExFolders that is Exchange 2010 SP1 Compatible


I’ve mentioned the tool Exfolders before in http://workinghardinit.wordpress.com/2010/05/25/exchange-2007-2010-public-folders-issues-the-active-directory-user-wasnt-found/.   It’s a great tool and a worthy successor for PFDAVadmin.Now please note that when you upgrade to Exchange 2010 SP1 you’ll need to update the Exfolders tool as well. You can find the E2K10SP1 compatible version here: http://msexchangeteam.com/files/12/attachments/entry456255.aspx I’m happy to see that the new version of the tool is released in sync with the service pack. It’s a very handy an valuable tool to have. PFDAVadmin users already know this from experience. You can also use it to connect to an Exchange 2007 server but you need to run if on an Exchange 2010 server.

For more information about the tool take a look at this blog post by the Exchange Team: http://msexchangeteam.com/archive/2009/12/04/453399.aspx Don’t forget to read the instructions and follow them, especially regarding the import of the TurnOffSNVerificationForExFolders.reg file or the tool will crash.