A little while back, the Exchange Team released a new recommendation set for the Database Cache Size in Hub Transport Servers for SP1. The recommendation is to change the maximum database cache size to 512 MB, from the default value of 128 MB, and it should help disk I/O substantially. This change is recommended for Hub Transport servers with 4 GB of RAM or more, and they have not tested, nor support yet changing the Max Database Cache Size to anything over 512 MB. Although my testing was done with 4 GB of RAM, I’m sure there would be similar results on a dedicated HT server running 2 GB of RAM or more.
The test they ran, was on a Hub Transport Server that was processing some 21 Messages/Sec. The reduction they saw was noticeable, lowering the total IOPS/message from 26.71 to 11.67, which is pretty impressive.
This lead to a discussion between myself and several other colleagues, regarding, not the legitimacy of Microsoft’s claim, but the necessity in smaller environments. We all worked in much smaller shops at the time, the biggest of us with a headcount of no more than 400 people total. Although the users are unusually heavy in all of our locations, they can’t meet the message standards set forth in the Exchange Team’s article. Our question was, would it make any difference in smaller environment’s such as ours?
The discussion settles around the new transport engine used by the Hub Transport role in Exchange 2007. Hub Transport servers use the same Extensible Storage Engine (ESE) database as Mailbox servers use to store mailboxes, no longer relying on the Windows IIS SMTP service to handle SMTP traffic. Unlike a mailbox server, which can dynamically grow it’s database cache to use any extra available memory the server may have, the Hub Transport server has a hard coded cap to the amount of memory it can use. By default in Exchange 2007 SP1, this value is 128 MB. The limit is controlled by the <add key="DatabaseMaxCacheSize" value="134217728" /> key that’s located in the EdgeTransport.exe.config file. This file is stored by default at C:\Program Files\Microsoft\Exchange Server\Bin. Any changes to this value require a restart of the Microsoft Transport Service to take effect.
For the test, in one of my AD sites, I installed a single Hub Transport server. This server hosted the Client Access, and Hub Transport server roles. The server was installed as a VM on an ESX host, running a DL Proliant 380 G5 host with 16 GB of RAM. The VM had 4 GB of RAM, one virtual core, and it’s hard drives (C and D), were both hosted as VMDK files on an iSCSI attached LUN. Throughout this testing, the VM was the only VM running on the host, and was the only machine with activity on the iSCSI LUN. The transport DB and Logs were hosted on the same VMDK.
For the test, the HT was set to send TLS encrypted email only to a third party retention site, connecting directly through DNS. Initially the server was run with the max database cache size at 128 MB. I monitored the servers for a period of 1 hour using perfmon, and collected the following counters:
Average Disk sec/Read
Average Disk sec/Write
Average Disk sec/Transfer
MSExchange Database\Database Cahce % Hit
MSExchange Database\Database Cache Size (MB)
MSExchange Database\I/O Database Reads/Sec
MSExchange Database\I/O Database Writes/Sec
MSExchange Database\I/O Logs Reads/Sec
MSExchange Database\I/O Logs Writes/Sec
The results of the test at 128 MB Database Cache Size were:
Log Writes/Sec were 32.33
Database Writes/Sec were 10.8
Database Hit % was 95.27
Below is a picture of the perfmon, with the highlighted value being Average Disk Sec/Write:
You will notice a decent amount of disk activity, as depicted by the various high spikes in disk activity.
For the next test, I changed the Max Database Cache size to 512 MB, restart the Transport service and began the test again. Again, I monitored the server for an hour, and collected the above listed counters using perfmon. The results are as follows:
Log Writes/Sec were 10.13
Database Writes/Sec were 2.4
Database Hit % was 98.6.
In the Log Writes/Sec, we saw a 31% reduction, and in Database Writes/Sec, a 22% reduction. Here is the perfmon shot with the same highlighted value:
You can see by the picture, that the spikes are noticeably lower. It seems that the value change, gives a very noticeable reduction in I/O used by the Hub Transport server, even in a smaller environment. Also, the higher database hit % indicates the server did not have to go to disk nearly as much to return cached recipient info.
Overall it seems that in pretty much any environment, the changing of the Maximum Database Cache Size, has a distinct advantage in the terms of disk I/O reduction.