In Exchange 2003, we were given a single high availability model to follow, clustering. In 2003, you created what was called an “Exchange Virtual Server” or EVS, that consisted of a number of “nodes” or servers, that presented a singular "server” to the clients to connect to.
Essentially you would have two separate servers, attached to a single copy of the Exchange files. This is made possible through the use of a SAN, that allows both physical servers to “see” the same database files. The availability this would allow would be a typical Active/Passive situation, one node is Active, meaning it has control of the EVS and presents itself as the Exchange server the clients connect to, and one is passive, not doing anything.
If the active node were to fail, such as a blue screen, or someone accidently pulled the power plug, the passive node would detect that failure, and assume the role of the EVS, and mount the database’s and continue servicing clients. There would be a time gap where the database’s were unavailable. This would depend on the amount of storage groups and database’s, but typically would take about 30 seconds to 2 minutes before services were restored. A client in cached mode would never even know there was a failure.
In Exchange 2007, this type of cluster is still available, it’s known as a Single Copy Cluster (SCC). Exchange 2007 also introduced a new type of cluster, called a Continuous Copy Replication Cluster (CCR). Also to note, Microsoft has changed the name of the EVS to CMS (Clustered Mailbox Server).
What is the difference? Well, lets look at the diagrams below:
On the left we have a SCC, note how there is one copy of the Exchange files. The two separate physical machines, Node1.domain.test and Node2.domain.test, are both part of the cluster, and have access to the shared Exchange files through the use of a SAN or some type of approved shared storage. In our scenario, one of the node’s, lets say Node1.domain.test, has control of the CMS known as MB01.domain.test. Notice how the icon is a “ghost” server? That’s because the server doesn’t physically exist. Node1.domain.test presents itself as MB01.domain.test. If Node1.domain.test fails, Node2.domain.test will take control of MB01.domain.test and present itself as the CMS.
Now, lets look at the right side of the screen. We still have the two nodes, and the CMS. But notice we have TWO copies of the Exchange files, one for each respective node. We no longer have a single shared copy of the data, each node keeps its own copies. In a CCR cluster, you have can a maximum of two total nodes in the cluster, and it has to be set up in an Active/Passive setup.
So, again, Node1.domain.test has control of the CMS MB01.domain.test. If it fails, the role fails over to Node2.domain.test, which takes control of the CMS MB01.domain.test. It is important to note that BOTH nodes ALWAYS have their respective databases mounted and operational, but only one has control of the CMS.
So now, the question is how do they keep both databases up to date? The worst thing is to have database divergence, or a situation where the passive node is weeks behind the active node. The way both are kept up to date is through log shipping and seeding.
Seeding, is the copying of the initial database. If you have a 10 GB database on Node1, you “seed” or copy that database to the passive node so their copies are up to date.
You may notice that transactional log files are no longer 5 MB as they were in Exchange 2003, but are now 1 MB in size. The active node, after it has taken the data in a transaction log, and wrote that data to the actual database, will “ship” or copy that transaction log over to the passive node, so that the passive node can update it’s database is up to date, or close to that of the the active node.
This type of clustering is essentially a “Share Nothing Cluster”. This means several things, most of which were either vulnerabilities with Single Copy Clusters, or a barrier to implementation:
- You now have multiple copies of the database. In a SCC, if the database is corrupted, you HAVE to restore from backup. There is no alternative copy to mount like a CCR.
- A SAN, or any type of shared storage is absolutely not required. This means you can implement CCR utilizing direct attached storage
- You can take VSS or snapshot backups of the Passive node. This allows you to offload any backup and verification jobs to the passive node, meaning you can take more backups with no impact to end users. A successful backup on the passive node will force a truncation of the transaction logs on both the passive and active nodes!
There are still some requirements to implementing a CCR cluster, some of which are general rules of Microsoft Cluster Services, and some of Exchange:
- You can only have a single database per storage group to install a CCR cluster. If you have a public folder database, it has to be in its own storage group, and the only public folder database in your organization.
- You still need a separate “heartbeat” network that the servers use to communicate with each other to detect if one of the nodes has failed.
- If using Windows Server 2003, the nodes have to be on the same network (Windows Server 2008 has added features for its Clustering software and can be used across subnets).
- A CCR node can only host the Mailbox server role. This means that you need a minimum of three physical servers if you are going to install a CCR cluster. Two nodes each part of the CCR, and than one more server hosting both the Client Access Server and Hub Transport Server roles.
- Paths for Exchange files, as well as drive letters and/or mount points should be identical on both nodes.
- OS’s should match patch level and version.
So that is a 10,000 Foot view of a CCR cluster. In the next article, we will actually install the physical CCR cluster. Stay tuned!!!