Failover Cluster | Microsoft Windows Server Blog

What’s new in failover clustering: #08 Cloud Witness

Microsoft Windows Server Team — Wed, 10 Aug 2016 16:00:59 +0000

This post was authored by Amitabh Tamhane, Program Manager, Windows Server

Introduction

The primary goal of failover clustering in Windows Server is to provide a reliable infrastructure to make workloads highly available. Configuring quorum correctly for failover clusters is an important step in ensuring high availability for the cluster itself. This helps to make applications hosted on clusters to be highly available. With features like Dynamic Quorum, Dynamic Witness, and Node Vote Tiebreaker, the cluster automatically handles quorum vote calculations to provide the most optimal quorum configuration. When cluster quorum witness is specified, it gives an additional quorum vote for the cluster to toggle as needed, providing highest availability.

The recommendation is to simply always configure quorum witness – which effectively lets the cluster determine when to use the quorum witness vote as needed. This greatly simplifies the cluster quorum configuration. The question is: What type of quorum witness should be configured?

Cloud Witness is a new type of failover cluster quorum witness being introduced in Windows Server 2016. Cloud Witness leverages Microsoft Azure’s Blob Storage to read/write a blob file, which is then used as an arbitration point in case of split-brain resolution.

Benefits of using Cloud Witness

There are significant benefits from this approach:

Leverages Microsoft Azure
- No need for a third separate datacenter when stretching a cluster across datacenters
Uses standard publically-available Microsoft Azure Blob Storage
- No extra overhead of maintaining VMs hosted in the public cloud
Same Microsoft Azure Storage Account can be used for multiple clusters
- Single blob file per cluster with cluster unique id as blob file name
Very low on-going $cost to the Storage Account
- Very small data written per blob file
- Blob file updated only once when cluster nodes’ state changes
Built-in Cloud Witness resource type
- No extra download/installation steps necessary

Single witness type for most scenarios

If you have a failover cluster deployment, where all nodes can reach the internet (by extension Microsoft Azure), it is recommended to configure Cloud Witness as your quorum witness resource. Here are some sample scenarios where the Cloud Witness functionality can be utilized:

Disaster recovery stretched multi-site clusters
Failover clusters without shared storage (SQL Always On, Exchange DAGs, etc.)
Failover clusters running inside Guest OS hosted in Microsoft Azure or Amazon Web Services IaaS VMs (or any other public cloud)
Failover clusters running inside Guest OS hosted on Enterprise, Hoster, Azure Stack Private Cloud VMs (or any other private clouds)
Storage clusters with or without shared storage (Storage Spaces Direct clusters, Scale-out File Server clusters, etc.)
Small branch-office clusters (even 2-node clusters)

Easy to deploy

Our goal when making this feature available was to ensure that it would be a no-brainer to anyone familiar with failover clustering in Windows Server to start using the Cloud Witness option. With that in mind, we made an easy way to deploy Cloud Witness using Failover Cluster Manager GUI or cluster PowerShell:

PowerShell syntax

Set-ClusterQuorum –CloudWitness –AccountName -AccessKey

Microsoft Azure Storage Account considerations

There are a few things you’ll need to consider when using the Cloud Witness option:

Failover cluster will not store the Azure Storage Access Key, but rather it will generate a Shared Access Security (SAS) Token that is generated using the Access Key provided and stores this SAS Token securely.
The generated SAS Token is valid as long as the Access Key remains valid. When rotating the Primary Access Key, it is important to first update the Cloud Witness (on all your clusters that are using that Storage Account) with the Secondary Access Key before regenerating the Primary Access Key.
Cloud Witness uses HTTPS REST interface of the Microsoft Azure Storage Account service which requires the HTTPS port to be open on all cluster nodes.

To try this new feature in Windows Server 2016, download the Technical Preview.

Check out the series:

The post What’s new in failover clustering: #08 Cloud Witness appeared first on Microsoft Windows Server Blog.

What’s new in failover clustering: #07 SMB Multichannel & Multi-NIC cluster networks

Microsoft Windows Server Team — Wed, 03 Aug 2016 16:00:39 +0000

This post was authored by Rob Hindman, Senior Program Manager, Windows Server

Getting the best performance

Building enterprise-grade solutions with Windows Server 2016 is now easier with the new Simplified SMB Multichannel feature in Failover Clustering. A Windows Server 2016 failover cluster will now automatically recognize and configure multiple NICs on the same subnet, greatly simplifying network design and implementation. SMB Multichannel helps customers to leverage high-bandwidth 10GiB, 40GiB, and higher networks. Since both RSS-capable and RDMA-capable NICs can be used, throughput for SMB traffic is greatly improved. The net result is faster solutions that take advantage of modern hardware and are easier to configure.

Both Hyper-converged and Converged cluster configurations are supported. In the Hyper-converged diagram below, there are two physical networks (subnets); the network with multiple NICs in each cluster node can be used for high bandwidth traffic, such as Virtual Machine Live Migration.

In the Converged network diagram below, multiple NICs are used in the network that spans the two clusters (the North-South network) to achieve high performance.

Note that multiple physical networks (subnets) are required to ensure that the failover cluster can continue to function in the event of a switch failure.

Automatic and on by default

No configuration is necessary to use this feature – the cluster will automatically detect and use all the NICs that are present. All of the NICs will be used for cluster heart beating, CSV, and cluster traffic. The cluster will also automatically use the IPv6 Link Local (fe80) IP Address resources on private cluster-only networks. Cluster validation has also been updated to check for multiple NICs on the same subnet.

Further details on the Simplified SMB Multichannel feature in Windows Server 2016 can be found here. A great article about SMB Multichannel can also be found here.

Summary

In summary, the new Simplified SMB Multichannel feature means that Windows Server 2016 failover clusters can automatically take maximum advantage of modern network switches and NICs. Great network throughput and security can be realized with SMB 3.1.1 – your users will love it!

To try this new feature in Windows Server 2016, download the Technical Preview.

Check out the series:

The post What’s new in failover clustering: #07 SMB Multichannel & Multi-NIC cluster networks appeared first on Microsoft Windows Server Blog.

What’s new in failover clustering: #06 Virtual machine start ordering

Microsoft Windows Server Team — Wed, 06 Jul 2016 16:00:00 +0000

This post was authored by Subhasish Bhattacharya, Program Manager, Windows Server

Introduction: “Special” virtual machines

Not all virtual machines (VMs) in your production deployment are created equal… some are just special! Therefore, it is important for these “Utility” VMs to start up before other “Dependent” VMs in your private cloud. Consider a VM hosting the Domain Controller for your private cloud. It is imperative for that VM to start before any VM in your private cloud that depends on Active Directory.

Virtual machine priority in Windows Server

Today in Windows Server, VM start ordering is addressed by configuring the priority of VMs. VMs can be designated a Low, Medium or High priority. This ensures that the most important VMs are started first. Additionally, it is ensured that in the case of resource constraints, the most important VMs are running. However, there is no cross-node orchestration (by VM priority) across the nodes in a cluster. Each cluster node has an isolated view of the priority of the VMs it is hosting. Additionally, for VM start ordering based on priority, a VM is considered to be running once it reaches online state. This often does not provide a sufficient head start for its dependent VMs.

The need for virtual machine start ordering in your private cloud

Let us consider some scenarios to motivate the need for VM start ordering in our production deployments:

A multi-tiered application where the database VMs have to start first, followed by the middle-tier VMs and lastly, the front-end VMs.
In an integrated system, such as the Cloud Platform System, where infrastructure VMs (hosting services like Active Directory) need to start first. Next, application VMs (such as those hosting SQL) can start, followed by front-end VMs hosting management infrastructure.
A hyper-converged cluster where storage utility VMs need to start before management and tenant VMs. A similar scenario exists for storage appliances.
Converged clusters where at least one Domain Controller VM needs to start up before VMs hosting applications with Active Directory dependencies can be brought up.

Virtual machine start ordering

The virtual machine start ordering enhances your private cloud VM orchestration by providing the following:

Special VMs

• VMs can be anointed as “Utility” VMs which are slated to start before all other VMs.

Orchestration

• Groups of VMs can be defined to represent tiers.
• Startup indicators and triggers are available for each VM group to determine when each VM group can be considered to be started.

Start ordering
• Multi-layer dependencies can be created between different VM groups to define a start order.

Extending beyond VMs

Thus far in this blog post I have discussed the start ordering of VMs. However, this feature enables you to orchestrate the start ordering for any application represented as a cluster group (for example: a cluster group that is used to make your in-house application highly available)!

To try this new feature in Windows Server 2016, download the Technical Preview.

Check out the series:

The post What’s new in failover clustering: #06 Virtual machine start ordering appeared first on Microsoft Windows Server Blog.

What’s new in failover clustering: #05 Resilient private cloud

Microsoft Windows Server Team — Tue, 05 Jul 2016 16:00:43 +0000

This post was authored by Subhasish Bhattacharya, Program Manager, Windows Server.

Introduction

In the past, in a world of reliable but expensive SANs, an aggressive high-availability strategy designed to fail fast was most optimal. The health of the system would be closely monitored to detect issues and react quickly and swiftly. This minimized downtime when catastrophic failures occurred.

In today’s cloud-scale environments, commonly comprising of commodity hardware, transient failures have become more common than hard failures. These transient compute and storage failures in commodity hardware are triggered by common events such as switch reset, packet loss, latency, and spanning tree convergence. In this new world, reacting aggressively to handle transient failures can cause more downtime than it prevents.

The storage and compute stack in Windows Server 2016 has been designed to optimize both high availability and resiliency. In a Software Defined Datacenter, we must assume infrastructure will break and it is imperative that software is resilient. At the same time, it is not acceptable to have degraded Virtual Machine (VM) availability.

Resilient private clouds: Compute and storage virtual machine resiliency

Windows Server 2016 introduces increased VM resiliency features to address both:

Compute failures: Due to east-west transient network failures.
Storage failures: Due to north-south transient storage failures.

Compute resiliency

Transient network failures impede intra-cluster communication for your private cloud. This results in cluster nodes being removed from active membership in a cluster. In Windows Server 2016, your cluster is resilient to intra-cluster communication failures. This resiliency is achieved by the following:

A VM continues to run on a node even when it falls out of cluster membership. In this state, the node is considered to be in an “isolated” state and the VM is “unmonitored” – i.e., its health is not being actively monitored by the cluster service.
If the network connectivity of the “isolated” node fails to recover within a certain duration, the VM is live-migrated to another node in the cluster. Note that this results in no downtime for the VM.
Additionally, “flapping” nodes, which constantly come in and out of cluster membership, are temporarily banished and placed in a “quarantined” state.

Storage resiliency

A transient storage failure results in a VM being unable to access its underlying VHDX file since read or write requests to disk fail. In Windows Server 2016, a VM is able to seamlessly detect and be resilient to such transient failures as follows:

On detecting a transient storage failure, the tenant VM session state is preserved.
Any failure in block- or file-based storage infrastructure is handled by the VM stack, triggering an intelligent and quick response.
The VM is moved to a “PausedCritical” state as it waits for the storage to recover.
On recovery from the transient failure, the session state is restored.

To try this new feature in Windows Server 2016, download the Technical Preview. For additional details, see the feature blog posts for compute and storage VM resiliency.

Check out the series:

–    #01 Cluster OS Rolling Upgrade
–    #02 VM Load Balancing
–    #03 Stretched Clusters
–    #04 Workgroup and multi-domain clusters

The post What’s new in failover clustering: #05 Resilient private cloud appeared first on Microsoft Windows Server Blog.

What’s new in failover clustering: #04 Workgroup and multi-domain clusters

Microsoft Windows Server Team — Mon, 20 Jun 2016 17:08:53 +0000

This post was authored by Subhasish Bhattacharya, Program Manager, Windows Server

Introduction: Active Directory integration with your private cloud

Active Directory integration provides significant value for most of the private cloud deployments. However, for a subset of scenarios, it is desirable to be able to decouple your deployment from Active Directory. In prior Windows Server releases, we introduced a number of features to minimize the dependence of your private cloud on Active Directory. Some of these features include:

Bootstrapping without Active Directory: Introduced in Windows Server 2012 and allows you to boot your private cloud without Active Directory dependencies. This is especially useful when you have lost power to your entire datacenter and have to bootstrap. It therefore enables you to virtualize your entire datacenter, including domain controllers.

Cluster Shared Volumes independent of Active Directory: Cluster Shared Volumes, in Windows Server 2012 and beyond, have no dependence on Active Directory. This is especially advantageous in deployments in branch offices and off-site deployments.

Active Directory-detached clusters: In Windows Server 2012 R2, Failover Clusters can be created without computer objects in Active Directory, thereby decreasing your deployment and maintenance complexity. However, this deployment model still requires all the nodes in your private cloud to be joined to a single domain.

Flexible private cloud – the need for domain independence

In our discussions with you over the last few years, we learned about why you wanted domain independence (freedom from domain requirements for your clusters)!

SQL Server workload

1. You wish to have AlwaysOn Availability Groups (AG) span across multiple domains and on workgroup nodes. This in some cases is motivated by your desire to move away from database mirroring. You have described how:

Your enterprise needs to operate with multiple domains due to events such as mergers and acquisitions.
You would like to consolidate multiple replicas from multiple sources to a single destination.
You have AG replicas not in a domain.
You wish to address deployment complexity and the dependence of your DBA administrators on the owners of your Active Directory infrastructure.

2. Today there are thousands of SQL Server production deployments on Azure IaaS Virtual Machine (VM) environments. You love the flexibility of Azure, but these deployments require you to deploy two additional VMs for redundant DCs. You would like to avoid this requirement and reduce the deployment cost of your solution.

3. You love Hybrid deployments, where some replicas are running in Azure VMs and other replicas are running on-premises for cross-site disaster recovery. However, all replicas are required to be in the same domain. This is a deployment burden for you. More details about this deployment model can be found here.

Hyper-V and File Server workloads

You would like to be able to deploy Hyper-V and File Server clusters without the cost and complexity of configuring a domain infrastructure for the following scenarios:

Small deployments
Branch office
DMZ deployment outside firewall
Highly secure deployments (domain-joined is considered a security weakness in highly secure environments)
Test and development environments

In Windows Server 2016, we have addressed your SQL Server workload scenarios, end-to-end! We continue to strive to light up your Hyper-V and File Server workload scenarios for subsequent Windows Server releases. Hyper-V live migration and File Server have a dependency on Kerberos, which currently remains unaddressed in Windows Server 2016.

Domain-independent clusters

Windows Server 2016 breaks down domain barriers and introduces the ability to create a Failover Cluster without domain dependencies. Failover Clusters can now therefore be created in the following configurations:

Single-domain clusters: Clusters with all nodes joined to the same domain.

Workgroup clusters: Clusters with nodes that are member servers/workgroup (not domain-joined).

Multi-domain clusters: Clusters with nodes that are members of different domains.

Workgroup and domain clusters: Clusters with nodes that are members of domains and nodes that are member servers/workgroup.

Creating domain-independent clusters

All the options to create “traditional” clusters are still applicable for domain-independent clusters in Windows Server 2016. To try this new feature in Windows Server 2016, download the Technical Preview. For additional details, see the feature Cluster blog here. Some of the options to create domain-independent clusters are:

1. Using Failover Cluster Manager

2. Using Microsoft PowerShell©

New-Cluster –Name -Node -AdministrativeAccessPoint DNS

Check out the series:

The post What’s new in failover clustering: #04 Workgroup and multi-domain clusters appeared first on Microsoft Windows Server Blog.

What’s new in failover clustering: #03 Stretched Clusters

Microsoft Windows Server Team — Fri, 17 Jun 2016 17:00:18 +0000

This post was authored by Ned Pyle, Principal Program Manager, Windows Server

Why should you care about clustered storage? Everyone’s talking about apps, mobile, DevOps, containers, platforms. That’s cutting edge stuff in the IT world. Storage is boring, right?

Well, they’re all wrong. Storage is the key. You care about storage because it contains the only irreplaceable part of your IT environment: your data. That data is what makes your company run, what makes the money, what keeps the lights on. And that data usage is ever increasing.

Your datacenter could burn to the ground, all your servers could flood, your network could be shut down by a malicious attack, but if your data is safely protected, you can always get back to business.

Windows Server 2016 stretch clustering is here to protect that data and run those workloads so that your business stays in business.

Stretching clusters with Storage Replica in Windows Server 2016

Storage Replica offers new disaster recovery and preparedness capabilities to the already robust Failover Cluster in Windows Server 2016 Technical Preview. For the first time, Windows Server offers the peace of mind of a zero data loss recovery point objective, with the ability to synchronously protect data on different racks, floors, buildings, campuses, counties and cities. After a disaster strikes, all data will exist elsewhere, without any possibility of loss. The same applies before a disaster strikes; Storage Replica offers you the ability to switch workloads to safe locations prior to catastrophes when granted a few moments warning – again, with no data loss.

Storage Replica allows more efficient use of multiple datacenters. By stretching clusters or replicating clusters, workloads can be run in multiple datacenters for quicker data access by local proximity users and applications, as well as for better load distribution and use of compute resources. If a disaster takes one datacenter offline, you can move its typical workloads to the other site temporarily. It is also workload agnostic – you can replicate Hyper-V VMs, MS SQL Server databases, unstructured data or third party application workloads.

Stretch Cluster allows configuration of computers and storage in a single cluster, where some nodes share one set of asymmetric storage and some nodes share another, then synchronously or asynchronously replicate with site awareness. This scenario can utilize shared Storage Spaces on JBOD, SAN and iSCSI-attached LUNs. It is managed with PowerShell and the Failover Cluster Manager graphical tool, and allows for automated workload failover.

Synchronous Replication

Order of operation

1. Application writes data
2. Log data is written
and the data is replicated to the remote site
3. Log data is written at the
remote site
4. Acknowledgement from the remote site
5. Application
write acknowledged

t & t1: Data flushed to the volume, logs always write through

Besides synchronous replication, Storage Replica can utilize asynchronous replication for higher latency networks or for lower bandwidth networks.

Ease of Deployment and Management

You deploy and manage stretch clusters using familiar and mature tools like Failover Cluster Manager, which means reduced training time for staff. Wizard-based setup allows administrators to quickly deploy new replication groups and protect their data and workloads.

To ensure successful deployment and operational guidance, Storage Replica and the Failover Cluster both provide validation mechanisms with detailed reports. For instance, prior to deploying a stretch cluster, you can test the topology for requirements, estimate sync times, log size recommendations and write IO performance.

Windows Server 2016 also implements site fault domains, allowing you to specify the location of nodes in your cluster and set preferences. For instance, you could specify New York and New Jersey sites, then ensure that all nodes in New York must be offline for the workloads and storage replication to automatically switch over to the New Jersey site. All of this is implemented through a simple PowerShell cmdlet, but can also be automated with XML files for larger private cloud environments.

Summary

Windows Server 2016 stretch clustering was designed with your data’s safety in mind. The mature, robust failover clustering combined with synchronous replication offer peace of mind at commodity pricing. Try this new feature in Windows Server 2016 and download the Technical Preview. For additional details, see the feature Cluster blog here.

Check out the series:

The post What’s new in failover clustering: #03 Stretched Clusters appeared first on Microsoft Windows Server Blog.

What’s new in failover clustering: #02 VM Load Balancing

Microsoft Windows Server Team — Thu, 09 Jun 2016 16:00:00 +0000

This post was authored by Subhasish Bhattacharya, Program Manager, Windows Server.

Introduction: Optimizing your private cloud

In our discussions with customers, we learned that a key consideration for private cloud deployments is the capital expenditure (CapEx) required to go into production. We also learned that customers added redundancy to their private clouds, thereby increasing CapEx, to avoid under-capacity during peak traffic in production. The need for redundancy is driven by unbalanced private clouds where some nodes are hosting more Virtual Machines (VMs) and others are underutilized (such as a freshly rebooted server).

During the lifecycle of your private cloud, certain operations (such as rebooting a node for patching) results in the VMs in your clusters being moved. This ultimately results in an unbalanced cluster. System Center Virtual Machine Manager (SCVMM) has a feature called Dynamic Optimization which automatically balances the utilization of your cluster. A consistent and vocal message we heard from you is the need for a similar solution for environments without SCVMM. Node Fairness thus provides an in-box feature in Windows Server to optimize your private cloud utilization.

What’s VM Load Balancing?

Load Balancing is a new in-box feature in Windows Server 2016 that allows you to optimize the utilization of nodes in a Failover Cluster. It identifies over-committed nodes and re-distributes VMs from those nodes to under-committed nodes. Some of the salient aspects of this feature are as follows:

It is a zero-downtime solution: VMs are live-migrated to idle nodes.
Seamless integration with your existing cluster environment: Failure policies such as anti-affinity, fault domains and possible owners are honored.
Heuristics for balancing: VM memory pressure and CPU utilization of the node.
Granular control: Enabled by default. Can be activated on-demand or at a periodic interval.
Aggressiveness thresholds: Three thresholds available based on the characteristics of your deployment.

The Feature in Action

A new node is added to your private cloud

When you add new capacity to your private cloud, the Load Balancing feature automatically balances capacity from the existing nodes in your private cloud, to the newly added capacity. Here is the flow of the steps:

The pressure is evaluated on the existing nodes in the private cloud.
All nodes exceeding threshold are identified.
The nodes with the highest pressure are identified to determine priority of balancing.
VMs are Live Migrated (with no down time) from a node exceeding threshold to a newly added node in the private cloud.

Recurring load balancing

When configured for periodic balancing, the pressure on the cluster nodes is evaluated for balancing every 30 minutes. Alternately, the pressure can be evaluated on-demand. Here is the flow of the steps:

The pressure is evaluated on all nodes in the private cloud.
All nodes exceeding threshold and those below threshold are identified.
The nodes with the highest pressure are identified to determine priority of balancing.
VMs are Live Migrated (with no down time) from a node exceeding the threshold to node under minimum threshold.

To try this new feature in Windows Server 2016, download the Technical Preview. For additional details, see the feature Cluster blog here.

Check out the series:

#01 Cluster OS Rolling Upgrade

The post What’s new in failover clustering: #02 VM Load Balancing appeared first on Microsoft Windows Server Blog.

What’s new in failover clustering: #01 Cluster OS Rolling Upgrade

Microsoft Windows Server Team — Thu, 02 Jun 2016 16:00:08 +0000

This post was authored by Rob Hindman, Senior Program Manager, Windows Server.

Better agility for your private cloud

We asked what you needed in the next release of Windows Server and we heard you clearly: you have made significant investments in the IT infrastructure that is used to deliver results for your business. When you upgrade the Hyper-V and Scale-out File Server clusters in your private cloud or datacenter to a new version of Windows Server, you don’t want users to notice. You don’t want your SLAs to be impacted in the slightest bit, and you don’t want to buy new hardware. You want to use new features in Windows Server 2016 as soon as possible.

The Cluster OS Rolling Upgrade feature in Windows Server 2016 is our response to your requests. Cluster OS Rolling Upgrade introduces a new concept called Mixed-OS mode, which allows customers to start with a Windows Server 2012 R2 failover cluster, and add Windows Server 2016 server nodes to the cluster. Using a sequential process of draining, evicting, upgrading, then adding and resuming each node, any failover cluster can be upgraded. Details include:

Upgrade any Windows Server 2012 R2 cluster, physical (host) or virtual (guest) cluster.

No downtime for physical Hyper-V and Scale-out File Server (SoFS) clusters.
No new or additional hardware required if there is enough capacity to remove one cluster node at a time.
No need for a second cluster. The original cluster is upgraded one node at a time.
No need to copy storage data; although VM Live Migration does copy VM memory.

After all nodes in the cluster have been upgraded to Windows Server 2016, the administrator uses the Update-ClusterFunctionalLevel cmdlet to commit the cluster to permanently running Windows Server 2016 nodes only – at which point the new Windows Server 2016 features become available.

Only Hyper-V and the Scale-out File Server (SoFS) clusters can be upgraded without any downtime. The Hyper-V cluster makes use of VM Live Migration so the VMs will keep running while they are moved from Windows Server 2012 R2 to Windows Server 2016. The Scale-out File Server (SoFS) makes use of Continuously Available (CA) file handles which can move between cluster nodes without any data loss. For other workloads like SQL Server AlwaysOn Availability Groups and SQL Server AlwaysOn Failover Cluster Instance, downtime is equivalent to failover time.

As an example, let’s start with two Windows Server 2012 R2 clusters. The first cluster is a Hyper-V cluster of four VMs hosting eight VMs. The second cluster is a Scale-out File Server (SoFS) storage cluster for the VHDX files used by the Hyper-V cluster. Many customers like this converged (aka, disaggregated) strategy so that they can add nodes to either the Hyper-V or SoFS clusters, adding capacity to either cluster as needed. Assuming that there is capacity on each cluster to drain one node, the VMs can remain running.

Let’s see how we upgrade both of these clusters at the same time using Cluster OS Rolling Upgrade:

The VMs are Live Migrated off of HVHOST-1, and the SMB3 CA File shares are moved off of SOFS-1. Both of these nodes are evicted from their clusters:

HVHost-1 and SOFS-1 are reformatted, and have Windows Server 2016 installed on them. They are both added back into their clusters, and VMs are Live Migrated onto HVHOST-1, and SMB3 CA File shares are moved to SOFS-1. HVHOST-2 and SOFS-2 are evicted from their clusters:

HVHost-2 and SOFS-2 are reformatted, and have Windows Server 2016 installed on them. They are both added back into their clusters, and VMs are Live Migrated onto HVHOST-2, and SMB3 CA File shares are moved to SOFS-2. HVHOST-3 and SOFS-3 are evicted from their clusters:

HVHost-3 and SOFS-3 are reformatted, and have Windows Server 2016 installed on them. They are both added back into their clusters, and VMs are Live Migrated onto HVHOST-3, and SMB3 CA File shares are moved to SOFS-3. HVHOST-4 and SOFS-4 are evicted from their clusters:

HVHost-4 and SOFS-4 are reformatted, and have Windows Server 2016 installed on them. They are both added back into their clusters, and VMs are Live Migrated onto HVHOST-4, and SMB3 CA File shares are moved to SOFS-4:

At this stage, the cluster is ready to be committed to Windows Server 2016 – the administrator is ready to run the Update-ClusterFunctionalLevel cmdlet. When this happens, new failover clustering features such as VM Compute Resiliency, VM Storage Resiliency, and the Cloud Witness can be used on the cluster. The upgrade process has completed without any downtime for the users of the VMs.

Cluster OS Rolling Upgrade can be used to upgrade any failover cluster from Windows Server 2012 R2 to Windows 2016. Used with Hyper-V VM Live Migration, there is no downtime for Hyper-V host clusters. Scale-out File Server (SoFS) clusters can also be upgraded without any downtime because of the Continuously Available (CA) file shares used by SMB3. Other cluster workloads like SQL Server AlwaysOn Failover Cluster Instance (FCI) and SQL Server AlwaysOn Availability Groups (AGs) can be upgraded with minimal downtime which is equivalent to failover time. Virtual (guest) clusters using Shared VHDX (Shared VHDX guest clusters) need to be logically detached from shared storage before upgrade, and reattached after the upgrade process has completed.

We’ve worked closely with many customers using Windows Server 2016 Technical Preview to ensure that Cluster OS Rolling Upgrade delivers the deployment agility that they need, reducing the time and cost needed to upgrade to Windows Server 2016.

To try this new feature in Windows Server 2016, download the Technical Preview. For additional details, see Cluster OS Rolling Upgrade on TechNet.

The post What’s new in failover clustering: #01 Cluster OS Rolling Upgrade appeared first on Microsoft Windows Server Blog.