Sharing experiences with Azure Key Vault Managed HSM

Introduction

All the things you need to know about Azure Key Vault Managed HSM, is right here! This blog describes the my experience which I have learned working with the Azure Key Vault Managed HSM and its lifecycle management.

What is Azure Key Vault Managed HSM
Azure Key Vault Managed HSM (Hardware Security Module), shortly Managed HSM, is a fully managed, highly available, single-tenant, standards-compliant cloud service that enables you to safeguard cryptographic keys for your cloud applications, using FIPS 140-2 Level 3 validated HSMs.

In stead of maintaining your own HSM (maintenance, support, and lifecyclemanagement), the hardware is managed for you by Microsoft. The data within the Managed HSM, Microsoft does not have access to. This means that all the content encrypted by the Managed HSM is not readable for Microsoft, which is a plus, but can also be a downside, if a customer looses their encryption keys because there is no way to unencrypt the data. Not even for Microsoft. For more details about the applied security controls (physical, technical and administrative), click here.

Usecase

Managed HSM is a security mechanism to provide data access in compliance with local laws and regulations:

  • Protect data from security breaches and malicious activity;
  • Maintain data confidentiality to protect privacy;
  • Protect data against unauthorized access to data;
  • Secure data to prevent negative business and financial impact.

The most common use-case is to protect “highly sensitive” classified data (like Intellectual Property or Personal Identifiable Data in for example medical records), which should be encrypted. Encryption is one of the security methods, so a cloud provider or bad actor is not a part of the “chain of Trust”, meaning that the customer is in control (total ownership) over their own data.

Usage

Below topics you should consider first.

How to use it
When the organisation has a clear use-case which they want to protect, the next step is to decide which kind of HSM suits the organisation their needs. There are diferent kind of solutions (like On-Prem, an cloud independent solution or that of a Cloudprovider). When the organisation has decide which one prefer, the next step is to think through it’s livecycle management. As mentioned in this blog, i've chosen for the Managed HSM of a Cloudprovider. A Managed HSM can take of a lot of maintenance burden on the hardware maintenance level. Besides costs, it is also giving an the organisation to the possiblility to deploy (very quickly) and use its functionality and lean from the best practices of by the Cloudprovider (in this case Microsoft).

Shared or Dedicated
When the choice is made for an Managed HSM, the next step is retrieve any organisation requirements and the position of the Managed HSM in the current Azure Landscape. Questions like "how many Managed HSMs will be deployed", "will it get it’s own subscription", "can it be reached only by private endpoints" and more has to be answered before deployment can take place.

From my experience, the hard limits which can result in more than one Managed HSM, will not be touched that easily. It is mostly the costs of this Azure Resource, which drives organisations to “share” one Managed HSM so it is usable for more than one workload team.

Other design related topics
It is wise to also think about the topics as shown below. It helps in getting a better view and understanding of the Managed HSM beter,


Live cycle management of the Managed HSM

The picture below shows the life cycle management of the Azure Key Vault Managed HSM. Each phase will be explained below.


Phase 0 - Processes
Before you start, it is recommended to create and describe the necessary steps and procedures around the Managed HSM within the organization. From my experience, the following process and procedures should be described:

  • Keymanagement process;
  • Demarcation of Key management responsibility within the organization;
  • Managed HSM Root Key creation;
  • Managed HSM Workload key lifecycle management;
  • The Disaster Recovery procedure of the Managed HSM;
  • Disaster Recovery scenario’s overall and how to act;
  • Risk assessment.

Leasons learned
First write down the processses and arrange the necessary people within the organisation to confirm the required documents. This was the most consuming phase of them all.


Phase 1 - Deploy
The deployment phase was written clear on the Microsoft documentation site. It explains which steps to take to provision the Managed HSM.

# Log into the Azure environment with Azure Powershell
Connect-AzAccount

# Connect to the correct subscription
set-AzContext -Subscription "your subscription ID Managed HSM"

# Create a resourcegroup
az group create --name "your-resourcegroupnameMHSM" --location westeurope

# Create the Managed HSM
New-AzKeyVaultManagedHsm -Name "your-unique-managed-hsm-name" -ResourceGroupName "your-resourcegroupnameMHSM" -Location "westeurope" -Administrator "your-principal-ID" -SoftDeleteRetentionInDays "# of days to retain the managed hsm after softdelete"

Leasons learned

  • Always use infrastructure as code to deploy an Managed HSM, following the best practices. Don't forget to enable “logging”, turn on “purge protection” and “soft delete”;
  • It is important to have a good understanding of what a “Security Domain” is;
  • Enabling logging is an “azure-cli command”. Logging can stored to a storage account or a LogAnalytics Workspace;
  • Bear in mind the “-Administrator” flag in the command above (whether it is a Entra ID group, service principle or otherwise), is the ONLY one that can do the “Disaster Recovery” of the Managed HSM, in case of an emergency.

Phase 2 - Activate
After the deployement, the Managed HSM must be activated. From the Microsoft documentation, it shows that the customer should create first and then give in it’s own keys (Root Keys) which can be created with for example the tool “openssl”. As stated, you are not required to use this mechanism. Other methods can be used as well.

If you are not interested to use another tool, the advice is to use "Openssl" and the method as mentioned below for creating your own keys:

# Create RSA keys via Openssl 
openssl req -newkey rsa:4096 -nodes -keyout cert_0.key -x509 -days 365 -out cert_0.cer
openssl req -newkey rsa:4096 -nodes -keyout cert_1.key -x509 -days 365 -out cert_1.cer
openssl req -newkey rsa:4096 -nodes -keyout cert_2.key -x509 -days 365 -out cert_2.cer

Create and store the RSA key pairs and "Security Domain" file generated in this step securely.

Note:
When using SSL certificates, those certificates can expire and should be renewed yearly. This does not mean, that you cannot decrypt your data. Even if the certificates are "expired," it can still be used to restore the security domain.

The next step is to download the "Security Domain" and activate your Managed HSM, (which is only available in Azure Powershell). The following example uses three RSA key pairs (only public keys are needed for this command) and sets the quorum to three.

# Log into the Azure environment with Azure Powershell
Connect-AzAccount

# Connect to the correct subscription
set-AzContext -Subscription "your subscription ID Managed HSM"

# Activate the Managed HSM
Export-AzKeyVaultSecurityDomain -Name "your-unique-managed-hsm-name" -Certificates "cert_0.cer", "cert_1.cer", "cert_2.cer" -OutputPath "MHSMsd.ps.json" -Quorum 3

The status of the Managed HSM will change to active.

Please store the "Security Domain" file and the RSA key pairs securely. You will need them for disaster recovery or for creating another Managed HSM that shares same "Security Domain" so the two can share keys. After successfully downloading the "Security Domain", your HSM will be in an active state and ready for you to use.

Leasons learned

  • Creating the organization keys (Root keys) is a responsibility of the organization itself, which is sometimes easily forgotten. The key creation is mostly provided in forms of a key ceremony;
  • Storage of the Root keys and the "Security domain" file can be an issue. Keep them safe on different locations and create a process around it by testing (dry-run) the scenario “Disaster Recovery”.

Phase 3 - HSM keys lifecycle for workload teams
Within this phase, the Managed HSM is used for workload teams. There are separate phases described, as mentioned below:

  • Creation/Distribution;
    • Three types supported: “RSA-HSM, EC-HSM en AES-HSM”;
    • Distribution takes not place, only references to the HSM-sleutel (keyName), the Managed HSM (keyVaultUri) and the key version (keyVersion) must be shared to Workload teams, to put them in the required Workload supported Azure resource;
  • Control;
    • Monitor on expirary dates of HSM-keys (when enabled on the HSM-key);
  • Update/Rotate;
    • Adding corrections on for example “key operations” (sign, verify, wrap key, unwrap key, encrypt, decrypt);
    • Act on compromised HSM-keys (generating a new key-version or complete new key).
  • Delete.
    • When and how to remove inactive HSM-keys.

Leasons learned

  • Creating a simple and understandable picture, helps in visability and clearence, abstracting away complexity;
  • It is helpfull to describe each phase into tasks, stakeholders and responsibility. Besides the clearence, it gives expectations and helps in the adoption within the organisation;
  • Possibility to add Best practices and advices in each phase, so it is practical for Workload teams and gives compliance to the latest standards.

Phase 4 - Recover the Managed HSM
In section describe how to recover the Managed HSM when a disaster strikes and the Managed HSM is lost or unavailable (because it was deleted and then purged, or when a catastrophic failure in the region). The following steps must be taken.

Step 1 - First create a new Managed HSM instance

# Log into the Azure environment with Azure Powershell
Connect-AzAccount

# Connect to the correct subscription
set-AzContext -Subscription "your subscription ID ManagedHSM"

# Create a resourcegroup
az group create --name "your-resourcegroupnameMHSMversion2" --location westeurope

# Create a variable for the Administrators group
oid=$(az ad signed-in-user show --query objectId -o tsv)

# Create the Managed HSM
az keyvault create --hsm-name "your-unique-managed-hsm-name-version-2" --resource-group "your-resourcegroupnameMHSMversion2" --location "westeurope" --administrators $oid

Your Azure account is now authorized to perform any operations on this Managed HSM. As of yet, nobody else is authorized. Setting the credentials of an EntraID group is a beter best practice. The output of this command shows properties of the Managed HSM that you've created. The two most important properties are:

  • name: In the example, the name is hsm2. You'll use this name for other Key Vault commands.
  • hsmUri: In the example, the URI is 'https://hsm2.managedhsm.azure.net.' Applications that use your HSM through its REST API must use this URI.

Step 2 - Activate the "Security Domain recovery" mode
At this point in the normal creation process, you initialize and download the new HSM's Security Domain. However, since you're executing a disaster recovery procedure, you request the HSM to enter Security Domain Recovery Mode and download a Security Domain Exchange Key instead. The Security Domain Exchange Key is an RSA public key that will be used to encrypt the security domain before uploading it to the HSM. The corresponding private key is protected inside the HSM, to keep your Security Domain contents safe during the transfer.

# Log into the Azure environment with Azure Powershell
az keyvault security-domain init-recovery --hsm-name "your-unique-managed-hsm-name-version-2" --sd-exchange-key your-unique-managed-hsm-name-version-2-SDE.cer

Step 3 - Create a Security Domain Upload blob of the source HSM
For this step you'll need:

  • The Security Domain Exchange Key you downloaded in previous step (for easy reference HSM2);
  • The Security Domain of the source HSM (for easy reference HSM 1, the orginal HSM);
  • At least quorum number of private keys that were used to encrypt the security domain.
# Create a Security Domain Upload blob of the source HSM
az keyvault security-domain restore-blob --sd-exchange-key HSM2-SDE.cer --sd-file HSM1-SD.json --sd-wrapping-keys cert_0.key cert_1.key cert_2.key --sd-file-restore-blob restore_blob.json

Step 4 - Upload Security Domain Upload blob to destination HSM
You now use the Security Domain Upload blob created in the previous step and upload it to the destination HSM to complete the security domain recovery. The --restore-blob flag is used to prevent exposing keys in an online environment.

# Upload Security Domain Upload blob to destination HSM
az keyvault security-domain upload --hsm-name "Name of HSM2" --sd-file restore_blob.json --restore-blob

Now both the source HSM (HSM1) and the destination HSM (HSM2) have the same security domain. The next step is to restore a full backup from the source HSM into the destination HSM.


Step 5 - Create a backup (as a restore point) of your new HSM
To create an HSM backup, you will need to have:

  • A storage account where the backup will be stored;
  • A blob storage container in this storage account where the backup process will create a new folder to store encrypted backup;
  • A user assigned managed identity that has the Storage Blob Data Contributor role on the storage account OR storage container SAS token with permissions 'crdw'.

Use az keyvault backup command to the HSM backup in the storage container mhsmbackupcontainer, which is in the storage account mhsmdemobackup in the following examples.

# Set permissions if required for a userAssignedIdentity
az keyvault update-hsm --hsm-name "Name of HSM2 --mi-user-assigned "/subscriptions/subid/resourcegroups/mhsmrgname/providers/Microsoft.ManagedIdentity/userAssignedIdentities/userassignedidentityname"

# Make a backup and store it in the storage account
az keyvault backup start --use-managed-identity true --hsm-name "Name of HSM2" --storage-account-name mhsmdemobackup --blob-container-name mhsmbackupcontainer

Step 6 - Create a backup (as a restore point) of your new HSM
To restore an HSM backup, you will need to have:

  • The storage account and the blob container in which the source HSM's backups are stored;
  • The folder name from where you want to restore the backup. If you create regular backups, there will be many folders inside this container.

Use az keyvault restore command to the new Managed HSM (HSM2), using the backup of the source Managed HSM (HSM1) that you are trying to restore, which is in the folder name mhsm-HSM1-2020083120161860 found in the storage container mhsmdemobackupcontainer of the storage account mhsmbackup in the following example.

# Set permissions if required for a userAssignedIdentity
az keyvault restore start --hsm-name "Name of HSM2" --storage-account-name mhsmbackup --blob-container-name mhsmdemobackupcontainer --backup-folder mhsm-mhsmbackup-2020083120161860 --use-managed-identity true

Now you've completed a full disaster recovery process. The contents of the source HSM when the backup was taken are copied to the destination HSM, including all the keys, versions, attributes, tags, and role assignments.


Leasons learned

  • Restoring the Managed HSM is doable but a time consuming operation;
  • Dry-run the Disaster Recovery process is required. The organisation can prepare itself in case a real Disaster strikes and cam act quickly if required;
  • Step 3 as mentioned above only works with accounts that have “Administrator” permissions, given in phase 1 “-Admnistrator” flag. Without this permission, there is no way that step 3 is going to work. Not even with “Owner” permissions on the subscription. It is wise to use and create an special EntraID in stead of your own account.

Phase 5 - Removing managed HSM
Within this phase, the Managed HSM is not used anymore and will be deleted.

Leasons learned

  • Keep in mind that if "purge protection" is enabled, the Managed HSM will not be removed directly. It will be removed when the retention period has reached.

Mitigation of Common Risk scenario's

A breakdown has been made into two main categories, namely:

  • Workload-related risks (more based on their guarentee services to the organisation);
  • Infrastructure related risks (to guarantee services and availability of the shared Managed HSM.

Workload-related risks
This section discusses which risks have been defined, which could possibly lead to a Disaster Recovery situation. The following points have been identified:

  • Expired HSM-key;
    • Usecase: Concerns a workload team that has applied Azure Resources with Customer Managed Keys (CMK) in their solution, whereby an issued HSM-key with an expiration date has expired.
    • Risk: Some Azure Resources can stop functioning and there is a chance that encrypted data will no longer be accessible.
    • Mitigation: There is no risk, verified with Microsoft (with reference to the documentation). It is always possible to decrypt and/or extract data with expired keys. Repacking and/or encrypting is no longer possible once the date is expired.
  • Compromised HSM-Key;
    • Usecase: Reason for compromise may vary (unknown, send by email, public repo, or hacked), but the result is that the HSM key info was exposed. Keys themselves can’t leak out. The references around the HSM key, however are available to leak (Managed HSM URL, key version and the key name).
    • Risk: Unwanted people can retrieve the key name, it’s version and the Managed HSM URL based on the leaked data. They can start hacking in a targeted manner of the Managed HSM itself.
    • Mitigation: In the short term, provide the HSM-key with a new version (rotate) as quickly as possible via a manual or automated process. Additional active monitoring (on activity log Managed HSM) and continuous checking of RBAC is required. In the long term, depending on the size of the data breach, you can also choose to set up a new Managed HSM (recycling Managed HSM).
  • Corruption of HSM-Key.
    • Usecase: A workload team requesting a new HSM-key through the automated process, receiving an “incomplete key version or other details” response resulting in an error message. The affected Azure Resource does not apply the HSM-key information send.
    • Risk: Due to incorrect HSM-key information, the relevant Azure Resource or Data is not encrypted and the application does not comply with the “highly sensitive” security measures the organisation.
    • Mitigation: Verification is required so the correct information is send to the workload team. Checks can also be built-in into the automation proces.

Infrastructure related risks
This section discusses which risks have been defined, which could possibly lead to a Disaster Recovery situation. The following points have been identified:

  • Expired Root-keys of the MangedHSM;
    • Usecase: Concerns an infrastructure team that has rolled out and activated the Managed HSM with Root keys (with an expiration date of 1 year), whereby the Root (RSA) keys are expired after 1 year. The reason for the expiration may vary (informed too late or not renewed on time), but the result is that the Root keys has not been renewed on time.
    • Risk: There is a chance that encrypted data will no longer be accessible after the Root keys expire. As a result, the organization can no longer access its data and certain Azure Resources from Workload teams may stop functioning.
    • Mitigation: There is no risk, verified with Microsoft (with reference to the Documentation). It is always possible to decrypt and/or extract data with expired keys. Repacking and/or encrypting is no longer possible once the date expired. If there is concern (or compliance desire), the organisation should create new keys and download a new copy of the "Security Domain" (SD) before the old keys expire. Once the organisation has validated the new downloaded copy of SD and new keys are working, they could delete the old SD copy protected by the old keys. This is how SD key rotation should work. There is no effect to the data encrypted with the keys within the Managed HSM when there is a SD key rotation. I do not think logging will show if root keys are almost expired.
  • Renew/Rotation of the Root-keys of the Managed HSM;
    • Usecase: It involves an infrastructure team that needs to renew the Managed HSM Root keys within the year of the expiration date.
    • Risk: Some Azure Resources can stop functioning and there is a chance that encrypted data will no longer be accessible.
    • Mitigation: There is no risk, verified with Microsoft, as mentioned above. Renewing or rotating the Root-Keys have no impact on already encrypted data. Renewing will result in applying new keys and an new SecurityDomain, that should be kept safely stored.
  • Disaster Recovery of the Managed HSM;.
    • Usecase: Concerns the recovery of the Managed HSM due to a Hardware Disaster recovery, described in detail in the Microsoft Documentation. This procedure describes that a new Managed HSM must be deployed, activated with the Root Keys, backup and then restore the “Security Domain”.
    • Risk: Due to the failure of the Managed HSM, it is not possible to continue the services of the Managed HSM. This concerns the rollout of new HSM keys, setting permissions and communication between Azure Resources of Workload-teams with the Managed HSM.
    • Mitigation: By following the Disaster recovery procedure, the Managed HSM can be restored without the need to first decrypt and then encrypt the data. However, workload teams must change the data in the code from the old to the new “Managed HSM URL”.

Final Words

What a journey it was to discover it's functionality, letting organisations learn how to use it and write down all the required organisation processes and procedures around it. This amazing component is not just another piece of technology. You have to think about "People, Process and Technology" (PPT) which makes this a success. It comes with a good understanding how it works, the organisation responsiblity, enabling necessary processes and procedures within the organisation and good fathership "protecting the Root encryption key and Secure Domain" by storing them safely.

Managed HSM is definitely delivering what it promises to do: "Protecting "highly sensitive" data and give back ownership of the data only to the organisation. If you want to protect your "Intellectual Property", surely investigate if Managed HSM can help you with this.

I hope you have enjoyed this interesting learning! If you want to know more, just let me know! Till next time! Happy learning! And spread the word by sharing this with friends and colleques!