Azure SQL Elastic Pool Automatic Storage Scaling
Azure SQL Elastic Pools offer some great benefits for SQL Databases such as DTU based Performance and on-demand scaling up and down of compute/memory/storage via the Azure Portal. However, one thing that is not native out of the box, is scaling of storage on an elastic pool, in response to storage pressure. Recently I was exposed to this where an elastic pool reached its storage limit due to large amounts of data load happening, which caused the pool to run out of space and which would then quickly cause an outage as databases couldn’t allocate anymore space.
So, lets fix that and create our own….
To do this, we are going to leverage a couple of Azure features
- Azure Monitor Alerts
- Azure Automation Accounts & Runbooks
How do we do this?
What we want to happen is that when an Elastic Pool hits a certain limit of Data Space Allocated Percentage that an Azure Monitor Alert is triggered, which executes an Azure Automation Runbook, which runs some powershell to scale the elastic pool. How I have structured this is the following.
- At 80% Data Space Allocated, send me an email but don’t scale (so I get chance to validate it)
- At 90% Data Space Allocated, send me an email but this time scale the pool.
- At 95% Data Space Allocated, repeat the scale as before. This is a safety net in case the pool grows very quickly.
And part of this is also to have a hard limit of elastic pool size, so that I don’t get a situation where runaway storage growth needs me to take out a loan to pay the bill. All of this is easily doable with Azure Runbooks.
Lets Test
You’ll need some pre-reqs for this:
- Azure SQL Server
- Azure SQL Elastic Pool
- Azure Automation Account
If you don’t have these you can create some quickly with the below powershell, replacing the vars with what you want.
# Install Pre-Req Modules
Install-Module Az.Resources, Az.Sql, Az.ManagedServiceIdentity, Az.Automation
# Define our variables
$location = "australiaeast"
$rsgName = "myResourceGroup"
$sqlServerName = "mySQLServer"
$sqlServeElasticPoolName = "myElasticPool"
$sqlServerUser = "mysSQLAdminName"
$sqlServerPass = "SuperSecretPassword123!"
$automationAccountName = "myAutomationAccount"
# Create Resource Group
New-AzResourceGroup -Name $rsgName -Location $location
# Create our Credential Object
[securestring]$secStringPassword = ConvertTo-SecureString $sqlServerPass -AsPlainText -Force
[pscredential]$sqlCreds = New-Object System.Management.Automation.PSCredential ($sqlServerUser, $secStringPassword)
# Create SQL Server and Elastic Pool
$sqlServer = New-AzSQLServer -ServerName $sqlServerName -ResourceGroupName $rsgName -Location $location -ServerVersion "12.0" -SqlAdministratorCredentials $sqlCreds
$sqlElasticPool = New-AzSqlElasticPool -ServerName $sqlServerName -ResourceGroupName $rsgName -ElasticPoolName $sqlServeElasticPoolName -Edition "Standard" -Dtu 50 -DatabaseDtuMin 10 -DatabaseDtuMax 50 -StorageMB 51200
# Create Automation Account
New-AzAutomationAccount -Name $automationAccountName -ResourceGroupName $rsgName -Location $location -AssignSystemIdentity
Create your Azure Automation Runbook
Go to your Azure Automation Account, either an existing one or the one you created above, and create a Runbook.
Name it and ensure its running Powershell and 5.1
Once its created, open it and click Edit at the top, and then paste in the following Powershell Code. You can also get this from my GitHub Repo here. Once pasted in click Save and then Publish. If you don’t click Publish then the Runbook won’t execute.
<#
.SYNOPSIS
Automatically grow the allocated storage of an Azure SQL Elastic Pool from an Azure Runbook
.DESCRIPTION
When executed from an Azure Monitor Runbook Trigger, this script will expand the elastic pool by
its specified amount, up to a hard limit.
It also won't scale if it doesn't meet a scaling threshold, initially defined as 85%
Created by Andy Roberts (andyr8939@gmail.com)
.PARAMETER WebhookData
Raw data passed to the runbook from Azure Monitor
.PARAMETER poolIncreaseMB
How much space to add to the pool on each scale event. Initially set at 50 GB.
.PARAMETER poolIncreaseLimitMB
A hard limit of how big the pool can increase too. Initially set at 1 TB.
#>
param
(
[Parameter (Mandatory=$false)]
[object] $WebhookData,
[Parameter (Mandatory= $false)]
[String] $poolIncreaseMB = "51200", # 50GB
[Parameter (Mandatory= $false)]
[String] $poolIncreaseLimitMB = "1048576" # 1TB
)
$connectionName = "AzureRunAsConnection"
try
{
# Get the connection "AzureRunAsConnection "
$servicePrincipalConnection=Get-AutomationConnection -Name $connectionName
"Logging in to Azure..."
Connect-AzAccount `
-ServicePrincipal `
-TenantId $servicePrincipalConnection.TenantId `
-ApplicationId $servicePrincipalConnection.ApplicationId `
-CertificateThumbprint $servicePrincipalConnection.CertificateThumbprint
}
catch {
if (!$servicePrincipalConnection)
{
$ErrorMessage = "Connection $connectionName not found."
throw $ErrorMessage
} else{
Write-Error -Message $_.Exception
throw $_.Exception
}
}
Write-Output "This runbook was started from webhook $WebhookName."
# Collect Webhook Data and convert to use with JSON
$WebhookBody = (ConvertFrom-Json -InputObject $WebhookData.RequestBody)
# Obtain the WebhookBody containing the AlertContext and any details we need
$AlertContext = [object] ($WebhookBody.data).context
$SubId = $AlertContext.subscriptionId
$ResourceGroupName = $AlertContext.resourceGroupName
$ResourceType = $AlertContext.resourceType
$ResourceName = $AlertContext.resourceName
$status = ($WebhookBody.data).status
$threshold = $WebhookBody.data.context.condition.allOf.threshold
# Write results to output for logging
Write-output "Subscription - $SubId"
Write-output "RSG Name - $ResourceGroupName"
Write-output "Resource Type - $ResourceType"
Write-output "Resrouce Name - $ResourceName"
Write-output "Status - $status"
Write-output "Threshold - $threshold"
# Only scale if over 85% allocated, to give chance to manually clean up on earlier 80% trigger with just email
if ($threshold -le 85) {
Write-Output "Threshold is $threshold which does not require scaling. Email only sent."
Write-Output "Scaling Occurs once threshold is over 90%"
Exit
}
# As its over 85% threshold, prepare to scale
# Get Resources for SQL and Elastic Pool
$sqlServer = Get-AzSqlServer -ResourceGroupName $ResourceGroupName
$elasticPool = Get-AzSqlElasticPool -ElasticPoolName $ResourceName -ResourceGroupName $ResourceGroupName -ServerName $sqlserver.ServerName
# Add 50GB to the current size
$newElasticPoolStorageGB = $elasticpool.StorageMB + $poolIncreaseMB # Add 50GB
# Check new size and if it will exceed 1TB as a hard limit to prevent runaway scale
if ($newElasticPoolStorageGB -ge $poolIncreaseLimitMB) {
$failError = "Elastic Pool Size for $ResourceName will exceed its allowed upper limit of $($poolIncreaseLimitMB /1024) GB so cannot scale"
Write-Output $failError
throw $failError
}
Write-Output "Current Elastic Pool Size for $ResourceName is $($elasticpool.StorageMB /1024) GB"
Write-Output "Scaling Elastic Pool $ResourceName to $($newElasticPoolStorageGB /1024) GB"
# Scale Elastic Pool to new size
$elasticpool | Set-AzSqlElasticPool -StorageMB $newElasticPoolStorageGB
Write-Output "New Elastic Pool Size for $ResourceName is $($newElasticPoolStorageGB /1024) GB"
Now you had added your Runbook, you need to ensure that the Automation Account knows about the AZ.Sql Module so it can actually run the powershell cmdlets in the runbook.
In your Automation Account, select Modules and Add a Module.
Select Browse from Gallery and then click to Browse the Gallery. A nice feature here is that you don’t need to know the module name, just type the Powershell cmd you want and it will tell you the module. So for example type in Get-AzSQLServer and it will show you the Az.Sql Module. Select it and then when it comes back to the Add a Module screen, select 5.1 as the Runtime Version and Import. This can take a few minutes but as we are finished here so you can move on.
Create your SQL Elastic Pool Alert
Now you have your Automation Runbook, its time to configure the Alert Rule. Go to your SQL Elastic Pool, Alerts and then Create Alert Rule.
The metric you want to monitor is Data Space Allocated Percent, and the values you use are up to you, but I like to create an initial one at 80%. Also the period depends on your individual needs, but for a demo, I set it to 1 minute so it quickly notices the change.
Next you need to crete an Action Group. This is what you are going to action, surprisingly, so it’s executing a runbook, triggering an email etc. So give it a sample name.
Under Action Type, select Automation Runbook.
To find the Runbook you created earlier, change the Runbook Source to User and then find the Automation Account and Runbook you Created earlier. You don’t need to Configure Parameters at this stage, but if you wanted to add in custom policies like how many GB to scale and the upper limit, you can do this here.
I also like to add in additional actions here, such as sending myself an email when it happens, but this is totally up to you.
Repeat all of the above Alert Rules for as many levels as you want. I typically have rules trigger at 80%, 90% and 95%, but you will end up with a rule looking like this.
Time to expand that pool
Its now time to put this all together and see how it functions. You’ll need to add some data to your SQL Elastic Pool to push it over the threshold you set. I like to use these sample StackOverflow Databases for this purpose but you can use whatever suits.
80% Allocated - Don’t Grow
On my first run, I pushed Data Space Allocated to 80%. The Alert Rule on the Elastic Pool triggered the Runbook (yey) but it didn’t grown the pool. This was because I had the threshold in the parameters set to only scale at 90%. This is perfect, as my action group sent me an email so I can investigate if I need to.
90% Allocated - Lets Grow
But for the demo perspective, lets assume it was valid and now it suddenly hits 90%. So the 90% Alert Rule triggers the runbook again.
It works! We can see above that the runbook saw the threshold from the alert was 90%, which was over my 85% threshold. So because of that it knew it needed to scale and added 50 GB to the pool, meaning it was now 100 GB!
Protect with a limit
Like we put in earlier, we want a hard limit to stop run away growth of the pool. To illustrate this here is an example where I changed my thresholds so it would hit the limit in one go. This causes the runbook to throw an exception (as by default all runbooks will complete) and fail, highlighting the limit.
That’s all there is to it. You can adjust and change that with thresholds to suit your needs really easily. It was a fun one for me to do as it solved a fairly common problem and got me more familar with Azure Automation Runbooks at the same time.
Andy