Reducing Cloud Spend by 40% Through Azure Reserved Instances and Rightsizing

FinOps

Executive Summary

A mid-size SaaS company running 60+ Azure VMs across dev, staging, and production was spending $37,240/month on compute alone. After a structured four-phase cost optimization engagement spanning eight weeks, monthly spend dropped to $22,140 — a 40.5% reduction — without removing a single workload or degrading application performance. This post details the methodology, tooling, decision frameworks, and governance automation that delivered $181,200 in annualized savings.

The Challenge

The client's Azure environment had grown organically over three years. Individual engineering teams provisioned VMs based on vendor recommendations or worst-case capacity estimates. Nobody tracked utilization. Cost Management was configured but not reviewed. Azure Advisor recommendations sat unacknowledged for months.

This pattern is not unusual. According to the Flexera 2025 State of the Cloud Report, organizations estimate that 30-35% of their cloud spend is wasted. Gartner research suggests the figure may be higher — up to 40% for enterprises without a dedicated FinOps practice. The client's situation was representative of what we see across mid-market SaaS companies that grew from a small cloud footprint into a multi-environment estate without implementing cost governance along the way.

The initial assessment revealed four structural issues:

  • 75% of production VMs were running below 20% average CPU utilization. Many had been sized for peak loads that occurred once per quarter.
  • Dev and staging environments ran 24/7 despite being used only during business hours in a single timezone. This alone accounted for an estimated $6,800/month in unnecessary spend.
  • No reserved instances were in place. Every VM — including workloads running continuously for two years — ran on pay-as-you-go pricing at the full retail rate.
  • No tagging discipline. Cost allocation required manual spreadsheet reconciliation. Finance could not attribute costs to product lines, and engineering managers had no visibility into their team's consumption.
  • Orphaned resources accumulated silently. Managed disks from deleted VMs, unused public IPs, empty App Service Plans, and idle load balancers were generating charges with no associated workload.

The Approach

We structured the engagement into four sequential phases. The sequencing matters — each phase depends on the outputs of the previous one. Attempting to purchase reservations before rightsizing, for example, locks in waste at a discount. Attempting to rightsize before establishing visibility means operating on assumptions rather than data.

Phase 1: Visibility and Governance Foundation

The first phase established the instrumentation and governance framework required for data-driven optimization. Without tagging, you cannot allocate costs. Without cost allocation, you cannot hold teams accountable. Without accountability, optimization gains erode within months.

Azure Advisor Review: We began with a comprehensive review of Azure Advisor recommendations across all subscriptions. Advisor is often underutilized — it provides actionable findings but lacks business context. Our review process categorizes each recommendation by estimated monthly savings, implementation risk, and business impact.

The Advisor findings broke down as follows:

Advisor CategoryFinding CountEstimated Monthly SavingsRisk LevelOrphaned Managed Disks34$1,280LowUnused Public IP Addresses18$540LowEmpty App Service Plans7$890LowIdle Load Balancers4$320LowOversized VMs (Advisor-flagged)22$4,100MediumUnattached Network Interfaces12$70Low

Key Finding: $3,100/month was going to orphaned resources — managed disks from deleted VMs, unused public IPs, empty App Service Plans, and idle load balancers. These are zero-risk savings that require no application changes.

Tagging Policy Enforcement: We deployed Azure Policy definitions to enforce mandatory tagging on all new resources. The tag schema included:

  • cost-center: Maps to the finance team's cost allocation codes
  • environment: dev, staging, production
  • owner: Engineering team or individual responsible
  • application: The product or service the resource supports
  • created-date: Auto-populated via policy for lifecycle tracking

The policy operated in audit mode for the first week to identify non-compliant resources without blocking deployments, then switched to deny mode. Existing resources were remediated via a targeted remediation task that tagged 340+ resources over a single weekend.

Cost Management Dashboards: We configured Azure Cost Management views segmented by cost-center and environment tags. Each engineering manager received a weekly automated cost report for their team's resources. This visibility alone changed behavior — within the first month, two teams independently shut down forgotten dev environments.

Total Phase 1 savings: $7,200/month from orphaned resource cleanup and immediate Advisor actions.

Phase 2: VM Rightsizing

Rightsizing is the highest-impact optimization lever in most Azure environments. It is also the most misunderstood. Azure Advisor flags VMs with low average CPU, but this single metric is insufficient for a rightsizing decision. A VM running at 8% average CPU may spike to 85% during batch processing. Downsizing based on average alone risks application performance degradation.

Our Rightsizing Methodology:

We analyzed 30 days of Azure Monitor metrics across four dimensions for every VM in the environment. The methodology requires examining these metrics in combination, not isolation:

  • CPU Utilization: We examined the P95 (95th percentile) value, not the average. A VM with 12% average CPU but 78% P95 CPU is appropriately sized. A VM with 12% average and 18% P95 is a rightsizing candidate. The P95 threshold captures burst behavior without being skewed by momentary spikes.
  • Memory Utilization: Azure Monitor does not report memory utilization by default — the Azure Monitor Agent (AMA) must be deployed with performance counters enabled. We deployed AMA across the fleet and collected 14 days of memory data. Several VMs that appeared underutilized by CPU were memory-constrained, which would have made CPU-only rightsizing dangerous.
  • Disk IOPS and Throughput: Premium SSD disks were attached to VMs that never exceeded Standard SSD IOPS thresholds. We identified 11 VMs where downgrading from Premium to Standard SSD saved $45-120/month per VM with no performance impact.
  • Network Throughput: Network-intensive workloads (API gateways, data ingestion) were evaluated against the network bandwidth limits of candidate smaller VM SKUs. Two VMs were excluded from downsizing because their target SKU had insufficient network bandwidth.

We executed rightsizing changes during maintenance windows over two weeks. Each change followed a runbook: snapshot the current disk, resize the VM, validate application health checks, monitor for 24 hours, then confirm or rollback. Zero rollbacks were required.

Total Phase 2 savings: $13,500/month.

Phase 3: Reserved Instances vs. Savings Plans

With rightsized VMs in place, we identified compute workloads suitable for commitment-based pricing. Azure offers two commitment models, and the choice between them requires understanding your workload stability and flexibility requirements.

Reserved Instances vs. Savings Plans — Decision Criteria:

CriteriaReserved Instances (RIs)Savings PlansCommitment typeSpecific VM family, size, and regionDollar amount per hour of computeDiscount depth (1-year)Up to 40%Up to 20%Discount depth (3-year)Up to 62%Up to 35%FlexibilityInstance size flexibility within same familyApplies across VM families, regions, and servicesBest forStable, predictable workloads that will not change VM familyDynamic environments with frequent SKU changesRisk if workload changesStranded reservation — you pay for unused capacityCommitment reallocates automaticallyCancellationEarly termination fee (12% of remaining balance)Non-cancellable

Key Decision: We recommended a hybrid approach. For the 22 production VMs running 24/7 on stable, well-understood workloads (databases, core API servers), we purchased 1-year Reserved Instances for the deeper discount. For the 8 VMs supporting workloads with a planned container migration in 18 months, we recommended Compute Savings Plans for flexibility. We specifically avoided 3-year reservations — the client's product roadmap included a container migration that would change the compute profile significantly.

Azure Hybrid Benefit: An additional optimization lever that many organizations overlook. The client held Windows Server and SQL Server licenses through an Enterprise Agreement with Software Assurance. Azure Hybrid Benefit (AHB) allows these on-premises licenses to offset Azure VM licensing costs.

  • Windows Server AHB: Applied to 28 Windows VMs, saving approximately $85/month per VM ($2,380/month total)
  • SQL Server AHB: Applied to 4 SQL Server VMs running SQL Standard, saving approximately $340/month per VM ($1,360/month total)

AHB savings are often excluded from cloud cost optimization analyses because they depend on existing license entitlements. In this engagement, AHB contributed $3,740/month — more than the orphaned resource cleanup.

Total Phase 3 savings (RIs + Savings Plans + AHB): $9,356/month.

Phase 4: Scheduling and Automation

Dev and staging environments do not need to run 24/7. This is widely known and rarely implemented because manual shutdown processes are unreliable. Engineers forget. Scripts break. Exceptions accumulate.

We deployed an Azure Logic App-based auto-shutdown and auto-start solution with the following design:

  • Auto-shutdown: All VMs tagged environment:dev or environment:staging are stopped (deallocated) at 7:00 PM local time on weekdays and remain off on weekends
  • Auto-start: The same VMs are started at 7:30 AM local time on weekdays
  • Exception handling: VMs tagged always-on:true are excluded from the schedule. This tag requires approval via a lightweight request process — it cannot be self-applied by engineers
  • Notification: A Teams webhook notifies the relevant channel 15 minutes before shutdown, giving engineers time to tag a VM as always-on if needed for an overnight job

This scheduling reduced dev/staging runtime from 168 hours/week to approximately 57.5 hours/week — a 66% reduction in non-production compute hours.

Total Phase 4 savings: $5,184/month.

Detailed Cost Breakdown by Phase

Optimization LeverBefore (Monthly)After (Monthly)Monthly SavingsAnnualized SavingsOrphaned resource cleanup$3,100$0$3,100$37,200Immediate Advisor actions$4,100$0$4,100$49,200VM rightsizing (compute)$24,600$14,240$10,360$124,320Disk tier optimization$2,840$1,900$940$11,280B-series conversion$3,400$1,300$2,100$25,200Reserved Instances (1-year)$8,360$5,004$3,356$40,272Compute Savings Plans$3,240$2,980$260$3,120Azure Hybrid Benefit$3,740$0$3,740$44,880Dev/staging scheduling$6,800$2,316$4,484$53,808Totals$37,240$22,140$15,100$181,200

Cost Optimization Maturity Model

Through dozens of cloud cost engagements, we have observed that organizations progress through three distinct maturity levels. Understanding where your organization sits determines which optimizations will stick and which will erode.

Level 1: Reactive

Cost optimization happens in response to budget overruns or executive alarm. Savings are achieved through one-time cleanup efforts — deleting orphaned resources, shutting down forgotten environments. Without governance, costs drift back to pre-optimization levels within 6-9 months.

Indicators: No tagging policy. Cost Management dashboards exist but are not reviewed. Azure Advisor recommendations are unacknowledged. No FinOps role or practice.

Level 2: Proactive

Tagging is enforced. Cost reports are reviewed monthly by engineering managers. Reserved Instances are purchased for stable workloads. Rightsizing reviews happen quarterly. Anomaly alerts are configured in Cost Management. Savings are sustained because governance prevents the most common drift patterns.

Indicators: Mandatory tagging via Azure Policy in deny mode. Monthly cost review cadence. RI coverage above 60% for stable compute. Budget alerts configured per subscription.

Level 3: Automated

Cost governance is embedded in the deployment pipeline. Infrastructure-as-code templates include cost tags by default. Policy-driven auto-shutdown is standard for non-production. Commitment-based pricing is reviewed monthly with automated utilization tracking. Cost anomaly detection triggers automated investigation workflows.

Indicators: FinOps team or embedded practice. IaC templates enforce cost governance. Auto-scaling replaces static oversizing. Commitment utilization exceeds 90%. Cost per transaction or cost per customer is tracked as a product metric.

The client in this engagement moved from Level 1 to Level 2 during our engagement, with a roadmap to reach Level 3 within six months through IaC adoption and FinOps process integration.

Ongoing Monitoring and Alerting

Optimization without monitoring is a one-time event, not a capability. We configured the following ongoing monitoring framework to prevent cost drift:

  • Budget Alerts: Configured at the subscription level with thresholds at 75%, 90%, and 100% of monthly budget. Alerts notify both the engineering owner and finance contact.
  • Anomaly Detection: Azure Cost Management anomaly detection was enabled to flag unexpected spending spikes. The detection model uses historical patterns and alerts when spend deviates by more than 20% from the trailing 30-day average.
  • Reservation Utilization Monitoring: A monthly automated report tracks RI utilization percentage. If utilization drops below 85%, the report flags the reservation for review — the workload may have been decommissioned or migrated.
  • Quarterly Rightsizing Reviews: A calendar-recurring review uses Azure Advisor and Azure Monitor data to identify VMs that have drifted from their optimal size — either through workload growth (undersized) or workload reduction (oversized).
  • Tag Compliance Dashboard: A Power BI dashboard tracks tag compliance percentage across all subscriptions. The target is 100% compliance on the five mandatory tags. Any resource without mandatory tags appears on a weekly remediation report sent to the resource owner.

Industry Benchmarks

Context matters when evaluating optimization results. The following industry benchmarks help frame the 40.5% reduction achieved in this engagement:

BenchmarkIndustry AverageThis EngagementCloud waste as % of total spend (Flexera 2025)30-35%40.5% recoveredRI/Savings Plan coverage (FinOps Foundation)45-55%72% post-optimizationTag compliance rate (industry average)40-60%100% (policy-enforced)Time to achieve first savings4-8 weeksWeek 1 (orphaned resources)Cost optimization sustainability at 12 months60-70% of initial savings retainedTargeting 90%+ via governance

Results and Outcomes

MetricBeforeAfterImprovementMonthly compute spend$37,240$22,14040.5% reductionAnnualized savings--$181,200--VMs with proper tagging23%100%Policy-enforcedReserved Instance coverage0%72%22 VMs on 1-year RIsNon-production runtime reduction168 hrs/week57.5 hrs/week66% reductionOrphaned resources75+0Policy prevents recurrenceCost visibility (team-level attribution)Manual/monthlyAutomated/weeklyReal-time dashboardsEngagement ROI----6.2x in Year 1

Key Takeaways

  1. Rightsizing before reservations is non-negotiable. Purchasing reserved instances for oversized VMs locks in waste at a discount. Always rightsize first, then commit. The sequencing of this engagement — visibility, then rightsizing, then commitments — is deliberate and essential.
  2. Use P95 metrics, not averages, for rightsizing decisions. Average CPU utilization hides burst patterns that matter for application performance. A VM at 12% average and 78% P95 is correctly sized. A VM at 12% average and 18% P95 is a clear candidate for downsizing. Always validate across CPU, memory, disk IOPS, and network — not CPU alone.
  3. Azure Hybrid Benefit is material and frequently overlooked. If your organization holds Windows Server or SQL Server licenses with Software Assurance, AHB can reduce VM costs by 40-55% on top of other optimizations. In this engagement, AHB contributed $3,740/month — more than the orphaned resource cleanup.
  4. Governance prevents drift — and drift is the real enemy. One-time cleanup without governance is a temporary fix. Tagging policies in deny mode, automated scheduling with exception management, and monthly cost reviews sustain savings. Without governance, expect 30-40% of savings to erode within 6-9 months.
  5. Dev/test scheduling is the highest-ROI single optimization. Reducing non-production runtime from 168 hours/week to 57.5 hours/week saved $4,484/month with near-zero risk. If you do nothing else, implement auto-shutdown for non-production environments.
  6. Understand the RI vs. Savings Plans tradeoff for your specific situation. RIs offer deeper discounts but less flexibility. Savings Plans offer less discount but automatic reallocation. For stable workloads, RIs win. For environments planning SKU changes, migrations, or containerization, Savings Plans avoid stranded commitments.
  7. Establish a FinOps feedback loop. Cost optimization is not a project — it is a practice. Monthly rightsizing reviews, quarterly commitment rebalancing, and ongoing anomaly monitoring transform a one-time savings event into a sustained capability that compounds over time.

Next Steps

If your organization's Azure spend is growing faster than your workload, a structured assessment typically identifies 25-45% in recoverable spend within the first two weeks. The methodology outlined in this post — visibility, rightsizing, commitment optimization, and scheduling — applies to environments of any size.

The most common pattern we see is organizations that have already attempted some optimization but have not achieved sustained results. The missing element is almost always governance — the policies, automation, and processes that prevent drift after the initial optimization.

Contact Techrupt to schedule a cloud cost assessment for your Azure environment.

Ready to Make the Move? Let's Start the Conversation!

Whether you choose Security or Automation service, we will put your technology to work for you.

Schedule Time with Techrupt
Insights

Latest Blogs & News