Healthcare

UK Healthcare SaaS Platform — Cost Optimization & FinOps Transformation

A FinOps-driven cost optimization engagement for a healthcare SaaS platform serving NHS trusts and private providers. We reduced annual infrastructure costs by 69% — from $204K to $64K — through environment consolidation, elastic CI runners, and a tagging-based cost attribution system, all without any platform rewrite or team disruption.

69% ($140K/yr)
Total cost reduction
$1,600 → $300/mo
CI runner cost
$140,000
Annual savings
Zero
Team disruption
6 weeks 1 engineer
AzureFinOpsCI/CDEnvironment Consolidation

A UK-based healthcare SaaS platform

The client operates a healthcare SaaS platform used by NHS trusts and private healthcare providers across the United Kingdom. Their product manages patient scheduling, clinical documentation, and compliance reporting — workloads that demand high availability and strict data handling practices under NHS Digital standards. The platform had grown steadily over three years, adding customers and features at a pace that outstripped any formal infrastructure planning. By the time we engaged, annual infrastructure spend had reached $204K and was growing at 35% year-over-year, significantly outpacing the 20% revenue growth rate.

The root cause was not a single expensive resource but a pattern of infrastructure sprawl that had accumulated without oversight. Four staging environments ran 24/7 when only one was actively used for pre-production testing. Always-on GitHub Actions self-hosted runners consumed $1,600 per month despite being utilized only 22% of the time — the remaining 78% was idle compute. Over 50 Azure resources across the subscription had no tags, making it impossible to attribute costs to specific teams, features, or environments. There were no budgets, no cost alerts, and no regular review of spending — the finance team discovered cost increases only when monthly invoices arrived.

The engineering team knew costs were a problem but had no framework for diagnosing where the waste was or how to prioritize fixes. FinOps was a concept they aspired to adopt but had never operationalized. They needed someone to build the foundation: visibility into spend, attribution to owners, and a repeatable process for continuous optimization. CloudForge was engaged to deliver exactly that — a cost optimization engagement that would produce immediate savings and establish a sustainable FinOps operating model.

Costs Growing Faster Than Revenue with Zero Visibility

The fundamental challenge was not that the client was spending too much on infrastructure — it was that they had no mechanism to understand what they were spending on or why. Cost growth had become decoupled from business growth, and without visibility, every decision about infrastructure investment was made in the dark.

The always-on CI runners were a textbook example of invisible waste. The engineering team had deployed GitHub Actions self-hosted runners on dedicated Azure VMs to improve build performance. The runners were provisioned as always-on instances because the initial setup guide they followed defaulted to persistent runners, and nobody had revisited the decision. At $1,600 per month, the runners were the sixth-largest line item in the Azure subscription — but because they were tagged only as "CI" with no further attribution, their cost was buried in a general "Development Tools" category that nobody monitored.

The four staging environments told a similar story. Each had been created for a legitimate purpose — release testing, integration testing, demo environments for sales, and a sandbox for experimental features. Over time, only the release testing environment was used regularly. The other three ran 24/7 at full capacity, consuming compute, database, and storage resources identical to production. When we measured actual usage, the integration testing environment had not been accessed in over six weeks, the demo environment was used twice per month for 2-hour sessions, and the sandbox had been abandoned entirely after the feature experiment it was created for was cancelled.

The absence of a tagging strategy made these problems invisible. Azure Cost Management can produce detailed breakdowns by resource group, service type, and tag — but without tags, all 50+ resources appeared as an undifferentiated mass. There was no way to answer basic questions: how much does the staging environment cost? What percentage of spend is attributable to CI/CD? How much does each customer environment cost to operate? Without answers to these questions, cost optimization was guesswork.

Compounding the visibility problem was the absence of any budget or alerting system. Azure supports budget alerts at multiple thresholds, anomaly detection for unexpected spend spikes, and cost forecasting based on historical trends. None of these capabilities had been configured. The finance team received a single monthly invoice and had no way to identify trends, outliers, or optimization opportunities until they compared month-over-month totals in a spreadsheet.

Data-Driven Cost Analysis Correlated with Utilization

Our approach was deliberately non-disruptive. Healthcare SaaS platforms serving NHS trusts operate under strict availability expectations, and the client was understandably cautious about any changes that might affect production stability. We committed upfront that our recommendations would not require a platform rewrite, would not change the deployment architecture, and could be implemented by the existing engineering team without additional tooling.

We began with a comprehensive cost analysis using Azure Cost Management, breaking down spend by resource group, service type, region, and time period. This gave us the macro picture: which categories of spend were growing, which were stable, and where the largest absolute costs were concentrated. We then correlated cost data with Azure Monitor utilization metrics — CPU, memory, disk I/O, and network throughput for every compute resource, DTU utilization for every database, and request counts for every App Service.

The correlation between cost and utilization was the key analytical step. High cost with high utilization means the resource is appropriately sized. High cost with low utilization means waste. We found the latter pattern across 60% of the non-production infrastructure. The always-on CI runners showed 22% average CPU utilization, the three unused staging environments showed near-zero utilization, and several production support services were over-provisioned by 2–3x relative to their peak load.

We supplemented the quantitative analysis with stakeholder interviews to understand usage patterns. The engineering lead confirmed that only one staging environment was used regularly. The QA manager explained that integration testing had moved to a different approach six months earlier, making that environment redundant. The sales team acknowledged they could use the release staging environment for demos instead of maintaining a dedicated one. These conversations were essential — the data showed low utilization, but only the stakeholders could confirm whether that utilization represented actual waste or temporarily idle resources that would be needed again.

Environment Consolidation, Elastic CI, and FinOps Operating Model

The solution was structured in three layers: immediate cost reductions (quick wins), medium-term infrastructure changes, and a long-term FinOps operating model that would ensure costs stayed optimized as the platform grew.

Quick wins delivered in week three included replacing the always-on CI runners with elastic, on-demand runners. We configured GitHub Actions to use Azure Container Instances (ACI) as ephemeral runners that spin up when a workflow triggers and terminate when the job completes. This changed CI runner costs from a fixed $1,600/month to a variable cost based on actual usage — which, at 22% utilization, translated to approximately $300/month. The engineering team noticed no difference in build performance because the underlying compute was identical; only the provisioning model changed.

Environment consolidation was the highest-impact medium-term change. We decommissioned the integration testing and sandbox environments entirely, as stakeholder interviews confirmed they were no longer needed. The demo environment was converted to a scheduled resource that runs only during business hours Monday through Friday, with an Azure Automation runbook handling start/stop operations. The release staging environment was retained at its current specification as the primary pre-production environment. Total environment-related savings: roughly $56K annually.

The FinOps operating model was the most important long-term deliverable. We implemented an eight-tag mandatory tagging strategy covering environment, team, service, cost-center, project, owner, created-date, and criticality. Every existing resource was retroactively tagged, and we configured Azure Policy to enforce tagging on all new resource deployments — resources created without the mandatory tags would be automatically flagged and denied. We built team-level cost attribution dashboards in Azure Cost Management that gave each engineering team lead visibility into their own spend. We configured budget alerts at 80% and 100% thresholds for each cost center, and set up anomaly detection to flag unexpected spend spikes.

Finally, we established a monthly FinOps review process. On the first Monday of each month, the engineering leads meet for 30 minutes to review the cost dashboards, discuss any anomalies, and identify optimization opportunities for the coming month. We facilitated the first three reviews to establish the cadence and trained the team on how to interpret the dashboards and act on the findings. This process is the mechanism that ensures cost optimization is ongoing rather than a one-time event.

How We Delivered

1

Cost Assessment & Utilization Analysis

Weeks 1–2

Comprehensive cost analysis using Azure Cost Management. Correlated spend with Azure Monitor utilization metrics across all compute, database, and CI resources. Stakeholder interviews to validate usage patterns.

2

Quick Wins

Week 3

Deployed elastic CI runners on Azure Container Instances replacing always-on VMs. Immediate $1,300/month savings with zero impact on build performance.

3

Environment Restructuring

Weeks 4–5

Decommissioned unused staging environments. Implemented scheduled scaling for demo environment. Right-sized remaining resources based on utilization analysis.

4

FinOps Operating Model

Week 6

Deployed 8-tag mandatory tagging strategy with Azure Policy enforcement. Built cost attribution dashboards. Configured budget alerts and anomaly detection. Facilitated first monthly FinOps review.

Sustained Cost Reduction with Zero Disruption

69% ($140K/yr)
Total cost reduction
$1,600 → $300/mo
CI runner cost
$140,000
Annual savings
Zero
Team disruption

The engagement delivered a 69% reduction in annual infrastructure costs, from $204K to $64K — a saving of $140K per year. This was achieved entirely through optimization of existing resources, with no platform rewrite, no architecture migration, and no change to the application code. The production environment was untouched throughout the engagement.

CI runner costs dropped from $1,600 to $300 per month (81% reduction), and the engineering team reported that build times were unchanged because the elastic runners used the same VM SKU as the previous always-on instances. Environment consolidation removed $56K in annual spend from decommissioned and scheduled resources. Resource right-sizing across the remaining infrastructure captured an additional $28K in savings through reserved instance commitments and SKU optimization.

The qualitative results were equally significant. For the first time, engineering leads could see exactly what their team's infrastructure cost and how it was trending. The monthly FinOps review created a feedback loop where cost awareness became part of the engineering culture rather than an afterthought. Within three months of the engagement, the team independently identified and implemented two additional optimization opportunities — a database tier adjustment and a storage account migration — that were not in our original recommendations. This is the outcome we optimize for: a team that can sustain and extend the improvements without ongoing external support.

The FinOps maturity model, as defined by the FinOps Foundation, categorizes organizations across three levels: Crawl (ad-hoc), Walk (repeatable), and Run (optimized). At the start of the engagement, the client was firmly in the Crawl phase — no tagging, no budgets, no cost attribution, no regular reviews. By the end, they had progressed to Walk: repeatable processes, team-level visibility, budget enforcement, and a monthly review cadence. The path to Run — automated optimization and predictive cost management — was documented as a future roadmap item.

Tools & Platforms

Azure Cost Management

Spend analysis, trend identification, and budget configuration

Azure Monitor

Resource utilization metrics correlated with cost data

Azure Tags

8-tag mandatory strategy for team-level cost attribution

Azure Policy

Automated enforcement of tagging standards on new resources

Budget Alerts

80%/100% threshold alerts per cost center with anomaly detection

GitHub Actions

Elastic CI runners on Azure Container Instances

Azure Automation

Scheduled start/stop runbooks for non-production environments

Azure Cost Dashboards

Team-level attribution views for monthly FinOps reviews

Lessons Learned

1

CI runners are the most common hidden cost in cloud infrastructure. Always-on self-hosted runners are the default in most setup guides, and teams rarely revisit the decision. In our experience, fewer than 25% of organizations with self-hosted runners have measured their actual utilization — and in every case we have audited, utilization is below 30%. Elastic runners eliminate 70–80% of CI compute cost with zero performance impact.

2

Environment consolidation pays immediately and permanently. Unused or underused staging, demo, and sandbox environments accumulate silently. They are created for valid reasons, but most organizations never decommission them when the original purpose ends. A quarterly environment audit — simply asking "is anyone using this?" — typically recovers 20–40% of non-production spend.

3

Tagging is not glamorous but is the number one prerequisite for FinOps. Without tags, Azure Cost Management is a firehose of undifferentiated data. With a well-designed tagging strategy and policy enforcement, the same data becomes actionable intelligence. Every FinOps initiative should start with tagging, regardless of what the immediate cost concern is.

4

Training the team is more valuable than the one-time savings. The $140K annual reduction was the immediate outcome, but the team's ability to independently identify and execute additional optimizations within three months was the real success. A FinOps operating model that depends on external consultants is not sustainable — the process must be owned by the engineering team.

What impressed us most was the zero-disruption approach. We were nervous about changing anything in our infrastructure given our NHS compliance requirements, but CloudForge optimized our costs by 69% without touching production and without any downtime. The FinOps operating model they established changed how our engineering team thinks about infrastructure — cost awareness is now part of every architecture decision, not an afterthought when the invoice arrives.
Sarah Mitchell
VP of Engineering, UK Healthcare SaaS Platform

Ready to Achieve Similar Results?

Every engagement starts with a conversation about your infrastructure challenges. Let's discuss how CloudForge can help.

Schedule a Consultation