Open Policy Agent (OPA) has become the de facto standard for policy as code, providing a unified framework to enforce rules across the cloud-native stack. While often associated with security and compliance, OPA is an incredibly powerful tool for FinOps, enabling teams to implement automated, policy-driven cost optimization. By writing policies in its declarative language, Rego, you can create guardrails that prevent budget overruns and enforce cost-saving best practices.
How OPA Works for Cost Control
OPA works by decoupling policy logic from your application or CI/CD pipeline. It evaluates a JSON input against policies written in Rego to make a decision (e.g., allow or deny). For cost control, this process typically involves:
Generating Input Data: For Kubernetes, the API server provides the manifest of a resource being created as JSON. For Terraform, you generate a plan and convert it to JSON.
Adding Cost Data: A cost estimation tool like Infracost or Scalr's native integration analyzes the IaC plan and injects cost data (e.g.,
proposed_monthly_cost) into the JSON input for OPA.Evaluating Policies: OPA evaluates this combined JSON data against your Rego policies to check for violations.
Enforcing Decisions: If a policy is violated, OPA returns a deny message, and the admission controller (for Kubernetes) or CI/CD pipeline (for Terraform) blocks the deployment.
OPA Policy Examples for Kubernetes
In Kubernetes, OPA is often deployed as an admission controller using Gatekeeper. This allows you to enforce policies every time a new resource is created or updated.
1. Enforce Mandatory Cost Allocation Labels
This policy ensures every Deployment has a cost-center label.
Code snippet
package kubernetes.validating.cost_control
deny[msg] {
input.request.object.kind == "Deployment"
not input.request.object.metadata.labels["cost-center"]
msg := "Deployments must have a 'cost-center' label for cost allocation."
}
How it works: This policy checks if the incoming object is a Deployment and if the cost-center label is missing from its metadata. If the label is not present, it generates a deny message, and Gatekeeper will reject the resource creation.
2. Restrict Expensive Node Selectors in Non-Production
This policy restricts which node selectors are allowed in certain namespaces.
Code snippet
package kubernetes.validating.cost_control
deny[msg] {
input.request.object.kind == "Pod"
input.request.namespace == "development"
input.request.object.spec.nodeSelector["gpu"] == "true"
msg := "GPU nodes are not allowed in the 'development' namespace."
}
How it works: This policy triggers if a Pod is being created in the development namespace and its nodeSelector is set to request a GPU-enabled node.
3. Enforce Resource Limits to Prevent Runaway Costs
This policy denies pods that do not have memory limits defined.
Code snippet
package kubernetes.validating.cost_control
deny[msg] {
container := input.request.object.spec.containers[_]
not container.resources.limits.memory
msg := sprintf("Container '%v' must have memory limits defined.", [container.name])
}
How it works: The policy iterates through each container in a pod's specification (containers[_]). If any container is found without resources.limits.memory defined, it generates a deny message.
OPA Policy Examples for Terraform
In a CI/CD workflow for Terraform, OPA evaluates a JSON file generated from the plan and cost estimation output.
1. Block Deployments Exceeding a Cost Increase Threshold
This policy blocks any pull request that would increase the monthly infrastructure cost by more than $100.
Code snippet
package terraform.cost_control
deny[msg] {
input.tfrun.cost_estimate.delta_monthly_cost > 100
msg := sprintf("Monthly cost increase of $%v exceeds the $100 limit.", [input.tfrun.cost_estimate.delta_monthly_cost])
}
How it works: This policy directly accesses the delta_monthly_cost field provided by a tool like Scalr. If the value is greater than 100, the policy fails, and the CI/CD pipeline can be configured to block the merge.
2. Restrict Expensive Instance Types
This policy prevents the use of large, expensive AWS EC2 instance types.
Code snippet
package terraform.cost_control
# List of disallowed expensive instance types
disallowed_types := {"m5.8xlarge", "c5.12xlarge", "p3.2xlarge"}
deny[msg] {
# Find any aws_instance resource being created or updated
resource_change := input.plan.resource_changes[_]
resource_change.type == "aws_instance"
resource_change.change.actions[_] == "create"
# Check if the instance type is in the disallowed list
instance_type := resource_change.change.after.instance_type
disallowed_types[instance_type]
msg := sprintf("Resource '%v' uses disallowed expensive instance type '%v'.", [resource_change.address, instance_type])
}
How it works: The policy iterates through all resource_changes in the Terraform plan JSON. It identifies any aws_instance being created and checks if its instance_type is present in the disallowed_types set.
3. Enforce Data Archiving with S3 Lifecycle Policies
This policy ensures that all S3 buckets have a lifecycle rule to transition old data to a cheaper storage class.
Code snippet
package terraform.cost_control
deny[msg] {
resource_change := input.plan.resource_changes[_]
resource_change.type == "aws_s3_bucket"
resource_change.change.actions[_] == "create"
# Check if a lifecycle_rule is defined
not resource_change.change.after.lifecycle_rule
msg := sprintf("S3 bucket '%v' must have a lifecycle_rule defined for cost optimization.", [resource_change.address])
}
How it works: This policy checks every new aws_s3_bucket resource to ensure that a lifecycle_rule block is present in its configuration.
Conclusion
Open Policy Agent, when combined with cost estimation data, provides a powerful, flexible, and open-source framework for implementing FinOps as Code. The Rego language allows you to write nuanced policies that go far beyond simple budget alerting. By codifying cost controls for both Kubernetes and Terraform, organizations can build an automated governance system that enforces financial best practices and prevents costly mistakes.
All in One Place
Atler Pilot decodes your cloud spend story by bringing monitoring, automation, and intelligent insights together for faster and better cloud operations.

