Cloud Access Management as a Delivery Concern: Considerations and Best Practices

Cloud access is often treated as only a security concern, but in reality it is a key factor to safe delivery reducing the risks of incidents on the path to live. In this blog, we dive into cloud access controls as a delivery concern with considerations and best practices to take into account.

Author

Jon Paske

Head of Cloud Engineering, SRE & DevOps Practice

Jon is a multi-discipline engineer currently focused on Cloud Engineering within the Data space. He provides expertise around platform design and implementation, with a drive for SRE practices especially focused on observability and reducing toil.

Access control across workload environments is not just a security problem, but a delivery efficiency challenge. This as a statement is obvious, if you don’t have the access needed to carry out your role, delivery will face friction. Inverting that, it’s reassuring to know that I cannot perform actions I shouldn’t be able to, for example, accidental destructive actions. So how do we get to that position and how do we maintain it?

Cloud access management considerations

Workload lifecycle

Firstly when do we need to apply a workload’s access control. In short, access controls should be considered whenever change occurs, whether additive or subtractive. This nicely aligns to a high-level product lifecycle.

Inception - new greenfield, not impacted by existing usage within the standards set by organisations cloud implementation.
Feature development - additive changes to the workload, for example a new queue when adding a new async integration.
End-of-life - either at a workload or feature level resulting reduced access needs.

Multi-layered Implementations

Access controls are multi-tiered - in the scenario of an AWS organisation it can span from org to account.

Service Control policies (SCPs) - policies applied to an account or collection of. This can be used to enforce an allow list of services.
Role Policy - defines access to a given principal for example and SSO user of automated service.
Boundary Policy - working in collaboration of boundary polices to set the maximum set of actions allowing the enforcement of guardrails.

Image credit: AWS

Working in tandem, these layers establish clear ownership and simplify policy design. A role policy does not need to manage blast radius or propagation if a boundary policy already enforces those limits. The role policy can focus purely on what the principal needs to do.

Personas and principals

When considering who needs access, there are two primary categories: human and service (non-human) principals.

For humans, access should align to function rather than identity. A single person may perform multiple roles, for example:
- auditing platform health (read-only)
- supporting or executing releases (write access)

Separating these functions into distinct roles limits risk. Auditing can be safely read-only, while release roles carry higher privilege and therefore higher scrutiny.

Understanding personas and responsibilities before building policies allows for a clearer, more maintainable access model throughout the workload lifecycle.

Environment differences

Not all environments serve the same purpose. Production environments exist to deliver business value to end users. Test environments, while often production-like, primarily exist to enable delivery, not the business use case itself.

Differences may include:
- data that mirrors production profiles but is fictitious or anonymised
- simulated or stubbed endpoints
- additional services such as automated test frameworks

Testing introduces another access pattern. In production, actions are driven by users or upstream services. In testing, automation often impersonates those interactions sometimes requiring direct database access or API calls normally triggered via a UI. Each environment as a result needs its own access design.

Impact

Taking these considerations into focus access controls can be streamlined. Alternatively, as with all debt, the impact compounds.

Organic workflow growth results in a blurred purpose letting additional exceptions to the rule creep in. Access Controls are then marred by unclear personas and principles no longer fitting a precise purpose opening the possibility of error due to permissible access.

Complex access controls or unnecessarily elevated within the lower environments lead to a divergence on the path to live resulting in access issues in higher environments which typical have tighter controls. For example, an SCP blocking a service in prod that was allowed by default in non-prod lower environments. Or the permission to all queue actions in non-production and not even the permission to view them in production.

Policy growth and complexity can result in higher testing requirements especially when a role diverges across environments multiple feature switch paths often include hands on manual testing.

All of these points ultimately affect the agility of a team either directly or indirectly to make changes needed for the business.

Tooling

Tooling provides visibility and a mechanism to eliminate the technical debt built within a platform. First thought is to introduce literal apps or scripts but tools can also be a framework leveraged as part of ways of working to align practises such as run book for doing a Pull Request.

Bespoke tools have a tipping point for their return on investment which equates to the problem size either within a given product team or reusable across the organisation and beyond (open source). In this problem case, tool applications can run into the same problem of access that it is trying to solve hindering the deployment and effectiveness of the tool itself. Ultimately bespoke tools can give the greatest flexibility given ownership for a repeatable aid over the lifecycle of the product.

Before building a bespoke tool, the question is this problem only affecting me comes to mind? Therefore, talking to team within the organisation and beyond can provide added knowledge on where an approach is working well alongside the discovery of tools that can be used with minimal investment.

Static code analysis tools, such as Code Scene and SonarQube, provide an overarching view of a codebase. The feedback emphasises tech debt and complexity, informing remediation activities. These tools align with a configurable industry definition of “good,” ranging from code smells to highlight vulnerabilities, including memory management.

Beyond this generalisation, tools such as Checkov provide security-specific feedback in relation to access controls, highlighting standards like least privilege access. All of which provide a shift left approach for early development feedback in the development process, with a level of protection for the main code base.

These approaches provide mechanisms to delivery friction from access controls deviating from the least privileged required access. Shifting the feedback left on the delivery lifecycle enables preventative measures. However, in tandem to changes the existing environments states need to be reviewed. Environment scans complete the overall picture of current access controls; Cloud providers typically provide native tools including AWS IAM Accessor Adviser.

Scenario - S3 bucket policies

When working on an existing AWS estate the approach for S3 bucket access controls was facilitated through bucket policies opposed to centralised roles. This was a work around for a human process where multiple teams needed to be involved to provide role updates.

As part of a shift within the organisation to improve observability, new products and processes were being introduced to include security adherence and cost controls. This required new principals / services to have access to the S3 buckets to report on policy adherence for example encryption.

The update process included applying changes to all the S3 buckets individually first allow actions and to update the exception statement, so the new principal is not blocked. This change took over a day to change, due to the number of buckets and bespoke policies, prior to testing and rolling out to production highlighted the cost of complexity that has built up within the system.

This effort spurned some analysis create an access control map which can now be used to inform a new access control strategy aligned to AWS best practices which can be fed into driving down technical debt initiatives.

IAM Documentation Resources

Summary

Regardless of the phase your workload is in, it is never too late to consider your access controls. Depending on your position there will be a varying degree of effort need to align it with differing value.

Realising where you are at will feed into a strategy between chipping away during each change to a drastic wholesale rewrite. fundamentally tech debt is not bad if it is an active decision with an understanding of impact so take this thought in a wider context.

If you'd like to discuss your cloud access management, get in touch!

Author

Jon Paske

Head of Cloud Engineering, SRE & DevOps Practice

Advisory

Enterprise Data & AI Platforms

ML Solutions

Generative AI Solutions

Data Migrations

Run and Support