Things to consider when planning to use cloud services
Things to consider when planning to use cloud services
While the Cloud has been around for a while now, it’s still causing major headaches when companies are adopting cloud services. It doesn’t seem to matter if the company is a micro, small, medium, or large organisation, the problems faced tend to be similar in nature but can be amplified, especially for small or large companies.
In Factory, we’ve helped companies make the best use of services from AWS, Microsoft and Google, we’ve migrated web platforms, IT workloads, security workloads and analytics services into the cloud for a variety of customers. The architectures we’ve worked with have varied drastically. Some are very “traditional” EC2 setups, others are fully automated CI/CD deployments running in a DevSecOps manner with all the bells and whistles, others are “hybrid” and have tentacles into various on-premises and hosted setups existing in different areas of the business. We’ve even done security evaluation for customers who’ve managed to end up with several hundred different tenants. This of course is just the AWS/Azure stuff, we’ve not even got into Office 365/Azure AD setups or the myriad of SaaS services that can be used within an organisation.
Given the data and experience we’ve got in this space, we thought it might be useful to distil some of it into a blog post to act as a bit of a guide into common pitfalls/problems we see and things that we think are worth considering when making a move to the cloud. Being the agnostic organisation, we are (Yes, we still deploy stuff in normal datacentres, yes we still build normal servers too). As such, while we’ll always recommend using cloud when it’s right, we’ll also recognise when it’s not and we’ll not just sell cloud to improve our relationship with a cloud provider at the expense of our relationship with a customer (We’re looking at far too many companies when we say this!).
For any migration, prior to even thinking about technology, there’s a bunch of other considerations that make sense to evaluate and think about first, we normally see the biggest problems when technology choices have been made without first aligning that to a business purpose or thinking about wider technology requirements. The impacts of not deciding things at this stage aren’t immediately obvious but start to cause complications later, sometimes years down the line. The following areas are good areas to explore upfront for any cloud related projects, actively exploring these will give you a good indication of if what you’re looking at is a good fit, and if it’s likely to succeed.
Through this article, we’ll divide into a few key areas and go through some common questions that arise and give an indication into the thought process we go through when looking at these areas.
Does it need to be in the Cloud? It’s surprisingly common to hear the term “cloud first” as a core pillar of any enterprise strategy now. In certain circumstances this makes the upmost of sense. If you’re committed to moving your workloads into a cloud, you can standardise on the tools and technologies of that platform, perform all your typical due diligence checks as scale and then allow individual projects to gain off that economy of scale. While this approach makes sense – i.e., cloud first, it shouldn’t mean cloud only. What we commonly see if project teams who hear cloud first so will actively ignore other viable options, even if their workload makes very little sense to port to the cloud.
Equally, from an organisational perspective, there is merit to having everything in a particular cloud. You’ve got visibility of it and can see what you have at this point, it makes tasks like auditing and compliance dramatically easier, and the outputs of those tasks might be dramatically more important than a slightly more expensive cloud bill.
We typically see the biggest problems of systems going to cloud when the expectation is moving a selection of physical/virtual machines into the cloud will be cheaper. Often, this isn’t the case and it’d simply be cheaper to run these systems in a virtualised platform hosted in a traditional datacentre.
The other use-case that’s tricky is when there is a lot of action on-premises – say manufacturing/large networks and traffic between the cloud/wider network is very “chatty”. This can be exasperated with outbound transfer charges which when moving huge amounts of data can get very expensive. Sometimes there isn’t a nice workaround for this where the cloud is fully utilised. Our approach to this is normally a hybrid architecture where certain elements remain within a customer network/datacentre/hosting facility and other elements end up in the cloud. In some cases, if the scale is there, using services like AWS Outpost can also realistically work and help. This again brings a scale question to point though, certain options only work given certain scale points, otherwise the cost of implementing a certain option will never be a benefit because it’ll never get used to it designed scale.
People and support
This is imperatively important. One thing that’s typical here is understanding what skills you already have, what skills you could feasibly recruit and what skills you could get via an MSP relationship.
Ultimately, aligning your choices primarily to your core teams, it should be understandable and relatable to your teams. In some respects, we’ve recommended Azure for customers in instances where from a purely technical standpoint, AWS would be superior for a given use case. Once however the complexity of access management/logging/knowledge of the platform is considered, the advantage dwindles and although it’s the “right” technology choice, it’s not the right choice for that given customer. If the scale of the project was bigger, it might then be a different story, because the scale might offset the integration effort/training effort and team size, but it’s still a consideration.
Next on the support side is when is your platform available to users, do you need out of hours support in-case something breaks? Is monitoring configured in a way that’s tied to release management?
The elephant in the room with cloud services is the “how much is it going to cost” and the answer typically is “it depends”. The downside of this of course is budgeting is a financial exercise that any company should do and companies of a reasonable size almost certainly do. As such, understanding how much something will be becomes an important consideration. From a technical side, techies often give a higher number and then think people will be pleasantly surprised when the bill comes in at 50% of that initial number. Finance people typically find this hugely annoying because if everyone underspends by 50% then they’ll have said no to a bunch of projects which could have otherwise occurred if they knew they’d have had that pot of cash to spend when they were planning.
As such, it’s important to be as accurate as possible with cloud spending. Billing in the cloud is an incredibly complex exercise, some organisations have setup companies that do nothing other than work out the best way to optimise, predict and report your cloud spending. At an organisation level, if you’ve lots of projects and accounts, this becomes more complex. If you’ve a few accounts, it’s easier but there are still complexities.
By way of example, if someone accidently picks an instance which bills at $50 an hour rather than the tiny instance that bills at $0.05 an hour, when do you find out about this error? An hour later, a month later? How long would that instance be left running for before you know?
Having appropriate controls and checks around this sort of stuff won’t stop every issue of thing kind occurring, but It’ll dramatically help reduce how often it happens and reduce the time it takes someone to notice.
Ultimately, using calculators, managing the bill/reporting on spend and knowing what assets you have/why you have them/who deployed them will help work towards an idea of what a normal bill looks like in the Cloud.
VMs, Containers or Serverless? So now you’ve picked the Cloud, what kind of architecture do you want? Should you just stick with VMs so it’s easier to move to another cloud? (Don’t do this!) What about containers? How about going all in on serverless? Again, this depends on what you’re trying to achieve. Typically, you’ll be deploying code that’s either been written by a vendor or written by your own teams. If a vendor has written something, you might be constrained to running it in VMs. Equally, you might have a legacy component that’s easier to just run in a VM. That’s fine and VMs can still be a component of a valid cloud architecture.
If you can make use of containers, it’ll be useful in the long run. You can specify how your application should run, what sort of dependencies it needs and how it’ll communicate with a wider system.
Likewise, moving to fully serverless and just letting the cloud platforms run your code can be even better, you’ll be paying less, have less to manage and in theory everything will just work fine.
Both architectures have pros and cons though. What we’ve typically found Is most customers cloud platforms a built by small teams and typically they’ll be 1-3 key people making things happen in that platform. If the key person moves on, customers often come to an MSP like us at that point for help and support, either on a long-term basis or to stabilise things while they’re recruiting.
We tend to find customers struggle more to find the correct skills in the serverless space, as such, at smaller scale platforms, we find the skills required – and the commercial cost of those skills – sometimes wipes out the savings they’ve made on infrastructure by moving to serverless. This isn’t to say serverless isn’t the right thing to do, in many instances, it’s becoming the only sensible thing to do. Equally, it’s worth understanding the resourcing/project cost to get a platform into a serverless architecture and then working out if that makes sense prior to embarking on that journey.
What does the platform talk to?
Does the platform talk to a lot of external or on-premises sources? What way does the traffic go? If the system is largely receiving data from the outside world, that’s great, if it’s sending large quantities of data back, that’s a problem and connectivity costs might get expensive. Most public clouds still bill by the GB transferred. When compared to traditional connectivity costs, this is mightily expensive, worse still, the providers don’t typically review the pricing of this all that often and again, high use of outbound bandwidth (transfer from the cloud to an external platform – say you downloading from a cloud service) can suddenly get expensive.
Moving to private connectivity options doesn’t generally help this as you’ll now have a carrier cost and you’ll still have a “by the GB” outbound cost, as such, don’t use the notion of using a private connection to save money. Private connections should be used when security concerns, segregation or performance guarantees are required.
What’s your software vendors plans
Ultimately, you’ll be running software. If you’re running software, where else can it run? Does the software provider have a SaaS Service you could use instead? If you’re a smaller business, this is often a much wiser investment. The complexity of keeping servers/IaaS/PaaS services talking and working is often not commercially competitive with “It’s X per user per month”. While that might seem expensive, if it’s “all in” – i.e., you don’t need servers, they’re managing the platform/platform security etc then it might actually work well at lower user counts.
Equally, you might need to run some software or a web application that you’d like to host in the cloud, again this might be a good option. Absolutely consider the software vendors plan when planning this sort of migration though, if they’re going to launch a cloud service, you might find the time and effort your putting in might not payoff if you’re only going to be in your IaaS area for a little while before moving to the vendors SaaS offering.
How will you handle tenant level management?
This goes for IaaS and SaaS Services. If you take a SaaS service, how are you going to onboard users? Who’s going to handle subscriptions and licensing? Does the system segment data and access by groups, should that be split between different groups? Does the system talk to AzureAD/Active Directory/OpenID? How can you secure admin accounts in the tenant? Can you get logs to see when the tenant is being accessed? Can you lock the tenant to access from your own network or devices?
All these things come into consideration, and that’s about the wrapping of the cloud tenant before you get into the workload/the point the service exists in the first place. For the users to perform their job/do what they need to do, a level of assurance about access to your services – and ensuring data doesn’t accidently leak needs to be covered.
How will you handle the workloads inside the tenant?
If the system is an IaaS/PaaS tenant, how will you handle what’s created inside the tenant? Services like AWS and Azure have a concept of tagging. This concept essentially lets you tag things that have been created. Having a common tagging standard and enforcing it can be critical in policing the workloads inside the tenant. i.e. knowing who created a resource, what department they’re from, what system the resource belongs to, is the resource is test/dev or production and what not is imperative in designing a strategy to operate your cloud services in a successful manner.
Out of the workloads created (virtual machines/containers/container orchestration resources/databases/object stores), who’ll manage them and fix them when they break, who’ll perform backups and who will keep an eye on the costs these services are incurring and if they’re running as expected? If I had a £1 for every time I’d been in a meeting following a cloud assessment where we’d saved the customer £30k+ on the spot, we’ll, I’d have about £9 now, but the point is, it’s happened often enough that it’s worth checking.
Automation – how much of it will you do?
Automation can be fantastic, and we advocate it here. Many of us in Factory worked with service providers who had thousands of servers and hundreds of thousands of customers in the early 2000s. The only way to get anything done was to automate so we see the value of automation. We’ve been using tools like Puppet, Ansible, CloudFormation and Terraform since they launched and use them to good effect when they make sense.
Our typical process for automation follows the logic of it’s very hard to automate something you either 1) haven’t defined very well or 2) don’t yet understand. Often, we see engineers trying to immediately automate something they’ve never deployed. As such, our first approach is to get engineers to experiment with a new service, understand the service and start to gauge how it works. Once that knowledge is gained, you can then start to understand and define a requirement, once you have the requirement along with the understanding of how the service works, you can begin to automate.
We often get the “what’s the point of automating X, it’s just two VMs and it’ll be quicker to just create it manually”. This is fine, and can work, but, if a wider team is aware of automation code and can use it, you can start to use off the shelf scripts, mildly modify them for your requirement and then proceed to build your system in an automated repeatable manner. This starts to pay dividends if you’re in a regulated industry or have auditors. Questions like “What firewall rules are enforced for device X” are great to answer with “The firewall rules are in code and changes are committed and logged, today they look like this, but here’s the last years changes to that particular ruleset”.
Data extraction formats (PaaS/SaaS)
If you’re using especially SaaS services and in some instances PaaS services, look at what your data looks like if you were to choose to migrate away from that provider. Often, SaaS providers will offer a very flat (i.e. not relational) export of your data which is very hard to make anything useful from. In this instance, the SaaS provider might be being deliberately awkward to maintain subscription numbers by making it harder to move away.
We see this as hugely important as a feature of SaaS services, however. In the modern/cloud world, if a SaaS provider goes out of business, you completely lose the ability to use that software and you could also loose the data you have within that tenant.
Another issue that’s coming up more as the arrow of time moves forward and more data moves towards SaaS is that in typically mergers and acquisitions companies tend to align common IT services. Often, performing a SaaS > SaaS migration can be quite complex depending on how the import/export functions of the various services work. The most obvious strategy here is either using specialist software of using an IaaS cloud as a midpoint/translation engine between the two services to shuffle data back and forth.
Given this issue however, if you can at least review what a data export looks like when evaluating a SaaS service, you’ll have some idea of the level of risk/complexity you’ll have if you need to migrate away from that service at a later date. You might mark that risk higher or lower depending on the business impact of that service no longer being available presents to your business.
With all Cloud Services, most have what’s commonly known as a Shared responsibility model which relates to who’s responsible for what when it comes to security. As an example, AWS would be responsible for the security of their login pages to make sure you can only gain access to your tenant with valid credentials, and you would be responsible for keeping your credentials safe. This sounds simple but starts to become more complex when you put more stuff in the cloud or give more people access to your tenants. It’s important to understand the shared responsibility, and critically the bits you’re responsible for. Moving to the cloud still leaves you with a lot to do with regard to tenant level and workload level security.
What security standards/controls do you need to adopt?
Luckily there are great standards out there. The Cloud Security Alliance have a great standard with sections covering the ins and outs of ways to manage tenants and workloads within those tenants. Other standards from organisations like the Centre for Internet Security (commonly referred to as CIS).
There is an array of different things that can be done in this space. Obviously, the closer you are to recognised standards, the easier audits will be. You’ll also start with a pretty good level of security. The complexity of course is making sure you can keep to that standard as time moves forward. Doing that can be achieved through firstly having a good architecture upfront, secondly through good controls and finally through monitoring/observation and auditing to make sure your systems are still passing the standards you’ve intended the system to achieve.
Of course, while a lot of the cloud service is ultimately about technology, there are clear legal obligations to consider when using a cloud service. Unless you’ve been under a rock for the last 6-8 years, you’ll have spent at least some time fending off GDPR consultants.
The terms and conditions instilled by the cloud provider will largely govern what they will and won’t do with the data you entrust them with. It’ll also set out there responsibilities, your responsibilities and will give a feel to the kind of privacy you can expect for using their service.
This becomes critical when you’re handling personal or sensitive data. You’ll be in a position potentially as a data controller with the service provider acting as a data processor. Ultimately, you control the data (i.e., you say whether or not to delete a customer record from your database) but the cloud provider would process that change (you control the press of the delete button, they initiate the delete process). As such, you have a linkage of responsibility there. Ultimately however, the data is entrusted with you and you’re making a claim to your customer that the cloud provider meets the legal obligations on your business.
Depending on size and scale, providers – even the big ones – will accept amendments to their terms and conditions, this can include price negotiations or fixed percentage discounts below market rates. Typically, you’ve got to be buying at a certain scale before this occurs, but it is absolutely something that can be achieved.
Finally the extraction of data or the ability to have customised extractions should you need to leave a cloud provider for any reason can be something to aim for if negotiating on terms. This is important as typically some SaaS services will have a relatively limited export function. Behind that however, they’ll have data stored in normal databases/Big Data systems. As such, if a provider would agree to give you some analyst/developer time on their side to customise an export should it be required, this would be favourable. Not all SaaS providers will meet this, but it’s certainly worth an ask.
Common use cases
To finish this piece up, we thought it’d be useful to cover some very common use cases and say when we think these are/aren’t right for cloud and what kind of thoughts/scale/involvement might tip the balance of favour.
Small Web Platforms
This is a typical business website. It’s moved beyond the realms of a shared hosting plan, it might be sat on a single server/virtual server and your organisation wants the site to work better for international users and your organisation would like the site to be more resilient.
This is a particularly obvious use case for using cloud services. We’d typically look at a small setup initially using something like AWS, Azure or GCP. We’d typically perform a single or multi-region setup depending on the scale/use case requirements. The web-tier/code would go into static content if possible, if there was much dynamic content we’d either go serverless, container or VM depending again on scale/complexity. If the code worked in a simple manner in serverless, we’d go that way, if it was a real struggle to get it working, we’d look at options to push to a container or a VM. In terms of data storage, assuming the site used an SQL database that was supported under something like AWS RDS, we’d use that. Naturally, we’d wrap everything in a VPC layer, add in an ALB (application load balancer). Depending on how “global” the site needed to be and how static the content was, we’d either setup dynamic elements in different regions in the world (fingers crossed we’ll start just running code on CDN edges soon!). Once that’s done, we’d finally route traffic to the application using an agnostic service like Cloudflare or use one of the Cloud providers own services (Route 53/Cloudfront).
Global Supplier Connected into a UK business
A UK business has a few people working in different parts of the world. The customer doesn’t want data to leave their platforms so doesn’t really want to ship laptops aboard. Equally, the latency for the users is intolerable so it’s quite tricky to get the best work out of them. In terms of meetings, the company has shipped iOS devices and the users/company find using this acceptable for meetings. They do however want a separate system to access company information.
In this instance, we’d look to use a service like Windows Virtual Desktops or AWS Workspaces. These services would let us deploy a desktop nearby to the user. That desktop could be connected into the corporate active directory structure and the user could get a normal login. We could build a virtual private network back to the company’s location to allow use of services and systems. Obviously, latency will always be an issue with high distance applications, but with the desktop nearer the user, that will solve a lot of the problems. For larger environments, we might use wan optimisation technology on the VPN to speed up the flow of traffic by effectively packaging less network efficient traffic into more network efficient transport protocols.
Depending on the level of security required, we could ship a pre-configured firewall/switch/thin client to the remote user. The firewall would auto-VPN and would only allow traffic from the thin-client to access the network. In this mode, the thin client would give the user a “local” desktop however all the content/the desktop itself would never leave the confines of the cloud.
This would reduce the risk of data loss/exfiltration, improve latency for the user and help alleviate compliance issues with storing sensitive data abroad.
Virtual Machine Hosting
A company still has a range of physical/virtual machines for a variety of tasks, the usage is typically in country/in company and the machines are running a mixture of Windows and Linux. The systems run day to day EPOS systems/ordering systems/finance systems and also contain a multitude of test/dev work for applications across the business.
In this instance, it’s tricky, cloud at first appears obvious because you can just start loading VMs into it and migrating applications in. The downside on this is cost, particularly as scale increases. Typically the options here are either 1) move to the cloud and negotiate/use compute savings plans/rightsized VMs – which doesn’t always work – special nod to Java here! Or 2) build a new VM platform and host these machines on that.
Assuming the hardware is procured in year 1, the prices of both options are similar with a marginal lead to the cloud. In years 2, 3, 4 and 5 however, the VM platform wins hugely because the cost has already been absorbed in year 1.
Taking this a different route, if the hardware is financed and the costs spread equally, the VM platform will be significantly more cost effective than hosting in this cloud. Our estimates of this tend to cover a service provider colocation space, hardware maintenance, a network, virtualisation software and the appropriate software to maintain a normal environment and the management costs associated with running such a platform.
This environment typically starts to make sense when your workloads combined are using more than 500GB of RAM and 10-20TB of storage. It can work on smaller scales as well, but use case dependent, the cloud can win for those cases.
One obvious issue is manufacturing. Some plants run 24x7x365, some have PLC controllers/SCADA systems which run a variety of code. Typically, in this space, we see a hybrid option occurring. A certain amount of compute must be onsite if connectivity fails. Most sites of this nature have access to lots of resilient power due to the nature of what they do, as such, running compute workloads isn’t a huge issue.
For companies that are trying to do everything in a consistent manner, systems like AWS Outpost can help bring everything into one platform while keeping a certain element of compute local which can increase the ability to still operate even when connectivity fails outside of the site.
Equally, there are huge cost savings by using a platform from a company like Nutanix/Dell/HP which will give you a lot of resilience/performance in a package that’ll reliably run VMs and services when designed correctly.
Our final thoughts on all the above are that it’s always worth considering the use case, the why. If you’ve not correctly communicated or understood why you need to achieve a certain objective, there’s not much point getting into what and how. It makes sense to ensure you understand why your organisation is doing something and what benefits they expect from that change. Only then can you accurately design a system to cope with that need and demand. It’s always less painful to have “Oh I thought you meant that?” conversations before you’ve burned through a huge amount of time and money.
It’s hard to cover this in a blog post without feeling like I’ve left things out. It’s also one of those reads that feels somewhat flighty, I switch from business related info through to tech related info, as such, this isn’t perhaps the easiest read and I apologise for that. On engagements, we typically tailor our content to our audience, in a blog, this is tailored to the world with a view to giving some insight away in a form that’s hopefully actionable and which provokes thought at the stage of adopting and building cloud services. It’s always important to consider the wider ramifications of technology choices and how that decision might play out in an organisation, perhaps not immediately, but at some point, in the future. This is becoming ever-more critical as now production isn’t just keeping our organisation running, but is often keeping society itself running.