Why monitoring is critical in todays world.
Why monitoring is critical in todays world.
In the world of IT, when people are asked about monitoring, generally the response is about up/down monitoring for servers, Network Operation Centres looking at banks of screens with red, green and amber blobs on them showing them a rough status of what’s going on.
15 years ago, this was quite normal and at that point in time companies were used to physical devices just doing one job, just doing one thing and perhaps having some kind of physical redundancy setup for said device. In this world, that wasn’t such a terrible way to perform monitoring.
Fast forward to the present day however and life has got a lot more complicated. A typical IT environment now looks like a chunk of roaming users on a mixture of devices using a mixture of connectivity accessing a mixture of platforms, applications and software services. In that world, understanding what’s being used, from where, by who is often super important, not just for availability and service monitoring, but for making decisions on what to invest in, what to change, what to sunset and more.
Imagine if a Bank handed out loans without basing it on any data? Imagine if Formula 1 teams designed new cars and didn’t bother to check the data from their previous runs before making changes? Imagine is home boilers just adjusted without a thermostat to measure the temperature and just “guessed” it instead.
That all seems quite crazy, but in the world of IT, lots of decisions get made in this way. Sometimes system changes are based on a hunch or worst of all, because it worked that way last time.
With good monitoring, you can start to make data driven decisions on where to take your business. You can use the technology to understand where your users come from, how they access your service and you can spot shifting trends.
Monitoring used to be very technology focused – is stuff working or not? Now it can deliver greater business value. At Factory Internet, we still monitor traditional devices – it’s still very much needed – knowing what processes are running on a server, how much resource they’re using, how much space is free on various disk partitions, if a key service is running, is it listening on a port? All of those things are still valid things to monitor. The difference now is that the world, and services have become more connected and interdependent. Your Web Application used to depend on a WebServer/Database running in tandem. Now, it depends on Amazon CloudFront, S3, Github, a third party postcode lookup service, 6 Social Media Plugins, Google Analytics, 19 Javascript components – 6 of which are externally hosted) and two regions of AWS, along with DNS Services, SSL Certificates, NTP Time Services and a bunch of APIs and Message Queues that shuffle things around. To add to the complexity, that might just be one of twenty applications. Other applications might be SaaS application and how do you even begin to monitor someone else service?
Then the position gets worse, the application starts slowing; or becomes more popular, you need to change the stack. How? In the past you could add CPU & RAM and all would be good. Now, how do you know what’s slowing the service down? Do you have data to compare across the last sets of releases to see when the app started to slow down? Is it a third-party service or one of your own services thats suffering? What if you’re running Serverless or Micro-Services? How do you check-in/check-out “worker” servers so you know how they’re performing while delivering a service? What’s the impact to users, are they starting to use rival services instead?
Clearly, today, monitoring is more than just servers. It’s about platform intelligence, it’s about understanding all of the small components that make your service accessible, usable and functional to your community of users – be it internal users or external users.
We think businesses need to be armed with this kind of insight and intelligence. Without it, when something major happens and you’re not aware or you don’t understand why, your teams are posting generic messages back to users on social media, start fielding calls without giving any real insight back to the customer. With data, you’d at least be able to point at the problem and show you’re working on a problem, in a matter of seconds. Having data at this critical point can massively enhance customer trust.
This problem is only getting bigger and more complicated as more and more services move to the web. At Factory Internet, we’ve spent a chunk of time integrating a bunch of monitoring, alerting, service and reporting tools to help address this problem. We can provide this as a service, we combine leading edge tools with a consultative approach to deliver outcomes that aren’t available elsewhere. We’re always open to conversations, so feel free to get in touch!