Alerting and Monitoring in Azure

Alerting and monitoring aspect in most solutions is an afterthought. This thinking can lead production issues and organisations missing their O/SLAs which they agreed with their business or other parties.

When it comes to Azure, we have a range of services to cover all aspect of monitoring-

  1. Azure Monitor-
    A consolidated alerting and monitoring for Azure 1st party/core services.
  2. Azure Alerts-
    By using alerts, you can configure conditions over data and get notified when the conditions match the latest monitoring data.
  3. Service Health-
    It is a suite of experiences that provide personalized guidance and support when issues in Azure services affect you.
  4. Azure Security Center-
    It provides unified security management and advanced threat protection across hybrid cloud workloads.
  5. Azure Advisors-
    It is a personalized cloud consultant that helps you follow best practices to optimize your Azure deployments.
  6. Application Insights-
    It is an extensible Application Performance Management (APM) service for web developers building and managing apps on multiple platforms.
  7. Log Analytics-
    It plays a central role by consolidating monitoring data from different sources and providing a powerful query language (Kusto) for consolidation and analysis.
  8. Azure Mobile App-
    App to manage your Azure resources, in essence, a portal in a phone.

Although, the above services can cover alerting and monitoring fully, it can be an overwhelming exercise to make them work together and understand their individual contributions to the overall solution.

To help understand how these services are connected together and can benefit you as a consumer of Azure services, I’ve put together a following diagram-

…Next up, surface information for different stakeholders, essentially create a dashboard as a product for each role.

Service Fabric Log

Notes from the field on Azure Service Fabric (ASF) and some less known facts-

FAQs-

  1. Why do we need a durability level Silver or Gold? Silver and Gold tier allows SF to integrate with underlying VMSS resulting in the following features-
    • You can scale back the underlying VMSS after scaling it out and ASF will recognise this change and will not mark the cluster as unhealthy. Also note, you cannot scale back the nodes to anything below 5 even though when you create a cluster a Silver/Gold node type has only 3 nodes.
    • Allows ASF to intercept and delay VM level actions requested by the platform or cluster admin to allow stateful services to maintain the minimum replicaset/quorum at any point in time.
  2. Can I change durability tier of the existing cluster/node type?
    • Yes you can upgrade from lower levels to higher and from Gold -> Silver.
  3. Why do we need minimum of 5 nodes in primary node type? Because-
    • To maintain the quorum, you need majority of the nodes running at any point in time in the primary node type. If you are upgrading ASF binaries to the new version and Microsoft decides to update the host machine which hosts one of your 3 VMs then in this situation you are 2 VMs down out of 3, so this will impact the stateful system services. If you had 5 VMs, taking out 2 of 5 will still have 3 (majority) available.
  4. Does Microsoft support ASF cluster spanning across the multiple Azure DCs?
    • Generally speaking, it is not supported.
  5. Can I add/remove Node Types after the cluster is created?
    • Yes.
  6. Can I scale ASF?
    • Yes, via VMSS auto/manual scale mechanism only at present.
  7. Can I make unsecure cluster a secure cluster?
    1. No
  8. Can I scale stateful services?
    1. Yes, by partitioning the data which allows multiple parallel service type instances receiving the requests for their respective partitions.
  9. Each application instance runs in isolation, with its own work directory and process.
  10. Service Fabric process runs in kernel mode hence applications running under it will not be able to crash it easily.
  11. By default, the cluster certificates are added to the allowed Admin certificates list hence the template here secures both node-node and client-node comms. You can though add separate client certs for readonly and admin cluster roles.
  12. Scaling out VMSS causes other nodes in the same VMSS to change to stopping state. This is a superficial UI bug and will be fixed soon, VMs in the VMSS do not stop in reality.
  13. Can I create a SF cluster with small size VMs?
    • Yes, you can but please bear in mind when you do that cluster may start to throw the warnings, see this post to remove those warnings.