Unhealthy event: SourceId=’FabricDCA’, Property=’DataCollectionAgent.DiskSpaceAvailable’, HealthState=’Warning’, ConsiderWarningAsError=false. The Data Collection Agent (DCA) does not have enough disk space to operate. Diagnostics information will be left uncollected if this continues to happen.

You will often see this error pretty much right away when your Service Fabric cluster comes up if you are using a VMSS with VMs having smaller temporary disk sizes (d:\).

So what’s going on here?

By default Service Fabric’s reliable collections logs for reliable system services are stored in D:\SvcFab, to verify this you can remote desktop in to one of the VMs in VMSS where this warning is coming from. Most people will only see this warning in primary node type as the services you as a customer create are generally stateless and hence no stateful data logs are present on the non primary node types.

Default log size for replicator log (reliable collections) in MB is 8192 so if your temporary disk is 7GB (Standard_D2_v2) for example you will see the warning message in the cluster explorer as below-

Unhealthy event: SourceId=’FabricDCA’, Property=’DataCollectionAgent.DiskSpaceAvailable’, HealthState=’Warning’, ConsiderWarningAsError=false. The Data Collection Agent (DCA) does not have enough disk space to operate. Diagnostics information will be left uncollected if this continues to happen.

How to fix this?

You can change the default replicator log size by adding the FabricSetting in the ARM template named “KtlLogger” like highlighted below, this file size does not change once configured (grow or shrink)-

“fabricSettings”: [
{
“name”: “Security”,
“parameters”: [
{
“name”: “ClusterProtectionLevel”,
“value”: “EncryptAndSign”
}
]
},
{
“name”: “KtlLogger”,
“parameters”: [
{
“name”: “SharedLogSizeInMB”,
“value”: “4096”
}
]
}
]

 

For VM temporary disk sizes and specs, see here- https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general

More info around configuring reliable services\manifest is here-

 

Service Fabric to Cloud Services, Why?

These are some interesting benefits of using Service Fabric (SF) (over Cloud Services and in general)-

  1. High Density- Unlike cloud services you can run multiple services on a single VM saving you both cost and management overhead, SF will re-balance or scale out the cluster if resource contention is predicted or occurs.
  2. Any Cloud or Data Center- Service Fabric cluster can be deployed in Azure, on-premise or even in a 3rd party cloud if you need to due unforeseen change in company’s direction or regulatory requirements. It just runs better in Azure, why? Because certain services are provided in Azure as a value addition e.g. upgrade service.
  3. Any OS- Service Fabric cluster can run on both Windows and Linux. In near future, you will be able to have a single cluster running both Windows and Linux workloads in parallel.
  4. Faster Deployments- As you do not create a new VM per service like in cloud services, new services can be deployed to the existing VMs as per the configured placement constraints, making deployments much faster.
  5. Application Concept- In microservices world, multiple services working together forms a business function or an application. SF understands the concept of application than just individual services which constitutes a business function. SF treats and manages application and it’s services as one entity to maintain the health and consistency of the platform, unlike cloud services.
  6. Generic Orchestration Engine- Service Fabric can orchestrate both at process and container level should you need to. One technology to learn to rule them all.
  7. Stateful Services- A managed programming model to develop stateful services following  the OOPs principle of encapsulation i.e. keeping state and operations as a unit. Other container orchestration engines cannot do this. And of course you can develop reliable stateless services as well.
  8. Multi-tenancy- Deploy multiple versions of the same application for multiple clients side by side or do a canary testing.
  9. Rolling Upgrades-  Upgrade both applications and platform without any downtime with a sophisticated rolling upgrade feature set.
  10. Scalable- Scale to hundreds or thousands of VMs if you need to with auto scaling or manual.
  11. Secure- Inter VM encryption, cluster management authentication/authorization (RBAC), network level isolation are just a few ways to secure your cluster in the enterprise grade manner.
  12. Monitoring- Unlike cloud services SF comes with a built in OMS solution which understands the events raised by SF cluster and take appropriate action. Both inproc and out of proc logging is supported.
  13. Resource Manager Deployments– Unlike cloud services which still runs in a classic deployment model, SF cluster and applications uses resource manager way of deployments which are much more flexible and deploys only the artefacts you need.
  14. Pace of Innovation- Cloud services is an old platform, still used by many large organisations for critical workloads but it is not the platform which will get new innovative features in future.

More technical differences are here.

vNet Peering with ExpressRoute

Recently, I was working with a large healthcare provider to design their public facing mission critical web platform. The customer already had ExpressRoute connection with a 50mb line, not super fast but it doesnt need to be either always.

Given the design divided the environments in their respective vNets, we ended up creating multiple vNets around 9 in Azure. Connecting each vNet to on-premise network via ExpressRoute would mean consuming 9 connections\links out of 10 (20 for premium sku). This was sub optimal as that will not leave any room for the future projects to utilise the same circuit on-premise connectivity, plus, expansion of additional evironments in future on the platform will be limited as well.

So how could you avoid this problem?

VNet Peering comes to rescue here, following diagram dipicts the topology which can be used to achieve the above design-

vnetpeering

Other points-

  1. You can also use transitive vnet (‘transitive’ is just a given name) for NVAs, implementing hub and spoke model.
  2. vNet Peering does not allow transitive routing i.e. if three vnets are peered in a chain, vnet1 ,vnet2 and vnet3 then vnet1 cannot talk to vnet3.
  3. vNet Peering is always created from both vNets to work, hence above diagram has two arrows for each vNet Peering.
  4. As all the vnets are connected to single VPN Gateway via ExpressRoute, bandwidth between on-prem and Azure vNets will limited to the VPN Gateway SKU selected for the Expressroute connection.

 

Can’t Create VM/Resources in Resource Group

My customer recently ran into this problem, which will come up when you try to configure your environment properly i.e. create a resource group and give only the required access to the resources in your organisation, following the principle of least privilege. The structure looks like below-

RGSubcriptionIssue

What’s going on here?

Objective: Anthony is a subscription admin and he wants to ensure a role based access control in applied to the resource groups. He takes the following steps to achieve this-

  1. He creates a resource group called A and give a ‘contributor’ access to the user called ‘Ben’.
  2. He then informs Ben to go ahead start using the resource group for the project.
  3. Ben logs into the portal with his credentials and try to create the resource.
  4. Resource creation fails with the error which looks like below- Registering the resource providers has failed. Additional details from the underlying API that might be helpful: ‘AuthorizationFailed’ – The client suneet.xxx@xxx.com’ with object id ‘b8fe1401-2d54-4fa2-b2dd-26c0b8eb69f9’ does not have authorization to perform action ‘Microsoft.Compute/register/action’ over scope ‘/subscriptions/dw78b73d-ca8e-34b9-89f4-6f716ecf833e’. (Code: AuthorizationFailed)

This will stump most of the people as expected. Why? because if you have the contributor access to a resource group, surely, you can create a resource e.g. a virtual machine. What went wrong here- Carefully at the error message and focus on ‘Microsoft.Compute/register/action’ over scope ‘/subscriptions/dw78b73d-ca8e-34b9-89f4-6f716ecf833e’. What does this say? it’s not the authorisation error to create a resource, it is the authorisation error to register a resource provider. This is expected, we don’t want a resource group level identity to register/unregister the resource providers at the subscription level. So how do we solve it? Option 1

  1. Log into Azure with an identity which has a subscription level access to register a resource provider e.g. admin/owner.
  2. Using PowerShell (PoSh) register the resource providers you need at the subscription level. You can also see which providers are available and registered already. Sample script is given below-
Login-AzureRmAccount

$subscriptionId= "<Subscription Id>"
Select-AzureRmSubscription -SubscriptionId $subscriptionId

#List all available providers and register them
Get-AzureRmResourceProvider -ListAvailable | Register-AzureRmResourceProvider

Options 2

  1. Let the subscription admin/owner create the resource e.g. a VM.
  2. This will implicitly register the resource provider for the resources created.

Hope this was helpful.

I’ll be talking to the engineering to see if we can improve this user experience.

Azure, a Safe Innovation Environment

Microsoft Azure allow segmentation of applications, their tiers and the respective staging environments using the controls already built into Azure. Segmentation can be achieved at both network and user access level in Azure.

Segmentation is introduced for the following primary reasons-
Security
Performance
Deployment/Releases Management
Isolated/Safe Innovation Environment

Security
To secure the platforms/applications it is important to-
Separate the application tiers in different Azure virtual network subnets with NSG (Network Security Group/Firewall).
This helps mitigating breach impact as only a limited access is provided to the layer down the stack. Also, it provides a safe container for the components within a tier/subnet to interact with each other, for example a SQL/MongoDB cluster.

User Defined Routes (UDR) in Azure can enforce traffic to route via a virtual intrusion detection appliance for enhanced security.

Employ the principle of least privilege (POLP) using Azure ARM.
This helps ensuring only a minimum required access is provided to the users for supporting the application/platform. For example, only infosec team will have access to manage the credentials, applications will only have read access and will not store credentials on the file system at any time. This also limits the impact of breach in any tier.

Performance
To ensure each application tiers individually and the application themselves provide a guaranteed performance and quality of service (QoS), it’s important to-
Implement SoC (Separation of Concerns) principle and avoid mixing different workloads on the same tier/VM.
Understand disk IOPS thresholds and segment the storage accounts accordingly.
Understand networking/bandwidth thresholds and separate production traffic from dev\test to maintain network QoS.

Deployment/Release Management
Azure fully supports agile methodology natively including the concepts of continuous integration/deployment, blue-green deployments , A/B testing and Canary Releases. By clearly segmenting and demarcating the boundaries of the services\APIs and their environments, following microservices principles, we can deploy and upgrade the services with minimal impact on the platform. Azure natively support microservices architecture and provides a fully managed platform for running a highly sophisticated microservices (ServiceFabric) in cloud.

Isolated/Safe Innovation Environment
To ensure developers, testers and release management teams get a secure environment to deploy applications and platform, it’s important to implement the above mentioned security concepts i.e. NSG, UDR, ARM RBAC/Policies. A well designed environment provides developers a safe environment to try out new technologies without any hindrance to continue to innovate and deliver business value.

Segmentation

Migrate from AWS RDS/SQL using ADF

Recently, I came across a requirement from my customer to migrate the data from AWS RDS/SQL service to Azure for some Big Data Analysis. Obvious choice for this sort of activity in Azure is to use Azure Data Factory (ADF) feature. Now there are many examples of ADF on MSDN with various different data sources and destinations except for some and one of which is AWS RDS.

So how do you achieve it? Simple, treat AWS RDS/SQL as an on-prem SQL Server and follow the guidance for this specific scenario using Data Management Gateway.

Essentially you need to do the following from a very high level perspective-

  1. Create an instance on EC2 in AWS and configure relevant firewall rules (as specified in guidance)
  2. Deploy Data Management Gateway on the above instance.
  3. Test the RDS/SQL access via Data Management Gateway tool from the above instance.
  4. Create ADF factory to read from SQL Server linked service via Gateway.
  5. Do the mapping of data.
  6. Store it in the destination of your choice (e.g. Blob storage)

Adding Authentication via ARM for API Apps/Gateway

API Apps Preview 2 has changed the auth model defined below, please refer here for details about what’s changed]

This one was left out for a long I must admit. Since I joined Microsoft, I was keeping very busy learning about my new role, organisation and the on-boarding process. Today is the first weekend I have some breathing space to revisit this but in the in meanwhile I had some excellent pointers from Gozalo Ruiz (Lead CSA in my team) on this which led me to resolve this faster than I would have otherwise.

Here’s the problem, I had a fully automated ALM pipeline configured to build, test and deploy API App to Azure from VS Team Services (previously known as VS Online) except that there was no easy way to configure authentication identity for the gateway. For those who don’t know how API App authentication works (this is set to change now, gateway will not be requirement in future), each API App is fronted by a gateway which manages the authentication for each API App within the same Resource Group. I had a need to secure my API via Azure AD so I used Azure Active Directory as a provider in the gateway (See this post if you want to learn a bit about authentication mechanism in API Apps, its a topic in itself though).

Here’s the screenshot of the configuration which the gateway should have been populated with via ARM deployment.

GatewayWithIdentityAuth

Solution is simple, populate the relevant appSettings for this configuration when you create the API App with Gateway but it wasn’t easy to find these (wish it was) but here they for your use. Refer to the complete template here

"appSettings": [
 {
 "name": "ApiAppsGateway_EXTENSION_VERSION",
 "value": "latest"
 },
 {
 "name": "EmaStorage",
 "value": "D:\\home\\data\\apiapps"
 },
 {
 "name": "WEBSITE_START_SCM_ON_SITE_CREATION",
 "value": "1"
 },
 {
 "name": "MS_AadClientID",
 "value": "21EC2020-3AEA-4069-A2DD-08002B30309D"
 },
 {
 "name": "MS_AadTenants",
 "value": "mycompany.com"
 }
]

If you are using other identity providers than AAD, you could use the one of these instead (I’ve not tested these ones but should work in theory)

MS_MicrosoftClientID
MS_MicrosoftClientSecret

MS_FacebookAppID
MS_FacebookAppSecret

MS_GoogleClientID
MS_GoogleClientSecret

MS_TwitterConsumerKey
MS_TwitterConsumerSecret

Api App ZuMo Authentication\Testing in CI

[API Apps Preview 2 has changed the auth model defined below, please refer here for details about what’s changed]

Recently, I ran into a situation where one of my in-house development teams wanted to run the load test in the CI pipeline on the Api App they developed and deployed in Azure. Api app was using an Azure AD as an identity provider via app service gateway.

In order to solve this problem you first need to understand how OAuth works in the Api App case. A good explanation of this provided here by Tom Dykstra.

We are using a client flow authentication mechanism in particular in this instance as our scenario was based on service to service interaction, no user/password prompts etc. I’ve used this service to service flow in the past for the web api apps (pre-app services incarnation). This flow is defined here and uses OAuth 2.0 Client Credentials Grant Flow. So I was very keen on using the same workflow for the Api Apps as well as it allows me to use different client azure AD app to authenticate without impacting my Api App service.

Please follow the article mentioned above to setup the client and service (Api App) apps in Azure AD. Once done we should have something like below logically.

ZumoClientAuthSetup

Let’s cut to the chase, here is how at the http level client flow for Api App authentication works-

ZumoClientAuthFlow

Here are what the Http requests and responses looks like (via fiddler)-

  1. POST https://login.microsoftonline.com/abc.com/oauth2/token HTTP/1.1
    HEADER

    Accept: application/json
    Content-Type: application/x-www-form-urlencoded

    BODY

     resource=https://abcappservicegateway.azurewebsites.net/login/aad
    &client_id=876817c5-f812-6640-b7fa-eb7662b43a8d
    &client_secret=MgE47656Zy8qnKjjZcXP%2BgPVOxgcMc9kbJBayT5y7qI%3D
    &grant_type=client_credentials

    Explanation- client_id and client_secret here is for the client AD app and not the API App AD app. Because client AD app has been given the access to the API App AD app in Azure AD, token returned will be valid for the API App AD app. This way you can remove access to your API App by the clients without changing any config in the API App AD app.

  2. HTTP/1.1 200 OK
    HEADER

    Content-Type: application/json; charset=utf-8

    BODY

    {
    "token_type": "Bearer",
    "expires_in": "3600",
    "expires_on": "1445096696",
    "not_before": "1445092796",
    "resource": "https://abcappservicegateway.azurewebsites.net/login/aad",
    "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Ik1uQ19WWmNBVGZNNXBPWWlKSE1iYTlnb0VLWSIsImtpZCI6Ik1uQ19...."
    }
  3. POST https://abcappservicegateway.azurewebsites.net/login/aad HTTP/1.1
    HEADER

    Content-Type: application/json; charset=utf-8

    BODY

    {
    "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Ik1uQ19WWmNBVGZNNXBPWWlKSE1iYTlnb0VLWSIsImtpZCI6Ik1uQ19...."
    }
  4. This step happens behind the scenes hence no fiddler trace available.
  5. HTTP/1.1 200 OK
    HEADER

    Content-Type: application/json; charset=utf-8

    BODY

    {
    "user": { "userId": "sid:171BC49224A24531BDF480132959DD54" },
    "authenticationToken": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmdWxscm93IjoiYWxsIiwiRGJnMiI6ImxvZ2luIiwidmVyIjoiMyIsIn...."
    }
  6. GET https://abcservice.azurewebsites.net/api/addresses?postcode=KT19%208QJ HTTP/1.1
    HEADER

    X-ZUMO-AUTH: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmdWxscm93IjoiYWxsIiwiRGJnMiI6ImxvZ2luIiwidmVyIjoiMyIsIn....
  7. HTTP/1.1 200 OK
    HEADER

    HTTP/1.1 200 OK
    Content-Type: application/json; charset=utf-8

    BODY

    [
    {
    "PostcodeFull": "KT19 8QJ",
    "PostTown": "EPSOM",
    "DependentLocality": null,
    "DoubleDependentLocality": null,
    "ThoroughfareAndDescriptor": "Parkview Way",
    "DependentThoroughfareAndDescriptor": null,
    "BuildingNumber": "34",
    "BuildingName": null,
    "SubBuildingName": null,
    "POBox": null,
    "DepartmentName": null,
    "OrganisationName": null,
    "UDPRN": "51946386",
    "PostcodeType": "S"
    }
    ]

As you can see you can easily replicate this communication using HttpClient in .net (or in any other language for that matter) to get the ZuMo token for calling the authenticated operations on the Api App. This exactly what we did and placed that logic the WebTest Plugin in Visual Studio web performance test to automate this process in the CI pipeline. Load Test was then developed to use the web performance test and it placed the X-ZUMO-AUTH header in the http request to the Api App with the retrieved ZuMo token in real-time.

Advantages of this approach-

  1. You don’t have to share your service (Api App) master key with the clients. Clients use their app secret\key from their Azure AD app which has access to the service which they want to use. You can use this approach in production environment.
  2. You are testing the application authentication flow as it would be in production for the users.

If you want the plugin code for this, please give me a shout (I’ll put that on the GitHub anyway once I get the opportunity), here’s the code until then if you need (caveat: you will need to take care of refreshing the token after 1 hour in this instance, more on that here)

If you are using a server flow for authentication instead of a client flow, you can take the approach mentioned here (by Yossi Dahan) as you may not have client flow authentication code in your app. But there is no reason why you could not use this approach there as well.

Azure Data Factory Table Storage Partition Key

To use the source column as a partition key for the destination table storage table (a sink type) you will have to use the property called azureTablePartitionKeyName in the pipeline definition as below-

"sink": {
 "type": "AzureTableSink",
 "azureTablePartitionKeyName": "PostcodeFull",
 "writeBatchSize": 100,
 "writeBatchTimeout": "01:00:00"
 }

Simple, right? well it is but you do have to remember that if you don’t map the column (assuming you are mapping specific columns to the destination table) which you want to use for partition key in the translator section as below you wouldn’t get the output you want.

"translator": {
 "type": "TabularTranslator",
 "columnMappings": "PostTown: PostTown, PostcodeFull: PostcodeFull"
 }

Good thing is that this additional mapping would not create the column in the destination table.