Scaling PHP-FPM with custom metrics on GKE/kubernetes

May 29, 2020

If you're running PHP-FPM in Kubernetes you have likely found that adding a Horizontal Pod Autoscaler (HPA hereonin) that is based on the CPU ultisation metric isn't reliable. This is because web traffic by nature is somewhat spikey, and scaling based on a sustained high CPU % is in reality too late to scale.

There is a solution however by using Kubernetes Custom Metrics API. Now, this blog post is entirely focused on Google Kubernetes Engine, and we're using Stackdriver as our custom metrics server. I'll explain the setup details in this post. If you're not running GKE there will be a few steps you'll need to swap out.

First up, how should php-fpm be configured?

The general advice is to ensure the process manager should be configured to static, then you're pm.max_children value should be relatively low as each pod/container shouldn't run too many processes. You'll need to balance this based on your kubernetes container request/limit values for CPU and Memory.

Additionally we enable the php-fpm status page. This is important, as later on we will scrape the metrics from the status page!

Here's a sample of the configuration I use:

pm = static
pm.status_path = /status
pm.max_children = 10
; A child process will handle at least 200 requests before respawning.
; useful for memory leaks in 3rd party code.
pm.max_requests = 200

Then the kubernetes deployment configuration:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: phpfpm-demo
  labels:
    app: phpfpm-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: phpfpm-demo
  template:
    metadata:
      labels:
        app: phpfpm-demo
    spec:
      containers:
      - name: fpm
        image:  playsportsgroup/php:7.3-fpm-alpine-root
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9000
        resources:
          requests:
            cpu: 1
            memory: 1G
          limits:
            cpu: 1
            memory: 1G

Now, as it stands we have a deployment with a php-fpm container. Next up we'll need to begin scraping our metrics.

Scraping php-fpm status page metrics

The easiest way to do this is to use a Prometheous compatible exporter. I have personally used this one: https://github.com/hipages/php-fpm_exporter

We will need to add this container to our Deployment. So let's do that, our new deployment will look like this:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: phpfpm-demo
  labels:
    app: phpfpm-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: phpfpm-demo
  template:
    metadata:
      labels:
        app: phpfpm-demo
    spec:
      containers:
      - name: fpm
        ... (omitted to keep it readible!)
      - name: phpfpm-exporter
        image: hipages/php-fpm_exporter
        ports:
          - name: exporter
            containerPort: 9253
        env:
          - name: PHP_FPM_SCRAPE_URI
            value: "tcp://localhost:9000/status"
          - name: PHP_FPM_FIX_PROCESS_COUNT
            value: "true"
        resources:
          requests:
            cpu: 10m
          limits:
            cpu: 10m

This works by scraping the status page over a TCP connection to the php-fpm container. The metrics will be exposed on the port 9253.

Now this isn't it. These metrics are currently not being sent anyway. The exporter has simply made them available! Next up we'll need to convert the Prometheous metrics into Stackdriver Metrics. This can be achieved with another container provided by Google.

... (continuing from the pevious)
      - name: prometheus-to-sd
        image: gcr.io/google-containers/prometheus-to-sd:v0.9.2
        ports:
          - name: profiler
            containerPort: 6060
        command:
          - /monitor
          - --stackdriver-prefix=custom.googleapis.com
          - --monitored-resource-type-prefix=k8s_
          - --source=:http://localhost:9253
          - --pod-id=$(POD_NAME)
          - --namespace-id=$(POD_NAMESPACE)
          - --cluster-location=[ADD_YOUR_CLUSTER_LOCATION_HERE]
        resources:
          requests:
            cpu: 10m
          limits:
            cpu: 10m
        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace

In the above example you'll need to define the cluster location yourself. I have found that within a Deployment this value cannot be automatically set. This is just the region that your cluster is in, so europe-west1 if you're using the Belgium region for example.

Let's take a look at the command arguments we are using here, as this deviates from GCP's documentation here slightly, again this is because we are using a Deployment here. GCP's guides only work when creating Pods directly.

--stackdriver-prefix=custom.googleapis.com is required to ensure the metric is tracked in stackdriver as a custom metric. Otherwise we won't be able to scale based on this!

--monitored-resource-type-prefix=k8s_ by default this appears to set the legacy prefix which is gke_. We don't want this as it doesn't work with Deployments!

--source=:http://localhost:9253 this specifies where the source of our metrics. The colon is defining the namespace, which in this case is empty. This is important because GKE doesn't support scaling on custom metrics that contain slashes, by adding a namespace it will add a slash. i.e. --source=mynamespace:http://localhost:9253 would create: custom.googleapis.com/mynamespace/some_metric, which is not desired!

--pod-id=$(POD_NAME) This applies the pod-id label in Stackdriver, allowing us to see the origin of metrics in relation to the pod.

--namespace-id=$(POD_NAMESPACE) Same as above, but for the kubernetes namespace your pod belongs to (note in my example deployment I have omitted the namespace which means it will default to default).

The last two arguments are using the environment variables defined on the container, these are coming from Kubernetes downward API. The Pod ID will be generated uniquely depending on the revision and number of replicas in your deployment, so we do not know this until runtime.

Ok. With that done we should now see metrics appearing in Stackdriver! Using the Metrics Exlorer you will be able to search for the metrics, they will all be prefixed with phpfpm_ - a full list can be found here: https://github.com/hipages/php-fpm_exporter

Configure cluster to access stackdriver metrics

Now, we have our metrics going to Stackdriver. But we won't be able to access these metrics inside Kubernetes to trigger our autoscaler just yet. We'll need to create the required authorisation roles to do this.

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"
Note: the above command will use your user account. I'd recommend using a service account instead that has the specific resources applied. Tying resources users is never a good approach.

With the authorisation role in place we can now deploy the Custom Metrics Stackdriver Adaptor to our cluster. This will make the stackdriver metrics available within your cluster.

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

Note: This deviates slightly from some of GCP's documentation again. They may tell you to apply a different file from the same github repo. That will be the legacy version which again, does not work with Deployments due to deployments using the k8s_ prefix for resources instead of the legacy gke_ (I lost a lot of hours to this...).

Create your autoscaler

Finally we're now ready to create our autoscaler. For this I'm going to use the phpfpm_active_processes metric. As we're using a statically defined number of processes we can trigger scaling when the number of active processes exceed a given average target value.

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: phpfpm-demo-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: phpfpm-demo
  minReplicas: 1
  maxReplicas: 8
  metrics:
    - type: Pods
      pods:
        metricName: phpfpm_active_processes
        targetAverageValue: 6

We have now defined an autoscaler with a minimum of 1 replica, which will increase to a max of 8. This will be triggered when the average value exceeds 6 active processes within phpfpm. Magical.

Useful links: