As jobs are often short running and finish before you can check anything within the pod, often it’s helpful to make the pod run for longer to be able to inspect the environment and re-run the job manually. One way to do this is to extract the yaml definition for a previous run of the job and create a pod definition replacing the command with an endless sleep.

kubectl get pod {previous-job-pod} -o yaml > pod.yaml

Then edit the pod.yaml, removing the status block and the additional metadata referring to the previous job so the metadata only contains name and namespace

The comand then needs to be updated/specified with something like this:

  command:
    - /bin/sh
    - '-c'
    - while true; do echo hello; sleep 10000; done

Then use kubectl apply -f pod.yaml to deploy the new pod definition to your cluster and you will have a long running pod with the environment and code for the job.

First Potatoes

Today I harvested our first row of potatoes from our raised beds:

First potatoes

Quite pleased with them, and looking forward to tasting them!

API Connect Reserved instance provides the ability to add remote API gateways so that you can co-locate the gateway service with your back-end systems for improved performance. With the new announcement of IBM Cloud Satellite, you can make use of this to securely expand your API Connect footprint across cloud providers and into the on-premise datacenter close to where your applications are running.

Create your Satellite location

To create a Satellite location you will need 3 hosts for the control plane, and at least one host to deploy DataPower on. For each of the hosts you will need to do the following:

  • Set up the host prior based on the host requirements for Satellite.

  • Register the host for RedHat updates using subscription-manager register - then apply a subscription to it in the RedHat Customer Portal

  • Refresh the packages and enable the repositories

subscription-manager refresh
subscription-manager repos --enable rhel-server-rhscl-7-rpms
subscription-manager repos --enable rhel-7-server-optional-rpms
subscription-manager repos --enable rhel-7-server-rh-common-rpms
subscription-manager repos --enable rhel-7-server-supplementary-rpms
subscription-manager repos --enable rhel-7-server-extras-rpms
  • Run the attach script obtained from the IBM Cloud Satellite UI to attach the host.

For the three hosts to form the control plane, you will need to assign them to the Satellite control plane through the UI or CLI.

Install DataPower in the Satellite location

  • Download the DataPower rpms from your reserved instance.
  • Install DataPower on your RHEL VM. In order to do this along with the pre-reqs I used the following commands:
yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum install schroot ipvsadm telnet
yum install idg_cloud1.10011.common.x86_64.rpm idg_cloud1.10011.image.x86_64.rpm
  • Create a link endpoint pointing to the API Connect Gateway management endpoint (usually port 3000)

Configure DataPower for API Connect

  • Follow the steps to configure your DataPower for use with API Connect.
  • For the certificates, in order to avoid a hostname mismatch create a self-signed keypair for the management interface for the hostname generated for your link endpoint.

Configure Certificate Manager service

Add Remote gateway to your reserved instance

  • Register your gateway in your API Connect Reserved Instance, filling in the details as follows:
    • Management endpoint - enter the full URL and port generated by IBM Cloud Satellite for the Link Endpoint prefixed with https://
    • Certificate - the certificate to present for the mutual TLS communication with the Gateway management
    • CA bundle - the certificate in certificate manager to use to verify the gateway management endpoint (either your CA or the certificate itself)
    • Base URL - the endpoint you want APIs to be called through - should map to the API gateway address configured in the API Connect Gateway service either directly or through a load balancer.

Currently the v10 Reserved Instance of API Connect doesn’t yet have a simple approach for headless use of the CLI toolkit. The following details how to use an IBM Cloud IAM bearer token with the API Connect CLI and REST API in a headless environment such as a CI/CD pipeline.

For interactive use of the API Connect CLI, you can login using the --sso option, retreive an api key with your browser and provide that to the CLI for example:

apic-slim login --server {apic-api-endpoint} --sso

However if you want to use the CLI in a non-interactive context such as a CI/CD pipeline you need to retrieve an IBM Cloud IAM Bearer token for the toolkit to use. This can be obtained using ibmcloud iam oauth-tokens and then placed in ~/.apiconnect/token for the apic CLI to use.

The token file needs to contain:

{apic-api-endpoint}/api: |
  refresh_token: ""
  access_token: {access_token}   

This can be done programmatically using something like this:

ibmcloud iam login --apikey {api-key}
ic iam oauth-tokens | sed 's/IAM token:  Bearer /api.9a6e-bd639816.us-south.apiconnect.cloud.ibm.com\/api: |\n  refresh_token: ""\n  access_token: /' > ~/.apiconnect/token

apic orgs --my --server {apic-api-endpoint}
production    [state: enabled]   https://api...apiconnect.cloud.ibm.com/api/orgs/9123ae60-427c-4997-8a6b-ddd75b169bfb
test-porg     [state: enabled]   https://api...apiconnect.cloud.ibm.com/api/orgs/b73708ea-a7b5-4d27-b562-80767e0b238e

Invoking the API Connect REST APIs

To invoke the API Connect REST APIs in Reserved Instance v10, you can use an IBM Cloud IAM token which can be obtained using the ibmcloud iam oauth-tokens CLI command or with an API call as detailed in the IAM API documentation. This token can then be used as a bearer token to invoke the API Connect REST APIs.

The full process looks like this:

Firstly use your IBM Cloud API key to retrieve an IAM token

curl -X POST \
  "https://iam.cloud.ibm.com/identity/token" \
  --header 'Content-Type: application/x-www-form-urlencoded' \
  --header 'Accept: application/json' \
  --data-urlencode "grant_type=urn:ibm:params:oauth:grant-type:apikey" \
  --data-urlencode "apikey=[your api key]"

{"access_token":"[access token would be here]","refresh_token":"not_supported","token_type":"Bearer","expires_in":3600,"expiration":1615370557,"scope":"ibm openid"}

Then take the value of the access_token and use it to call the API Connect API e.g.

curl https://{{ api_host }}/api/orgs \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer [access_token]

{
  "total_results": 2,
  "results": [
    {
      "type": "org",
      "org_type": "provider",
    ...
    }
  ]
}

Even though a lot has changed within the API Connect product and the types and numbers of stacks we’re running since I first posted an overview of monitoring API Connect , the main areas we monitor haven’t.

We are still using Grafana as a central location for dashboarding and analysing data across different data sources but some of the tools we’re using to collect the data have changed. Having access to all the data in a single UI is really powerful, especially when troubleshooting or investigating events across the systems, being able to identify correlations between data from external load balancing, response times parsed from logs and pod utilisation metrics can really help narrrow in on specific components and how they impact the wider solution.

Metrics

Metrics flow

For metrics we’re making use of the IBM Cloud Monitoring with Sysdig to gather metrics from across the kubernetes deployment, including metrics from kubernetes itself and recognisable container applications such as nginx. We also supplement this with our own custom metrics exporter, Trawler, which we built for API Connect to extract key application specific data and expose them to a prometheus compatible monitoring tool or send them to graphite. Examples of data gathererd are counts of objects within API Manager and DataPower and analytics call counts. For endpoint and availability monitoring we are continuing to use Hem which is a simple python application to call HTTP(s) endpoints and send the metrics to our graphtie stack. All of these then come together to view within our grafana dashboards - and to be used within new exploratory dashboards whilst problem solving as needed.

Logging

Logging flow

For our logging infrastructure, we continue to use Elastic, making use of the filebeat agent within our clusters to gather and tag the container logs, then some custom parsing in logstash to parse out the significant elements from the different logs so that we can easily correlate these with events going on in the system. A lot of the time this data is then viewed in timeseries graphs within grafana, but also linked to Kibana views to dig deeper in the logs themselves.

As part of our work in running and monitoring our API Connect cloud deployments we’ve built some of our own tooling to assist with monitoring what is going on within the deployments. Trawler is one of these items which is used to gather metrics from a Kubernetes based deployment of API Connect.

Trawler runs within kubernetes alongside API Connect and identifies the API Connect components and exposes metrics to prometheus (or other compatible monitoring tooling)

This data can then be used to feed into dashboards such as this one in Grafana: Grafana dashboard

Trawler is open-source and available on github and docker hub - See the installation guide for more information on using trawler for yourself.

The kind of metrics that trawler collects are currently as follows:

Management subsystem:

  • API Connect version information (apiconnect_build_info)
  • Total users (apiconnect_users_total)
  • Number of provider_orgs (apiconnect_provider_orgs_total)
  • Number of consumer orgs (apiconnect_consumer_orgs_total)
  • Number of catalogs (apiconnect_catalogs_total)
  • Number of draft products / apis (apiconnect_draft_products_total / apiconnect_draft_apis_total)
  • Number of products / apis (apiconnect_products_total / apiconnect_apis_total)
  • Number of subscriptions (apiconnect_subscriptions_total)

DataPower subsystem:

  • TCP connection stats (datapower_tcp…)
  • Log target stats: events processed, dropped, pending (datapower_logtarget…)
  • Object counts e.g. SSLClientProfile, APICollection, APIOperation etc. (datapower_{object}_total)
  • HTTP Stats (datapower_http_tenSeconds/oneMinute/tenMinutes/oneDay)

Analytics subsystem

  • Cluster health status (analytics_cluster_status)
  • Number of nodes in the cluster (analytics_data_nodes_total/analytics_nodes_total)
  • Number of shards in states - active, relocating, initialising, unassigned (analytics_{state}_shards_total)
  • Number of pending tasks (analytics_pending_tasks_total)