Skip to main content
Ba Blog

ConfigState

Discovering a shipwreck

If you think about it long enough, you can blur the line between configuration and program state; kind of like pointing your index fingers together just a few inches in front of your eyes. Go ahead, give it a try ... can you see the "sausage"? Lol, anyway, let's take the example of discoverables.

Suppose we have a PhotoBook service and one of it's many features is to resize photos. This sort of thing can be compute-intensive and we have broken resize out into its own separately scalable service. In the beginning PhotoBook can simply be configured with the available resizers. Later, as things really scale up :), automating this now-significant chore starts to look worthwhile. If the resize workers register somewhere, PhotoBook can discover them; configstate!

In this post we'll take a look at a Golang project that tracks available services as found in a Consul key-value store. It also shows off a nice approach to running additional workers in an http api service.

Consul #

Hashicorp's Consul is a solid product that offers service discovery, mesh, and quite a lot more. I imagine it's usually found deployed in support of their container orchestration system, Nomad. In the name of simplicity and perhaps portability, we'll make do with Consul's excellent key-value store, which backs it's galaxy of other features.

Starting up a stand-alone Consul and following its logs:

~/proj/configstate$ docker run -d --name consul01 -p 8500:8500 hashicorp/consul:1.17
~/proj/configstate$ docker logs -f consul01

That was easy; thanks Hashi!

KV Store #

Trying out the KV Store:

~/proj/configstate$ curl -s localhost:8500/v1/kv/bargle -XPUT -d'{"ima":"pc"}' | jq
true
~/proj/configstate$ curl -s localhost:8500/v1/kv/bargle | jq
[
  {
    "Key": "bargle",
    "Value": "eyJpbWEiOiJwYyJ9",
    "ModifyIndex": 42526
    ...
  }
]
~/proj/configstate$ curl -s localhost:8500/v1/kv/bargle?raw | jq
{
  "ima": "pc"
}

Simples! we can PUT a value, GET it in its full glory with a base64 encoded value, or just the raw value.

Long Polling #

A-and we can long-poll!:

~/proj/configstate$ curl -v "localhost:8500/v1/kv/bargle?index=42526&wait=9s" | jq
...
< X-Consul-Index: 42532
< X-Consul-Knownleader: true
< X-Consul-Query-Backend: blocking-query
...
100   185  100   185    0     0     40      0  0:00:04  0:00:04 --:--:--    40
[
  {
    "Key": "bargle",
    "Value": "eyJpbWEiOiJwYyB0b28hIn0=",
    "ModifyIndex": 42532
    ...
  }
]

What's going on here:

Long-polling can really hit the spot and I think it does here:

So yeah, a nice way to keep up with dynamic configuration. I'm having trouble thinking of an example of this that I cannot frame as discovery, but there's nothing here bound to it.

More Features #

Not surprisingly, Consul takes things discovery quite a bit further than the KV Store and you can read up on Catalog and Services in their excellent api docs.

Another interesting option could be to store services individually and think of them as in a services folder. This might look like:

~/proj/configstate$ curl -s localhost:8500/v1/kv/svc/24 -XPUT -d@service24.json
~/proj/configstate$ curl -s localhost:8500/v1/kv/svc?recurse | jq
[
  {
    "Key": "svc/05",
    "ModifyIndex": 11969
    ...

Simple Blob #

In our case just a simple blob of structured data under a single key will do. So we'll go with that!

Putting services into the store:

~/proj/configstate$ curl -s localhost:8500/v1/kv/services-test -XPUT -d@test/data/services.json | jq
true

Golang #

Cool, so we've got a long-pollable key-value store from which we can get a list of services. How might consuming this look from within a Golang api service? The configstate project demonstrates with a minimal api service and an additional worker polling for updates.

Jumping in with an overview of main.go:

func main() {
  // load config and setup logger
  // ...

  // init graceful and create router

  ctx = graceful.Initialize(ctx, &wg, lgr)

  rtr := chi.New()
  rtr.Set("GET", "/config", delish.ObjHandler("config", cfg, lgr))

  // start discovery and register handler

+ client := cfg.ConsulClient.NewWithTrippers(lgr)
+ csl := cfg.Consul.New(client)
+ dsc := &discover.Discover{Poller: csl, Logger: lgr}

+ dsc.Start(ctx, &wg)
+ dsc.Register(rtr)

  // start server and wait for shutdown

  server := cfg.Server.NewWithLog(ctx, rtr, lgr)
  server.Start(ctx, &wg)
  graceful.Wait(ctx)
}

After the usual api setup:

Once discovery is setup, it starts and registers with router.

Looking at just the discover worker's logs for another high-level perspective, omitting all but the msg field for clarity:

~/proj/configstate$ bin/discover | jq '. | select(has("worker_id")) | .msg'
"worker starting"
"sending request"
"received response"
"updating services"
"sending request"
< some time passes ... >
< process killed >
"worker shutting down"
"worker stopped"

The discover worker starts, gets data, updates its services list, sends another request and .. hangs?

Aha!, this is the long-poll. It's got the current data and waiting for Consul to respond or timeout.

Then the process is killed externally and it shuts down. Left to run, we'd see:

To round things out, let's ask the api for services known to it:

~/proj/configstate$ curl -s localhost:8081/services | jq
{
  "services": [
    {
      "uri": "http://pool04.boxworld.org/api/v2",
      "capabilities": [
        {
          "name": "resize",
          "capacity": 23
        }
      ]
    },
    ...
  ]
}

And soon after updating the key-value store, those changes will be reflected here.

Discover #

Discover implements a worker that polls for services and provides access to other, hypothetical at this point, subsystems that need the information.

Let's see how it works:

// discover.go , most logging ommitted for reader sanity
func (dsc *Discover) work(ctx context.Context, wg *sync.WaitGroup) {
  dsc.hash = fnv.New64a()
  wg.Add(1)
  defer wg.Done()

  for {
    data, err := dsc.Poller.Poll(ctx)
    if errors.Is(err, context.Canceled) {
      break
    }
    if err != nil {
      continue
    }
    if dsc.unchanged(data) {
      continue
    }

    services, err := entity.DecodeServices(data)
    if err != nil {
      dsc.Logger.Error(ctx, "failed to watch", err)
      continue
    }

    dsc.mu.Lock()
    dsc.services = services
    dsc.mu.Unlock()
  }
}

It loops for-ever:

The beating heart of discover!

Of note, is the absence here of any concerns over timing. The loop will run as fast as it can, relying completely on the poller for good behavior. This is a design decision, which may stand the test of time :)

Finally, discover provides access:

// discover.go
func (dsc *Discover) Services() entity.Services {

  dsc.mu.RLock()
  defer dsc.mu.RUnlock()

  return dsc.services.Copy()
}

With the locking and copying, we see simplicity and reliability prioritized over performance. Which one can reasonably hope is appropriate for the use case.

Poller #

A Consul object satisfies the poller interface with:

// consul.go
func (csl *Consul) Poll(ctx context.Context) (data []byte, err error) {

  delay := csl.Limiter.Reserve().Delay()
  csl.LimitDelay += delay
  time.Sleep(delay)

  var newIdx uint64
  data, newIdx, err = csl.GetKv(ctx, csl.Key, csl.Idx)
  if err != nil {
    return
  }

  if newIdx < csl.Idx {
    newIdx = 0
  }
  csl.Idx = newIdx

  return
}

Ahh, now we're getting into matters of timing. The first line of defense is hidden away in GetKv which will send requests with wait as seen in the curl example above.

Should that fail, say due to a fritzy service re-registering every few nanoseconds, the rate limiter will slow things down. This makes me happy :) Any delay that the limiter does introduce is accumulated in LimitDelay for some visibility, although I'm not making use of it currently.

Hashi has a nice page discussing the use of Consul's blocking endpoints.

Graceful #

At this point I've sketched, hopefully with some verisimilitude, how discover, along with it's consular poller, maintains a dynamic configstate representing discovered services. Now I'd like to call your attention to the graceful shutdown of a service with multiple workers.

Let's start by having another look at the logs of a trial run, but this time without filtering any messages:

~/proj/configstate$ bin/discover | jq .msg
"starting up"
"worker starting"
"starting http service"
"listening"
"sending request"
"received response"
"updating services"
"sending request"
< process killed >
"shutting down"
"shutting down http service"
"http service stopped"
"worker shutting down"
"worker stopped"
"stopped"

After the process is killed, we see:

In all honesty, we could very likely get away with just pulling the rug from under these two. This is a good practice though and often-times crucial to the prevention of insidious errors in the larger system.

These conveniences are brought to us by graceful from the delish module:

// graceful.go
func Initialize(ctx context.Context, wg *sync.WaitGroup, lgr Logger) context.Context {
  ctx, cancel := context.WithCancel(ctx)

  graceful = &Graceful{
    Cancel:    cancel,
    WaitGroup: wg,
    Logger:    lgr,
  }

  return ctx
}

When the graceful singleton is initialized it stashes:

Once all the workers are running, graceful can wait for an interrupt:

// graceful.go
func Wait(ctx context.Context) {

  sigChan := make(chan os.Signal, 1)
  signal.Notify(sigChan, stop...)
  <-sigChan

  graceful.Logger.Info(ctx, "shutting down ..")

  graceful.Cancel()
  graceful.WaitGroup.Wait()

  graceful.Logger.Info(ctx, "stopped")
}

First, it does the usual signal channel blocking to wait.

Once the fix is in, it:

Sweet!

If you look back up at discover's work loop, you'll see:

  ...
  if errors.Is(err, context.Canceled) {
    break
  }
  ...

Triggering it to shutdown.

Somewhat similarly, in the delish http service, you'll see:

  ...
  <-ctx.Done()
  svr.Logger.Info(ctx, "shutting down http service ..")
  err := httpServer.Shutdown(ctx)
  ...

The difference here is because the discover worker spends its time waiting for Poll to return while the http service can simply wait for the Done channel to close.

The End #

To sum up, the configstate project demonstrates:

I hope it's been informative and/or thought provoking. Get in touch with me via email if you'd like :)

Thanks for reading!