The Issue

We're moving fast. Our engineering team has a lot to do each sprint and we pride ourselves on meeting every deadline set by our CEOs. This leaves little time to respond to questions from our ops and culinary teams or to help fix the inevitable bugs that pop up during operating hours.

This week we found ourselves with a down day and decided to build a Slackbot to automate responses to some questions we are commonly asked by other teams. In particular, our kitchen and delivery teams use a printer system and web-based CRM (customer relationship management) tool that we built internally. Any issues they experience with these systems could be due to a number of factors: a loose cable, bad wifi, a problematic OS update, a bug we introduced in an update, or a back-end system going down. Our ops team is tremendous and they can troubleshoot most problems, but if the underlying issue is that a service that supports the CRM or printers goes down, how will they know? We have a Slack channel that pipes in notifications from our uptime service, but there is a lot of activity in there as we work on systems throughout the day. Even if a non-developer peeks in and sees that some esoterically named service is down, what does that mean for their operations? We built a Slackbot to help them separate the signal from the noise.

The Bot's Functionality

We wanted v1 of the bot to respond to three question: 1) "Is anything down?" 2) "Is ________ down?" (here, we wanted the team to be able to ask for the status of multiple systems with one query, such as "Is microservice.x, workers.y or microservice.z down?"). 3) Finally, if something is down, enable someone to get details about what the service does by asking "What is workers.y"?

With each uptime related query the bot would poll our uptime service to check if either a specific service or any service is down, and it would and report back to Slack. For questions on what a service does, it would check a dictionary of descriptions of each service (including the implications of it going down).

Skeleton of a Bot

The first step was to set up an outgoing webhook to our own microservice. For detail on how to do this check out Slack's API docs. There are several configuration options, including the trigger words that will initiate a post to an endpoint. We set ours up to respond to lines that begin with "@bot".

At the other end of the flow, we needed an incoming webhook integration to receive messages from the service that backs our bot. Details on incoming hooks can be found here.

Once you have these endpoints set up, you can build out the core functionality of the bot.

Meat of a Bot: Text Parsing

You can do a lot with bots. The possible functionality is virtually endless. Our first bot makes calls to our uptime service and queries an object with service descriptions. Very useful, but admittedly not earth shattering.

Still, beyond getting our feet wet with Slackbots, this project allowed us to dive into something very important that we will need to become competent in as we continue to add functionality to the bot: natural language text parsing. There are plenty of off the shelf solutions and we will likely incorporate one or many of these in the future, but no off the shelf solution is perfect and we anticipate always needing some amount of custom parsing.

Here is an overview of how our v1 parser handles text coming back from Slack's outgoing webhook:

  1. Sanitize the text. Normalize the text and strip out unnecessary or unwanted elements. For our needs, this meant converting it all to lowercase and stripping out extra whitespace and all punctuation other than periods. You should be able to accomplish this with built-in language functions and regular expressions.

  2. Tokenize the text. Split the remaining input into an array of the individual elements you care about. For us, this was simply splitting on the spaces between each word and adding each word to a tokens array. There are other and more complicated ways of doing this. I can imagine needing to split on whole sentences or individual letters, or storing each token as an object with additional data (eg, {element: 'hello', type: 'word', index: 1}).

  3. Additional cleanup. Once you have your tokens you might need to filter them further. For us, this meant removing unnecessary words (ie, "or" and "and") and cleaning up certain tokens that Slack had treated as hyperlinks and added markup to.

  4. Parse the text for meaning. To do this, you'll have to move through the tokens chunk by chunk. I use the term "chunk" deliberately. When trying to ascertain meaning from the tokens, it probably won't be enough to look at one individually. Rather, you'll have to look at each token in relation to those before and after it. For-loops and recursion do the trick here. For instance, for our "is anything down" check, the parsing loop looks at three consecutive tokens for that pattern. For the query "Is microservice.x, workers.y or microservice.z down?", it recursively scanned the elements for anything resembling a service name until it hit the word "down".

What's to Come

The rest of the team loved and appreciated v1 of the bot, and we really enjoyed building it. The next steps will likely be to add functionality and modularize the architecture. We envision adding additional help desk-like functionality, such as adding a step-by-step guide when someone enters "@bot help the printers are down", or respond to "@bot what did John Smith order for dinner yesterday?".

Additionally, we plan to break the bot into individual services. The main entry point will likely remain the text parsing engine. It will make sense of what is being asked of the bot and unload the query onto a specific sub-service that processes the task and sends a response to Slack.

I hope to update you on version 2 of the bot soon.