Slack Notifications for your Spark Workflow

Why is this taking so long?

One metric I like tracking at work is the time it takes to parse a dataset through the latest iteration of a model pipeline that is currently under development. While most machine learning frameworks allow users to print timestamps to a screen, this is clumsy to track and share with the rest of the team.

The birth of a bot

I was after a few things in any potential solution:

I eventually came across this Medium post by Matt Harvey, which sets out simple steps to make calls to the Slack-API using Slack’s Python Development Kit.

tick, tick, tick

Over the following days, I created a Slack channel for our growing collection of notification bots (I also added another one for tracking activity in our Git repository) that people could subscribe to. If I wanted to record how long a particular stage took to run, I simply included a call request at the start and end of the stage, and logged the time of the corresponding Slack messages – for my purposes, I only really needed a rough indication of how long something took. You could of course also incorporate a bit of Python code here if you wanted something more specific or rigorous than just a friendly note saying which stage had just finished – e.g. include the train and test errors and log these as well in this process.

Everyone, meet Daryl

Everyone, meet Daryl

Bad News: there’s no (official) Scala version (yet)

The good news is that a lot of work is going into building one – gilbertw1’s is the most active one I could find. I essentially wanted to replicate my Python-Slack setup in my Spark Notebook to continue my testing, but unsurprisingly, it’s not quite as straightforward as I hoped. Here is what I managed to (hastily) pull together:

  1. If you google “slack scala api”, you’ll find there’s a few GitHub projects and blog posts guiding you on the installation process. If you are using SBT, those will be probably be more helpful than what I’m about to share, so have a look there first. If you are, like me, trying to set up the API in Spark Notebook, keep on swimming
  2. Open your notebook and go to Edit > Edit Notebook Metadata. You should see a window pop up with a JSON editor
  3. You need to change the customDeps parameter in order to add the Slack-Scala-API dependency. We will be using ponkotuy’s project – more on this later
  4. Delete the null and replace it with the dependency like this
    "customDeps": ["com.ponkotuy % scala-slack_2.11 % 0.4.0"]
    
    Note the dependencies at the bottom of this page. For example, your Scala version should be 2.11.8 – otherwise, update to this version
  5. Click OK before shutting down your notebook and restarting your Spark Notebook instance. It is important that you completely close Spark Notebook and relaunch it
  6. Reopen your notebook and import the dependency
    import com.ponkotuy.slack.SlackClient
    
  7. Record your Slack token
    val s = new SlackClient("your_token")
    
  8. To post a message, run
    s.chat.postMessage("#your_channel", "Hello World!")
    

Both this library and the Python one actually have a lot more functionality than what I’ve shown here. Have a look to see what else you can use in your workflow!

Why didn’t you use the other one?

I originally wanted to use gilbertw1’s project since it was the only one which was actively being developed – note that ponkotuy’s and all the other forks of that project haven’t been updated for over a year. I went through similar steps to try to import that dependency, but ran into issues with the real time messaging client – specifically trying to use the Akka ActorSystem class. A brief recount of what I tried:

Let me know if you manage to crack this and get it to work in your Spark Notebook!

Final thoughts

Slack is awesome for sharing snippets of information in general, and a dedicated channel for bot notifications is a really good idea especially if you are planning to go all out on recruiting a bot army. A lot of great work has already been done, so it’s mostly about trying to find what works for you and your team. For me, it means I can kick off my scripts and get updates to my phone as it completes (compare that to having to sit at my desk and check every so often).

comments powered by Disqus