Okay, went on a small vacation in between, so this blogpost took a little longer to write since me and the kids were busy mastering our best splash in the hotel swimming pool.
Where was I? Right, preprocessing: check, features: check, our models: check, so now it’s time to put these bad boys into production.
Some extra requirements I’ve defined for myself:
- Any predictions should be done in real-time and event driven
- All preprocessing and prediction services are implemented separate microservices
- The architecture should be scalable in case it needs to serve multiple arcade cabinets
- The architecture should cater for non-intrusive deployments, for instance to test new models in parallel with the production models.
- The architecture should be cloud vendor agnostic avoiding any vendor lock in.
Without further ado, let’s look at the architecture.
Diving into it in a little more detail: the arcade cabinet publishes events locally on an MQTT bus. These events are then forwarded to a Kafka Topic (InputEventTopic) in the cloud. Now bear in mind that I want to be able to scale and to process a heck of a lot of events in real-time and I definitely want to support multiple arcade cabinets in the (near?) future.
A preprocessing service then takes responsibility to do the “one hot encoding” of the input events and builds the combination matrix (as discussed in my previous feature engineering blog post). Here’s a refresher, remember that I used an incremental window approach to build my features? These increments will be stored in Kafka State Store. Whenever a prediction is requested (frequency of which is entirely depending on the prediction speed) I publish an event containing the combination matrix on the input topic, which is in fact the input topic for our Machine Learning Model (wrapped as yet another service and also deployed separately).
Predictions itself are in turn published on the Output Topic and then forwarded onto an MQTT message bus and eventually visualized on our Machine Learning IoT panel.
Also, remember that the prediction speed has 7 distinct presets. Prediction speed 1 requires a game window size of 1 second wide, which is as such the fastest prediction speed available. On the other side of the prediction speed spectrum, speed 7 requires a game window size of 30 seconds wide. This one for sure makes a more accurate prediction since it bases its prediction on more data, but on the other hand it takes longer to receive the prediction. This trade-off is yours to make.
Now to intervene in the prediction process meta events can be published by turning knobs or pusing buttons on the IoT panel, which are handled the same way as any other IoT event (they are published on a local MQTT bus and then forwarded onto a meta Kafka Topic). The meta processing service handles them and stores the result in a global Kafka Store.
This “meta Kafka Store” is consulted for instance by the feature preprocessor when it needs to decide whether it has to publish its combination matrix for a specified window size or not.
Making data accessible everywhere
Apache Kafka seems a really good match for these requirements. Using Apache Kafka Streams I get numerous advantages out of the box. It’s a light weight library and makes it easy to build stream processing applications.
I’m not going into a lot of detail about the advantages of using Apache Kafka, but nevertheless let me highlight a few here.
Apache Kafka consumer groups are a very powerful concept, since consumers belonging to the same consumer group are considered to be competing consumers, while consumers belonging to different consumer groups are considered to implement the publish/subscribe pattern.
Especially the latter is very interesting when you want to test a new model on real unseen data. Your test set is there and ready to use. Just define a service with your new model, deploy it using a “bypass” consumer group and see how it is performing side by side with your production model. Because of the fact they belong to a different consumer group means they’ll get the exact same messages and still can handle them differently.
The retention period of your topic is also a powerful concept and effectively enables implementing a feedback loop for your model. Apache Kafka works with offsets and holds the original data on the topic for a certain period of time (the default period is set to 7 days if I recall correctly, but don’t take my word for it, Apache Kafka is extensively documented which is another big plus to go for this framework).
Let’s say you notice a misclassified prediction in the event stream. You can easily replay the stream of events or add the data to your training set or look at how the features evolved over time to make an analysis of what went wrong.
In the ideal world, you would even be able to never expire your data and use all the available data to build your model (in fact, only available disc space is the limiting factor here). This even makes online learning possible, where you would start from nothing and just play games until a certain threshold is met and then automate the deployment of that trained model so that the model is tweaked over time in an automated fashion.
Now let’s dive into some code. I used the Processor API to build a topology using Apache Kafka.
The preprocessor topology is defined like this:
- The source is the topic where game events are forwarded to by the MQTT arcade cabinet.
- My processor – implemented by the ArcadeEventProcessor – does the feature preprocessing and stores the frequency windows in the gameFrequencyStateStore.
- The gameSteeringStore is used to decide if a game has started or ended.
- The global state store is where the meta events are stored such as the prediction speed settings.
.addProcessor(“EventProcessor”, () -> new ArcadeEventProcessor(kafkaConfiguration), “EventSource”)
.addGlobalStore(metaDataStoreSupplier, “MetaSource”, new StringDeserializer(), new StringDeserializer(), kafkaConfiguration.getMetaTopic(), “MetaProcessorStore”,
() -> new ArcadeMetaProcessor(kafkaConfiguration))
.addSink(“FrequencySink”, kafkaConfiguration.getProducerTopic(), “EventProcessor”);
- The topology sink is the frequency sink, where the prediction request events (containing the actual features) are published.
Note: I use JSON messages for data interchanges.
A state store is easy to setup using the StoreBuilder. Notice that in this example I’m not using JSON but String Deserializers instead. There are a couple of options here and I am still looking into it.
I am not yet entirely happy with the modularity level here, but hope to improve on it in the near future.
Zooming in on the prediction service now. It’s an easy-peasy topology: upon receipt of a prediction request at the frequency Source and the processor (containing our serviced model), a prediction is made and then sunk to the prediction sink for further processing by any interested parties.
eventBuilder .addSource(“FrequencySource”, kafkaConfiguration.getConsumerTopic())
.addProcessor(“PredictionProcessor”, () -> new PredictionProcessor(), “FrequencySource”)
.addSink(“PredictionSink”, kafkaConfiguration.getProducerTopic(), “PredictionProcessor”);
Making services deployable anywhere.
Now that I have my data accessible when I want it, where I want it, I’d like to do same thing on an orchestration level. Since we have quite some Kubernetes knowledge and experience in our company, the decision was quickly made that our requirements are easily met by Kubernetes.
Again, not going into lengthy appraisals about the advantages of Kubernetes here, but let’s elaborate on one or two.
Canary deployments are fairly easy done with Kubernetes. Exactly what I need whenever I want to test a new model to run alongside a production model for just one arcade cabinet for instance. When successful I then can quickly roll it out to all arcade cabinets.
Again, let’s look at the online learning idea a little further: I could easily deploy our Machine Learning Pipeline as a service, add a new service belonging to a different consumer group to do the online learning and then automate deployment of that new model so that it replaces the old (slightly dumber) model, and all this in automated fashion, maximizing the advantages of both Apache Kafka and Kubernetes.
A problem I have though using this architecture is that containers on Kubernetes are effectively stateless. Out of the box they’ll lose their data whenever they restart. Something I definitely like to avoid for any persistent state stores. Fortunately, Kubernetes has a concept of volumes you can mount to facilitate this. These volumes are exposed by so-called StatefulSets I can use to manage our stateful application. Some limitations apply, so again also here trade-offs need to be made, depending on the need for an application to be stateful or stateless.
Well, in the end I managed to deploy all of the above, and I had already quite some fun demo-ing all of this live on stage before an equally live audience. I’ve got my models trained, deployed and used in a real-time production setting. I managed to make the predictions accessible anytime, anywhere, anyhow I wanted.
Off course, there are still a bunch of truly great ideas in our team about to even further extend it and I’m guessing this project will never really end, which I presume is exactly what is required of any fun project 😉
Wanna know what’s still in the pipeline? For starters, the online learning idea, I mentioned earlier in this blogpost. It would be ideal if the actual model could be tweaked whilst being “in flight”. The current architecture is already up for it.
Another one is, that our model is making its predictions in an episodic fashion.
Whenever a game starts, it is now perfectly possible to predict the game with 90% certainty 10 times in a row as being the correct game, e.g. Puckman, and nevertheless throw out an 11th prediction which is a completely different (and incorrect) one, e.g. Bomberman, with for example only 55% certainty.
The problem here is that the current implementation doesn’t take into account any previous predictions made. The model uses a snapshot to make a totally independent prediction. One way to avoid this is to make use of a recurrent neural network called a LSTM network. LSTM stands for Long Short-Term Memory, and is built up from LSTM cells which are cells capable of “remembering” a value over a certain time interval.
Remember also that we set the requirement of using only the buttons or joysticks to build up our features? Well, the idea is to extent this feature set with profiles for players (is it a good mortal kombat player or not e.g.). This can be done for instance using the score of the game, adding authentication to the arcade cabinet which is linked to an anonymized profile and so on.
The goal of this project was to build a classification agent which task environment is limited to its IoT inputs. I set out to proof that – while IoT data in itself might be considered meaningless – applying machine learning to it would render it meaningful by putting it into context and consequently doing something useful with it. I’d say, mission accomplished!
Credits: blogpost by Kevin Smeyers, Machine Learning Master at ToThePoint