We want to introduce Robotika.ai, the tool for Perpetual Machine Learning. The goal for Robotika is to make the deployment of ML solutions quick and comfortable for Data Scientists. There are four main aspects of Robotika that we would like to discuss here.
1. The Cycle.
Robotika can be used for all the significant processes in the ML pipeline. Currently, the system can read data in several formats (e.g., CSV, parquet). The size of the files is almost unlimited. In other words, Robotika can read large files that are impossible to read on a local computer. Imagine the logs of a large website where tens of Gigabytes of data are added daily. To analyze the data, we need only to attach the links to Robotika, and the report will be formed automatically. Based on the code provided by the Data Scientists, Robotika transforms the data and trains the model in the cloud. We know how hard it is to deploy the model to production, and we try to make the process as easy as possible. Finally, Robotika deploys the model on AWS, and the endpoint is provided to the user.
2. The Code.
Robotika is built on top of TFX. TFX is the TensorFlow-based framework that aims to simplify and integrate standard components of the ML pipeline. Alongside with KuberFlow, these frameworks are curently the most popular for Machine Learning. Due to relative simplicity, we decided that exactly TFX will be a perfect framework for us and you. Thus, to deploy a model, data scientists must provide a pipeline in TFX. Here we will not discuss the framework further, and for the interested reader, we recommend visiting an official TFX page (https://www.tensorflow.org/tfx).
3. Deployment.
With Robotika, you can deploy Machine Learning algorithms on AWS and automatically gain access to the created model. Even data scientists with a lack of experience in deploying the models can quickly create an endpoint that can be immediately used in production. In addition, it is possible to create specific conditions for the performance of the model. For example, a model can go to production only and only if the f1-score of the model is no less than 0.9. With this option, the Data Scientists are exempt from the routine work of filtering the models and making ML even more automatic.
4. The adaptation.
In the real world, where the data changes dynamically, the possibility of data drift exists. The data drift is a change of statistical characteristics of the data set in a time that could affect the performance of a model in production. Not only can Robotika identify a data drift, but also Robotika can handle this problem with the automatic retraining and redeployment of the model at the chosen period. Imagine that your application analyzes the stock price in real time over the last several hours to predict the future movement of the price. It would be logical to update the model each hour, and Robotika can do it automatically.
In the next post, we will explain in depth how to create a pipeline in Robotika on the example of the Fraud detection dataset.
Stay tuned )