Building a Machine Learning team in-house

Let’s find out how to build a Machine Learning team and who should be hired to create the MVP of your project. We will deal with the areas of responsibility at the MVP stage and during the full development cycle. We will determine step by step how to implement Data Science in your project or product. We will consider Agile practices of DS/ML project management and how they differ from traditional development.

Data science – a fairly young field, both in the world and in Ukraine. The first data science competence centers appeared in our outsourcing companies about four years ago, and today very few companies have managed to formalize this competence in the form of business and put “on the rails” of the service line.

When starting a data science direction in a typical outsourcing company, the first impulse is to integrate it into the existing system of competences and project work, along with, for example, developers of a certain profile. In other words, to add an additional category to the list of engineers: data science, without changing the approach to forming project teams. A standard project team has the roles of project manager (planning, coordination), business analyst (communication with the customer, requirements), engineer (implementation) and tester (quality control). Now among the specializations of the engineer is added data science.

However, more often than not, such a model does not stand the test of strength. If a business analyst is not qualified in data science, DS models may look like black magic to him, and the problem, the solution to which is simple and obvious, is exactly the same as the problem, the solution to which does not exist at all today. This leads to problems in communication with the team and the formation of false expectations from the customer.
Further: if the data scientist only develops the algorithm, while the product itself is written by an engineer, there are difficulties in communication and cooperation. In all cases from my personal practice and ELEKS practice, if the model had to be rewritten by the engineer (for example, to work on a mobile platform), it ended with problems and finding out who was to blame for the fact that the final system is not as accurate as the prototype of data scientist, not so fast, not for all cases, etc.

In addition, analytical models become obsolete over time – the customer’s business changes, the data changes, the patterns change. Who will maintain the model? A data scientist? He can’t write production code. An engineer? He has been working on another project for a long time. Another engineer? He needs to re-learn the product and the model…

The format of the data science team that we finally settled on, however paradoxical it may sound at first glance, is the absence of a team as such. A data science stream on a project is usually led by one person – the data scientist. He personally communicates with the customer, modeling and productizing the model to the level of a component, which can be used by other engineers without any knowledge of data science at all.

Here we can digress from data science a bit and remember that narrow specialization and division of labor in software development did not appear immediately. At the beginning of its history, software engineers were responsible for identifying requirements, and for modeling systems, and for implementation and testing. The team format we are accustomed to appeared when the projects began to grow larger and more complex.

We came to this format of work by trial and error, and were surprised when we encountered Pivotal Labs and McKinsey. It turned out that they use the same format of work and a similar model of competence for data science projects. It seems that nowadays the trial and error method of creating data science services leads to one answer, no matter where you start from 🙂