The Beauty of Data (Science)

Andres Hernandez, Principal Data Scientist , 03.08.2023

About the author: Andres Hernandez is a Principal Data Scientist at KONUX. Before he joined the company almost 6 years ago, he got a PhD in baryogenesis and gained extensive work experience in quantitative finance. At KONUX, Andres is not only valued for his subject-matter expertise, but also organizes the KONUX Festival of Ideas and is known for his fun and playful approach to work. In the following interview he shares his story of joining KONUX almost 6 years ago, the journey from studying theoretical physics to improving railways in the real world and what it means to him to think wildly, without the pressure of a big milestone.

Hello! Please tell us about yourself… Who are you?

My name is Andres Hernandez. I originally came to Germany from Mexico to study physics, and I got my PhD in theoretical physics at the University of Heidelberg. The topic was baryogenesis, i.e. why there is more matter than antimatter in the universe.

Shortly before finishing my PhD, my wife and I decided we really wanted to stay in Germany, so I looked for a job outside of academia that would give me more stability, but also that would still require me to use mathematics. Quantitative finance presented an interesting opportunity and I worked in that area for almost 9 years.

And then one day I got in contact with an acquaintance who had been doing some interesting things with applications of Reinforcement Learning and Bayesian methods to investment strategies, and he told me he was now working at this Munich startup and that I should come by and interview. I was a bit reluctant because it would mean once more a change of industry, but I agreed and I couldn’t be happier. The interview was such a blast, we spent so long talking about global optimizers and swarm intelligence, and 15 minutes into the interview I was completely sold on the company and coming to work here.

Creative names for projects, memes,… At KONUX, you are quite famous for your creativity and humorous approach to data science. Where’s that coming from?

I am not entirely sure. There is certainly the appreciation for being able to work in an environment that allows that. But also, how awesome is it going to sleep on a Sunday dreaming about all the improvements you want to do to Voltron or Gandalf? I am always careful that the names are fanciful, and reflect the type of media I enjoy, but also that the names have a meaning and are not just chosen at random, e.g. Voltron is an autoencoder that is composed of several parts, each with a specific purpose to try to disentangle the latent space.

It is also a bit of a fun challenge sometimes. Picking a theme, like Middle Earth, and then determining how each of the parts I am currently working on map onto this setting is a fun activity to keep thinking about when I am not working.

One of your signature projects is the work on a "classifier" that allows you to determine properties of the trains in the rail network? Can you tell us about this?

Our main use case is to monitor rail infrastructure by analyzing acceleration measurements made when a train passes over the IIoT device. I like to describe it as similar to an ultrasound in a medical setting, but instead of looking in space, we look in the frequency domain.

Now, that primary use case is complicated because we measure not only the infrastructure, but rather a complex interaction between the train, the temperature, the rails itself and the rocks under the rails. In order to understand what we are measuring, the “response” of the system, it is crucial to be as specific as we can about the “input”: What type of train it was, its speed, the number of axles, and even the weight of each axle. Determining these factors using machine learning and other techniques is something I myself have invested a lot of time into.

The general idea of disentangling the effects of the train from the asset itself has taken several shapes over time. It started as a personal project in line with our moonshot budget, i.e. where we can dedicate 15% of our time to personally directed projects. And I keep coming back to it with new ideas and approaches on how to increase accuracy.

In doing so I have significantly reduced the amount of manual work needed to assemble the corresponding “ground truth” data sets. Thankfully, in the most recent moonshot round we were able to connect and combine the projects from a number of data scientists, such that now we have the tools to automate the entire data pipeline from start to finish. This means we are close to having a solution which will automatically trigger the training of the models as needed, and the labels used in that training will be produced by new models trained on synthetic traces. This is important since when we roll out to larger and larger networks, we need to keep the manual effort at a minimum while guaranteeing reliability of our insights.

In total, we will be able to be much more accurate about what we can say about train speed, weight, length and so on at a fraction of the effort – and this in turn will be a reliable basis for our insights on infrastructure, as well as enable a significantly higher accuracy for our insights about the network traffic.

How does that fit your day-to-day work at KONUX?

I love the chance to choose my own projects and work on the things that I believe will have a big impact on our users, but to do so in a setting where I am allowed to take bigger risks and try things out that I acknowledge might not work, and which with the pressure of a big milestone I might not have felt comfortable attempting.

It is a great opportunity to be even more creative than usual, think wildly or try out the methodology from a paper I might have seen in a conference. Usually it comes down to taking disparate ideas from many different inspirations and bringing them together into a single application. Sometimes it works, sometimes it doesn’t, but I have a lot of fun along the way.