In the last decade there has been an AI revolution, a welcome progression from the AI drought between 1990 and 2000. This movement, likely fueled by the proliferation of data availability, is making Data Science a hot career path in today’s job market and opening the door to education and research - feeding a global AI race.
From Japan positioning itself as a leader in robotics, to the UK and Canada looking at ethics and education respectively, countries around the world are positioning themselves at the forefront of AI talent and development. But it isn’t just AI & Machine Learning that’s benefiting from this resurgence. More generally, being a data-savvy individual is a key skill that is becoming mandatory, regardless of the particular IT domain where one is employed.
Cybersecurity and the shortage of AI skillsets
This mindset is opening up new career paths, from building algorithms for self-driving cars, to cutting-edge data visualization for informing management decisions. Think of any business value chain in the digital world and data will no doubt be a value creator within it. However, the line between Data Engineers, Software Engineers and Data Scientists blurring when it comes to big data, with more Data Engineers and Software Engineers seeking to become Data Scientists.
With this shift we are seeing numerous companies offering individuals training to become Data Scientists (which any company should encourage), however, there are very few companies which offer training around data and cybersecurity. While some are, there are many training companies claiming that no prior knowledge in data curation or statistics is required. While that may be true in some areas, in the domain of cybersecurity we need more people with a solid understanding of the principles as well as data science concepts. In the area of cybersecurity there is still a big shortfall in the skillsets required for securing data and avoiding data breaches.
The importance of a Data Scientist skillset in cybersecurity
The reason cybersecurity so urgently needs Data Scientists, is their ability to understand the meaning, statistical properties and relationships between data attributes across a variety of data sources. As well as understanding how those data attributes and data sources might impact a given algorithm, especially when dealing with issues such as the imbalanced classification problem.
If we take credit fraud detection as an example, this means having an intuitive grasp about how, when and where a given transaction type occurs - a prerequisite for formulating and testing experimental hypotheses. It also means understanding exactly how a given classification algorithm might be impacted when few to no examples of a given transaction type are available, and tuning or adapting the algorithm as necessary.
In essence, we need people who can understand the precise nature of the data and overcome its limitations when designing and implementing solutions. We can easily put this into practice, by looking at the self-driving cars industry. There is a lot of real-world data on how to drive on a normal road, when to stop at traffic lights, how to deal with cyclists etc., but very few to no examples of real-world scenarios of driving on the other side of the road or hitting a parked car.
Looking beyond the limitations of data – finding the right mix of skills
Whilst Data and Software Engineers are highly sought after and a valuable asset to any team, they bring a different skillset to that of a trained Data Scientist. To truly fight the bad-guys we need to look beyond the limitations of data and understand the differences between available statistical and machine learning techniques (for example principal component analysis and factor analysis), and their behavior where the data is sparse or highly variable. As well as understanding and overcoming the limitations of these techniques, with an understanding of where machine learning might introduce bias due to human factors (e.g. there are real-world examples of racial or gender bias) – a skillset akin to Data Science.
Of course, no single person is an expert across several data science areas, but it is important that individuals have a breadth of technical ability, in addition to strong technical and domain expertise. In a data science team, it is often useful to have people with complementary technical skills and backgrounds such as STEM subjects, where the focus (theoretical vs. applied) depends on the precise nature of business, but a strong grasp of statistics is essential.
At Callsign we believe it’s important to get the right mix of talent in order to grow teams that can excel in all subjects, but can also learn and grow together. To do this we look beyond conventional hiring routes such as engaging with universities, and offering internship opportunities that represent real life scenarios in a challenging environment (i.e. involving real world noisy data samples). But it isn’t just hiring that needs focus, it is important to ensure that individuals get opportunities to explore new ideas, take on training and attend conferences. By opening the door to talented individuals from varying backgrounds and allowing them to utilize their skillsets together, we can ensure that we are driving the industry forward in the right direction.