Lorem ipsum dolor sit amet

read more

A Data Scientist, a Data Engineer and a Statistician walk into a bar, what next?

This was the question raised at our Big Data event, held at the Major Players Covent Garden office, at the beginning of October. With a panel of four Data science professionals, lead by Leila Seith Hassan, Head of Analytics at Ogilvy, the panellists spoke of the attributes that distinguish Data Scientists, Data Engineers and Statisticians.

Joining Leila on the panel was Callum Staff, Lead Data Scientist at M&S, Sam Cooledge, Lead Data Scientist at NetX Labs, Raj Sharma, Data Science Tech Lead at DWP and Francesco D'Orazio, VP Product and Research at PULSAR. 

The topics discussed

Can you define the role of a Data Scientist and a Statistician?

Sam – A Data Scientist enables the translation; it’s that ability to translate statistics or insights into commercially valuable information, whether that’s marketing, sales, product design or management. That translation is the definitive factor for a data scientist in my opinion. It encompasses a whole host of other skills from production rate to coding; it’s that key ability to translate that defines the role of a Data scientist. The difference between a Data Scientists and a Statistician is commercial interest, not academic interest, and it’s the translation of a problem and the solution into commercially valuable information. 

Looking back ten years, there weren’t that many Data Scientists with people being considered analysts or operations researchers, what were they doing that was different to now?

Francesco – The fundamental difference between a Data Scientist and a Statistician is that a Data Scientist, because of their mix of research, coding and product skills, is someone that can bring new structure into data science. A Data Scientist for me is someone that can code, but who also understands the fundamentals of statistics, analysis, product, and customer needs. It’s someone that can balance those three while bringing in the new structure.

The reason why ten years ago these people didn’t exist is that the range of data available back then just wasn’t as wide as it is now. We’re going through a phase were everything generates data.

Callum - I think one of the defining features of a Data Scientist is their creativity and vision. That’s not to say you don’t get Statisticians that aren’t creative and who don’t have a vision but I agree with Fran, you need the maths, and you need the coding. A large proportion of a Statisticians’ role is to provide the reporting which in turn allows the manual decision making.

Where does a Data Engineer fall into this?

Raj – Data Engineering is more about the sequence and making sense of the data and the wrangling of data. These skills are very much required for a Data Engineer. Data analytics is more about being able to visualise it and tell the story of what is behind the data. It’s about the data insight. You can’t have a Data Scientist if there’s no data to do the science on.

What are the main differences between a Data Scientists and a Statistician?

Leila - A Data Scientist can do what at Statistician can do but they can also create a product out of it, they can look into a much greater variety and complexity of data and use languages that a Statistician couldn’t necessarily apply. They then automate and build products using those languages.

Francesco – If the Statistician can help make certain decisions with data, a Data Scientist can help you understand what decisions can be made with that data because they can change the shape of the data.

Are there languages and skills that a data scientist has to have these days or is it still too broad to put that much definition around it?

Sam – For me, the answer is probably no because what is required is a tighter definition of the problem at hand. It is difficult to say that all business problems require a single solution, what is needed is a better rule book for defining the issues.

Trying to fit three things around what a Data Scientist should have is impossible; I think the way to get around it, is to consider what a Data Science team should have.

Callum – As much as I think I am a Data Scientist, if you take the two words literally, I don’t believe it scientist really means anything. It conjures up images of the sort of stuff we do just like an engineer is not a civil engineer. A Data Architect is not an architect, but you get the idea of the sort of stuff we’re doing.

Data Scientists are paid around 36% more than other creative analysts. Do you think Data Science is actually harder than the other data specialities and therefore warrants it?

Francesco - Yes I think that there is value in Data Science but only if you know how to use it. You can create the next big product with the right scientist in the team, but you need to know what questions to ask them, what resources to give them, how much time to give them. If the systems surrounding that person are not right, you are just wasting money.

What do you need to be studying or areas that you need to be looking at as a Data Scientist?

Sam – It’s important to have a portfolio and show how you can take that working knowledge and apply it to something. You might get a straightforward data search, but in the real world, it is never going to be that clean. Nor is the research question going to be that clean. Going out there and finding the problem and trying to solve it, that’ going to make the difference. That’s what gives you an edge. Being able to translate insights into commercial value, which shows a company you know how to solve a problem for them.

There are a lot of accreditations for different types of analysts. Google has a lot of certifications around the web, and there’s the incredible official statistics accreditation. For Data Science, there doesn’t seem like there is anything yet. Are there any that you would recommend?

Francesco – I feel like these accreditation schemes end up being scams from people that are trying to get a job. To get a job, you need to learn statistics and if you know how to code – you can learn everything else, but if you don’t have stats and coding, you can’t do the job.

What does the future hold for Data Science over the next five-ten years?

Sam – I think that industry will become far more specialised. The other thing is compliance. It will be a defining feature in five-ten years. I mean if you look at how the last 12 months have changed with GDPR and privacy, everything has had to change.

Callum – Data Scientists, AR and animation are going to do a lot of people of jobs and there’s a whole other debate about whether jobs support those industries or not but I also think, ironically, Data Scientists are going to do themselves out of jobs as well because a lot of what they do will be automating models in the first place. If you’re a Data Scientist or in this space and you always want to work on models that could do your job for you, then this is going to be the challenge.

Francesco – For me, two things are going to happen, standardisation and automation. Standardisation is going to be building the foundations of the industry, somebody is going to come up with the best way to complete a task in the best industry, and the algorithm is going to become the standard for doing that job. This is going to happen more and more, as soon as all the mains are mapped out, tried out and integrated models, then you’re going to have a winner takes it all scenario where some of the models are going to be owned by some corporations who will deploy them across different verticals.

And then with automation, this is going to become more of a trial and test type model that decides what type of analysis you can do on the data. This is just because Data Scientists are paid to do this, and very few people have the resources to do it right.