Mapping Out Data Science Roles, Part 1

November 24, 2014 at 10:55 AM by Kathleen de Lara

how to hire data engineersBig data rules.

These days, the smartest companies are the ones making effective use of this wave of newfound intelligence – in other words, building a more “data-driven organization.” Data science can be a fascinating, fuzzy field. It’s tricky to tell who does what in the grand scheme of the data-driven org. Defining what that means can also be tricky.

Here’s how DJ Patil, former head of LinkedIn's data and analytics team, breaks it down:

A data-driven organization acquires, processes, and leverages data in a timely fashion to create efficiencies, iterate on and develop new products, and navigate the competitive landscape.

For many companies, one of the first steps to elevating their strengths is hiring a data team comprised of a mix of data scientists, data analysts, and data engineers.

Given data science and its corresponding roles are fairly new, some recruiters and hiring managers may have a hard time differentiating between the types of data specialists – people whose jobs require them to know what type of information to collect, to gather, process, and sift through the data, and to report on the findings.

Having trouble distinguishing the three? In the first our two-part series, we compare the roles of a data analyst and a data engineer.

To help visualize the three different roles, let’s consider how a team of race car drivers operates, and how each role plays a part from the build to deployment and upkeep.

What is a data analyst?

A data analyst creates the blueprint, framework for deciding what structure to build. Before building the race car, a data analyst uses their studies and insights to determine what model makes sense to create.

This is the most junior of the three roles and is often referred to as someone who “does the dirty work” for a data scientist – they run experiments, normalize the data they collect, enumerate data sources, and create models whose goals are defined by the scientist. Data analysts observe correlations, and determine what desired results to optimize for to be able to frame predictions and understand how this influences businesses. Data analysts look at data in a holistic sense, as they are at the foundation of the structure.

The primary tasks of a data analyst are to compile and analyze numerical information into reports that present insights in a way that allows non-technical people to understand how to plan their next business strategies. Data analysts usually have a degree in computer science and/or business, as the role is not entirely computer or programming-related. To organize information, data analysts must be able to understand business processes, accounting, and correlation, and be able to answer the question, “What is this data telling me?”

What is a data engineer?

A data engineer creates the playground on which to observe signals and correlations. They are the race car mechanics.

To put the role into perspective, the data engineer is the closest hybrid role of DevOps and full-stack engineers. Data engineers focus on building and deploying data storage systems that can handle gathering and collecting small and large batches of information, even in real-time streams. They create structures based on the desired specific, hyper-targeted observations, allowing data scientists to be more efficient at their jobs. Data scientists can easily query the data presented to them by data engineers through an API.

The primary tasks of a data engineer are to ensure data is properly stored and can be quickly accessed by the data scientist. Data engineers are usually stronger in machine learning, software engineering and programming skills, including Mahout, Java, Scala, Ruby, C++, and SAS. What other skills do data engineers need? We can tell you. Data engineers understand how to extract and mine data, getting data in and out of its storage, or warehouse.

Successfully hiring for these roles requires sourcing talent highly specialized in gather, processing, and understand datasets of all sizes to create effective business strategies. These data science roles allow organizations to determine what to improve on, what customers are doing, and where competitors are likely headed. Stay tuned for the second part of our data science series to learn about the data scientist role, and how recruiters and hiring managers can hire top talent for these positions!

Don’t forget to tune into our tech recruiting webinar series for a full guide on hiring qualified candidates for your company’s engineering and design teams!

how to recruit top tech talent