Are You an ETL Developer Or a Data Engineer? (3 min read)
The evolving data engineering landscape
This question feels redundant.
ETL or Extract Transform and Load, is a function that a developer performs when moving data from a source to a target. So, ETL development is a component of data engineering.
The job title, ETL developer, always seemed very limiting as it is just one component of many functions the typical ETL developer will perform. Nonetheless, the industry, for years has adopted that term to label anyone that is responsible for building and automating data pipelines and data-infrastructure, typically for large enterprises that have data warehouses.
Recently, another term has come into fashion - The Data Engineer. Is this a rebrand of the traditional ETL developer?
Unsurprisingly, the industry hasn’t been able to provide a definite answer.
On the surface, they are the same - Data Engineers are also responsible for building and automating data pipelines and data-infrastructure. However, once we get past that high-level description, different organizations tend to think about these roles in materially different ways.
GUI-Developer Vs. Software Engineer
The industry tends to think of ETL developers as practitioners that have specialized in proprietary tools that are used for ETL. These tools were adopted by large firms en mass in the 90s and early 2000s, and promised a low-code solution through a visual UI. These are tools like Informatica, Micorosoft’s SSIS and IBM’s Datastage. ETL developers would build programs in a drag-and-drop manner and although the promise was for ease of use, the reality was that the tools were complex and idiosyncratic which required specialists to wield them.
Core software engineering concepts like data structure and algorithms as well as standard best practices like version-control, were abstracted away by these ETL tools and gave that functionality out of the box. So if you had experience with using these tools and were decent at SQL, you had a good chance of getting hired as an ETL developer.
Data Engineers on the other hand are seen as a subset of software engineers. They are expected to know general purpose programming languages (python, java, Scala), CS fundamentals like data structures and algorithms and time complexity, databases, web APIs and more. What makes them a specialty is the expectation of them to have deeper knowledge of database internals, distributed systems, data modeling, open-source tooling and data architecture.
One way to think about the difference between ETL developers and data engineers, is that data engineers can transition into other roles in software engineering like back-end engineer and vice-versa. Whereas the same isn’t typically true of ETL developers. It isn’t uncommon for ETL developers to have never used something like git for instance.
Tech-first Companies vs. traditional firms
If you split companies that hire people to build data-pipelines, you can broadly split them into two categories - companies where software is the product and companies that use software to service their product. This is the difference between Stripe and your traditional bank. This difference is getting increasingly blurrier as time goes on but it’s an effective heuristic nonetheless.
Tech-first companies tend to list postings for Data Engineers. Traditional firms tend to list postings for ETL developer. Although, this appears to be changing slowly where I’ve seen the title Data Engineer show up in the job listings of insurance companies and banks.
Beyond ETL
Data Engineers tend to be expected to own more of the data-workflow than a typical ETL developer. If we roughly split the data-workflow into database modeling, ETL pipelines, automation and visualization/reporting, ETL developers tend to only be responsible for ETL pipeline and automation. However, data engineers are expected to own or at least have the ability to work on everything until the visualization piece. With the rise of cloud services, it is common for data engineers to take on the responsibility of painting some of the cloud infrastructure as well, often called data-ops.
Closing Thoughts
The data industry has been going through a period of rapid change within the last 5 years let alone the last decade. It is likely futile to try to speculate what the next 10 years hold. Nonetheless, here are my prognostications.
There will continue increase in data engineering jobs as more companies of all sizes try to leverage the swaths of data that they find themselves in.
Due to the increase of web services, unstructured data and need for real-time analytics, traditional firms will transition from hiring ETL developers to the more software engineering focused data engineer.
There will always be some demand for the skillsets of traditional ETL developers as tools like Informatica have been ingrained in the data-infrastructure of large firms. There is also a rise of a new set of low-code solutions for ETL developers emerging that might find a middle ground.