How does AI impact data engineers?
Will we lost our jobs?
I invite you to join my paid membership list for only 7$/month (pay annually) to get access to:
This article + other 200+ deep-dive data engineering articles
CLI tools to help you learn data engineering skills → Demo for Spark learning tool
If you’re a student with an education email, use this 50% ANNUAL DISCOUNT
Intro
The main theme of Internet discussion about jobs right now: “AI will replace us, it will take our jobs…“
The second main theme of Internet discussion about jobs is: “AI is not smart enough to handle our jobs, we are safe (now). “
You will read about a big tech company that laid off X% of its employees because they can leverage 20$ subscriptions (or it’s just that the company is going downhill and using AI as a fancy reason)
Or you might read about a piece of sci-fi vibe movie like: “X% of our code is written by AI“.
—
The rise of LLMs and their wrapper applications like ChatGPT, Gemini, and Claude clearly improves life in some ways, but it also instills fear in people.
Let’s temporarily put the in-the-future fear, where AI will have its own mind, create an army of T-800, and wipe us out, aside.
The more obvious and evidence-based fear is that AI will take our jobs. Many time-travelers predict that jobs A, B, and C will disappear in X, Y, and Z years. Or some industries are seeing AI actually taking people's jobs.
—
At least 10 times this year, I have had folks reach out to me to express their concern that AI will disrupt data engineering: some were afraid that AI would replace them when they saw it could write PySpark jobs, and some were struggling to find a junior position.
My answer is always generic like this: “Focus on fundamentals and know the right way to do something, as AI will need our feedback to do well.“
Thinking back, that’s a fine answer, but it is not detailed enough to answer the question “How does AI impact data engineers?”
So I decided to sit down and write this article.
—
In this article, we will discuss my personal observations and experiences regarding how AI will impact us (data engineers). We will dive into two big sections: the first examines the angle at which AI can boost our productivity (or replace us), and the second explores how our mindset changes when the customer is now also the AI.
Note: This is purely my train of thought. Also, I’m somewhat out of date on recent AI innovations, and I’m more on the ‘not-so-hyped-about AI’ side. So, take it with a grain of salt.
tl;dr
Using AI is not optional anymore.
If you stop understanding problems, making decisions, evaluating trade-offs based on the current context and constraints, and communicating with others, you will be replaced by AI. → Learning aggressively to become a senior.
If somehow AI can do these in the future, we will be doomed; not just data engineers, but all humans.
The demand for leveraging AI in organizations (fine-tuning or making them an analytics serving layer) forces us, as data engineers, to update our mindset and skill set. From implementing the semantic layer and understanding the vector database to advanced techniques for making AI consistent and reliable.
From that, I personally believe the data engineer role won’t be replaced soon. There will be two dominant statuses of those who pursue the data engineer career:
They won’t get a job: their experience and skill set can be replaced by AI.
They’re having a job with more tasks to do than ever. The CEO might think: “AI can help a person do many things now, so that we can save labor costs. “ The tasks of the remaining data engineers increase by the following factors:
The number of the company’s data engineers decreases
The high pressure from the company board because they believe AI can significantly boost an individual’s productivity → more tasks for an individual.
Sloppy AI’s work causes bugs or disasters. We now spend more time than ever reviewing. (e.g., commit with 50+ file changes and 1000+ code diffs)
AI boosts our productivity.
In each section below, we will discuss a main data engineering task. In some tasks, we discuss the two sub-processes: decision making and implementation, to see the clear AI impact on the entire process. The order of these sections does not reflect the actual order of those steps in the real-life data engineering process.
Keep in mind: the impact evaluation is my own.
Ingest and move data.
One of the most obvious data engineering tasks. Your business user wants some insight; you “link” the insight to the source data, reach the source, and “move” the data.
I invite you to join my paid membership list for only 7$/month (pay annually) to get access to:
This article + other 200+ deep-dive data engineering articles
CLI tools to help you learn data engineering skills → Demo for Spark learning tool
If you’re a student with an education email, use this 50% ANNUAL DISCOUNT





