9 software engineering skills a DE should have and how to learn them effectively.
Make your work more reliable
With only $7/month (billed annually), you can access all the materials you need to grow from junior → senior DE.
200+deep-dive data engineering articles
practice-spark: 65 LeetCode-style problems to practice Spark SQL/DataFrame
learn-spark/dbt/airflow: CLI tools to master Spark/dbt/Airflow
If you’re a student with an education email, use this 50% ANNUAL DISCOUNT
If you’re a Vietnamese user, please DM me for an upgrade due to payment issues. As compensation for the inconvenience, you’ll get 50% OFF the annual plan.
Intro
In recent years, people have called for applying software engineering best practices to data engineering. From CI/CD and testing to environment separation and observability.
The motivation is understandable: if software engineering best practices could ensure software quality, we hope they could do the same for data quality.
I think that’s a great idea.
But now, we have to expand our technical breadth even more. That journey might be confusing.
To clear the mist, I wrote this article to list skills/areas of software engineering that a data engineer must equip themselves with, along with an approach to learning them effectively. Hope this makes your roadmap a bit clearer.
Before jumping into those skills, let's first understand what software engineering is.
This article is written purely based on my experience and observations. If you find I miss anything, feel free to let me know.
What is software engineering?
I have a confession.
For the first few years of my career as a data engineer, I thought software engineering just meant creating working code.
And there's nothing more to say about it; I was wrong. That thought makes me focus only on the writing code aspect and ignore everything else.
Software engineering is the discipline of building systems that keep working. Even when requirements change, when bugs arise, and when the guy who originally created them has left the company. Software engineering makes us think about maintainability, reliability, testability, productivity, and scalability.
For data engineers, seeing software engineering as more than coding is crucial. It is no exaggeration to say that software engineering practice is 90% about what works reliably, especially in contexts where even a small mistake could make people lose trust in the data we provide.
The rest of this article covers the specific software engineering skills I believe we should all have.
Writing code that other people (and future you) can understand
I said coding is not software engineering, but listing it as the first thing to learn here. ¯\_(ツ)_/¯
(Saying this might get me into a lot of trouble →) Writing code is easy.
But writing understandable code is hard. It needs time.
Writing understandable code means expressing your logic clearly enough that another engineer (or even you, six months from now) can understand, modify, and extend it without spending a whole day figuring out what the current code is doing.
I don’t think I need to discuss much about why we must write understandable code here; a very high chance that you’ve already faced a situation where you inherit someone’s code and a week later you still don’t fully understand it, or have the courage to adjust something with 100% sure it won’t break.
Writing understandable code makes you a less selfish person.
Also, it makes you become better at writing code. To do it, you must first clear your thought process, which will make you better at reasoning and expressing things.
—
How to learn?
And to achieve it, start simple. Give variables meaningful names, create functions that do one thing and modules that have a clear purpose, and add comments where code is not enough. Then, pay attention to design patterns (Python-general or data-pipeline-specific) and follow up your programming language best practices or your company’s practices.
To me, the best way to learn is by participating in the coding review process on both sides: having your code reviewed and reviewing someone’s code. This allows you to get feedback from others while learning from how they write code.
Another effective approach is to read open-source code, or even better, contribute to some open-source project. To enable collaboration, those projects must be understandable enough: you will learn a lot by the way they organize the code, naming the variable, expressing if-else, or handling edge cases.
Read it and “steal“ some of their method, if you can, contribute to the project, and you will have a code review session from the project’s maintainers, who have a lot of experience in writing understandable code.
Version Control
It’s more than Git.
With only $7/month (billed annually), you can access all the materials you need to grow from junior → senior DE.
200+deep-dive data engineering articles
practice-spark: 65 LeetCode-style problems to practice Spark SQL/DataFrame
learn-spark/dbt/airflow: CLI tools to master Spark/dbt/Airflow
If you’re a student with an education email, use this 50% ANNUAL DISCOUNT
If you’re a Vietnamese user, please DM me for an upgrade due to payment issues. As compensation for the inconvenience, you’ll get 50% OFF the annual plan.






