From Software to Data Engineering

I started to work as a data engineer when me and my team were tasked to automate some data sets and reports for the businesses/brands our team was supporting. The challenge was significant: these sites lacked any established data pipelines, relying solely on basic event capture. What was initially a three-month project ignited my passion for data and analytics. In this post, I’ll share my experiences and key learnings from the past year.

What is Data Engineering?

Data engineering is a specialized field within software engineering focused on designing systems that transfer and transform data. The goal is to make data accessible and usable for end-users, facilitating decision-making and insights.

What I had to do?

To provide some context, my company manages over 40 distinct brands and businesses. While each team operates independently, we benefit from a shared technology and data backbone. This structure enables rapid team transitions and a unified approach to data management.

Learn the Stack

Our data stack is based on Spark/Scala, Airflow, and Redshift. For this, I took a few Udemy courses to understand the basics of each of these components. Completing end-to-end projects to get reps in really helped wrap my head around this. I also had a lot of references and support from other teams in this process.

Setup the Sack

In my case, we were forming a new data team for a set of businesses managed from our office. The first step was to spin up the data infrastructure and connect to the central data infrastructure where the data lake and Databricks were located. For this step, I had to polish my AWS skills and learn Terraform as it is the way we deploy our infra. Cloud Guru and Udemy were also great resources in this step but nothing like doing the actual thing.

Learn – Try, Rinse, and Repeat – until we were ready

The first few months of our team mainly were learning by trying and error and having a great support system across the company. Also, we kept a direct line of communication with the business to ensure that we were on the right path to what they needed from us.

What I would do different?

Having worked at an enterprise-level company, I’ve navigated numerous pre-established processes and guidelines. If given the chance to design a data stack from scratch, I would lean towards a fully managed, modern approach to minimize the operational burden. The choice of technology must align with specific business needs and the existing data infrastructure. The landscape of data tools has evolved rapidly, simplifying many previous challenges.