Post

Azure Data Engineering Roadmap

Stage 1: Python and SQL

  • WHY PYTHON?

  • Python is crucial for data engineers because it offers a versatile and readable programming language with extensive libraries, facilitating efficient data manipulation and analysis in various data engineering tasks.

  • Steps:

  1. Watch the awesome video below to receive a basic introduction to Python and become familiar with its syntax and concepts in 1 Hour - Programming with Mosh

  2. Practice as much as possible using W3 Schools

  • WHY SQL?

  • SQL is important for data engineers because it helps them easily organize, retrieve, and work with information stored in databases.

  • Steps:

  1. Watch the video to receive a fundamental introduction to SQL, spending 3 hours to become familiar with its syntax and concepts - Programming with Mosh:

  2. Practice as much as possible using W3 Schools

Stage 2: Data Warehouse Concepts

  • WHY DATA WAREHOUSE?

  • Understanding data warehouse concepts is important for data engineers because it helps them create organized repositories of information, like a well-structured library, making it easier to find and use data for analysis, just as a librarian organizes books for easy access.

  • Download the third edition Best Book to learn Data Warehouse concepts: Kimball Group

  • Okay, I hear you 😊 If you are an absolute beginner, I understand this might be a little overwhelming for you. To overcome this, I have taken a simple approach by noting down some of the most important topics in data warehousing, which are more than enough to get started as a data engineer. The topics are as follows:

  • TOPICS

  1. What is a Data Warehouse? What Is a Data Warehouse? - YouTube
  2. OLAP vs OLTP: Explain By Example: OLTP vs OLAP - YouTube
  3. What is Normalization? Normalization Techniques
  4. What is a Fact Table?
  5. What is a Dimension Table?
  6. Data Modelling: Star Schema vs Snowflake Schema
  7. Slowly Changing Dimensions - Type 1 and Type 2: What is SCD - Slowly Changing Dimension - YouTube
  8. What is a Data Mart? Data Mart vs Datawarehouse How Data Mart actually works? We are here to show you! - YouTube
  9. What is Extract Transform Load (ETL)? (understand the ETL part)
  10. What is a Data Lake? DataLake vs Data Warehouse vs Database KNOW the difference between Data Base // Data Warehouse // Data Lake (Easy Explanation👌) - YouTube

Stage 3: AZ-900 - Microsoft Azure Fundamentals Certification

  • Why AZ-900?

  • Completing AZ-900 is important because it provides a foundational understanding of Microsoft Azure,essential for anyone looking to build a career in cloud computing.

  • Certification Info:

  • Exam AZ-900 : Microsoft Azure Fundamentals - Certifications & Microsoft Learn

  • How to Prepare?

  • There are lots of free resources available on the Internet for AZ-900. If you are a video person like me,who likes to learn things by watching videos, you can watch any ONE (based on your preference) of the below videos to prepare for the exam.

  1. FreeCodeCamp.org
  2. Adam Marczak
  3. Edureka
  • Test your Learnings!

  • Once you are done learning the AZ-900 concepts, it’s now time to test your learnings. There is a wonderful website called ExamTopics that will have DUMPS (real-time questions) for the certifications. You can use this website to answer the questions and test your learnings. Make sure you learn all the questions before you book the exam. One thing to be aware of is that, for each question, there will be a discussion tab. Make sure you read the comments from the discussion and validate the right answer for the question (mostly the highly voted one will be the right answer). It is important to check the discussion because sometimes the answer given to the question might be wrong, so please go through the discussion tab for all the questions.

  • Book for the Exam.

  • Okay, once you have learned all the topics and practiced all the DUMPS questions, you can book the exam using the link below (it’s an online-based exam). Exam AZ-900: Microsoft Azure Fundamentals - Certifications | Microsoft Learn

  • Watch this video to understand how to book exam:

Stage 4: Azure Data Tools

  • Create a Free Azure Account
  • After creating a free account, you can try creating different Azure tools by watching the video series below to get a better understanding of how each of these tools works.

  • Azure Data Factory

  • Azure Data Factory (ADF) is a cloud-based Extract, Transform, Load (ETL) tool provided by Microsoft Azure that helps organizations move and transform data from various sources to destinations. Think of it as a data orchestration tool that allows you to create, schedule, and manage ETL data pipelines.

    1. YT ADF Channel- Mr. K Talks Tech - 10 Videos
    2. YT ADF Channel- WafaStudies - 107 Videos
  • Azure Synapse Analytics

  • Azure Synapse Analytics is a cloud-based analytics service by Microsoft Azure which offers big data and data warehousing functionalities. The platform offers a unified experience for data professionals, facilitating collaboration and efficient analysis through integrated workspaces and notebooks.

    YT Azure Synapse Channel - WafaStudies - 59 Videos

  • Azure Databricks

  • Azure Databricks is a cloud-based big data analytics platform provided by Microsoft Azure in collaboration with Databricks. It combines Apache Spark, a powerful open-source analytics engine, with Azure’s cloud services to provide a fast, easy, and collaborative environment for big data and machine learning.

  • Resources to learn Azure Databricks

    1. YT DataBricks Channel - Raja’s Data Engineering - 109 Videos
    2. YT DataBricks Channel - spark fundamental (Theory) - 26 Videos
    3. YT DataBricks Channel - spark practical (DataFrame API) - 25 Videos
  • Azure Data Lake

  • Azure Data Lake Storage is a cloud-based storage service provided by Microsoft Azure that is specifically designed for big data analytics. It allows organizations to capture, store, process, and analyze large amounts of data in a scalable and cost-effective way. Azure Data Lake Storage is often used in conjunction with other Azure services, such as Azure Databricks and Azure Data Factory, to build comprehensive big data and analytics solutions.

  • Watch the below two videos two understand more about Azure Data Lake:

  1. Azure Data Lake
  2. Azure Data Lake
  • Microsoft Fabric

  • Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence. It offers a comprehensive suite of services, including data lake, data engineering, and data integration, all in one place.

  • Spend the entire fourth month learning more about these 5 important Azure Data Engineering tools.

  • The video playlist provided above is really good for anyone to get familiar with these tools.By the end of the fourth month in this 6 Months challenge, you will have a good knowledge of Python and SQL, along with all the required foundational knowledge of how Azure works in general, and most importantly, you will get an idea about the widely used Data Engineering tools in Azure.

Stage 5: DP-203 Azure Data Engineer Associate

  • DP-203 is the Microsoft Azure Data Engineer Associate certification exam. This certification is designed for individuals who want to demonstrate their skills as Azure Data Engineers, specializing in implementing data solutions using Azure services.

  • Why should you get DP-203 Certification?

  • Career Advancement: Having a recognized certification like DP-203 can enhance your career opportunities.Many employers look for certifications as a way to assess a candidate’s expertise and commitment to professional development.
  • Specialized Knowledge:

  • The certification focuses specifically on data engineering tasks in the Azure environment. By earning this certification, you showcase your proficiency in designing and implementing data storage, data processing, and data security solutions using Azure services.

  • Azure Data Engineer Role:
  • If you aspire to work in a role specifically related to data engineering in the Azure ecosystem, this certification is tailored to address the skills and competencies relevant to that position. It covers various aspects of Azure data services, including data storage, data processing, and data security.

  • Resources

  • Free Ones:
  1. Microsoft Azure Data Engineer Associate Certification Training
  2. YT playlist - DP-203
  • Paid One: (Optional)

  • The one below is an online course from Udemy. I have personally purchased this course and found it pretty useful. So, considering the lack of free resources available on the Internet, if you can spend some money,then buy this course to learn about DP-203 concepts, which will help you clear the exam easily.

  • Test your Learnings

  • Once you are done learning the DP-203 concepts, it’s now time to test your learnings using ExamTopics Dumps. Link below:

  • Book your exam:

  • Book the exam once you have gone through all the questions from Exam Topics.

This post is licensed under CC BY 4.0 by the author.