Roadmap for Data Engineer
A Data Engineering Roadmap beyond just cracking Interviews!
Here is a 32 weeks Step by Step Plan
1. Introduction to Big Data/DataLake Storage (3 weeks)
Big Data - The Big Picture, Linux Commands, Introducing the Multi Node Practice Environment, Distributed Storage
2. Distributed Processing with Pyspark (10 weeks)
Distributed Processing Fundamentals, Optimization techniques, Partitioning, Bucketing, Join Optimizations, File Formats, Compression techniques, Structured API, Spark SQL, Hive, Spark- Hive Integration, Project
3. Azure Databricks (6 weeks)
Databricks File System & Architecture, Delta Lake, Delta Table, Lakehouse Architecture, Delta Engine Optimizations, Medallian Architecture, Cluster Creation, Autoloader
4. Azure DataFactory (3 weeks)
Data Ingestion, Data Transformation (Data Flow), Workflow Orchestration, Data Integration Service, Pipeline Triggers on Custom Events, Data Orchestration, Data Flow Mapping, Project
5. Interview Readiness (3 weeks)
Data Modeling | System Design | CICD | Interview Preparation
6. Structured Streaming + Kafka + Autoloader, Project (4 weeks)
7. Azure & AWS Cloud services (2-3 weeks)
AWS - EMR, Redshift, Athena, Glue, S3, Lambda
Additional Modules -
Python for Data Engineers
DSA for Data Engineers (Targeting Product based Companies)
I follow the same Roadmap in my "Ultimate Big Data Masters Program"
It's a (32 weeks) extensive program, beyond just cracking Interviews!
I will be covering end to end of Big Data with this program while focusing on Cloud and surrounding technologies -
If you are looking to join this life changing big data program then DM me to know more. New batch is starting tomorrow.