Data Engineering for Beginner using Google Cloud & Python

Basic data engineering : python, pandas, google cloud platform (GCP) bigquery, spark on dataproc, gcs, data warehouse

“Data is the new oil”.

What you’ll learn

  • Basic data engineering, what is data engineering, why needed, how to do it from zero.
  • Relational database model, database modelling for normalization design & hands-on using postgresql & python / pandas.
  • NoSQL database model, denormalization design & hands-on using elasticsearch & python / pandas.
  • Introduction to spark & spark cluster using google cloud platform.

Course Content

  • Introduction –> 3 lectures • 11min.
  • Introduction to Data Engineering –> 3 lectures • 15min.
  • Database –> 8 lectures • 1hr 5min.
  • Relational Database Model –> 23 lectures • 1hr 39min.
  • NoSQL Database Model –> 8 lectures • 32min.
  • Data Warehouse –> 10 lectures • 1hr 25min.
  • Numbes Every Engineer Should Know –> 3 lectures • 16min.
  • Hadoop & Spark –> 10 lectures • 1hr 6min.
  • Spark Cluster on Google Cloud (Dataproc) –> 3 lectures • 39min.
  • Data Lake –> 4 lectures • 51min.
  • Resources & References –> 2 lectures • 4min.

Data Engineering for Beginner using Google Cloud & Python

Requirements

“Data is the new oil”.

 

You might have heard the quote before. Data in digital era is as valuable as oil in industrial era. However, just like oil, raw data itself is not usable. Rather, the value is created when it is gathered completely and accurately, connected to other relevant data, and done so in a timely manner.

Data engineers design and build pipelines that transform and transport data into a usable format. A different role, like data scientist or machine learning engineer then able to use the data into valuable business insight. Just like raw oil transformed into petrol to be used through complex process.

To be a data engineer requires a lot of data literacy and practice. This course is the first step for you who want to know about data engineering. In this course, we will see theories and hands-on to introduce you to data engineering. As data field is very wide, this course will show you the basic, entry level knowledge about data engineering process and tools.

 

This course is very suitable to build foundation for you to go to data field. In this course, we will learn about:

  • Introduction to data engineering
  • Relational & non relational database
  • Relational & non relational data model
  • Table normalization
  • Fact & dimension tables
  • Table denormalization for data warehouse
  • ETL (Extract Transform Load) & data staging using pyhton pandas
  • Elasticsearch basic
  • Data warehouse
  • Numbers every engineers should know & how it is related to big data
  • Hadoop
  • Spark cluster on google cloud dataproc
  • Data lake

 

Important Notes

Data field is HUGE!  This course will be continuously updated, but for time being, this contains introduction to concept, and sample hands-on for data engineering.

For now, this course is intended for beginner on data engineering.

If you have some experience on programming and wonder about data engineering, this course is for you.

If you have experience in data engineering field, this course might be too basic for you (although I’m very happy if you still purchase the course)

If you never write python or SQL before, this course is not for you. To understand the course, you must have basic knowledge on SQL and pyhton.