Best Etl Courses

Find the best online Etl Courses for you. The courses are sorted based on popularity and user ratings. We do not allow paid placements in any of our rankings. We also have a separate page listing only the Free Etl Courses.

Learn ETL using SSIS

Microsoft SQL Server Integration Services (SSIS) Training

Created by Rakesh Gopalakrishnan - Over 260,000 Students

"]

Students: 52667, Price: Free

Students: 52667, Price:  Free

Start from an absolute beginner to writing and deploying production quality packages.

In this course we will learn about the basic and advanced concepts of SQL Server Integration Services or SSIS. We will walk through the different tools provided by SSIS to Extract, transform and Load data into various databases. This course can followed along with me, provided you have Windows OS or a Windows VM. There are no-prerequisites to the course. At the end of this course, you will be comfortable building an ETL package, moving data around systems, Transforming data using SSIS controls like Fuzzy Lookup, Web service tasks, Email Tasks etc. and configuring and deploying production quality packages with tasks like SSIS logging and checkpoint tasks. Hope you enjoy the course.

OrientDB – Getting Started with Graph and Document Databases

A full introductory course to OrientDB, the Document-Graph database. Learn about NoSQL's latest trend: GraphDBs!

Created by OrientDB Team - Free OrientDB Getting Started Training

"]

Students: 39726, Price: Free

Students: 39726, Price:  Free

  Graph databases are among the fastest growing trends in technology. Helping you effectively manage modern, highly connected data is the key benefit of a OrientDB. This course will provide you a comprehensive overview of the multiple models supported by OrientDB, with bigger focus on Graph and Document principles as well as walk you through hands on examples of working with the database and its API. 

Pentaho for ETL & Data Integration Masterclass 2021- PDI 9.0

Use Pentaho Data Integration tool for ETL & Data warehousing. Do ETL development using PDI 9.0 without coding background

Created by Start-Tech Academy - 3,000,000+ Enrollments | 4+ Rated | 160+ Countries

"]

Students: 39437, Price: $29.99

Students: 39437, Price:  Paid

What is ETL?

The ETL (extract, transform, load) process is the most popular method of collecting data from multiple sources and loading it into a centralized data warehouse. ETL is an essential component of data warehousing and analytics.

Why Pentaho for ETL?

Pentaho has phenomenal ETL, data analysis, metadata management and reporting capabilities. Pentaho is faster than other ETL tools (including Talend). Pentaho has a user-friendly GUI which is easier and takes less time to learn. Pentaho is great for beginners. Also, Pentaho Data Integration (PDI) is an important skill in data analytics field.

How much can I earn?

In the US, median salary of an ETL developer is $74,835 and in India average salary is Rs. 7,06,902 per year. Accenture, Tata Consultancy Services, Cognizant Technology Solutions, Capgemini, IBM, Infosys etc. are major recruiters for people skilled in ETL tools; Pentaho ETL is one of the most sought-after skills that recruiters look for. Demand for Pentaho Data Integration (PDI) techniques is increasing day after day.

What makes us qualified to teach you?

The course is taught by Abhishek and Pukhraj. Instructors of the course have been teaching Data Science and Machine Learning for over a decade. We have experience in teaching and implementing Pentaho ETL, Pentaho Data Integration (PDI) for data mining and data analysis purposes.

We are also the creators of some of the most popular online courses - with over 150,000 enrollments and thousands of 5-star reviews like these ones:

I had an awesome moment taking this course. It broaden my knowledge more on the power use of Excel as an analytical tools. Kudos to the instructor! - Sikiru

Very insightful, learning very nifty tricks and enough detail to make it stick in your mind. - Armand

Our Promise

Teaching our students is our job and we are committed to it. If you have any questions about the course content on Pentaho, ETL, practice sheet or anything related to any topic, you can always post a question in the course or send us a direct message.

Download Practice files, take Quizzes, and complete Assignments

With each lecture, there is a practice sheet attached for you to follow along. You can also take quizzes to check your understanding of concepts on Pentaho, ETL, Pentaho Data Integration, Pentaho ETL. Each section contains a practice assignment for you to practically implement your learning on Pentaho, ETL, Pentaho Data Integration, Pentaho ETL. Solution to Assignment is also shared so that you can review your performance.

By the end of this course, your confidence in using Pentaho ETL and Pentaho Data Integration (PDI) will soar. You'll have a thorough understanding of how to use Pentaho for ETL and Pentaho Data Integration (PDI) techniques for study or as a career opportunity.

Go ahead and click the enroll button, and I'll see you in lesson 1 of this Pentaho ETL course!

Cheers

Start-Tech Academy

Data Engineer/Data Scientist – Power BI/ Python/ ETL/SSIS

Hands-on Data Interaction and Manipulation.

Created by Bluelime Learning Solutions - Learning made simple

"]

Students: 33129, Price: $89.99

Students: 33129, Price:  Paid

A common problem that organizations face is how to gathering data from multiple sources, in multiple formats, and move it to one or more data stores. The destination may not be the same type of data store as the source, and often the format is different, or the data needs to be shaped or cleaned before loading it into its final destination.

Extract, transform, and load (ETL) is a data pipeline used to collect data from various sources, transform the data according to business rules, and load it into a destination data store.

SQL Server Integration Services (SSIS) is a useful  and powerful Business Intelligence Tool . It is best suited to work with SQL Server Database . It is added to SQL Server Database when you install SQL Server Data Tools  (SSDT)which adds the Business Intelligence Templates to Visual studio that is used to create Integration projects.

SSIS can be used for:

  •  Data Integration

  •  Data Transformation

  •  Providing solutions to complex Business problems

  •  Updating data warehouses

  •  Cleaning data

  •  Mining data

  •  Managing SQL Server objects and data

  •  Extracting data from a variety of sources

  •  Loading data into one or several destinations

Power BI is a business analytics solution that lets you visualize your data and share insights across your organization, or embed them in your app or website. Connect to hundreds of data sources and bring your data to life with live dashboards and reports.

Discover how to quickly glean insights from your data using Power BI. This formidable set of business analytics tools—which includes the Power BI service, Power BI Desktop, and Power BI Mobile—can help you more effectively create and share impactful visualizations with others in your organization.

In this beginners course you will learn how to  get started with this powerful toolset.  We will  cover topics like  connecting to and transforming web based data sources.  You will learn how to publish and share your reports and visuals on the Power BI service.

Data science is the study of data. It involves developing methods of recording, storing, and analyzing data to effectively extract useful information

Data is a fundamental part of our everyday work, whether it be in the form of valuable insights about our customers, or information to guide product,policy or systems development.   Big business, social media, finance and the public sector all rely on data scientists to analyse their data and draw out business-boosting insights.

Python is a dynamic modern object -oriented programming language that is easy to learn and can be used to do a lot of things both big and small. Python is what is referred to as a high level language. That means it is a language that is closer to humans than computer.It is also known as a general purpose programming language due to it's flexibility. Python is used a lot in data science. 

This course is a beginners course that will introduce you to some basics of data science  using Python.

What You Will Learn

  • How to set up environment to explore using Jupyter Notebook

  • How to import Python Libraries into your environment

  • How to work with Tabular data

  • How to explore a Pandas DataFrame

  • How to explore a Pandas Series

  • How to Manipulate a Pandas  DataFrame

  • How to clean data

  • How to visualize data

Data Analyst – ETL/SSIS/SQL/PowerBI

Learn to extract ,transform, and analyse data.

Created by Bluelime Learning Solutions - Learning made simple

"]

Students: 19255, Price: $44.99

Students: 19255, Price:  Paid

Data analysts are in high demand across all sectors, such as finance, consulting, manufacturing, pharmaceuticals, government and education.

The ability to pay attention to detail, communicate well and be highly organised are essential skills for data analysts. They not only need to understand the data, but be able to provide insight and analysis through clear visual, written and verbal communication.

A common problem that organizations face is how to gathering data from multiple sources, in multiple formats, and move it to one or more data stores. The destination may not be the same type of data store as the source, and often the format is different, or the data needs to be shaped or cleaned before loading it into its final destination.

Extract, transform, and load (ETL) is a data pipeline used to collect data from various sources, transform the data according to business rules, and load it into a destination data store.

SQL Server Integration Services (SSIS) is a useful  and powerful Business Intelligence Tool . It is best suited to work with SQL Server Database . It is added to SQL Server Database when you install SQL Server Data Tools  (SSDT)which adds the Business Intelligence Templates to Visual studio that is used to create Integration projects.

SSIS can be used for:

  •  Data Integration

  •  Data Transformation

  •  Providing solutions to complex Business problems

  •  Updating data warehouses

  •  Cleaning data

  •  Mining data

  •  Managing SQL Server objects and data

  •  Extracting data from a variety of sources

  •  Loading data into one or several destinations

SQL is a standard language for accessing and manipulating databases.

SQL stands for Structured Query Language

What Can SQL do?

  • SQL can execute queries against a database

  • SQL can retrieve data from a database

  • SQL can insert records in a database

  • SQL can update records in a database

  • SQL can delete records from a database

  • SQL can create new databases

  • SQL can create new tables in a database

  • SQL can create stored procedures in a database

  • SQL can create views in a database

  • SQL can set permissions on tables, procedures, and views

Power BI is a business analytics solution that lets you visualize your data and share insights across your organization, or embed them in your app or website. Connect to hundreds of data sources and bring your data to life with live dashboards and reports.

Discover how to quickly glean insights from your data using Power BI. This formidable set of business analytics tools—which includes the Power BI service, Power BI Desktop, and Power BI Mobile—can help you more effectively create and share impactful visualizations with others in your organization.

In this  course you will learn how to  get started with this powerful toolset.  We will  cover topics like  connecting to and transforming web based data sources.  You will learn how to publish and share your reports and visuals on the Power BI service.

Bootcamp for KNIME Analytics Platform

For users new to KNIME and data science, or experienced users of other data science tools.

Created by KNIME Inc - Data Science and Evangelism

"]

Students: 19142, Price: Free

Students: 19142, Price:  Free

If you've never used KNIME Analytics Platform before, this is the course for you. You can use KNIME Analytics Platform to create visual workflows with an intuitive, drag and drop style graphical interface, without the need for coding.

We'll start with installation and setup of the software, and present detailed materials on its features. We'll move on to some practical application of data blending from different sources, and use real datasets to show you all the different way you can transform, clean, and aggregate information. Finally, we'll introduce some machine learning algorithms for classification, and show you how to build your own models.

More than 50 videos are provided, along with some exercises for you to work on independently. By the end of the course, we want you to feel comfortable with the interface of KNIME Analytics Platform, be able to perform common processing tasks with your own data, and start putting predictive analytics into practice.

Data Warehouse Developer-SQL Server/ETL/SSIS/SSAS/SSRS/T-SQL

Develop and Implement a Data Warehouse Solution Step by step

Created by Bluelime Learning Solutions - Learning made simple

"]

Students: 17582, Price: $39.99

Students: 17582, Price:  Paid

This course describes how to  design and implement a data warehouse solution.
students will learn how to create a data warehouse with Microsoft SQL Server implement ETL with SQL Server Integration Services, and validate and cleanse data with SQL Server Data Quality Services and SQL Server Master Data Services.

The Primary responsibilities of a data warehouse developer include:
Implementing a data warehouse.
Developing SSIS packages for data extraction, transformation, and loading.
Enforcing data integrity by using Master Data Services.
Cleansing data by using Data Quality Services.

Prerequisites :

Experience of working with relational databases, including:
Designing a normalized database.
Creating tables and relationships.
Querying with Transact-SQL.
Some exposure to basic programming constructs (such as looping and branching).
An awareness of key business priorities such as revenue, profitability, and financial accounting is desirable.

Students will learn how to :

••Deploy and Configure SSIS packages.
••Download and installing SQL Server
••Download and attaching AdventureworksDW database
••Download and installing SSDT
••Download and installing Visual studio
••Describe data warehouse concepts and architecture considerations.
••Select an appropriate hardware platform for a data warehouse.
••Design and implement a data warehouse.
••Implement Data Flow in an SSIS Package.
••Implement Control Flow in an SSIS Package.
••Debug and Troubleshoot SSIS packages.
••Implement an ETL solution that supports incremental data extraction.
••Implement an ETL solution that supports incremental data loading.
••Implement data cleansing by using Microsoft Data Quality Services.
••Implement Master Data Services to enforce data integrity.
••Extend SSIS with custom scripts and components.
••Databases vs. Data warehouses
••Choose between star and snowflake design schemas
••Explore source data
••Implement data flow
••Debug an SSIS package
••Extract and load modified data
••Enforce data quality
••Consume data in a data warehouse

The volume of data available is huge and increasing daily. Structured Query Language -SQL (pronounced as sequel) is the standard language used to communicate and interact with data stored in relational management database systems like Microsoft  SQL Server Oracle, PostgreSQL,MySQL etc.

Different database management systems have their own proprietary  version of the SQL language  but they all conform to using some commands in SQL the same way.   Microsoft SQL Server's version of SQL is known as Transact-SQL  (T-SQL).

You will learn the basics of the SQL language and Transact-SQL since  both use certain commands in the same way.

What You will learn includes:

  • Installing SQL Server

  • Install SSMS

  • Basic Database  Concepts

  • Creating Database

  • Creating Table

  • Creating Views

  • Creating stored procedures

  • Reading data from a database

  • Updating database records

  • Backing up database

  • Deleting Records

  • Truncating Table

  • Dropping Table

  • Dropping Database

  • Restore Database

Data Warehouse basics for absolute beginners in 30 mins

Data Warehouse basic concepts like architecture, dimensional modeling, fact vs dimension table, star vs snowflake schema

Created by Eshant Garg | LearnCloud.Info | 80,000+ Enrollments - Instructor | LearnCloud.Info | AWS | Azure

"]

Students: 9678, Price: Free

Students: 9678, Price:  Free

Important Note:

Please note that this is NOT a full course but a single module of the full-length course, and intended to cover very basic fundamental concepts for absolute beginners so that they can speed up with Azure Synapse SQL Data Warehouse course.

This module is NOT GOOD for you if:

  • You are already experienced in this technology

  • You are looking for an intermediate or advance concepts

  • You are looking for practical examples or demo

This module is GOOD for you if:

  • You want to understand the basic fundamental concepts of this technology.

This is a free module to help others. If you are not in the intended audience, I request you to please feel free to unenroll.

Where I can find a full-length course?

Please look at the bonus lecture in the end.

What will students learn in this course?

  • Microsoft SQL Data Warehouse (Crash course to speed up with Cloud warehousing)

Level

  • Beginners

Intended Audience

  • Anyone who wants to start learning Data warehousing

Language

  • English

  • If you are not comfortable in English, please do not take the course, captions are not good enough to understand the course.

Target Students

  • Database and BI developers

  • Database Administrators

  • Data Analyst or similar profiles

Prerequisites

  • Basic T-SQL and Database concepts

Course In Detail

Data Warehouse Crash Course

  • In this module, you will learn, what is Data Warehouse, Why we need it and how it is different from the traditional transactional database.

  • We will learn the concept of dimensional modeling which is a database design method optimized for data warehouse solutions.

  • Then I will explain what we mean when we say facts and their corresponding fact tables. What are the dimensions and their corresponding dimension tables?

  • how are these special kinds of tables joined together to form a star schema or snowflake schema?

  • This section will establish the foundation before you start my course on Azure Synapse Analytics or formally known as Azure SQL Data Warehouse.

[Tags]

Microsoft SQL Server, Azure SQL Server, Azure SQL Data Warehouse, Data Factory, Data Lake, Azure Storage, Azure Synapse Analytics Service, PolyBase, Azure monitoring, Azure Security, Data Warehouse, SSIS

ETL Testing: From Beginner to Expert

ETL Testing: Essential course for all software testing professionals.

Created by Sid Inf - Data/ETL Architect

"]

Students: 8641, Price: $99.99

Students: 8641, Price:  Paid

DW/BI/ETL Testing Training Course is designed for both entry-level and advanced Programmers. The course includes topics related to the foundation of  Data Warehouse with the concepts, Dimensional Modeling and important aspects of Dimensions, Facts and Slowly Changing Dimensions along with the DW/BI/ETL set up,  Database Testing Vs Data Warehouse Testing, Data Warehouse Workflow and Case Study, Data Checks using SQL, Scope of BI testing and as a bonus you will also get the steps to set up the environment with the most popular ETL tool Informatica to perform all the activities on your personal computer to get first hand practical knowledge. 

Get to know Pentaho Kettle PDI – Introduction

Great start for you to understand what is PDI

Created by Itamar Steinberg (inflow systems) - MBA in the field of IT, Master of ETL

"]

Students: 5054, Price: Free

Students: 5054, Price:  Free

This course is about the foundation of PDI - Pentaho data integration.

What am I going to get from this introduction?

  • understand PDI environment

  • learn what are jobs and transformations

  • learn basic table input output

  • understand the scope of the large course: "learn ETL with Pentaho kettle PDI "

This is a good overview on the subject,
it is not by any means a full course on the subject, you will need to take another course
to become expert by doing a lot of hands on your self.

SQL/ETL Developer – T-SQL/Stored Procedures/ETL/SSIS

Develop an ETL solution using SSIS

Created by Bluelime Learning Solutions - Learning made simple

"]

Students: 3810, Price: $74.99

Students: 3810, Price:  Paid

A common problem that organizations face is how to gathering data from multiple sources, in multiple formats, and move it to one or more data stores. The destination may not be the same type of data store as the source, and often the format is different, or the data needs to be shaped or cleaned before loading it into its final destination.

Extract, transform, and load (ETL) is a data pipeline used to collect data from various sources, transform the data according to business rules, and load it into a destination data store.

SQL Server Integration Services (SSIS) is a useful  and powerful Business Intelligence Tool . It is best suited to work with SQL Server Database . It is added to SQL Server Database when you install SQL Server Data Tools  (SSDT)which adds the Business Intelligence Templates to Visual studio that is used to create Integration projects.

SSIS can be used for:

  •  Data Integration

  •  Data Transformation

  •  Providing solutions to complex Business problems

  •  Updating data warehouses

  •  Cleaning data

  •  Mining data

  •  Managing SQL Server objects and data

  •  Extracting data from a variety of sources

  •  Loading data into one or several destinations

What You Will Learn ....

  •  How to install SQL Server Database

  •  How to download and attach a database to SQL Server

  •  How to download and install SQL Server Data Tools

  •  How to Create a New Integration Services Project

  •  How to add and Configuring a Flat File Connection Manager

  •  How to add and Configuring an OLE DB Connection Manager

  •  How to add a Data Flow Task to the Package

  •  How to add and Configuring the Flat File Source

  •  How to add and Configuring the Lookup Transformations

  •  How to Create Integration Services Tasks

  •  How to Create  New Connection Manager

  •  How to Write data to a SQL Server database

  •  How to Execute a package from SQL Server Data Tools

  •  How to Control Data Flow for Flat Files

  •  How to test  Packages 

  • SQL Functions

  • T-SQL Stored procedures

  • Extracting data from multiple tables

ETL Framework for Data Warehouse Environments

The non functional ETL requirements

Created by Sid Inf - Data/ETL Architect

"]

Students: 3188, Price: $124.99

Students: 3188, Price:  Paid

This course provides a high level approach to implement an ETL framework in any typical Data Warehouse environments. The practical approaches can be used for a new application that needs to design and implement ETL solution which is highly reusable with different data loading strategies, error/exception handling, audit balance and control handling, a bit of job scheduling and the restartability features and also to any existing ETL implementations. For existing implementations this framework needs to be embedded into the existing environment, jobs and business requirements and it might also go to a level of redesigning the whole mapping/mapplets and the workflows (ETL jobs) from scratch, which is definitely a good decision considering the benefits for the environment with high re-usability and improved design standards. 

This course is a combination of standard and practical approaches of designing and implementing a complete ETL solution which details the guidelines, standards, developer/architect checklist and the benefits of the reusable code. And, this course also teaches you the Best practices and standards to be followed in implementing ETL solution. 

Though this course, covers the ETL design principles and solutions based on Informatica 10x, Oracle 11g, these can be incorporated to any of the ETL tools in the market like IBM DataStage, Pentaho, Talend, Ab-intio etc. 

Multiple reusable code bundles from the marketplace, checklists and the material required to get started on UNIX for basic commands and Shell Scripting will be provided. 

An Introduction to Snowflake

Datawarehousing as a Service!

Created by Bhavuk Chawla - Authorized Instructor for Google, Cloudera, Confluent

"]

Students: 2832, Price: Free

Students: 2832, Price:  Free

Are you also perturbed about Scalability, Performance and Cost of your existing Data Warehouses?

We have a solution for you!

Just keep your gung-ho high and join the live webinar on Snowflake. Snowflake provides Data Warehousing as a Service. It opens the door to numerous benefits including almost Zero maintenance, On Demand Scaling in just a few seconds, Simplifying Data Sharing, Zero Copy Cloning etc.

Agenda:

  1. Evolution of Data Warehousing Technologies

  2. What is Snowflake?

  3. Snowflake vs RedShift

  4. Key Concepts

  5. Snowflake Architecture

  6. Setting up Snowflake Trial Account

  7. Demo

  8. Usecase: Implementing Change Data Capture in Snowflake

Please note that this is not an official course from Snowflake.

You may join our YouTube Channel named "DataCouch" for getting access to interesting videos free of cost.

We have many Google certified instructors who can assist your team in moving forward in Google Cloud implementation in the right way.

We are also an official training delivery partner of Confluent Kafka.. We conduct corporate trainings on various topics including Confluent Kafka Developer, Confluent Kafka Administration, Confluent Kafka Real Time Streaming using KSQL & KStreams and Confluent Kafka Advanced Optimization. Our instructors are well qualified and vetted by Confluent for delivering such courses.

Please feel free to reach out if you have any requirements for Confluent Kafka Training for your team. Happy to assist.

Data Warehouse Projects: A Short Course for IT Executives

Learn how to be successful with data warehouse projects.

Created by Bob Wakefield - Data Management Expert

"]

Students: 2059, Price: Free

Students: 2059, Price:  Free

2nd Edition now available!

Data warehouse projects can be expensive and complex. If an organization does not currently have a data warehouse, the value of building one may not be clear. This course will teach you how to manage a data warehouse project in a timely, cost-effective manner that is on budget and will demonstrate value to the business from day one.

This course is designed for people who manage IT projects. If you are not an IT manager but are interested in learning how to build a data warehouse, this course can serve as a solid introduction to data warehousing.

You will learn how to manage the people and processes necessary to bring an enterprise data warehouse to initial operating capability. You will be taught common data warehouse terms so you can effectively communicate with technical resources. You will be introduced to the documents necessary to design and build a data warehouse. You will gain knowledge about common pitfalls to avoid. You will learn who you need to hire to work on the project and how to select those people.

Updates in the 2nd edition include:

Links to external resources have been added to the relevant lectures.

New lectures added after the initial publish date have been fully integrated.

Entire class has been re-recorded using professional voice talent.

Data Warehouse ETL Testing & Data Quality Management A-Z

ETL Testing and Data Quality Management for beginners with practical exercises and certificate of completion.

Created by Lorenz DS - BI Consultant | Data Engineer

"]

Students: 759, Price: $19.99

Students: 759, Price:  Paid

Learn the essentials of ETL Data Warehouse Testing and Data Quality Management through this step-by-step tutorial. This course takes you through the basics of ETL testing, frequently used Data Quality queries, reporting and monitoring. In this tutorial we will learn how to build database views for Data Quality monitoring and build Data Quality visualizations and reports!

..Learn to build data quality dashboards from scratch!

..Learn some of the most common mistakes made when performing ETL/ELT tests..

..Forget about manual ad-hoc ETL testing, learn more about automated ETL and data quality reports

The course contains training materials, where you can practice, apply your knowledge and build an app from scratch. The training materials are provided in an Excel file that you can download to your computer. Each module also ends with a short quiz and there is a final quiz at the end of the course.

After completion of this course, you will receive a certificate of completion.

Good luck and hope you enjoy the course.

Pre-requisites:

  • Basic knowledge of SQL

  • Some experience with Visualization tools would be helpful, but not required

  • Basic setup of database (PostgreSQL, Oracle) and visualization tool (Qliksense) is recommended

Course content:

The course consists of the following modules:

  • Introduction

  • What is ETL/ELT Testing and Data Quality Management?

  • Build database views for Data Quality Monitoring

  • Build dashboards for Reporting

  • Exercises

  • Final Quiz

Who should follow this course?

  • Students that want to learn the basics of ETL/ELT testing and Data Quality Management

  • Business Analysts and Data Analysts that would like to learn more about ETL/ELT testing, frequently used queries and practical examples

  • Software Engineers that would like to build an automated solution for ETL/ELT testing using database views/dashboards

  • Data Stewards and Managers considering to apply data quality standards within their organization

Azure Databricks Masterclass: Beginners Guide to perform ETL

Learn how to perform ETL operations in Azure Databricks. Ideal for DP-200 and DP-201 exams

Created by Amit Navgire - Data Architect

"]

Students: 646, Price: $19.99

Students: 646, Price:  Paid

Databricks was developed with the original founders of Apache Spark with the motive to solve complex data engineering and data science problems in the most efficient way using distributed cluster based programming with the power of Spark framework under the hood.

Databricks provides fully managed clusters on cloud and integrates well with AWS and Azure both. But, in this course we are going to focus on how to create, manage and perform ETL operations using Azure platform.

In this course, you will learn right from the basics of Azure Databricks and slowly progress towards the advanced topics of performing ETL operations in Azure Databricks using practical hands on lab sessions.

Also, this course is helpful for those preparing for Azure Data Engineer Certification (DP-200. DP-201) .

Happy Learning !!

Writing production-ready ETL pipelines in Python / Pandas

Learn how to write professional ETL pipelines using best practices in Python and Data Engineering.

Created by Jan Schwarzlose - Data Engineer aus Leidenschaft

"]

Students: 25, Price: $89.99

Students: 25, Price:  Paid

This course will show each step to write an ETL pipeline in Python from scratch to production using the necessary tools such as Python 3.9, Jupyter Notebook, Git and Github, Visual Studio Code, Docker and Docker Hub and the Python packages Pandas, boto3, pyyaml, awscli, jupyter, pylint, moto, coverage and the memory-profiler.

Two different approaches how to code in the Data Engineering field will be introduced and applied - functional and object oriented programming.

Best practices in developing Python code will be introduced and applied:

  • design principles

  • clean coding

  • virtual environments

  • project/folder setup

  • configuration

  • logging

  • exeption handling

  • linting

  • dependency management

  • performance tuning with profiling

  • unit testing

  • integration testing

  • dockerization

What is the goal of this course?

In the course we are going to use the Xetra dataset. Xetra stands for Exchange Electronic Trading and it is the trading platform of the Deutsche Börse Group. This dataset is derived near-time on a minute-by-minute basis from Deutsche Börse’s trading system and saved in an AWS S3 bucket available to the public for free.

The ETL Pipeline we are going to create will extract the Xetra dataset from the AWS S3 source bucket on a scheduled basis, create a report using transformations and load the transformed data to another AWS S3 target bucket.

The pipeline will be written in a way that it can be deployed easily to almost any production environment that can handle containerized applications. The production environment we are going to write the ETL pipeline for consists of a GitHub Code repository, a DockerHub Image Repository, an execution platform such as Kubernetes and an Orchestration tool such as the container-native Kubernetes workflow engine Argo Workflows or Apache Airflow.

So what can you expect in the course?

You will receive primarily practical interactive lessons where you have to code and implement the pipeline and theory lessons when needed. Furthermore you will get the python code for each lesson in the course material, the whole project on GitHub and the ready to use docker image with the application code on Docker Hub.

There will be power point slides for download for each theoretical lesson and useful links for each topic and step where you find more information and can even dive deeper.