Best Big Data Courses

Find the best online Big Data Courses for you. The courses are sorted based on popularity and user ratings. We do not allow paid placements in any of our rankings. We also have a separate page listing only the Free Big Data Courses.

Hadoop Starter Kit

Hadoop learning made easy and fun. Learn HDFS, MapReduce and introduction to Pig and Hive with FREE cluster access.

Created by Hadoop In Real World - Expert Big Data Consultants

"]

Students: 167147, Price: Free

Students: 167147, Price:  Free

The objective of this course is to walk you through step by step of all the core components in Hadoop but more importantly make Hadoop learning experience easy and fun.

By enrolling in this course you can also get free access to our multi-node Hadoop training cluster so you can try out what you learn right away in a real multi-node distributed environment.

ABOUT INSTRUCTOR(S)

We are a group of Hadoop consultants who are passionate about Hadoop and Big Data technologies. 4 years ago when we were looking for Big Data consultants to work in our own projects we did not find qualified candidates because the big data industry was very new and hence we set out to train qualified candidates in Big Data ourselves giving them a deep and real world insights in to Hadoop.

WHAT YOU WILL LEARN IN THIS COURSE

In the first section you will learn about what is big data with examples. We will discuss the factors to consider when considering whether a problem is big data problem or not. We will talk about the challenges with existing technologies when it comes to big data computation. We will breakdown the Big Data problem in terms of storage and computation and understand how Hadoop approaches the problem and provide a solution to the problem.

In the HDFS, section you will learn about the need for another file system like HDFS. We will compare HDFS with traditional file systems and its benefits. We will also work with HDFS and discuss the architecture of HDFS.

In the MapReduce section you will learn about the basics of MapReduce and phases involved in MapReduce. We will go over each phase in detail and understand what happens in each phase. Then we will write a MapReduce program in Java to calculate the maximum closing price for stock symbols from a stock dataset.

In the next two sections, we will introduce you to Apache Pig & Hive. We will try to calculate the maximum closing price for stock symbols from a stock dataset using Pig and Hive.

Big Data and Hadoop Essentials

Essential Knowledge for everyone associated with Big Data & Hadoop

Created by Nitesh Jain - Hadoop and Data Analytics Instructor

"]

Students: 158982, Price: Free

Students: 158982, Price:  Free

Are you interested in the world of Big data technologies, but find it a little cryptic and see the whole thing as a big puzzle.

Are you looking to understand how Big Data impact large and small business and people like you and me?

Do you feel many people talk about Big Data and Hadoop, and even do not know the basics like history of Hadoop, major players and vendors of Hadoop. Then this is the course just for you!

This course builds a essential fundamental understanding of Big Data problems and Hadoop as a solution. This course takes you through:

  1. Understanding of Big Data problems with easy to understand examples.
  2. History and advent of Hadoop right from when Hadoop wasn’t even named Hadoop.
  3. What is Hadoop Magic which makes it so unique and powerful.
  4. Understanding the difference between Data science and data engineering, which is one of the big confusions in selecting a carrier or understanding a job role.
  5. And most importantly, demystifying Hadoop vendors like Cloudera, MapR and Hortonworks by understanding about them.

Unlock the world of Big Data!!!

Best part is that this course is Free of cost!!! (Best things is life are free :))

Big Data on Amazon web services (AWS)

Learn About Building out Scalable, Resilient Big Data Solutions Using Various Services on AWS Cloud Platform

Created by Learnsector LLP - Learn to Win

"]

Students: 149547, Price: $109.99

Students: 149547, Price:  Paid

This Big Data on AWS course is primarily to simplify the use of Big data tools on AWS. With the unstoppable growth in the organizations moving towards data science and big data analytics there is a dearth need of trained professionals who are well versed with both Big data and AWS technologies. This course helps the learners get the best of both worlds (Big Data analytics and AWS Cloud)  and prepare for the future.   

We cover the following topics in this course:

  • Overview of Big Data on AWS

  • Big Data Storage  & databases on AWS

  • Big Data Analytics Frameworks using AWS EMR, Athena and Elasticsearch

  • Data Warehousing on AWS Redshift

  • Real-Time Big Data Analytics on AWS

  • Artificial Intelligence/Machine Learning

  • Business Intelligence on AWS

  • Big Data Computation on AWS

  • How does it all work together

The Ultimate Hands-On Hadoop: Tame your Big Data!

Hadoop tutorial with MapReduce, HDFS, Spark, Flink, Hive, HBase, MongoDB, Cassandra, Kafka + more! Over 25 technologies.

Created by Sundog Education by Frank Kane - Founder, Sundog Education. Machine Learning Pro

"]

Students: 135248, Price: $109.99

Students: 135248, Price:  Paid

The world of Hadoop and "Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this Hadoop tutorial, you'll not only understand what those systems are and how they fit together - but you'll go hands-on and learn how to use them to solve real business problems!

Learn and master the most popular big data technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We'll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.

  • Install and work with a real Hadoop installation right on your desktop with Hortonworks (now part of Cloudera) and the Ambari UI

  • Manage big data on a cluster with HDFS and MapReduce

  • Write programs to analyze data on Hadoop with Pig and Spark

  • Store and query your data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto

  • Design real-world systems using the Hadoop ecosystem

  • Learn how your cluster is managed with YARN, Mesos, Zookeeper, Oozie, Zeppelin, and Hue

  • Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm

Understanding Hadoop is a highly valuable skill for anyone working at companies with large amounts of data.

Almost every large company you might want to work at uses Hadoop in some way, including Amazon, Ebay, Facebook, Google, LinkedIn, IBM,  Spotify, Twitter, and Yahoo! And it's not just technology companies that need Hadoop; even the New York Times uses Hadoop for processing images.

This course is comprehensive, covering over 25 different technologies in over 14 hours of video lectures. It's filled with hands-on activities and exercises, so you get some real experience in using Hadoop - it's not just theory.

You'll find a range of activities in this course for people at every level. If you're a project manager who just wants to learn the buzzwords, there are web UI's for many of the activities in the course that require no programming knowledge. If you're comfortable with command lines, we'll show you how to work with them too. And if you're a programmer, I'll challenge you with writing real scripts on a Hadoop system using Scala, Pig Latin, and Python.

You'll walk away from this course with a real, deep understanding of Hadoop and its associated distributed systems, and you can apply Hadoop to real-world problems. Plus a valuable completion certificate is waiting for you at the end! 

Please note the focus on this course is on application development, not Hadoop administration. Although you will pick up some administration skills along the way.

Knowing how to wrangle "big data" is an incredibly valuable skill for today's top tech employers. Don't be left behind - enroll now!

  • "The Ultimate Hands-On Hadoop... was a crucial discovery for me. I supplemented your course with a bunch of literature and conferences until I managed to land an interview. I can proudly say that I landed a job as a Big Data Engineer around a year after I started your course. Thanks so much for all the great content you have generated and the crystal clear explanations. " - Aldo Serrano

  • "I honestly wouldn’t be where I am now without this course. Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment.   This course helped me achieve a far greater understanding of the environment and its capabilities.  Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment." - Tyler Buck

Spark and Python for Big Data with PySpark

Learn how to use Spark with Python, including Spark Streaming, Machine Learning, Spark 2.0 DataFrames and more!

Created by Jose Portilla - Head of Data Science, Pierian Data Inc.

"]

Students: 81781, Price: $129.99

Students: 81781, Price:  Paid

Learn the latest Big Data Technology - Spark! And learn to use it with one of the most popular programming languages, Python!

One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems!

Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill! Because the Spark 2.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market!

This course will teach the basics with a crash course in Python, continuing on to learning how to use Spark DataFrames with the latest Spark 2.0 syntax! Once we've done that we'll go through how to use the MLlib Machine Library with the DataFrame syntax and Spark. All along the way you'll have exercises and Mock Consulting Projects that put you right into a real world situation where you need to use your new skills to solve a real problem!

We also cover the latest Spark Technologies, like Spark SQL, Spark Streaming, and advanced models like Gradient Boosted Trees! After you complete this course you will feel comfortable putting Spark and PySpark on your resume! This course also has a full 30 day money back guarantee and comes with a LinkedIn Certificate of Completion!

If you're ready to jump into the world of Python, Spark, and Big Data, this is the course for you!

Apache Spark with Scala – Hands On with Big Data!

Apache Spark tutorial with 20+ hands-on examples of analyzing large data sets, on your desktop or on Hadoop with Scala!

Created by Sundog Education by Frank Kane - Founder, Sundog Education. Machine Learning Pro

"]

Students: 72367, Price: $94.99

Students: 72367, Price:  Paid

New! Completely updated and re-recorded for Spark 3, IntelliJ, Structured Streaming, and a stronger focus on the DataSet API.

“Big data" analysis is a hot and highly valuable skill – and this course will teach you the hottest technology in big data: Apache Spark. Employers including AmazonEBayNASA JPL, and Yahoo all use Spark to quickly extract meaning from massive data sets across a fault-tolerant Hadoop cluster. You'll learn those same techniques, using your own Windows system right at home. It's easier than you might think, and you'll be learning from an ex-engineer and senior manager from Amazon and IMDb.

Spark works best when using the Scala programming language, and this course includes a crash-course in Scala to get you up to speed quickly. For those more familiar with Python however, a Python version of this class is also available: "Taming Big Data with Apache Spark and Python - Hands On".

Learn and master the art of framing data analysis problems as Spark problems through over 20 hands-on examples, and then scale them up to run on cloud computing services in this course.

  • Learn the concepts of Spark's Resilient Distributed Datasets, DataFrames, and Datasets.

  • Get a crash course in the Scala programming language

  • Develop and run Spark jobs quickly using Scala, IntelliJ, and SBT

  • Translate complex analysis problems into iterative or multi-stage Spark scripts

  • Scale up to larger data sets using Amazon's Elastic MapReduce service

  • Understand how Hadoop YARN distributes Spark across computing clusters

  • Practice using other Spark technologies, like Spark SQL, DataFrames, DataSets, Spark Streaming, Machine Learning, and GraphX

By the end of this course, you'll be running code that analyzes gigabytes worth of information – in the cloud – in a matter of minutes. 

We'll have some fun along the way. You'll get warmed up with some simple examples of using Spark to analyze movie ratings data and text in a book. Once you've got the basics under your belt, we'll move to some more complex and interesting tasks. We'll use a million movie ratings to find movies that are similar to each other, and you might even discover some new movies you might like in the process! We'll analyze a social graph of superheroes, and learn who the most “popular" superhero is – and develop a system to find “degrees of separation" between superheroes. Are all Marvel superheroes within a few degrees of being connected to SpiderMan? You'll find the answer.

This course is very hands-on; you'll spend most of your time following along with the instructor as we write, analyze, and run real code together – both on your own system, and in the cloud using Amazon's Elastic MapReduce service. over 8 hours of video content is included, with over 20 real examples of increasing complexity you can build, run and study yourself. Move through them at your own pace, on your own schedule. The course wraps up with an overview of other Spark-based technologies, including Spark SQL, Spark Streaming, and GraphX.

Enroll now, and enjoy the course!

"I studied Spark for the first time using Frank's course "Apache Spark 2 with Scala - Hands On with Big Data!". It was a great starting point for me,  gaining knowledge in Scala and most importantly practical examples of Spark applications. It gave me an understanding of all the relevant Spark core concepts,  RDDs, Dataframes & Datasets, Spark Streaming, AWS EMR. Within a few months of completion, I used the knowledge gained from the course to propose in my current company to  work primarily on Spark applications. Since then I have continued to work with Spark. I would highly recommend any of Franks courses as he simplifies concepts well and his teaching manner is easy to follow and continue with!  " - Joey Faherty

Taming Big Data with Apache Spark and Python – Hands On!

Apache Spark tutorial with 20+ hands-on examples of analyzing large data sets on your desktop or on Hadoop with Python!

Created by Sundog Education by Frank Kane - Founder, Sundog Education. Machine Learning Pro

"]

Students: 62008, Price: $89.99

Students: 62008, Price:  Paid

New! Updated for Spark 3, more hands-on exercises, and a stronger focus on DataFrames and Structured Streaming.

“Big data" analysis is a hot and highly valuable skill – and this course will teach you the hottest technology in big data: Apache Spark. Employers including Amazon, EBay, NASA JPL, and Yahoo all use Spark to quickly extract meaning from massive data sets across a fault-tolerant Hadoop cluster. You'll learn those same techniques, using your own Windows system right at home. It's easier than you might think.

Learn and master the art of framing data analysis problems as Spark problems through over 20 hands-on examples, and then scale them up to run on cloud computing services in this course. You'll be learning from an ex-engineer and senior manager from Amazon and IMDb.

  • Learn the concepts of Spark's DataFrames and Resilient Distributed Datastores

  • Develop and run Spark jobs quickly using Python

  • Translate complex analysis problems into iterative or multi-stage Spark scripts

  • Scale up to larger data sets using Amazon's Elastic MapReduce service

  • Understand how Hadoop YARN distributes Spark across computing clusters

  • Learn about other Spark technologies, like Spark SQL, Spark Streaming, and GraphX

By the end of this course, you'll be running code that analyzes gigabytes worth of information – in the cloud – in a matter of minutes. 

This course uses the familiar Python programming language; if you'd rather use Scala to get the best performance out of Spark, see my "Apache Spark with Scala - Hands On with Big Data" course instead.

We'll have some fun along the way. You'll get warmed up with some simple examples of using Spark to analyze movie ratings data and text in a book. Once you've got the basics under your belt, we'll move to some more complex and interesting tasks. We'll use a million movie ratings to find movies that are similar to each other, and you might even discover some new movies you might like in the process! We'll analyze a social graph of superheroes, and learn who the most “popular" superhero is – and develop a system to find “degrees of separation" between superheroes. Are all Marvel superheroes within a few degrees of being connected to The Incredible Hulk? You'll find the answer.

This course is very hands-on; you'll spend most of your time following along with the instructor as we write, analyze, and run real code together – both on your own system, and in the cloud using Amazon's Elastic MapReduce service. 7 hours of video content is included, with over 20 real examples of increasing complexity you can build, run and study yourself. Move through them at your own pace, on your own schedule. The course wraps up with an overview of other Spark-based technologies, including Spark SQL, Spark Streaming, and GraphX.

Wrangling big data with Apache Spark is an important skill in today's technical world. Enroll now!

  • " I studied "Taming Big Data with Apache Spark and Python" with Frank Kane, and helped me build a great platform for Big Data as a Service for my company. I recommend the course!  " - Cleuton Sampaio De Melo Jr.

Big Data Hadoop and Spark with Scala

Complete course (No Prerequisites) - Big Data Hadoop with Spark and Eco system

Created by Harish Masand - Technical Lead

"]

Students: 31263, Price: $124.99

Students: 31263, Price:  Paid

This course will make you ready to switch career on big data hadoop and spark.

After this watching this, you will understand about Hadoop, HDFS, YARN, Map reduce, python, pig, hive, oozie, sqoop, flume, HBase, No SQL, Spark, Spark sql, Spark Streaming.

This is the one stop course. so dont worry and just get started. 

You will get all possible support from my side.

For any queries, feel free to message me here.

Note: All programs and materials are provided.

Scala and Spark for Big Data and Machine Learning

Learn the latest Big Data technology - Spark and Scala, including Spark 2.0 DataFrames!

Created by Jose Portilla - Head of Data Science, Pierian Data Inc.

"]

Students: 28013, Price: $89.99

Students: 28013, Price:  Paid

Learn how to utilize some of the most valuable tech skills on the market today, Scala and Spark! In this course we will show you how to use Scala and Spark to analyze Big Data.

Scala and Spark are two of the most in demand skills right now, and with this course you can learn them quickly and easily! This course comes packed with content:

  • Crash Course in Scala Programming
  • Spark and Big Data Ecosystem Overview
  • Using Spark's MLlib for Machine Learning 
  • Scale up Spark jobs using Amazon Web Services
  • Learn how to use Databrick's Big Data Platform
  • and much more!

This course comes with full projects for you including topics such as analyzing financial data or using machine learning to classify Ecommerce customer behavior! We teach the latest methodologies of Spark 2.0 so you can learn how to use SparkSQL, Spark DataFrames, and Spark's MLlib!

After completing this course you will feel comfortable putting Scala and Spark on your resume!

Thanks and I will see you inside the course!

Fundamentals Data Analysis & Decision Making Models – Theory

Master handling Big Data, Analysis and presenting interactive DashBoards. Forecasting and

Created by Manish Gupta - Hospitality Finance Expert and Business Strategist

"]

Students: 25482, Price: Free

Students: 25482, Price:  Free

 Do you want to understand how big data is analysed and how decisions are made based on big data.

In this course we will be covering the various steps involved in data analysis in brief, Objective of this course to make you familiar with these steps and collect your feedbacks and questions.

I will then use those feedback and questions to make the detailed course better and relevant for you.

Data Engineering – ETL, Web Scraping ,Big Data,SQL,Power BI

Hands on Data Interaction using - ETL, Web Scraping ,Big Data,SQL,Power BI

Created by Bluelime Learning Solutions - Learning made simple

"]

Students: 24609, Price: $49.99

Students: 24609, Price:  Paid

A common problem that organizations face is how to gathering data from multiple sources, in multiple formats, and move it to one or more data stores. The destination may not be the same type of data store as the source, and often the format is different, or the data needs to be shaped or cleaned before loading it into its final destination.

Extract, transform, and load (ETL) is a data pipeline used to collect data from various sources, transform the data according to business rules, and load it into a destination data store.

SQL Server Integration Services (SSIS) is a useful  and powerful Business Intelligence Tool . It is best suited to work with SQL Server Database . It is added to SQL Server Database when you install SQL Server Data Tools  (SSDT)which adds the Business Intelligence Templates to Visual studio that is used to create Integration projects.

SSIS can be used for:

  •  Data Integration

  •  Data Transformation

  •  Providing solutions to complex Business problems

  •  Updating data warehouses

  •  Cleaning data

  •  Mining data

  •  Managing SQL Server objects and data

  •  Extracting data from a variety of sources

  •  Loading data into one or several destinations

Web scraping is  the  process of automatically downloading a web page's data and extracting specific information from it. The extracted information can be stored in a database or as various file types.

Web scraping software tools may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

Scraping a web page involves fetching it and extracting from it.  Fetching is the downloading of a page (which a browser does when you view the page).  to fetch pages for later processing. Once fetched, then extraction can take place. The content of a page may be parsed, searched, reformatted, its data copied into a spreadsheet, and so on. Web scrapers typically take something out of a page, to make use of it for another purpose somewhere else. An example would be to find and copy names and phone numbers, or companies and their URLs, to a list (contact scraping).

Big data can be characterised as data that has high volume, high variety and high velocity. Data includes numbers, text, images, audio, video, or any other kind of information you might store on your computer. Volume, velocity, and variety are sometimes called "the 3 V's of big data." 

What kind of datasets are considered big data? 

Examples includes social media network analysing their members' data to learn more about them and connect them with content and advertising relevant to their interests, or search engines looking at the relationship between queries and results to give better answers to users' questions.

SQL is a standard language for accessing and manipulating databases.

SQL stands for Structured Query Language

What Can SQL do?

  • SQL can execute queries against a database

  • SQL can retrieve data from a database

  • SQL can insert records in a database

  • SQL can update records in a database

  • SQL can delete records from a database

  • SQL can create new databases

  • SQL can create new tables in a database

  • SQL can create stored procedures in a database

  • SQL can create views in a database

  • SQL can set permissions on tables, procedures, and views

Power BI is a business analytics solution that lets you visualize your data and share insights across your organization, or embed them in your app or website. Connect to hundreds of data sources and bring your data to life with live dashboards and reports.

Discover how to quickly glean insights from your data using Power BI. This formidable set of business analytics tools—which includes the Power BI service, Power BI Desktop, and Power BI Mobile—can help you more effectively create and share impactful visualizations with others in your organization.

In this beginners course you will learn how to  get started with this powerful toolset.  We will  cover topics like  connecting to and transforming web based data sources.  You will learn how to publish and share your reports and visuals on the Power BI service.

Streaming Big Data with Spark Streaming and Scala – Hands On

Spark Streaming tutorial covering Spark Structured Streaming, Kafka integration, and streaming big data in real-time.

Created by Sundog Education by Frank Kane - Founder, Sundog Education. Machine Learning Pro

"]

Students: 22568, Price: $94.99

Students: 22568, Price:  Paid

New! Updated for Spark 3.0.0!

"Big Data" analysis is a hot and highly valuable skill. Thing is, "big data" never stops flowing! Spark Streaming is a new and quickly developing technology for processing massive data sets as they are created - why wait for some nightly analysis to run when you can constantly update your analysis in real time, all the time? Whether it's clickstream data from a big website, sensor data from a massive "Internet of Things" deployment, financial data, or something else - Spark Streaming is a powerful technology for transforming and analyzing that data right when it is created, all the time.

You'll be learning from an ex-engineer and senior manager from Amazon and IMDb.

This course gets your hands on to some real live Twitter data, simulated streams of Apache access logs, and even data used to train machine learning models! You'll write and run real Spark Streaming jobs right at home on your own PC, and toward the end of the course, we'll show you how to take those jobs to a real Hadoop cluster and run them in a production environment too.

Across over 30 lectures and almost 6 hours of video content, you'll:

  • Get a crash course in the Scala programming language

  • Learn how Apache Spark operates on a cluster

  • Set up discretized streams with Spark Streaming and transform them as data is received

  • Use structured streaming to stream into dataframes in real-time

  • Analyze streaming data over sliding windows of time

  • Maintain stateful information across streams of data

  • Connect Spark Streaming with highly scalable sources of data, including Kafka, Flume, and Kinesis

  • Dump streams of data in real-time to NoSQL databases such as Cassandra

  • Run SQL queries on streamed data in real time

  • Train machine learning models in real time with streaming data, and use them to make predictions that keep getting better over time

  • Package, deploy, and run self-contained Spark Streaming code to a real Hadoop cluser using Amazon Elastic MapReduce.

This course is very hands-on, filled with achievable activities and exercises to reinforce your learning. By the end of this course, you'll be confidently creating Spark Streaming scripts in Scala, and be prepared to tackle massive streams of data in a whole new way. You'll be surprised at how easy Spark Streaming makes it!

Learn Big Data: The Hadoop Ecosystem Masterclass

Master the Hadoop ecosystem using HDFS, MapReduce, Yarn, Pig, Hive, Kafka, HBase, Spark, Knox, Ranger, Ambari, Zookeeper

Created by Edward Viaene - DevOps, Cloud, Big Data Specialist

"]

Students: 22124, Price: $39.99

Students: 22124, Price:  Paid

Important update: Effective January 31, 2021, all Cloudera software will require a valid subscription and only be accessible via the paywall. The sandbox can still be downloaded, but the full install requires a Cloudera subscription to get access to the yum repository.

In this course you will learn Big Data using the Hadoop Ecosystem. Why Hadoop? It is one of the most sought after skills in the IT industry. The average salary in the US is $112,000 per year, up to an average of $160,000 in San Fransisco (source: Indeed).

The course is aimed at Software Engineers, Database Administrators, and System Administrators that want to learn about Big Data. Other IT professionals can also take this course, but might have to do some extra research to understand some of the concepts.

You will learn how to use the most popular software in the Big Data industry at moment, using batch processing as well as realtime processing. This course will give you enough background to be able to talk about real problems and solutions with experts in the industry. Updating your LinkedIn profile with these technologies will make recruiters want you to get interviews at the most prestigious companies in the world.

The course is very practical, with more than 6 hours of lectures. You want to try out everything yourself, adding multiple hours of learning. If you get stuck with the technology while trying, there is support available. I will answer your messages on the message boards and we have a Facebook group where you can post questions.

Taming Big Data with MapReduce and Hadoop – Hands On!

Learn MapReduce fast by building over 10 real examples, using Python, MRJob, and Amazon's Elastic MapReduce Service.

Created by Sundog Education by Frank Kane - Founder, Sundog Education. Machine Learning Pro

"]

Students: 21768, Price: $89.99

Students: 21768, Price:  Paid

“Big data" analysis is a hot and highly valuable skill – and this course will teach you two technologies fundamental to big data quickly: MapReduce and Hadoop. Ever wonder how Google manages to analyze the entire Internet on a continual basis? You'll learn those same techniques, using your own Windows system right at home.

Learn and master the art of framing data analysis problems as MapReduce problems through over 10 hands-on examples, and then scale them up to run on cloud computing services in this course. You'll be learning from an ex-engineer and senior manager from Amazon and IMDb.

  • Learn the concepts of MapReduce
  • Run MapReduce jobs quickly using Python and MRJob
  • Translate complex analysis problems into multi-stage MapReduce jobs
  • Scale up to larger data sets using Amazon's Elastic MapReduce service
  • Understand how Hadoop distributes MapReduce across computing clusters
  • Learn about other Hadoop technologies, like Hive, Pig, and Spark

By the end of this course, you'll be running code that analyzes gigabytes worth of information – in the cloud – in a matter of minutes.

We'll have some fun along the way. You'll get warmed up with some simple examples of using MapReduce to analyze movie ratings data and text in a book. Once you've got the basics under your belt, we'll move to some more complex and interesting tasks. We'll use a million movie ratings to find movies that are similar to each other, and you might even discover some new movies you might like in the process! We'll analyze a social graph of superheroes, and learn who the most “popular" superhero is – and develop a system to find “degrees of separation" between superheroes. Are all Marvel superheroes within a few degrees of being connected to The Incredible Hulk? You'll find the answer.

This course is very hands-on; you'll spend most of your time following along with the instructor as we write, analyze, and run real code together – both on your own system, and in the cloud using Amazon's Elastic MapReduce service. Over 5 hours of video content is included, with over 10 real examples of increasing complexity you can build, run and study yourself. Move through them at your own pace, on your own schedule. The course wraps up with an overview of other Hadoop-based technologies, including Hive, Pig, and the very hot Spark framework – complete with a working example in Spark.

Don't take my word for it - check out some of our unsolicited reviews from real students:

"I have gone through many courses on map reduce; this is undoubtedly the best, way at the top."

"This is one of the best courses I have ever seen since 4 years passed I am using Udemy for courses."

"The best hands on course on MapReduce and Python. I really like the run it yourself approach in this course. Everything is well organized, and the lecturer is top notch."

Apache Spark 2.0 with Java -Learn Spark from a Big Data Guru

Learn analyzing large data sets with Apache Spark by 10+ hands-on examples. Take your big data skills to the next level.

Created by Tao W. - Software engineer

"]

Students: 19288, Price: $99.99

Students: 19288, Price:  Paid

What is this course about:

This course covers all the fundamentals about Apache Spark with Java and teaches you everything you need to know about developing Spark applications with Java. At the end of this course, you will gain in-depth knowledge about Apache Spark and general big data analysis and manipulations skills to help your company to adapt Apache Spark for building big data processing pipeline and data analytics applications.

This course covers 10+ hands-on big data examples. You will learn valuable knowledge about how to frame data analysis problems as Spark problems. Together we will learn examples such as aggregating NASA Apache web logs from different sources; we will explore the price trend by looking at the real estate data in California; we will write Spark applications to find out the median salary of developers in different countries through the Stack Overflow survey data; we will develop a system to analyze how maker spaces are distributed across different regions in the United Kingdom.  And much much more.

What will you learn from this lecture:

In particularly, you will learn:

  • An overview of the architecture of Apache Spark.

  • Develop Apache Spark 2.0 applications with Java using RDD transformations and actions and Spark SQL.

  • Work with Apache Spark's primary abstraction, resilient distributed datasets(RDDs) to process and analyze large data sets.

  • Deep dive into advanced techniques to optimize and tune Apache Spark jobs by partitioning, caching and persisting RDDs.

  • Scale up Spark applications on a Hadoop YARN cluster through Amazon's Elastic MapReduce service.

  • Analyze structured and semi-structured data using Datasets and DataFrames, and develop a thorough understanding of Spark SQL.

  • Share information across different nodes on an Apache Spark cluster by broadcast variables and accumulators.
  • Best practices of working with Apache Spark in the field.

  • Big data ecosystem overview.

Why shall we learn Apache Spark:

Apache Spark gives us unlimited ability to build cutting-edge applications. It is also one of the most compelling technologies of the last decade in terms of its disruption to the big data world.

Spark provides in-memory cluster computing which greatly boosts the speed of iterative algorithms and interactive data mining tasks.

Apache Spark is the next-generation processing engine for big data.

Tons of companies are adapting Apache Spark to extract meaning from massive data sets, today you have access to that same big data technology right on your desktop.

Apache Spark is becoming a must tool for big data engineers and data scientists.

About the author:

Since 2015, James has been helping his company to adapt Apache Spark for building their big data processing pipeline and data analytics applications.

James' company has gained massive benefits by adapting Apache Spark in production. In this course, he is going to share with you his years of knowledge and best practices of working with Spark in the real field.

Why choosing this course?

This course is very hands-on, James has put lots effort to provide you with not only the theory but also real-life examples of developing Spark applications that you can try out on your own laptop.

James has uploaded all the source code to Github and you will be able to follow along with either Windows, MAC OS or Linux.

In the end of this course, James is confident that you will gain in-depth knowledge about Spark and general big data analysis and data manipulation skills. You'll be able to develop Spark application that analyzes Gigabytes scale of data both on your laptop, and in the cloud using Amazon's Elastic MapReduce service!

30-day Money-back Guarantee!

You will get 30-day money-back guarantee from Udemy for this course.

 If not satisfied simply ask for a refund within 30 days. You will get a full refund. No questions whatsoever asked.

Are you ready to take your big data analysis skills and career to the next level, take this course now!

You will go from zero to Spark hero in 4 hours.

Introduction to Big Data – an overview of the 10 V’s

An overview of the Dimensions and Forms of Big Data.

Created by Taimur Zahid - Machine Learning Engineer

"]

Students: 14970, Price: Free

Students: 14970, Price:  Free

This course is designed to be an in-depth overview of the field of Data Science. It teaches the students various Characteristics of Big Data as well as discuss a few types of Data that exists. After completing this course, you will have the knowledge that can be applied later on in your journey into this field when you're selecting an Algorithm, a Tool, a Framework, or even while making a Blueprint of how to deal with the current problem at hand.

Big Data in Advertising – Explained in Plain English

A 30 min overview of what kind of data advertisings use and how that data is collected on your devices.

Created by Ben Silverstein - Digital Advertising Professional & Entrepreneur in NYC

"]

Students: 14303, Price: Free

Students: 14303, Price:  Free

Big Data is a popular buzzword but it is very vague and can be somewhat scary. In this course I explain what Big Data in advertising actually is, and try to remove some of the confusion around the subject. 

This course gives everyone from beginners to professionals a quick overview of what data means to digital advertising. I review the companies involved, define the types of data advertisers look for, when and why advertisers buy data, how users share it (knowingly or unknowingly), and much more.

In this 35 minute course I cover the following topics:

  1. What Personal Data (PII) is

  2. What laws exist to regulate data collection

  3. Different types of Data Categories that Advertisers look for

    1. Demographic

    2. Behavioral 

    3. Contextual 

    4. Retargeting 

    5. Location

  4. How data is actually collected 

  5. Ad-Tech companies involved in the data & ad delivery process

At the end of the course you will have a better understanding of what kind of data is being collected about you, and how advertisers use that data to target their ads. You will also have a better understanding of why you see certain ads on each of your devices. 

---------------------

Real Student Testimonials:

★★★★★ “Very clear and to the point. I like the graphics.” - David Peterson

★★★★★ “Very clear presentation. I understand very easy.” - Catalin Badea

Reviews from Other Courses

Digital Advertising & Marketing 101

★★★★★ “The real-world examples almost makes it self-explanatory. Professionally done and author speaks with authority - i.e. he knows what he's talking about and it shows.” - AJ Du Toit

★★★★★ “Thought this was an excellent introduction course. Working in the industry without a huge amount of experience in this area, it was a great way to familiarize myself with topics in ongoing conversations internally and externally. Will be taking 201 to further my understanding.” - Jocleyn Armour

★★★★★ “It is advertised as a 101 course and it did exactly that and very well, touching on the building blocks of Digital Advertising and Marketing. Good job Ben.” - Jean C

Digital Advertising & Marketing 201

★★★★★ “When combined with Ben's 101 course, the two classes make for a thorough and well-organized primer on digital media today. Perfect for marketing people and agency folks (creative, account) who are not immersed in a media agency. It will give you a foundation for how digital media is structured, a clear explanation of the jargon and acronyms you'll hear bantered about, and a better understanding of the opportunities available. The 201 course goes into important detail about some of the key changes that have taken place in digital advertising recently. Ben explains the concepts clearly and succinctly. Definitely worth the time investment.” - Shawn E Fraser

★★★★★ “This course is amazing. I do affiliate marketing and always wanted to learn about programmatic advertising and this course me taught that. I completed this for an interview and the employer was really impressed by the knowledge I had. Hope there is another in-depth version of this course. Where he goes into ad platforms or ad servers and teaches the real world applications.” - Suryameet Singh

★★★★★ “Comprehensive overview...detailed!” - Kaithlean Crotty-Clark

Introduction to Programmatic Advertising

★★★★★ “I'm in advertising sales and have been looking for a clean easy way to explain and also test my root knowledge of the programmatic ad space. It was very helpful and simple to understand which is hard to do with this topic.” - Raul Bonilla

★★★★★ "Being an advertising agency media planner and buyer, having this hands on information helps when we face a decision to go into the digital advertising space. Your 101 and 201 was extremely informative and truly like your overviews in a very simplistic explanation. Thank you and look forward to your future courses." - Diane Tody

---------------------

 

According to recent trends by Statista, the digital marketing & advertising industry is on pace to be worth over $330B a year by 2021. If you’re not already learning about this industry, you will be soon. Get a jump start on your career, your co-workers, and peers by taking this intermediate-advance level course.

New in Big Data: Apache HiveMall – Machine Learning with SQL

HiveMall SQL on Spark, MapReduce and Tez. Leverage your knowledge of SQL to enter Machine Learning and Big Data space.

Created by Elena Akhmatova - Data Scientist

"]

Students: 14129, Price: Free

Students: 14129, Price:  Free

It is widely accepted that applying Machine Learning techniques to data is a complex task that requires knowledge of a variety of programming languages and means hours of coding, compiling and debugging.  

Not any longer!

Apache HiveMall is a Machine Learning library that allows anyone with basic knowledge of SQL to run Machine Learning algorithms. 

  • No coding
  • No compiling
  • No debugging

Apache HiveMall algorithms are hidden behind Hive UDFs. This allows end user to use SQL and only SQL to apply Machine Learning algorithms to a very large volume of training data.

Apache HiveMall Machine Learning Library makes training, testing, and model evaluation easy and accessible to a much wider community of business experts than ever before.

Want to be a Big Data Scientist?

Should you pursue a career in Data Science? Data Science basics, process, team, roles, skills, transition, opportunities

Created by V2 Maestros, LLC - Big Data / Data Science Experts | 50K+ students

"]

Students: 12503, Price: $19.99

Students: 12503, Price:  Paid

"Data Science is the sexiest job of the 21st century - It has exciting work and incredible pay". You have been hearing about this a lot. You try to get more information on this and start querying and browsing. You get so many definitions, requirements, predictions, opinions and recommendations. At the end, you are perplexed. And you ask - "What exactly is this field all about? Is it a good career option for me?"

**** Please note: This is a career advice course, not a technology course.

Data Science has been growing exponentially in the last 5 years. It is also a hybrid field that requires multiple skills and training. We have been training students in Data Science. A number of them committed to training without realizing what it really is. Some were happy, some adjusted their expectations and some regretted committing too soon. We felt that professionals thinking of getting into Data Science needed a primer in what this field is all about. Hence, we came up with this course.

Through this course, you will learn about

  • Data Science goals and concepts
  • Process Flow
  • Roles and Responsibilities
  • Where you will fit in to a Data Science team.
  • Building a transition plan

Getting into the Data Science field involves significant investment of time. Learn about the field in order to make an informed decision.

Apache Spark Hands on Specialization for Big Data Analytics

In-depth course to master Apache Spark Development using Scala for Big Data (with 30+ real-world & hands-on examples)

Created by Irfan Elahi - Data Scientist in the world's largest consultancy firm

"]

Students: 12288, Price: $99.99

Students: 12288, Price:  Paid

What if you could catapult your career in one of the most lucrative domains i.e. Big Data by learning the state of the art Hadoop technology (Apache Spark) which is considered mandatory in all of the current jobs in this industry?

What if you could develop your skill-set in one of the most hottest Big Data technology i.e. Apache Spark by learning in one of the most comprehensive course  out there (with 10+ hours of content) packed with dozens of hands-on real world examples, use-cases, challenges and best-practices?

What if you could learn from an instructor who is working in the world's largest consultancy firm, has worked, end-to-end, in Australia's biggest Big Data projects to date and who has a proven track record on Udemy with highly positive reviews and thousands of students already enrolled in his previous course(s)?

If you have such aspirations and goals, then this course and you is a perfect match made in heaven!

Why Apache Spark?

Apache Spark has revolutionised and disrupted the way big data processing and machine learning were done by virtue of its unprecedented in-memory and optimised computational model. It has been unanimously hailed as the future of Big Data. It's the tool of choice all around the world which allows data scientists, engineers and developers to acquire and process data for a number of use-cases like scalable machine learning, stream processing and graph analytics to name a few. All of the leading organisations like Amazon, Ebay, Yahoo among many others have embraced this technology to address their Big Data processing requirements. 

Additionally, Gartner has repeatedly highlighted Apache Spark as a leader in Data Science platforms. Certification programs of Hadoop vendors like Cloudera and Hortonworks, which have high esteem in current industry, have oriented their curriculum to focus heavily on Apache Spark. Almost all of the jobs in Big Data and Machine Learning space demand proficiency in Apache Spark. 

This is what John Tripier, Alliances and Ecosystem Lead at Databricks has to say, “The adoption of Apache Spark by businesses large and small is growing at an incredible rate across a wide range of industries, and the demand for developers with certified expertise is quickly following suit”.

All of these facts correlate to the notion that learning this amazing technology will give you a strong competitive edge in your career.

Why this course?

Firstly, this is the most comprehensive and in-depth course ever produced on Apache Spark. I've carefully and critically surveyed all of the resources out there and almost all of them fail to cover this technology in the depth that it truly deserves. Some of them lack coverage of Apache Spark's theoretical concepts like its architecture and how it works in conjunction with Hadoop, some fall short in thoroughly describing how to use Apache Spark APIs optimally for complex big data problems, some ignore the hands-on aspects to demonstrate how to do Apache Spark programming to work on real-world use-cases and almost all of them don't cover the best practices in industry and the mistakes that many professionals make in field.

This course addresses all of the limitations that's prevalent in the currently available courses. Apart from that, as I have attended trainings from leading Big Data vendors like Cloudera (for which they charge thousands of dollars), I've ensured that the course is aligned with the educational patterns and best practices followed in those training to ensure that you get the best and most effective learning experience. 

Each section of the course covers concepts in extensive detail and from scratch so that you won't find any challenges in learning even if you are new to this domain. Also, each section will have an accompanying assignment section where we will work together on a number of real-world challenges and use-cases employing real-world data-sets. The data-sets themselves will also belong to different niches ranging from retail, web server logs, telecommunication and some of them will also be from Kaggle (world's leading Data Science competition platform).

The course leverages Scala instead of Python. Even though wherever possible, reference to Python development is also given but the course is majorly based on Scala. The decision was made based on a number of rational factors. Scala is the de-facto language for development in Apache Spark. Apache Spark itself is developed in Scala and as a result all of the new features are initially made available in Scala and then in other languages like Python. Additionally, there is significant performance difference when it comes to using Apache Spark with Scala compared to Python. Scala itself is one of the most highest paid programming languages and you will be developing strong skill in that language along the way as well.

The course also has a number of quizzes to further test your skills. For further support, you can always ask questions to which you will get prompt response. I will also be sharing best practices and tips on regular basis with my students.

What you are going to learn in this course?

The course consists
of majorly two sections:

  • Section - 1:

We'll start off with
the introduction of Apache Spark and will understand its potential and business
use-cases in the context of overall Hadoop ecosystem. We'll then focus on how
Apache Spark actually works and will take a deep dive of the architectural components
of Spark as its crucial for thorough understanding.

  • Section  - 2:

After developing
understanding of Spark architecture, we will move to the next section of this
course where we will employ Scala language to use Apache Spark APIs to develop
distributed computation programs. Please note that you don't need to have prior
knowledge of Scala for this course as I will start with the very basics of
Scala and as a result you will also be developing your skills in this one of
the highest paying programming languages.

In this section, We
will comprehensively understand how spark performs distributed computation
using abstractions like RDDs, what are the caveats
in loading data in Apache Spark, what are the
different ways to create RDDs and how to leverage parallelism and much more.

Furthermore, as
transformations and action constitute the gist of Apache Spark APIs thus its
imperative to have sound understanding of these. Thus, we will then
focus on a number of Spark transformations and Actions that are heavily being
used in Industry and will go into detail of each. Each API usage will be
complimented with a series of real-world examples and datasets e.g. retail, web
server logs, customer churn and also from kaggle. Each section of the course
will have a number of assignments where you will be able to practically apply
the learned concepts to further consolidate your skills.

A significant
section of the course will also be dedicated to key value RDDs which form the
basis of working optimally on a number of big data problems.

In addition to
covering the crux of Spark APIs, I will also highlight a number of valuable
best practices based on my experience and exposure and will also intuit on
mistakes that many people do in field. You will rarely such information
anywhere else.

Each topic will be
covered in a lot of detail with strong emphasis on being hands-on thus ensuring
that you learn Apache Spark in the best possible way.

The course is
applicable and valid for all versions of Spark i.e. 1.6 and 2.0.

After completing
this course, you will develop a strong foundation and extended skill-set to use
Spark on complex big data processing tasks. Big data is one of the most
lucractive career domains where data engineers claim salaries in high numbers.
This course will also substantially help in your job interviews. Also, if you
are looking to excel further in your big data career, by passing Hadoop
certifications
like of Cloudera and Hortonworks, this course will prove to be
extremely helpful in that context as well.

Lastly, once enrolled, you will have life-time access to the lectures and resources. Its a self-paced course and you can watch lecture videos on any device like smartphone or laptop. Also, you are backed by Udemy's rock-solid 30 days money back guarantee. So if you are serious about learning about learning Apache Spark, enrol in this course now and lets start this amazing journey together!

Hadoop Developer In Real World

Free Cluster Access * HDFS * MapReduce * YARN * Pig * Hive * Flume * Sqoop * AWS * EMR * Optimization * Troubleshooting

Created by Hadoop In Real World - Expert Big Data Consultants

"]

Students: 7984, Price: $199.99

Students: 7984, Price:  Paid

From the creators of the successful Hadoop Starter Kit course hosted in Udemy, comes Hadoop In Real World course. This course is designed for anyone who aspire a career as a Hadoop developer. In this course we have covered all the concepts that every aspiring Hadoop developer must know to SURVIVE in REAL WORLD Hadoop environments.

The course covers all the must know topics like HDFS, MapReduce, YARN, Apache Pig and Hive etc. and we go deep in exploring the concepts. We just don’t stop with the easy concepts, we take it a step further and cover important and complex topics like file formats, custom Writables, input/output formats, troubleshooting, optimizations etc.

All concepts are backed by interesting hands-on projects like analyzing million song dataset to find less familiar artists with hot songs, ranking pages with page dumps from wikipedia, simulating mutual friends functionality in Facebook just to name a few.

Learn By Example: Hadoop, MapReduce for Big Data problems

A hands-on workout in Hadoop, MapReduce and the art of thinking "parallel"

Created by Loony Corn - An ex-Google, Stanford and Flipkart team

"]

Students: 7459, Price: $99.99

Students: 7459, Price:  Paid

Taught by a 4 person team including 2 Stanford-educated, ex-Googlers  and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with Java and with billions of rows of data. 

This course is a zoom-in, zoom-out, hands-on workout involving Hadoop, MapReduce and the art of thinking parallel. 

Let’s parse that.

Zoom-in, Zoom-Out:  This course is both broad and deep. It covers the individual components of Hadoop in great detail, and also gives you a higher level picture of how they interact with each other. 

Hands-on workout involving Hadoop, MapReduce : This course will get you hands-on with Hadoop very early on.  You'll learn how to set up your own cluster using both VMs and the Cloud. All the major features of MapReduce are covered - including advanced topics like Total Sort and Secondary Sort. 

The art of thinking parallel: MapReduce completely changed the way people thought about processing Big Data. Breaking down any problem into parallelizable units is an art. The examples in this course will train you to "think parallel". 

What's Covered:

Lot's of cool stuff ..

  • Using MapReduce to 

    • Recommend friends in a Social Networking site: Generate Top 10 friend recommendations using a Collaborative filtering algorithm. 
    • Build an Inverted Index for Search Engines: Use MapReduce to parallelize the humongous task of building an inverted index for a search engine. 
    • Generate Bigrams from text: Generate bigrams and compute their frequency distribution in a corpus of text. 

  • Build your Hadoop cluster: 

    • Install Hadoop in Standalone, Pseudo-Distributed and Fully Distributed modes 
    • Set up a hadoop cluster using Linux VMs.
    • Set up a cloud Hadoop cluster on AWS with Cloudera Manager.
    • Understand HDFS, MapReduce and YARN and their interaction 

  • Customize your MapReduce Jobs: 

    • Chain multiple MR jobs together
    • Write your own Customized Partitioner
    • Total Sort : Globally sort a large amount of data by sampling input files
    • Secondary sorting 
    • Unit tests with MR Unit
    • Integrate with Python using the Hadoop Streaming API


.. and of course all the basics: 

  • MapReduce : Mapper, Reducer, Sort/Merge, Partitioning, Shuffle and Sort
  • HDFS & YARN: Namenode, Datanode, Resource manager, Node manager, the anatomy of a MapReduce application, YARN Scheduling, Configuring HDFS and YARN to performance tune your cluster. 

A Big Data Hadoop and Spark project for absolute beginners

Data Engineering, Spark, Hive, Python, PySpark, Scala, Coding framework, Testing, IntelliJ, Maven, Glue, Streaming,

Created by FutureX Skill - Big Data, Cloud and AI Solution Architects

"]

Students: 6798, Price: $29.99

Students: 6798, Price:  Paid

This course will prepare you for a real world Data Engineer role !

Get started with Big Data quickly leveraging free cloud cluster and solving a real world use case!  Learn Hadoop, Hive , Spark (both Python and Scala) from scratch!

Learn to code Spark Scala & PySpark like  a real world developer. Understand real world coding best practices, logging, error handling , configuration management using both Scala and Python.

Project

A bank is launching a new credit card and wants to identify prospects it can target in its marketing campaign.

It has received prospect data from various internal and 3rd party sources. The data has various issues such as missing or unknown values in certain fields. The data needs to be cleansed before any kind of analysis can be done.

Since the data is in huge volume with billions of records, the bank has asked you to use Big Data Hadoop and Spark technology to cleanse, transform and analyze this data.

What you will learn :

  • Big Data, Hadoop concepts

  • How to create a free Hadoop and Spark cluster using Google Dataproc

  • Hadoop hands-on - HDFS, Hive

  • Python basics

  • PySpark RDD - hands-on

  • PySpark SQL, DataFrame - hands-on

  • Project work using PySpark and Hive

  • Scala basics

  • Spark Scala DataFrame

  • Project work using Spark Scala

  • Spark Scala Real world coding framework and development using Winutil, Maven and IntelliJ.

  • Python Spark Hadoop Hive coding framework and development using PyCharm

  • Building a data pipeline using Hive , PostgreSQL, Spark

  • Logging , error handling and unit testing of PySpark and Spark Scala applications

  • Spark Scala Structured Streaming

  • Applying spark transformation on data stored in AWS S3 using Glue and viewing data using Athena

Prerequisites :

  • Some basic programming skills

  • Some knowledge of SQL queries

ClickHouse crash course. Conquer big data with ease

Learn how to use one of the most powerful open source OLAP database on the market. Put new life in your big data.

Created by Viktor Dashkov - Software Developer and Fitness Geek

"]

Students: 5819, Price: Free

Students: 5819, Price:  Free

Has your data grown too much?

Do you have to wait forever to get even simple answers from your system?

Do you just want to explore your data in real time while it’s actually relevant and not 30 minutes later when nobody cares anymore?

Do you want your dev team to work on features and not on the infrastructure?

Then you’ve come to the right place. ClickHouse is a new technology that addresses all of the pain points above.

ClickHouse was designed to be very, very fast. And it is.

What is more, it’s extremely rigid, and it fails only in extreme circumstances.

Put ease of installation and maintenance on top, and you get nearly ideal solution for most OLAP use cases.

How can I help?

Together we’ll explore main functionality of ClickHouse, and we will develop tools and skills to incorporate and manage this database in existing and future systems.

We are going to have lots of fun along the way, because technology should be fun, and with the tools like ClickHouse it is.

Some of the topics we’ll cover:

  • ClickHouse Installation

  • External dictionaries

  • Arrays

  • Sampling

  • Aggregation

  • Cluster Configuration

You'll find lots of code snippets and supplementary material inside the course to help you master even the hardest topics.

At the end of the course, you’ll be able to confidently use ClickHouse in production.

You’ll get familiar with the main features and quirks of the database, as well as some edge cases you might encounter

Is it for me?

If you are an IT pro with specific OLAP needs, or just a DEVOPs looking for a new great technology, then my answer is yes.

All you need is basic knowledge of SQL and Docker

ClickHouse will make the rest a breeze.

Can't wait to see you inside!

Spark 3.0 & Big Data Essentials with Scala | Rock the JVM

Now with Spark 3.0: Learn practical Big Data with Spark DataFrames, Datasets, RDDs and Spark SQL, hands-on

Created by Daniel Ciocîrlan - Software Engineer & Best-Selling Instructor

"]

Students: 5707, Price: $49.99

Students: 5707, Price:  Paid

UPDATED FOR SPARK 3.0

In this course, we will learn how to write big data applications with Apache Spark 3 and Scala. You'll write 2000+ lines of Spark code yourself, with guidance, and you will become a rockstar.

This course is for Scala programmers who are getting started with Apache Spark and big data. The course is not for advanced Spark engineers.

Why Spark in Scala:

  • it's blazing fast for big data

  • its demand has exploded

  • it's a highly marketable skill

  • it's well maintained, with dozens of high-quality extensions

  • it's a foundation for a data scientist

I like to get to the point and get things done. This course 

  1. deconstructs all concepts into the critical pieces you need

  2. selects the most important ideas and separates them into what's simple but critical and what's powerful

  3. sequences ideas in a way that "clicks" and makes sense throughout the process of learning

  4. applies everything in live code

The end benefits are still much greater:

  • a completely new mental model around data processing

  • significantly more marketable resume

  • more enjoyable work - Spark is fun!

This course is for established programmers with experience with Scala and with functional programming at the level of the Rock the JVM Scala beginners course. I already assume a solid understanding of general programming fundamentals.

This course is NOT for you if

  • you've never written Scala code before

  • you don't have some essential parallel programming background (e.g. what's a process/a thread)

The course is comprehensive, but you'll always see me get straight to the point. So make sure you have a good level of focus and commitment to become a badass programmer.

I believe both theory and practice are important. That's why you'll get lectures with code examples, real life code demos and assignments, plus additional resources, instructions, exercises and solutions. At the end of the course, you'll have written thousands of lines of Spark.

I've seen that my students are most successful - and my best students work at Google-class companies - when they're guided, but not being told what to do. I have exercises waiting for you, where I offer my (opinionated) guidance but otherwise freedom to experiment and improve upon your code.

Definitely not least, my students are most successful when they have fun along the way!

So join me in this course and let's rock the JVM!

Master Big Data – Apache Spark/Hadoop/Sqoop/Hive/Flume

In-depth course on Big Data - Apache Spark , Hadoop , Sqoop , Flume & Apache Hive, Big Data Cluster setup

Created by Navdeep Kaur - TechnoAvengers.com (Founder)

"]

Students: 5404, Price: $29.99

Students: 5404, Price:  Paid

In this course, you will start by learning what is hadoop distributed file system and most common hadoop commands required to work with Hadoop File system.

Then you will be introduced to Sqoop Import

  • Understand lifecycle of sqoop command.

  • Use sqoop import command to migrate data from Mysql to HDFS.

  • Use sqoop import command to migrate data from Mysql to Hive.

  • Use various file formats, compressions, file delimeter,where clause and queries while importing the data.

  • Understand split-by and boundary queries.

  • Use incremental mode to migrate the data from Mysql to HDFS.

Further, you will learn Sqoop Export to migrate data.

  • What is sqoop export

  • Using sqoop export, migrate data from HDFS to Mysql.

  • Using sqoop export, migrate data from Hive to Mysql.

Further, you will learn about Apache Flume

  • Understand Flume Architecture.

  • Using flume, Ingest data from Twitter and save to HDFS.

  • Using flume, Ingest data from netcat and save to HDFS.

  • Using flume, Ingest data from exec and show on console.

  • Describe flume interceptors and see examples of using interceptors.

  • Flume multiple agents

  • Flume Consolidation.

In the next section, we will learn about Apache Hive

  • Hive Intro

  • External & Managed Tables

  • Working with Different Files - Parquet,Avro

  • Compressions

  • Hive Analysis

  • Hive String Functions

  • Hive Date Functions

  • Partitioning

  • Bucketing

Finally You will learn about Apache Spark

  • Spark Intro

  • Cluster Overview

  • RDD

  • DAG/Stages/Tasks

  • Actions & Transformations

  • Transformation & Action Examples

  • Spark Data frames

  • Spark Data frames - working with diff File Formats & Compression

  • Dataframes API's

  • Spark SQL

  • Dataframe Examples

  • Spark with Cassandra Integration

Big Data Analyst -using Sqoop and Advance Hive (CCA159)

Become Big data Analyst using Hive and Sqoop.Great course for business Analyst,Testers and Sql Developers.CCA159

Created by Navdeep Kaur - TechnoAvengers.com (Founder)

"]

Students: 4956, Price: $19.99

Students: 4956, Price:  Paid

You will start by learning what is Hadoop &  Hadoop distributed file system and most common hadoop commands required to work with Hadoop File system

Then you will be introduced to Sqoop Import

  • Understand lifecycle of sqoop command.

  • Use sqoop import command to migrate data from Mysql to HDFS.

  • Use sqoop import command to migrate data from Mysql to Hive.

  • Use various file formats, compressions, file delimeter,where clause and queries while importing the data.

  • Understand split-by and boundary queries.

  • Use incremental mode to migrate the data from Mysql to HDFS.

Further, you will learn Sqoop Export to migrate data.

  • What is sqoop export

  • Using sqoop export, migrate data from HDFS to Mysql.

  • Using sqoop export, migrate data from Hive to Mysql.

Finally, we will start with Apache Hive [Advance]

  • Hive Intro

  • External & Managed Tables

  • Insert & Multi Insert

  • Data Types & Complex Data Types

  • Collection Function

  • Conditional Function

  • Hive String Functions

  • Hive Date Functions

  • Mathematical Function

  • Hive Analysis

  • Alter Command

  • Joins, Multi Joins & Map Joins

  • Working with Different Files - Parquet,Avro

  • Compressions

  • Partitioning

  • Bucketing

  • Views

  • Lateral Views/Explode

  • Windowing Functions - Rank/Dense Rank/lead/lag/min/max

  • Window Specification

Architecting Big Data Solutions

How to architect big data solutions by assembling various big data technologies - modules and best practices

Created by V2 Maestros, LLC - Big Data / Data Science Experts | 50K+ students

"]

Students: 4837, Price: $139.99

Students: 4837, Price:  Paid

The Big Data phenomenon is sweeping across the IT landscape. New technologies are born, new ways of analyzing data are created and new business revenue streams are discovered every day. If you are in the IT field, Big data should already be impacting you in some way. 

Building Big Data solutions is radically different from how traditional software solutions were built. You cannot take what you learnt in the traditional data solutions world and apply them verbatim to Big Data solutions. You need to understand the unique problem characteristics that drive Big Data and also become familiar with the unending technology options available to solve them.

This course will show you how Big Data solutions are built by stitching together big data technologies. It explains the modules in a Big Data pipeline, options available for each module and the Advantages, short comings and use cases for each option.

This course is great interview preparation resource for Big Data ! Any one - fresher or experienced should take this course.

Note: This is a theory course. There is no source code/ programming included.

Apache Beam | A Hands-On course to build Big data Pipelines

Build Big data pipelines with Apache Beam in any language and run it via Spark, Flink, GCP (Google Cloud Dataflow).

Created by J Garg - Real Time Learning - Hadoop Trainer

"]

Students: 4384, Price: $49.99

Students: 4384, Price:  Paid

Apache Beam is a unified and portable programming model for both Batch and Streaming use cases.

Earlier we could run Spark, Flink & Cloud Dataflow Jobs only on their respective clusters. But now Apache Beam has come up with a portable programming model where we can build language agnostic Big data pipelines and run it using any Big data engine (Apache Spark, Flink or in Google Cloud Platform using its Cloud Dataflow and many more Big data engines).

Apache Beam is the future of building Big data processing pipelines and is going to be accepted by mass companies due to its portability. Many big companies have even started deploying Beam pipelines in their production servers.

What's included in the course ?

  • Complete Apache Beam concepts explained from Scratch to Real-Time implementation.

  • Each and every Apache Beam concept is explained with a HANDS-ON example of it.

  • Include even those concepts, the explanation to which is not very clear even in Apache Beam's official documentation.

  • Build 2 Real-time Big data case studies using Beam.

  • Load data to Google BigQuery Tables from Beam pipeline.

  • Codes and Datasets used in lectures are attached in the course for your convenience.

Hadoop MAPREDUCE in Depth | A Real-Time course on Mapreduce

A to Z of Hadoop Mapreduce - From Scratch to Real Time Implementation of Mapreduce by HANDS-ON Coding of every component

Created by J Garg - Real Time Learning - Hadoop Trainer

"]

Students: 3716, Price: $24.99

Students: 3716, Price:  Paid

Mapreduce framework is closest to Hadoop in terms of processing Big data. It is considered as atomic processing unit in Hadoop and that is why it is never going to be obsolete.

Knowing only basics of MapReduce (Mapper, Reducer etc) is not at all sufficient to work in any Real-time Hadoop Mapreduce project of companies. These basics are just tip of the iceberg in Mapreduce programming. Real-time Mapreduce is way more than that. In Live Big data projects we have to override lot many default implementations of Mapreduce framework to make them work according to our requirements.

This course is an answer to the question "What concepts of Hadoop Mapreduce are used in Live Big data projects and How to implement them in a program ?" To answer this, every Mapreduce concept in the course is explained practically via a Mapreduce program.

Every lecture in this course is explained in 2 Steps.

Step 1 : Explanation of a Hadoop component  | Step 2 : Practicals - How to implement that component in a MapReduce program.

The overall inclusions and benefits of this course:

  • Complete Hadoop Mapreduce explained from scratch to Real-Time implementation.

  • Each and Every Hadoop concept is backed by a HANDS-ON Mapreduce code.

  • Advance level Mapreduce concepts which are even not available on Internet.

  • For non Java backgrounder's help, All Mapreduce Java codes are explained line by line in such a way that even a non technical person can understand.

  • Mapreduce codes and Datasets used in lectures are attached for your convenience. 

  • Includes a section 'Case Studies' that are asked generally in Hadoop Interviews.