Best Hbase Courses

Find the best online Hbase Courses for you. The courses are sorted based on popularity and user ratings. We do not allow paid placements in any of our rankings. We also have a separate page listing only the Free Hbase Courses.

The Ultimate Hands-On Hadoop: Tame your Big Data!

Hadoop tutorial with MapReduce, HDFS, Spark, Flink, Hive, HBase, MongoDB, Cassandra, Kafka + more! Over 25 technologies.

Created by Sundog Education by Frank Kane - Founder, Sundog Education. Machine Learning Pro


Students: 135248, Price: $109.99

Students: 135248, Price:  Paid

The world of Hadoop and "Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this Hadoop tutorial, you'll not only understand what those systems are and how they fit together - but you'll go hands-on and learn how to use them to solve real business problems!

Learn and master the most popular big data technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We'll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.

  • Install and work with a real Hadoop installation right on your desktop with Hortonworks (now part of Cloudera) and the Ambari UI

  • Manage big data on a cluster with HDFS and MapReduce

  • Write programs to analyze data on Hadoop with Pig and Spark

  • Store and query your data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto

  • Design real-world systems using the Hadoop ecosystem

  • Learn how your cluster is managed with YARN, Mesos, Zookeeper, Oozie, Zeppelin, and Hue

  • Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm

Understanding Hadoop is a highly valuable skill for anyone working at companies with large amounts of data.

Almost every large company you might want to work at uses Hadoop in some way, including Amazon, Ebay, Facebook, Google, LinkedIn, IBM,  Spotify, Twitter, and Yahoo! And it's not just technology companies that need Hadoop; even the New York Times uses Hadoop for processing images.

This course is comprehensive, covering over 25 different technologies in over 14 hours of video lectures. It's filled with hands-on activities and exercises, so you get some real experience in using Hadoop - it's not just theory.

You'll find a range of activities in this course for people at every level. If you're a project manager who just wants to learn the buzzwords, there are web UI's for many of the activities in the course that require no programming knowledge. If you're comfortable with command lines, we'll show you how to work with them too. And if you're a programmer, I'll challenge you with writing real scripts on a Hadoop system using Scala, Pig Latin, and Python.

You'll walk away from this course with a real, deep understanding of Hadoop and its associated distributed systems, and you can apply Hadoop to real-world problems. Plus a valuable completion certificate is waiting for you at the end! 

Please note the focus on this course is on application development, not Hadoop administration. Although you will pick up some administration skills along the way.

Knowing how to wrangle "big data" is an incredibly valuable skill for today's top tech employers. Don't be left behind - enroll now!

  • "The Ultimate Hands-On Hadoop... was a crucial discovery for me. I supplemented your course with a bunch of literature and conferences until I managed to land an interview. I can proudly say that I landed a job as a Big Data Engineer around a year after I started your course. Thanks so much for all the great content you have generated and the crystal clear explanations. " - Aldo Serrano

  • "I honestly wouldn’t be where I am now without this course. Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment.   This course helped me achieve a far greater understanding of the environment and its capabilities.  Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment." - Tyler Buck

AWS Certified Data Analytics Specialty 2021 – Hands On!

Practice exam included! AWS DAS-C01 certification prep course with exercises. Kinesis, EMR, DynamoDB, Redshift and more!

Created by Sundog Education by Frank Kane - Founder, Sundog Education. Machine Learning Pro


Students: 49634, Price: $109.99

Students: 49634, Price:  Paid

[v2021: The course has been fully updated for the new AWS Certified Data Analytics -Specialty DAS-C01 exam (including new coverage of Glue DataBrew, Elastic Views, Glue Studio, and AWS Lake Formation), and will be kept up-to-date all of 2021. Optional content for the previous AWS Certified Big Data - Speciality BDS-C01 exam remains as well as an appendix. Happy learning! ]

The AWS Certified Data Analytics Specialty Exam is one of the most challenging certification exams you can take from Amazon. Passing it tells employers in no uncertain terms that your knowledge of big data systems is wide and deep. But, even experienced technologists need to prepare heavily for this exam. This course sets you up for success, by covering all of the big data technologies on the exam and how they fit together.

Best-selling Udemy instructors Frank Kane and Stéphane Maarek have teamed up to deliver the most comprehensive and hands-on prep course we've seen. Together, they've taught over 500,000 people around the world. This course combines Stéphane's depth on AWS with Frank's experience in Big Data, gleaned during his 9-year career at Amazon itself. Both Frank and Stéphane have taken and passed the exam themselves on the first try.

The world of data analytics on AWS includes a dizzying array of technologies and services. Just a sampling of the topics we cover in-depth are:

  • Streaming massive data with AWS Kinesis

  • Queuing messages with Simple Queue Service (SQS)

  • Wrangling the explosion data from the Internet of Things (IOT)

  • Transitioning from small to big data with the AWS Database Migration Service (DMS)

  • Storing massive data lakes with the Simple Storage Service (S3)

  • Optimizing transactional queries with DynamoDB

  • Tying your big data systems together with AWS Lambda

  • Making unstructured data query-able with AWS Glue, Glue ETL, Glue DataBrew, Glue Studio, and Lake Formation

  • Processing data at unlimited scale with Elastic MapReduce, including Apache Spark, Hive, HBase, Presto, Zeppelin, Splunk, and Flume

  • Applying neural networks at massive scale with Deep Learning, MXNet, and Tensorflow

  • Applying advanced machine learning algorithms at scale with Amazon SageMaker

  • Analyzing streaming data in real-time with Kinesis Analytics

  • Searching and analyzing petabyte-scale data with Amazon Elasticsearch Service

  • Querying S3 data lakes with Amazon Athena

  • Hosting massive-scale data warehouses with Redshift and Redshift Spectrum

  • Integrating smaller data with your big data, using the Relational Database Service (RDS) and Aurora

  • Visualizing your data interactively with Quicksight

  • Keeping your data secure with encryption, KMS, HSM, IAM, Cognito, STS, and more

Throughout the course, you'll have lots of opportunities to reinforce your learning with hands-on exercises and quizzes. And when you're done, this course includes a practice exam that's very similar to the real exam in difficulty, length, and style - so you'll know if you're ready before you invest in taking it. We'll also arm you with some valuable test-taking tips and strategies along the way.

Data analytics is an advanced certification, and it's best tackled by students who have already obtained associate-level certification in AWS and have some real-world industry experience. This exam is not intended for AWS beginners.

You want to go into the AWS Certified Data Analytics Specialty Exam with confidence, and that's what this course delivers. Hit the enroll button, and we're excited to see you in the course... and ultimately to see you get your certification!

GCP: Complete Google Data Engineer and Cloud Architect Guide

The Google Cloud for ML with TensorFlow, Big Data with Managed Hadoop

Created by Loony Corn - An ex-Google, Stanford and Flipkart team


Students: 44128, Price: $99.99

Students: 44128, Price:  Paid

This course is a really comprehensive guide to the Google Cloud Platform - it has ~25 hours of content and ~60 demos.

The Google Cloud Platform is not currently the most popular cloud offering out there - that's AWS of course - but it is possibly the best cloud offering for high-end machine learning applications. That's because TensorFlow, the super-popular deep learning technology is also from Google.

What's Included:

  • Compute and Storage - AppEngine, Container Enginer (aka Kubernetes) and Compute Engine
  • Big Data and Managed Hadoop - Dataproc, Dataflow, BigTable, BigQuery, Pub/Sub 
  • TensorFlow on the Cloud - what neural networks and deep learning really are, how neurons work and how neural networks are trained.
  • DevOps stuff - StackDriver logging, monitoring, cloud deployment manager
  • Security - Identity and Access Management, Identity-Aware proxying, OAuth, API Keys, service accounts
  • Networking - Virtual Private Clouds, shared VPCs, Load balancing at the network, transport and HTTP layer; VPN, Cloud Interconnect and CDN Interconnect
  • Hadoop Foundations: A quick look at the open-source cousins (Hadoop, Spark, Pig, Hive and HBase)

Big Data Hadoop and Spark with Scala

Complete course (No Prerequisites) - Big Data Hadoop with Spark and Eco system

Created by Harish Masand - Technical Lead


Students: 31263, Price: $124.99

Students: 31263, Price:  Paid

This course will make you ready to switch career on big data hadoop and spark.

After this watching this, you will understand about Hadoop, HDFS, YARN, Map reduce, python, pig, hive, oozie, sqoop, flume, HBase, No SQL, Spark, Spark sql, Spark Streaming.

This is the one stop course. so dont worry and just get started. 

You will get all possible support from my side.

For any queries, feel free to message me here.

Note: All programs and materials are provided.

Database Engineer/DBA – (PostgreSQL, IBM-DB2, MariaDB,NoSQL)

Practical Database Engineering | Administration

Created by Bluelime Learning Solutions - Learning made simple


Students: 29100, Price: $74.99

Students: 29100, Price:  Paid

Database engineers design new databases based on company needs, data storage needs, and the number of users accessing the database. Monitor Databases and Programs. Database engineers continuously monitor databases and related systems to ensure high functionality.

Database administrator ensures that data is available, protected from loss and corruption, and easily accessible as needed. Also they oversee the creation, maintenance and security of your databases. manages, backs up and ensures the availability of the data.

PostgreSQL is commonly known as Postgres and it is also  open source database.  PostgreSQL is a powerful, open source object-relational database system.

IBM Db2 is a family of related data management products,including relational database servers, developed and marketed by IBM.

MariaDB is great for its open-source innovation and enterprise-grade reliability, as well as its modern relational database.

SQL -Structured Query Language is an internationally recognized language used to communicate and manipulate various database systems.

Data is everywhere and growing at a rapid rate. Most Software application we interact with daily deals with stored data . From our interaction with our banks to social media applications like Facebook and Instagram..

Due to the relevance and dependency on data , professionals that are skilled in SQL are always in high demand  to help interact with various databases to help business intelligence and other sectors that relies on data.

NoSQL (Non-SQL or Not-only-SQL) databases are increasing in popularity due to the growth of data as they can store non-relational data on a super large scale, and can solve problems regular databases can't handle. They are widely used in Big data operations. Their main advantage is the ability to handle large data sets  effectively  as well as scalability and flexibility issues for modern applications.

Big Data Complete Course

Learn HDFS, Spark, Kafka, Machine Learning, Hadoop, Hadoop MapReduce, Cassandra, CAP, Predictive Analytics and much more

Created by Edcorner Learning - Edcredibly - Be Incredible


Students: 26187, Price: $89.99

Students: 26187, Price:  Paid

Big data is a combination of structured, semi structured and unstructured data collected by organizations that can be mined for information and used in machine learning projects, predictive modelling and other advanced analytics applications.

Systems that process and store big data have become a common component of data management architectures in organizations, combined with tools that support big data analytics uses. Big data is often characterized by the three V's:

  • the large volume of data in many environments;

  • the wide variety of data types frequently stored in big data systems; and

  • the velocity at which much of the data is generated, collected and processed.

Big data is a great quantity of diverse information that arrives in increasing volumes and with ever-higher velocity.

Big data can be structured (often numeric, easily formatted and stored) or unstructured (more free-form, less quantifiable).

Nearly every department in a company can utilize findings from big data analysis but handling its clutter and noise can pose problems.

Big data can be collected from publicly shared comments on social networks and websites, voluntarily gathered from personal electronics and apps, through questionnaires, product purchases, and electronic check-ins.

Big data is most often stored in computer databases and is analysed using software specifically designed to handle large, complex data sets.

Topics Covered in these course are:

  • Big Data Enabling Technologies

  • Hadoop Stack for Big Data

  • Hadoop Distributed File System (HDFS)

  • Hadoop MapReduce

  • MapReduce Examples

  • Spark

  • Parallel Programming with Spark

  • Spark Built-in Libraries

  • Data Placement Strategies

  • Data Placement Strategies

  • Design of Zookeeper

  • CQL (Cassandra Query Language)

  • Design of HBase

  • Spark Streaming and Sliding Window Analytics

  • Kafka

  • Big Data Machine Learning

  • Machine Learning Algorithm K-means using Map Reduce for Big Data Analytics

  • Parallel K-means using Map Reduce on Big Data Cluster Analysis

  • Decision Trees for Big Data Analytics

  • Big Data Predictive Analytics

  • PageRank Algorithm in Big Data

  • Spark GraphX & Graph Analytics

  • Case Studies of big companies and how they operate.

Learn Big Data: The Hadoop Ecosystem Masterclass

Master the Hadoop ecosystem using HDFS, MapReduce, Yarn, Pig, Hive, Kafka, HBase, Spark, Knox, Ranger, Ambari, Zookeeper

Created by Edward Viaene - DevOps, Cloud, Big Data Specialist


Students: 22124, Price: $39.99

Students: 22124, Price:  Paid

Important update: Effective January 31, 2021, all Cloudera software will require a valid subscription and only be accessible via the paywall. The sandbox can still be downloaded, but the full install requires a Cloudera subscription to get access to the yum repository.

In this course you will learn Big Data using the Hadoop Ecosystem. Why Hadoop? It is one of the most sought after skills in the IT industry. The average salary in the US is $112,000 per year, up to an average of $160,000 in San Fransisco (source: Indeed).

The course is aimed at Software Engineers, Database Administrators, and System Administrators that want to learn about Big Data. Other IT professionals can also take this course, but might have to do some extra research to understand some of the concepts.

You will learn how to use the most popular software in the Big Data industry at moment, using batch processing as well as realtime processing. This course will give you enough background to be able to talk about real problems and solutions with experts in the industry. Updating your LinkedIn profile with these technologies will make recruiters want you to get interviews at the most prestigious companies in the world.

The course is very practical, with more than 6 hours of lectures. You want to try out everything yourself, adding multiple hours of learning. If you get stuck with the technology while trying, there is support available. I will answer your messages on the message boards and we have a Facebook group where you can post questions.

Apache NiFi (Cloudera DataFlow) – Be an expert in 8 Hours

8+ Hours Beginners to Advance: Learn Real World Production Scenario and Become a Pro in Apache Data Flow Management

Created by Vikas Kumar Jha - Architect


Students: 16854, Price: $99.99

Students: 16854, Price:  Paid

Finally, A complete course on Apache NiFi is here. This course is going to discuss latest version of NiFi and Processors step by step.

With more than 3500+ students enrolled since the course went live Last Month. and has already become a bestseller now.  This course has 8+ Hours of comprehensive hands-on lessons which can take you through the journey of Data Flow Manager from Beginners to Advance Level. This course has latest version of Apache NiFi Processors covered.

Please feel free to ask any question you have during your journey of learning Apache NiFi. I would love to answer your queries.

A Quick Intro About NiFi:

Apache NiFi is an excellent open source software for automating and managing the data flows between various types of systems. It is a powerful and reliable system to process and distribute data. It provides a web-based User Interface for creating, monitoring, & controlling data flows. It has a highly configurable and modifiable data flow process that can modify data during executing also known as run time. It is easily extensible through the development of custom components.

Github Link to download the templates is updated in lecture - 5, resources.

Note: All my courses have 30 days money back guarantee, So don't hesitate to enroll and start your journey on Apache NiFi Data Flow Management.

Bash Shell Programming for Data Sciences: Animated

Innovative Project-based Animated Linux Command Line Masterclass: Bash Shell Programming Data Mining Science- 7.5 Hours

Created by Scientific Programmer™ Team - | Instructor Team


Students: 11528, Price: $19.99

Students: 11528, Price:  Paid


This awesome course is specifically designed to show you how to use the Linux commands and Bash shell programming to handle textual data which can be a csv format data or systems log file. In this course you will  learn Bash by doing projects. 

However, you need to understand the fact that Bash may not the best way to handle all kinds of data! But there often comes a time when you are provided with a pure Bash environment, such as what you get in the common Linux based Super-computers and you  just want an early result or view of the data before you drive into the real programming, using Python, R and SQL, SPSS, and so on. Expertise  in these data-intensive languages also comes at the price of spending a  lot of time on them.

In contrast, bash scripting is simple, easy to learn and perfect for mining textual data! Particularly if you deal with  genomics, microarrays, social networks, life sciences, and so on. It  can help you to quickly sort, search, match, replace, clean and optimise  various aspect of your data, and you wouldn’t need to go through any  tough learning curves. We strongly believe, learning and using Bash  shell scripting should be the first step if you want to say, Hello Big Data!

Also Featured on! popular Data Analytics Portals! Towards Data Science, Code Burst, Devto and so on.

This course starts with some practical bash-based flat file data mining projects involving:

  • University ranking data

  • Facebook data

  • AU Crime Data

  • Text Mining with Shakespeare-era Play and Poems

(Data sets and PDF text documentations are provided at the end of each section) + Free interactive playgrounds included!

If you haven’t used Bash before, feel free to skip the projects and get to  the tutorials part (supporting materials: eBook). Read the tutorials and then come back to the  projects again. The tutorial section will introduce with bash scripting,  regular expressions, AWK, sed, grep and so on. Students purchasing this course will receive free access to the interactive version (with Scientific code playgrounds) of this course from the Scientific Programming School (SCIENTIFIC PROGRAMMING IO). Based on your earlier feedback, we are introducing a Zoom live class lecture series on this course through which we will explain different aspects of Linux command line for Data analytics. Live classes will be delivered through the Scientific Programming School, which is an interactive and advanced e-learning platform for learning scientific coding.


When you enroll you will get lifetime access to all of the course contents and any updates and when you complete the course 100% you will also get a Certificate of completion that you can add to your resumé/CV to show off to the world your new-found Linux & Scientific Computing Mastery! So What are you Waiting For? Click that shiny enroll button and we'll See you inside. We created here a total of one university semester worth of knowledge (valued USD $2500-6000) into one single video course, and hence, it's a high-level overview.  Don't forget to join our Q&A live community where you can get free help anytime from other students and the instructor. This awesome course is a component of the Learn Scientific Computing master course.


"This is one of the best course I have reviewed in Udemy. All the chapters are very useful. The instructor explained exactly what you need  to use Bash as your data analysis tool in your pocket. I look forward more  coursed from this Instructor. The instructor is very experienced, explanations are  on point. Than you for creating a great course." -  Tarique Syed

"The instructor was very engaging. Changed a boring, hard-to-understand tool into something usable and easy-to-use, all the while making it fun to learn." - Prat Ram"Well done. Well - structured and explained course. Will definitely recommend the course to my course. From my point of view, everything was OK in the course." - Sem Milaserdov  "Overall, the course delivered what promised with a good resource for those who want to learn and do more. The course is filled with resource and the educator attached his own book on the subject for the learners." - Afshin Kalantari

"It's a very well organized course, from the background, basic Linux cli which everyone should be to build  data processing scenarios. wonderful class." - Charley Guan

From 0 to 1: The Cassandra Distributed Database

A complete guide to getting started with cluster management and queries on Cassandra

Created by Loony Corn - An ex-Google, Stanford and Flipkart team


Students: 11030, Price: $99.99

Students: 11030, Price:  Paid

Taught by a team which includes 2 Stanford-educated, ex-Googlers  and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with large-scale data processing. 

Has your data gotten huge, unwieldy and hard to manage with a traditional database? Is your data unstructured with an expanding list of attributes? Do you want to ensure your data is always available even with server crashes? Look beyond Hadoop - the Cassandra distributed database is the solution to your problems.

Let's parse that.

  • Huge, unwieldy data: This course helps your set up a cluster with multiple nodes to distribute data across machines
  • Unstructured: Cassandra is a columnar store. There are no empty cells or space wasted when you store data with variable and expanding attributes
  • Always available: Cassandra uses partitioning and replication to ensure that your data is available even when nodes in a cluster go down

What's included in this course:

  •  The Cassandra Cluster Manager (CCM) to set up and manage your cluster
  •  The Cassandra Query Language (CQL) to create keyspaces, column families, perform CRUD operations on column families and other administrative tasks
  • Designing primary keys and secondary indexes, partitioning and clustering keys
  • Restrictions on queries based on primary and secondary key design
  • Tunable consistency using quorum and local quorum. Read and write consistency in a node
  • Architecture and Storage components: Commit Log, MemTable, SSTables, Bloom Filters, Index File, Summary File and Data File
  • A real world project: A Miniature Catalog Management System using the Cassandra Java driver

Developer To Architect: Mastering Software Architecture

Learn Software & Solution Architecture for Architecting and Deploying Large-Scale, Highly-Available and Secure Systems

Created by NewTechWays - Anurag Yadav - Making knowledge available for everyone


Students: 7131, Price: $99.99

Students: 7131, Price:  Paid

Architecting software systems is a skill that is in huge demand, but it is not a readily available skill. To understand why this skill is rare to find, let's go through a few lines from Martin Fowler's blog on architecture.

He says: Architecture is about the important stuff. Whatever that is. It means that the heart of thinking architecturally about software is to decide what is important, (i.e. what is architectural), and then expend energy on keeping those architectural elements in good condition. For a developer to become an architect, they need to be able to recognize what elements are important, recognizing what elements are likely to result in serious problems should they not be controlled.

It takes a number of years for a developer to learn enough to become an architect. This learning largely depends on the kind of opportunities that you get in your career. Often these opportunities are limited to specific areas of work only.  However, to be an architect, you must possess extensive technical knowledge of as many areas as possible. You must understand all the complexities and challenges in different parts of a system. You need the ability to make upfront decisions by understanding various trade-offs. You should be able to foresee or anticipate critical problems that a system can face during its evolution.

This is where the 'Developer To Architect' course can be very useful for you. It assumes that you already have great development skills, and it builds from there. It extensively covers architecting non-functional properties of a system, handling of large-scale deployments, and internal working of popular open-source products for building software solutions.

To give you some details of what is specifically covered:

  • Architecting non-functional properties like Performance, Scalability, Reliability, Security. 

  • Large-scale deployment and operations using Docker containers and Kubernetes.

  • Internal working of popular open-source products like Node.js, Redis, Kafka, Cassandra, ELK stack, Hadoop, etc for confidently architecting software solutions.

In short, this course will help you learn everything you need to become a 'true' architect in a very short period of time.

CCA 131 – Cloudera Certified Hadoop and Spark Administrator

Prepare for CCA 131 by setting up cluster from scratch and performing tasks based on scenarios derived from curriculum.

Created by Durga Viswanatha Raju Gadiraju - Technology Adviser and Evangelist


Students: 7091, Price: $24.99

Students: 7091, Price:  Paid

CCA 131 is certification exam conducted by the leading Big Data Vendor, Cloudera. This online proctored exam is scenario based which means it is very hands on. You will be provided with multi-node cluster and need to take care of given tasks.

To prepare the certification one need to have hands on exposure in building and managing the clusters. However, with limited infrastructure it is difficult to practice in a laptop. We understand that problem and built the course using Google Cloud Platform where you can get credit up to $300 till offer last and use it to get hands on exposure in building and managing Big Data Clusters using CDH.

Required Skills

Install - Demonstrate an understanding of the installation process for Cloudera Manager, CDH, and the ecosystem projects.

  • Set up a local CDH repository

  • Perform OS-level configuration for Hadoop installation

  • Install Cloudera Manager server and agents

  • Install CDH using Cloudera Manager

  • Add a new node to an existing cluster

  • Add a service using Cloudera Manager

Configure - Perform basic and advanced configuration needed to effectively administer a Hadoop cluster

  • Configure a service using Cloudera Manager

  • Create an HDFS user's home directory

  • Configure NameNode HA

  • Configure ResourceManager HA

  • Configure proxy for Hiveserver2/Impala

Manage - Maintain and modify the cluster to support day-to-day operations in the enterprise

  • Rebalance the cluster

  • Set up alerting for excessive disk fill

  • Define and install a rack topology script

  • Install new type of I/O compression library in cluster

  • Revise YARN resource assignment based on user feedback

  • Commission/decommission a node

Secure - Enable relevant services and configure the cluster to meet goals defined by security policy; demonstrate knowledge of basic security practices

  • Configure HDFS ACLs

  • Install and configure Sentry

  • Configure Hue user authorization and authentication

  • Enable/configure log and query redaction

  • Create encrypted zones in HDFS

Test - Benchmark the cluster operational metrics, test system configuration for operation and efficiency

  • Execute file system commands via HTTPFS

  • Efficiently copy data within a cluster/between clusters

  • Create/restore a snapshot of an HDFS directory

  • Get/set ACLs for a file or directory structure

  • Benchmark the cluster (I/O, CPU, network)

Troubleshoot - Demonstrate ability to find the root cause of a problem, optimize inefficient execution, and resolve resource contention scenarios

  • Resolve errors/warnings in Cloudera Manager

  • Resolve performance problems/errors in cluster operation

  • Determine reason for application failure

  • Configure the Fair Scheduler to resolve application delays

Our Approach

  • You will start with creating Cloudera QuickStart VM (in case you have laptop with 16 GB RAM with Quad Core). This will facilitate you to get comfortable with Cloudera Manager.

  • You will be able to sign up for GCP and avail credit up to $300 while offer lasts. Credits are valid up to year.

  • You will then understand brief overview about GCP and provision 7 to 8 Virtual Machines using templates. You will also attaching external hard drive to configure for HDFS later.

  • Once servers are provisioned, you will go ahead and set up Ansible for Server Automation.

  • You will take care of local repository for Cloudera Manager and Cloudera Distribution of Hadoop using Packages.

  • You will then setup Cloudera Manager with custom database and then Cloudera Distribution of Hadoop using Wizard that comes as part of Cloudera Manager.

  • As part of setting up of Cloudera Distribution of Hadoop you will setup HDFS, learn HDFS Commands, Setup YARN, Configure HDFS and YARN High Availability, Understand about Schedulers, Setup Spark, Transition to Parcels, Setup Hive and Impala, Setup HBase and Kafka etc.

  • Once all the services are configured, we will revise for exam by mapping with required skills of the exam.

Learn by Example : HBase – The Hadoop Database

25 solved examples to get you up to speed with HBase

Created by Loony Corn - An ex-Google, Stanford and Flipkart team


Students: 5566, Price: $89.99

Students: 5566, Price:  Paid

Prerequisites: Working with HBase requires knowledge of Java

Taught by a team which includes 2 Stanford-educated, ex-Googlers  and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with large-scale data processing jobs. 

Relational Databases are so stuffy and old! Welcome to HBase - a database solution for a new age. 

HBase: Do you feel like your relational database is not giving you the flexibility you need anymore? Column oriented storage, no fixed schema and low latency make HBase a great choice for the dynamically changing needs of your applications. 

What's Covered: 

25 solved examples covering all aspects of working with data in HBase
CRUD operations in the shell and with the Java API, Filters, Counters, MapReduce 

Implement your own notification service for a social network using HBase

HBase and it’s role in the Hadoop ecosystem, HBase architecture and what makes HBase different from RDBMS and other Hadoop technologies like Hive. 

CCA131 Cloudera CDH 5 & 6 Hadoop Administrator Master Course

Master Cloudera CDH Admin. Spin up cluster in AWS, Mess it, Fix it, Play it and Learn. Real time demo on CCA131 Topics.

Created by MUTHUKUMAR Subramanian - Best Selling Instructor, Big Data, Spark, Cloud, Java, AWS


Students: 3800, Price: $109.99

Students: 3800, Price:  Paid

This course is designed for professionals from zero experience to already skilled professionals to enhance their learning. Hands on session covers on end to end setup of Cloudera Cluster. We will be using AWS EC2 instances to deploy the cluster. 


What students are saying:

  • 5 stars, "Very clear and adept in delivering the content. Learnt a lot. He covers the material 360 degrees and keeps the students in minds." - Sasidhar Thiriveedhi

  • 5 stars, "This course is an absolute paradigm shift for me. This is really an amazing course, and you shouldn't miss if you are a novice/intermediate level in Cloudera Administration." - Santhosh G

  • 5 stars, "Great work by the instructor... highly recommended..." - Phani Raj

  • 5 stars, "It is really excellent course. A lot of learning materials." - Shaiukh Noor

  • 5 stars, "This course is help me a lot for my certification preparation. thank you!" - Muhammad Faridh Ronianto

The course is targeted at Software Engineers, System Analysts, Database Administrators, Devops engineer and System Administrators who want to learn about Big Data Ecosystem with Cloudera. Other IT professionals can also take this course, but might have to do some extra work to understand some of the advanced concepts. 

Cloudera being the market leader in Big data space, Hadoop Cloudera administration brings in huge job opportunities in Cloudera and Big data domain. Covers all the required skills as follows for  CCA131 Certification

  • Install - Demonstrating and Installation of Cloudera Manager, Cloudera Data Hadoop (CDH) and Hadoop Ecosystem components

  • Configure - Basic to advanced configurations to setup Cloudera manager, Namenode High Availability (HA), Resource manager High Availability(HA)

  • Manage - Create and maintain day-to-day activities and operations in Cloudera Cluster like Cluster balancing, Alert setup, Rack topology management, Commissioning, Decommissioning hosts, YARN resource management with FIFO, Fair, Capacity Schedulers, Dynamic Resource Manager Configurations

  • Secure - Enabling relevant service and configuration to add security to meet the organisation goals with best practice. Configure extended Access Control List (ACL), Configure Sentry, Hue authorization and authentication with LDAP, HDFS encrypted zones

  • Test - Access file system commands via HTTPFS, Create, restore snapshot for HDFS directory, Get/Set extended ACL for a file or directory, Benchmark the cluster

  • Troubleshoot - Ability to find the cause of any problem, resolve them, optimize inefficient execution. Identify and filter out the warnings and predict the problem and apply the right solution, Configure dynamic resource pool configuration for better optimized use of cluster. Find the Scalability bottleneck and size the cluster.

  • Planning - Sizing and identify the dependencies, hardware and software requirements.

Getting a real time distributed environment with N number of machines at enterprise quality will be very costly. Thanks to Cloud which can help any user to create distributed environment with very minimal expenditure and pay only for what you are using it. AWS is very much technology neutral and all other cloud providers like Microsoft Azure, IBM Bluemix, Google Compute cloud, etc., works the similar way.

+++++Content Added on Request++++

Dec ++ Cloudera 6 Overview and Quick Install

Nov ++ HDFS Redaction

Nov ++ Memory management - Heap Calculation for Roles and Namenode

Nov ++ IO Compression

Nov ++ Charts and Dashboard

Oct ++ File copy, distcp

Oct ++ Command files added for all the section.

Sep ++ Kafka Service Administration

Sep ++ Spark Service Administration

Aug ++ Cluster Benchmarking

Cloudera Hadoop Administration

Installation ,Configuration,Security,Evaluation and Upgrade of Cloudera Hadoop Cluster .Cloudera Certified Hadoop Admin

Created by Sijeesh Kunnotharamal - Hadoop administrator / DevOPS / Cloud Engineer


Students: 3071, Price: $19.99

Students: 3071, Price:  Paid

This training provides you with proficiency in all the steps required to operate and sustain a Cloudera Hadoop Cluster which includes Planning, Installation,Configuration ,Active Directory Integration , Securing Cluster using Kerberos ,HDFS Access Control List,High Availability ,Hadoop Eco system components in detail and Upgrading Cloudera Manager and CDH . This training will provide hands-on preparation for the real-world challenges faced by Hadoop Administrators. The course curriculum follows Cloudera Hadoop distribution.

Big Data for Managers

A foundation course for big data that covers the big data tools for various stages of a big data project

Created by Ganapathi Devappa - Big Data Specialist


Students: 1951, Price: $24.99

Students: 1951, Price:  Paid

This course covers the required fundamentals about big data technology that will help you confidently lead a big data project in your organization. It covers the big data terminology like 3 Vs of big data and key characteristics of big data technology that will help you answer the question 'How is big data technology different from traditional technology'. You will be able to identify various big data solution stages from big data ingestion to big data visualization and security. You will be able to choose the right tool for each stage of the big data solution. You will see the examples use of popular big data tools like HDFS, Map reduce, Spark, Zeppelin etc and also a demo of setting up EMR cluster on Amazon web services. You will practice how to use the 5 P's methodology of data science projects to manage a big data project. You will see theory as well as practice by applying it to many case studies. You will practice how to size your cluster with a template. You will explore more than 20 big data tools in the course  and you will be able to choose the tool based on the big data problem.

I have recently(14-May-2020) updated content on open source, cloud computing, big data offerings by cloud vendors, multi-cloud, hybrid cloud and edge computing and integrated big data service providers Cloudera and MapR. As most organizations are moving towards public cloud, these lectures will provide the latest information on these technologies for you. I am sure you will like this content.

This course has benefited students in more than 50 countries and as an instructor, I am glad to share some of the  five star comments about the course:

This course really exceeded my expectations! Not only it covers the concepts and the overall view of a Big Data project landscape but it also provides good examples of real case studies, that help reinforce the contents presented. Great course!

This course is great! I have learnt many useful things. The case studies are very enlightening. I strongly recommend. Thank you very much.

otimo tecnicamente, excelente

Didatica muito boa e o conteudo conforme esperado

SQL, NoSQL, Big Data and Hadoop

A comprehensive journey through the world of database and data engineering concepts - from SQL, NoSQL to Hadoop

Created by Michael Enudi - Okmich


Students: 1534, Price: $19.99

Students: 1534, Price:  Paid

A comprehensive look at the wide landscape of database systems and how to make a good choice in your next project

The first time we ask or answer any question regarding databases is when building an application. The next is either when our choice of database becomes a bottleneck or when we need to do large-scale data analytics.

This course covers almost all classes of databases or data storage platform there are and when to consider using them. It is a great journey through databases that will be great for software developers, big data engineers, data analysts as well as decision makers. It is not an in-depth look into each of the databases but promises to get you up and running with your first project for each class.

In this course, we are going to cover 

  • Relational Database Systems, their features, use cases and limitations

  • Why NoSQL?

  • CAP Theorem

  • Key-Value store and their use cases

  • Document-oriented databases and their use cases

  • Wide-columnar store and their use cases

  • Time-series databases and their use cases

  • Search Engines and their use cases

  • Graph databases and their use cases

  • Distributed Logs and real time streaming systems

  • Hadoop and its use cases

  • SQL-on-Hadoop tools and their use cases

  • How to make informed decisions in building a good data storage platform

What is the target audience?

  • Chief data officers

  • Application developer

  • Data analyst

  • Data architects

  • Data engineers

  • Students

  • Anyone who wants to understand Hadoop from a database perspective.

What this course does not cover?

This course does not access any of the databases from the administrative perspective. So we don't cover administrative tasks like security, backup, recovery, migration and the likes.
Very in-depth features in the specific databases in discussion. An example is that we will not go into the different database engines for MySQL or how to write a stored procedures. 

What are the requirements?
The lab for this course can be carried out in any machine (Microsoft Windows, Linux, Mac OX). 
However, the training on HBase or Hadoop will require you to have a hadoop environment. The suggestion for this will be to to use a pre-installed sandbox, a cloud offering or install your own custom sandbox.

What do I need to know to get the best out of this course?
This course does not assume any knowledge of NoSQL or data engineering.
However a little knowledge of RDBMS (even Microsoft Access) is enough to get you into the best position for this course.

Talend For Big Data Integration Course : Beginner to Expert

Master guide for using Talend Big Data

Created by Kapil Chaitanya Kasarapu - Sr Big Data Engineer


Students: 1100, Price: $99.99

Students: 1100, Price:  Paid

Course Description

Talend Open Studio for Data Integration is an open Source ETL Tool, which means small companies or businesses can use this tool to perform Extract Transform and Load their data into Databases or any File Format (Talend supports many file formats and Database vendors).

Talend Open Studio for Big Data is an open Source Tool used to interact with Big Data systems from Talend.

If you want to learn how to use Talend Open Studio for Big Data from SCRATCH or If you want to IMPROVE your skills in Big Data Concepts and designing Talend Jobs, then this course is right for you.

Its got EVERYTHING, covers almost all the topics in Talend Open Studio for Big Data.

Talks about Real Time USE CASES.

Prepares you for the Certification Exam.

By the end of the Course you will Master Working with Big Data by designing Talend Jobs.

And what more you ask, All the Videos are HD Quality.

What Are the System Requirements ?

  • PC or Mac.

  • Virtual Box Which is FREE.

  • Talend Software Which is FREE.

  • HDP VM Which is FREE.

  • CDH VM Which is FREE.

Big Data Crash Course | Learn Hadoop, Spark, NiFi and Kafka

Ramp up on Key Big Data Technologies in Shortest Possible Time

Created by Bhavuk Chawla - Authorized Instructor for Google, Cloudera, Confluent


Students: 441, Price: $109.99

Students: 441, Price:  Paid

  • 9 hours+ Video Content

  • Gain Holistic Picture of Big Data Ecosystem

  • Learn HDFS, HBase, YARN, MapReduce Concepts, Spark, Impala, NiFi and Kafka

  • Experience Classroom like environment via White-boarding sessions

  • Understand "What", "Why" and "Architecture" of Key Big Data Technologies with hands-on labs

  • Perform hands-on on Google Cloud DataProc Pseudo Distributed (Single Node) Environment

  • Delivered by Bhavuk Chawla who has trained 5000+ participants in in-person training

  • Acquire Certificate on Successful Completion of the Course

Below are our more courses -

  1. Big Data Crash Course | Learn Hadoop, Spark, NiFi and Kafka

  2. Big Data For Architects | Build Big Data Pipelines and Compare Key Big Data Technologies

  3. Google Data Engineer Certification Practice Exams

  4. Setup Single Node Cloudera Cluster on Google Cloud

  5. Confluent Certified Operator for Apache Kafka Practice Test

  6. Confluent Certified Developer Apache Kafka Practice Tests

You may join our YouTube Channel named "DataCouch" for getting access to interesting videos free of cost.

We are also an official training delivery partner of Confluent Kafka.. We conduct corporate trainings on various topics including Confluent Kafka Developer, Confluent Kafka Administration, Confluent Kafka Real Time Streaming using KSQL & KStreams and Confluent Kafka Advanced Optimization. Our instructors are well qualified and vetted by Confluent for delivering such courses.

Please feel free to reach out if you have any requirements for Confluent Kafka Training for your team. Happy to assist.

Big Data For Architects

Build Big Data Pipelines using Hadoop, Spark, NiFi, Kafka etc.

Created by Bhavuk Chawla - Authorized Instructor for Google, Cloudera, Confluent


Students: 403, Price: $99.99

Students: 403, Price:  Paid

  • Full of Whiteboarding Sessions to provide you Classroom like experience

  • Understand thought process in choosing Big Data Ingestion, Storage, Processing and Analysis related Technologies

  • Focus on Key Architectures and Pipelines in Big Data Ecosystem

  • Which Big Data Technology to choose when?

  • Covering Breadth of Big Data Technologies

  • Hands-on on Google Cloud DataProc Pseudo Distributed Cluster

  • Theory is more but relevant to provide you right information to work with Real World Big Data Projects

  • No need to pay for running labs on Google Cloud

Below are our more courses -

  1. Big Data Crash Course | Learn Hadoop, Spark, NiFi and Kafka

  2. Big Data For Architects | Build Big Data Pipelines and Compare Key Big Data Technologies

  3. Google Data Engineer Certification Practice Exams

  4. Setup Single Node Cloudera Cluster on Google Cloud

  5. Confluent Certified Operator for Apache Kafka Practice Test

  6. Confluent Certified Developer Apache Kafka Practice Tests

You may join our YouTube Channel named "DataCouch" for getting access to interesting videos free of cost.

We are also an official training delivery partner of Confluent Kafka.. We conduct corporate trainings on various topics including Confluent Kafka Developer, Confluent Kafka Administration, Confluent Kafka Real Time Streaming using KSQL & KStreams and Confluent Kafka Advanced Optimization. Our instructors are well qualified and vetted by Confluent for delivering such courses.

Please feel free to reach out if you have any requirements for Confluent Kafka Training for your team. Happy to assist.

Hands-On with Hadoop 2: 3-in-1

Run your own Hadoop clusters on your own machine or in the cloud

Created by Packt Publishing - Tech Knowledge in Motion


Students: 142, Price: $89.99

Students: 142, Price:  Paid

Hadoop is the most popular, reliable and scalable distributed computing and storage for Big Data solutions. It comprises of components designed to enable tasks on a distributed scale, across multiple servers and thousands of machines.

This comprehensive 3-in-1 training course gives you a strong foundation by exploring Hadoop ecosystem with real-world examples. You’ll discover the process to set up an HDFS cluster along with formatting and data transfer in between your local storage and the Hadoop filesystem. Also get a hands-on solution to 10 real-world use-cases using Hadoop.

Contents and Overview This training program includes 3 complete courses, carefully chosen to give you the most comprehensive training possible.

The first course, Getting Started with Hadoop 2.x, opens with an introduction to the world of Hadoop, where you will learn Nodes, Data Sets, and operations such as map and reduce. The second section deals HDFS, Hadoop's file-system used to store data. Further on, you’ll discover the differences between jobs and tasks, and get to know about the Hadoop UI. After this, we turn our attention to storing data in HDFS and Data Transformations. Lastly, we will learn how to implement an algorithm in Hadoop map-reduce way and analyze the overall performance.

The second course, Hadoop Administration and Cluster Management, starts by installing the Apache Hadoop for cluster installation and configuring the required services. Learn various cluster operations like validations, and expanding and shrinking Hadoop services. You will then move onto gain a better understanding of administrative tasks like planning your cluster, monitoring, logging, security, troubleshooting and best practices. Techniques to keep your Hadoop clusters highly available and reliant are also covered in this course.

The third course, Solving 10 Hadoop'able Problems, covers the core parts of the Hadoop ecosystem, helping to give a broad understanding and get you up-and-running fast. Next, it describes a number of common problems as case-study projects Hadoop is able to solve. These sections are broken down into sections by different projects, each serving as a specific use case for solving big data problems.

By the end of this Learning Path, you’ll be able to plan, deploy, manage and monitor and performance-tune your Hadoop Cluster with Apache Hadoop.

About the Author

A K M Zahiduzzaman is a software engineer with NewsCred Dhaka. He is a software developer and technology enthusiast. He was a Ruby on Rails developer, but now working on NodeJS and angularJS and python. He is also working with a much wider vision as a technology company. The next goal is introducing SOA within the current applications to scale development via microservices. Zahiduzzaman has a lot of experience with Spark and is passionate about it. He is also a guitarist and has a band too. He was also a speaker for an international event in Dhaka. He is very enthusiastic and love to share his knowledge.

Gurmukh Singh is a technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo and has authored the book Monitoring Hadoop.               

Tomasz Lelek is a Software Engineer and Co-Founder of InitLearn. He mostly does programming in Java and Scala. He dedicates his time and efforts to get better at everything. He is currently delving into big data technologies. Tomasz is very passionate about everything associated with software development. He has been a speaker at a few conferences in Poland-Confitura and JDD, and at the Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference. He was also a speaker at an international event in Dhaka. He is very enthusiastic and loves to share his knowledge.

Bigdata with AWS – From Beginner to Expert

Basics to Advanced Step by Step Practical Hands-On on every ecosystem which will make you an expert.

Created by Saif Shaikh - Module Lead


Students: 141, Price: $24.99

Students: 141, Price:  Paid

Hi All,

This course is designed from Beginner to Advanced Level with Cloud Knowledge where all the sessions are hands-on.

Topics Covered:

1) Hadoop

2) Sqoop

3) Hive

4) Scala Programming

5) Spark

6) HBase

7) Cassandra

8) Kafka

9) AWS


All the sessions are starting from basics and minute care has been taken for implementing all concepts with hands-on.

After completing this course you should get ready to work in an independent industry environment and also it will help you gain confidence in coding as codes are written from scratch. Errors & Packages are explained, POM file creation, Jar creation and Spark Web UI has also been shown to get real time experience.

People coming from various background can easily pick up and work on it as sessions are hands-on with questions and answers. If you face any issues related to understanding or implementation post a question on Udemy i will try to answer that in 24-48 Hrs from my work Schedule.

Above course is designed keeping in mind with the current market & industry standards. All the topics are deeply covered to give you the best knowledge with cloud hands on experience. As Cloud will be the next new era so lets start learning now and becoming proficient.

I wish you good luck for your new learning of Bigdata with AWS and i hope you will be transfer your knowledge after this course with a good confidence.

Please Note:

No Documents, No Scripts, No VM, No Assignments & No Project would be provided in this.

The Complete Apache Phoenix Developer Course

Learn the fundamentals of Apache Phoenix and development with the help of real-time scenarios.

Created by HubeTech Academy, Inc. - High Quality Courses from Expert Instructors


Students: 124, Price: $89.99

Students: 124, Price:  Paid

Welcome to this course: The Complete Apache Phoenix Developer Course. Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver that hides the intricacies of the noSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; insert and delete rows singly and in bulk; and query data through SQL. Phoenix compiles queries and other statements into native noSQL store APIs rather than using MapReduce enabling the building of low latency applications on top of noSQL stores.

In this course, you'll learn:

  • Understand the fundamentals of Apache Phoenix
  • Learn and understand the use of Apache Phoenix
  • Learn how to obtain and configure Apache Phoenix
  • Learn how to structure data to get maximum performance from NoSQL solutions
  • Learn how to create Phoenix tables, load data, and execute queries against that data
  • Learn how to retrieve data from Phoenix by using a JDBC connection

At the end of this course, you will be an expert in using Apache Phoenix. What are you waiting for? Let's get started!

Solving 10 Hadoop’able Problems

Need solutions to your big data problems? Here are 10 real-world projects demonstrating problems solved using Hadoop.

Created by Packt Publishing - Tech Knowledge in Motion


Students: 82, Price: $89.99

Students: 82, Price:  Paid

The Apache Hadoop ecosystem is a popular and powerful tool to solve big data problems. With so many competing tools to process data, many users want to know which particular problems are well suited to Hadoop, and how to implement those solutions.

To know what types of problems are Hadoop-able it is good to start with a basic understanding of the core components of Hadoop. You will learn about the ecosystem designed to run on top of Hadoop as well as software that is deployed alongside it. These tools give us the building blocks to build data processing applications. This course covers the core parts of the Hadoop ecosystem, helping to give a broad understanding and get you up-and-running fast. Next, it describes a number of common problems as case-study projects Hadoop is able to solve. These sections are broken down into sections by different projects, each serving as a specific use case for solving big data problems.

By the end of this course, you will have been exposed to a wide variety of Hadoop software and examples of how it is used to solve common big data problems.

About the Author

Tomasz Lelek is a Software Engineer who programs mostly in Java and Scala. He is a fan of microservice architectures and functional programming. He dedicates considerable time and effort to be better every day. Recently, he's been delving into big data technologies such as Apache Spark and Hadoop. He is passionate about nearly everything associated with software development.

Tomasz thinks that we should always try to consider different solutions and approaches to solving a problem. Recently, he was a speaker at several conferences in Poland - Confitura and JDD (Java Developer's Day) and also at Krakow Scala User Group.

He also conducted a live coding session at Geecon Conference.

Intermediate Hadoop: Process & Analyze Large Data Sets

Hadoop: Intermediate

Created by Integrity Training - Certification Trainers for Over 20 Years


Students: 16, Price: $89.99

Students: 16, Price:  Paid

Hadoop: Intermediate training course is designed to give you in-depth knowledge about the Hadoop framework discussed in our Hadoop and MapReduce Fundamentals course. The course covers the concepts to process and analyze large sets of data stored in HDFS. It teaches Sqoop and Flume for data ingestion.

The Hadoop: Intermediate course is part of a two course series which covers the essential concepts in getting to know Hadoop and the big-data analytics. With increasing digital trend in the world, the importance of big data and data analytics is going to continue growing in the coming years. This course will enable the candidates to explore opportunities in this growing field of digital science.