Best Cloudera Courses

Find the best online Cloudera Courses for you. The courses are sorted based on popularity and user ratings. We do not allow paid placements in any of our rankings. We also have a separate page listing only the Free Cloudera Courses.

Hands-on HADOOP Masterclass – Tame the Big Data!

Big Data, Hadoop, MapReduce, HDFS, HIVE, PIG, Mahout, NoSQL, Oozie, Flume, Storm, Avro, Spark, Sqoop, Cloudera and more

Created by EDU CBA - Learn real world skills online

"]

Students: 21006, Price: $89.99

Students: 21006, Price:  Paid

Learn from well crafted study materials on Big Data, Hadoop, MapReduce, HDFS, HIVE, PIG, Mahout, NoSQL, Oozie, Flume, Storm, Avro, Spark, Sqoop, Cloudera, Data Analysis, Survey Analysis, Data Management, Sales Analysis, salary Analysis, Traffic Analysis, Loan Analysis, Log Data Analysis, Youtube Data Analysis, Sensor Data Analysis. Learn by doing. Learn from hands-on examples of analyzing big data. Turn your Crafting ability which can be a mixed bag ranging from developers to data scientists using procedural languages in the Hadoop space. Discover and learn the fundamentals of Hadoop. Be a person comfortable in managing the development and deployment of Hadoop applications.

What is Big Data

Big data is a collection of large datasets which cannot be processed using the traditional techniques. Big data uses various tools and techniques to collect and process the data. Big data deals with all types of data including structured, semi-structured and unstructured data. Big data is used in various fields data like

  • Black box data

  • Social media data

  • Stock exchange data

  • Power Grid Data

  • Transport Data

  • Search Engine Data

Benefits of Big Data

Big data has become very important and it is emerging as one of the crucial technologies in today’s world. The benefits of big data are listed below

Big data can be used by the companies to know the effectiveness of their marketing campaigns, promotions and other advertising media

Big data helps the companies to plan their production

Using the information provided through Big data companies can deliver better and quick service to their customers

Big data helps in better decision making in the companies which will increase the operational efficiencies and reduces the risk of the business

Big data handles huge volume of data in real time and thus enables data privacy and security to a great extent

Challenges faced by Big Data

The major challenges of big data are as follows

  • Curation

  • Storage

  • Searching

  • Transfer

  • Analysis

  • Presentation

What is Hadoop

Hadoop is an open source software framework which is used for storing data of any type. It also helps in running applications on group of hardware. Hadoop has huge processing power and it can handle more number of tasks. Open source software here means it is free to download and use. But there are also commercial versions of Hadoop which is becoming available in the market. There are four basic components of Hadoop – Hadoop Common, Hadoop Distributed File System (HDFS), MapReduce and Yet Another Resource Negotiator (YARN).

Benefits of Hadoop Course

Hadoop is used by most of the organizations because of its ability to store and process huge amount of any type of data. The other benefits of Hadoop includes

  • Computing Power

  • Flexibility

  • Fault Tolerance

  • Low Cost

  • Scalability

Uses of Hadoop

Hadoop is used by many of the organization’s today because of its following uses

Low cost storage and active data archive

Staging area for a data warehouse and analytics store

Data lake

Sandbox for discovery and analysis

Recommendation Systems

CCA 175 – Spark and Hadoop Developer Certification – Scala

Cloudera Certified Associate Spark and Hadoop Developer using Scala as Programming Language

Created by Durga Viswanatha Raju Gadiraju - Technology Adviser and Evangelist

"]

Students: 18143, Price: $24.99

Students: 18143, Price:  Paid

CCA 175 Spark and Hadoop Developer is one of the well recognized Big Data certification. This scenario based certification exam demands basic programming using Python or Scala along with Spark and other Big Data technologies.

This comprehensive course covers all aspects of the certification using Scala as programming language.

  • Scala Fundamentals

  • Core Spark - Transformations and Actions

  • Spark SQL and Data Frames

  • File formats

  • Flume, Kafka and Spark Streaming

  • Apache Sqoop

Exercises will be provided to prepare before attending the certification. Intention of the course is to boost the confidence to attend the certification.

All the demos are given on our state of the art Big Data cluster. You can avail one week complementary lab access by filling the form which is provided as part of the welcome message.

Apache NiFi (Cloudera DataFlow) – Be an expert in 8 Hours

8+ Hours Beginners to Advance: Learn Real World Production Scenario and Become a Pro in Apache Data Flow Management

Created by Vikas Kumar Jha - Architect

"]

Students: 16854, Price: $99.99

Students: 16854, Price:  Paid

Finally, A complete course on Apache NiFi is here. This course is going to discuss latest version of NiFi and Processors step by step.

With more than 3500+ students enrolled since the course went live Last Month. and has already become a bestseller now.  This course has 8+ Hours of comprehensive hands-on lessons which can take you through the journey of Data Flow Manager from Beginners to Advance Level. This course has latest version of Apache NiFi Processors covered.

Please feel free to ask any question you have during your journey of learning Apache NiFi. I would love to answer your queries.

A Quick Intro About NiFi:

Apache NiFi is an excellent open source software for automating and managing the data flows between various types of systems. It is a powerful and reliable system to process and distribute data. It provides a web-based User Interface for creating, monitoring, & controlling data flows. It has a highly configurable and modifiable data flow process that can modify data during executing also known as run time. It is easily extensible through the development of custom components.

Github Link to download the templates is updated in lecture - 5, resources.

Note: All my courses have 30 days money back guarantee, So don't hesitate to enroll and start your journey on Apache NiFi Data Flow Management.

Introduction to Apache NiFi | Cloudera DataFlow – HDF 2.0

Apache NiFi - An Introductory Course to Learn Installation, Basic Concepts and Efficient Streaming of Big Data Flows

Created by Stephane Maarek | AWS Certified Cloud Practitioner,Solutions Architect,Developer - Best Selling Instructor, Kafka Guru, 9x AWS Certified

"]

Students: 13150, Price: $99.99

Students: 13150, Price:  Paid

Apache NiFi (Cloudera DataFlows - ex Hortonworks DataFlow) is an innovative technology to build data flows and solve your streaming challenges?

In today's big data world, fast data is becoming increasingly important. Streaming data at scale and rapidly between all your systems should be centralised, automated and resilient to failure to ensure good delivery to your downstream systems.

With NiFi, you can build all your flows directly from a UI, no coding required, and at scale!

Apache NiFi initially used by the NSA so they could move data at scale and was then open sourced. Being such a hot technology, Onyara (the company behind it) was then acquired by Hortonworks, one of the main backers of the big data project Hadoop and then Hadoop Data Platform.

Apache NiFi is now used in many top organisations that want to harness the power of their fast data by sourcing and transferring information from and to their database and big data lakes. It is a key tool to learn for the analyst and data scientists alike. Its simplicity and drag and drop interface make it a breeze to use!

You can build streaming pipelines between Kafka and ElasticSearch, an FTP and MongoDB, and so much more! Your imagination is the limit

==============================

Quick Overview Of Course Content

This course will take you through an introduction of the Apache NiFi technology.

With a mix of theory lessons and hands-on labs, you'll get started and build your first data flows.

You will learn how to set up your connectors, processors, and how to read your FlowFiles to make most of what NiFi offer.

The most important configuration options will be demonstrated so you will be able to get started in no time.

We will also analyse a template picked from the web and understand how to debug your flows as well as route your data to different processors based on outcomes through relationships.

We will finally learn about the integrations between NiFi and Apache Kafka or MongoDB. Lots of learning ahead!

==============================

Why I should take this course?

  • With over 1.5 hours of videos and over 15 classes, you will get a great understand of Apache NiFi in no time!

  • You will learn how to install and configure Apache NiFi to get started

  • You will learn Apache NiFI Architecture and Core Concepts

  • The core concepts like FlowFile, FlowFile Processor, Connection, Flow Controller, Process Groups etc.

  • You will learn how to use Apache NiFi Efficiently to Stream Data using NiFi between different systems at scale

  • You will also understand how to monitor Apache NiFi

  • Integrations between Apache Kafka and Apache NiFi!

  • Questions can also be asked on the forum and instructor is keen to answer those in timely manner

==============================

Students Loved this course

Ashish Ranjan says “Great Course to get started with Nifi. Also, the instructor is very helpful and answers all your questions. I would highly recommend it. Great Job.” (Rated with 5 star)

Luca Costa says “It was very interesting and now I have an Idea how to start my project :) Thank you” (Rated with 5 star)

Aaron Gong says “Very clear and well instructed, first section is the most important, why use Nifi and for what purpose it is better suited for…” (Rated with 5 star)

I am sure that you will walk away with a great enterprise skill and start solving your streaming challenges!

===============================

Instructor

Stephane Maarek is the instructor of this course. He loved NiFi and data engineering. He's the author of the highly-rated Apache Kafka Series on Udemy, having taught already to 40,000+ students and received 12,000+ reviews.

=============================

You also have lifetime access to the course and 30 days’ money back guarantee, so click on “Enroll Now” button now and see you inside the course!

CCA 175 – Spark and Hadoop Developer – Python (pyspark)

Cloudera Certified Associate Spark and Hadoop Developer using Python as Programming Language

Created by Durga Viswanatha Raju Gadiraju - Technology Adviser and Evangelist

"]

Students: 9717, Price: $24.99

Students: 9717, Price:  Paid

CCA 175 Spark and Hadoop Developer is one of the well recognized Big Data certifications. This scenario-based certification exam demands basic programming using Python or Scala along with Spark and other Big Data technologies.

This comprehensive course covers all aspects of the certification using Python as a programming language.

  • Python Fundamentals

  • Spark SQL and Data Frames

  • File formats

Please note that the syllabus is recently changed and now the exam is primarily focused on Spark Data Frames and/or Spark SQL.

Exercises will be provided to prepare before attending the certification. The intention of the course is to boost the confidence to attend the certification.  

All the demos are given on our state of the art Big Data cluster. You can avail one-week complimentary lab access by filling this form which is provided as part of the welcome message.

CCA 131 – Cloudera Certified Hadoop and Spark Administrator

Prepare for CCA 131 by setting up cluster from scratch and performing tasks based on scenarios derived from curriculum.

Created by Durga Viswanatha Raju Gadiraju - Technology Adviser and Evangelist

"]

Students: 7091, Price: $24.99

Students: 7091, Price:  Paid

CCA 131 is certification exam conducted by the leading Big Data Vendor, Cloudera. This online proctored exam is scenario based which means it is very hands on. You will be provided with multi-node cluster and need to take care of given tasks.

To prepare the certification one need to have hands on exposure in building and managing the clusters. However, with limited infrastructure it is difficult to practice in a laptop. We understand that problem and built the course using Google Cloud Platform where you can get credit up to $300 till offer last and use it to get hands on exposure in building and managing Big Data Clusters using CDH.

Required Skills

Install - Demonstrate an understanding of the installation process for Cloudera Manager, CDH, and the ecosystem projects.

  • Set up a local CDH repository

  • Perform OS-level configuration for Hadoop installation

  • Install Cloudera Manager server and agents

  • Install CDH using Cloudera Manager

  • Add a new node to an existing cluster

  • Add a service using Cloudera Manager

Configure - Perform basic and advanced configuration needed to effectively administer a Hadoop cluster

  • Configure a service using Cloudera Manager

  • Create an HDFS user's home directory

  • Configure NameNode HA

  • Configure ResourceManager HA

  • Configure proxy for Hiveserver2/Impala

Manage - Maintain and modify the cluster to support day-to-day operations in the enterprise

  • Rebalance the cluster

  • Set up alerting for excessive disk fill

  • Define and install a rack topology script

  • Install new type of I/O compression library in cluster

  • Revise YARN resource assignment based on user feedback

  • Commission/decommission a node

Secure - Enable relevant services and configure the cluster to meet goals defined by security policy; demonstrate knowledge of basic security practices

  • Configure HDFS ACLs

  • Install and configure Sentry

  • Configure Hue user authorization and authentication

  • Enable/configure log and query redaction

  • Create encrypted zones in HDFS

Test - Benchmark the cluster operational metrics, test system configuration for operation and efficiency

  • Execute file system commands via HTTPFS

  • Efficiently copy data within a cluster/between clusters

  • Create/restore a snapshot of an HDFS directory

  • Get/set ACLs for a file or directory structure

  • Benchmark the cluster (I/O, CPU, network)

Troubleshoot - Demonstrate ability to find the root cause of a problem, optimize inefficient execution, and resolve resource contention scenarios

  • Resolve errors/warnings in Cloudera Manager

  • Resolve performance problems/errors in cluster operation

  • Determine reason for application failure

  • Configure the Fair Scheduler to resolve application delays

Our Approach

  • You will start with creating Cloudera QuickStart VM (in case you have laptop with 16 GB RAM with Quad Core). This will facilitate you to get comfortable with Cloudera Manager.

  • You will be able to sign up for GCP and avail credit up to $300 while offer lasts. Credits are valid up to year.

  • You will then understand brief overview about GCP and provision 7 to 8 Virtual Machines using templates. You will also attaching external hard drive to configure for HDFS later.

  • Once servers are provisioned, you will go ahead and set up Ansible for Server Automation.

  • You will take care of local repository for Cloudera Manager and Cloudera Distribution of Hadoop using Packages.

  • You will then setup Cloudera Manager with custom database and then Cloudera Distribution of Hadoop using Wizard that comes as part of Cloudera Manager.

  • As part of setting up of Cloudera Distribution of Hadoop you will setup HDFS, learn HDFS Commands, Setup YARN, Configure HDFS and YARN High Availability, Understand about Schedulers, Setup Spark, Transition to Parcels, Setup Hive and Impala, Setup HBase and Kafka etc.

  • Once all the services are configured, we will revise for exam by mapping with required skills of the exam.

CCA 159 – Data Analyst using Sqoop, Hive and Impala

Cloudera Certified Associate - Data Analyst using Technologies like Sqoop, Hive and Impala

Created by Durga Viswanatha Raju Gadiraju - Technology Adviser and Evangelist

"]

Students: 3816, Price: $24.99

Students: 3816, Price:  Paid

CCA 159 Data Analyst is one of the well-recognized Big Data certification. This scenario based certification exam demands in depth knowledge of Hive, Sqoop as well as basic knowledge of Impala.

This comprehensive course covers all aspects of the certification with real world examples and data sets.

  • Overview of Big Data eco system

  • HDFS Commands

  • Creating Tables in Hive

  • Loading/Inserting data into Hive tables

  • Overview of functions in Hive

  • Writing Basic Queries in Hive

  • Joining Data Sets and Set Operations in Hive

  • Windowing or Analytics Functions in Hive

  • Importing data from MySQL to HDFS

  • Performing Hive Import

  • Exporting Data from HDFS/Hive to MySQL

  • Submitting Sqoop Jobs and Incremental Imports

  • and more

Here are the objectives for the certification.

Provide Structure to the Data

Use Data Definition Language (DDL) statements to create or alter structures in the metastore for use by Hive and Impala.

  • Create tables using a variety of data types, delimiters, and file formats

  • Create new tables using existing tables to define the schema

  • Improve query performance by creating partitioned tables in the metastore

  • Alter tables to modify the existing schema

  • Create views in order to simplify queries

Data Analysis

Use Query Language (QL) statements in Hive and Impala to analyze data on the cluster.

  • Prepare reports using SELECT commands including unions and subqueries

  • Calculate aggregate statistics, such as sums and averages, during a query

  • Create queries against multiple data sources by using join commands

  • Transform the output format of queries by using built-in functions

  • Perform queries across a group of rows using windowing functions

Exercises will be provided to prepare before attending the certification. Intention of the course is to boost the confidence to attend the certification.

All the demos are given on our state of the art Big Data cluster. If you do not have multi node cluster, you can sign up for our labs and practice on our multi node cluster.

CCA131 Cloudera CDH 5 & 6 Hadoop Administrator Master Course

Master Cloudera CDH Admin. Spin up cluster in AWS, Mess it, Fix it, Play it and Learn. Real time demo on CCA131 Topics.

Created by MUTHUKUMAR Subramanian - Best Selling Instructor, Big Data, Spark, Cloud, Java, AWS

"]

Students: 3800, Price: $109.99

Students: 3800, Price:  Paid

This course is designed for professionals from zero experience to already skilled professionals to enhance their learning. Hands on session covers on end to end setup of Cloudera Cluster. We will be using AWS EC2 instances to deploy the cluster. 

COURSE UPDATED PERIODICALLY SINCE LAUNCH  (Cloudera 6)

What students are saying:

  • 5 stars, "Very clear and adept in delivering the content. Learnt a lot. He covers the material 360 degrees and keeps the students in minds." - Sasidhar Thiriveedhi

  • 5 stars, "This course is an absolute paradigm shift for me. This is really an amazing course, and you shouldn't miss if you are a novice/intermediate level in Cloudera Administration." - Santhosh G

  • 5 stars, "Great work by the instructor... highly recommended..." - Phani Raj

  • 5 stars, "It is really excellent course. A lot of learning materials." - Shaiukh Noor

  • 5 stars, "This course is help me a lot for my certification preparation. thank you!" - Muhammad Faridh Ronianto

The course is targeted at Software Engineers, System Analysts, Database Administrators, Devops engineer and System Administrators who want to learn about Big Data Ecosystem with Cloudera. Other IT professionals can also take this course, but might have to do some extra work to understand some of the advanced concepts. 

Cloudera being the market leader in Big data space, Hadoop Cloudera administration brings in huge job opportunities in Cloudera and Big data domain. Covers all the required skills as follows for  CCA131 Certification

  • Install - Demonstrating and Installation of Cloudera Manager, Cloudera Data Hadoop (CDH) and Hadoop Ecosystem components

  • Configure - Basic to advanced configurations to setup Cloudera manager, Namenode High Availability (HA), Resource manager High Availability(HA)

  • Manage - Create and maintain day-to-day activities and operations in Cloudera Cluster like Cluster balancing, Alert setup, Rack topology management, Commissioning, Decommissioning hosts, YARN resource management with FIFO, Fair, Capacity Schedulers, Dynamic Resource Manager Configurations

  • Secure - Enabling relevant service and configuration to add security to meet the organisation goals with best practice. Configure extended Access Control List (ACL), Configure Sentry, Hue authorization and authentication with LDAP, HDFS encrypted zones

  • Test - Access file system commands via HTTPFS, Create, restore snapshot for HDFS directory, Get/Set extended ACL for a file or directory, Benchmark the cluster

  • Troubleshoot - Ability to find the cause of any problem, resolve them, optimize inefficient execution. Identify and filter out the warnings and predict the problem and apply the right solution, Configure dynamic resource pool configuration for better optimized use of cluster. Find the Scalability bottleneck and size the cluster.

  • Planning - Sizing and identify the dependencies, hardware and software requirements.

Getting a real time distributed environment with N number of machines at enterprise quality will be very costly. Thanks to Cloud which can help any user to create distributed environment with very minimal expenditure and pay only for what you are using it. AWS is very much technology neutral and all other cloud providers like Microsoft Azure, IBM Bluemix, Google Compute cloud, etc., works the similar way.

+++++Content Added on Request++++

Dec ++ Cloudera 6 Overview and Quick Install

Nov ++ HDFS Redaction

Nov ++ Memory management - Heap Calculation for Roles and Namenode

Nov ++ IO Compression

Nov ++ Charts and Dashboard

Oct ++ File copy, distcp

Oct ++ Command files added for all the section.

Sep ++ Kafka Service Administration

Sep ++ Spark Service Administration

Aug ++ Cluster Benchmarking

CCA175 Exam Prep Qs pt B (With Spark 2.4 Hadoop Cluster VM)

Practice for CCA175 Test | Data Analysis Qs | Spark 2.4 Hadoop Cluster VM | Cloudera Spark & Hadoop Developer | Inc Data

Created by Matthew Barr - Data Scientist & Founder of Verulam Blue

"]

Students: 3124, Price: $34.99

Students: 3124, Price:  Paid

Prepare for the data analysis section of the CCA Spark & Hadoop Developer certification and pass the CCA175 exam on your first attempt.

Students enrolling on this course can be 100% confident that after working on the problems contained here they will be in a great position to pass the data analysis section of the CCA175 exam on their first attempt.

As the number of vacancies for big data, machine learning & data science roles continue to grow, so too will the demand for qualified individuals to fill those roles.

It’s often the case the case that to stand out from the crowd, it’s necessary to get certified.

This exam preparation series has been designed to help YOU pass the Cloudera certification CCA175, this is a hands-on, practical exam where the primary focus is on using Apache Spark to solve Big Data problems.

On solving the problems contained here you’ll have all the necessary skills & the confidence to handle any data analysis related questions that come your way in the exam.

(a) There are 30 problems in this part of the exam preparation series. All of which are directly related to the data analysis component of the CCA175 exam syllabus.

(b) Fully worked out solutions to all the problems.

(c) Also included is the Verulam Blue virtual machine which is an environment that has a spark Hadoop cluster already installed so that you can practice working on the problems.

• The VM contains a Spark stack which allows you to read and write data to & from the Hadoop file system as well as to store metastore tables on the Hive metastore.

• All the datasets you need for the problems are already loaded onto HDFS, so you don’t have to do any extra work.

• The VM also has Apache Zeppelin installed with fully executed Zeppelin notebooks that contain solutions to the problems.

Cloudera Hadoop Administration

Installation ,Configuration,Security,Evaluation and Upgrade of Cloudera Hadoop Cluster .Cloudera Certified Hadoop Admin

Created by Sijeesh Kunnotharamal - Hadoop administrator / DevOPS / Cloud Engineer

"]

Students: 3071, Price: $19.99

Students: 3071, Price:  Paid

This training provides you with proficiency in all the steps required to operate and sustain a Cloudera Hadoop Cluster which includes Planning, Installation,Configuration ,Active Directory Integration , Securing Cluster using Kerberos ,HDFS Access Control List,High Availability ,Hadoop Eco system components in detail and Upgrading Cloudera Manager and CDH . This training will provide hands-on preparation for the real-world challenges faced by Hadoop Administrators. The course curriculum follows Cloudera Hadoop distribution.

Cloudera CCA 175 Spark Developer Certification: Hadoop Based

Become a Master of Spark using Scala to Stage, Transform, and Store with Spark RDDs, DataFrames, and Apache Sqoop

Created by Dhruv Bais - Master Programmer and Machine Learning Robot

"]

Students: 2886, Price: $89.99

Students: 2886, Price:  Paid

Apache Spark is the single most revolutionizing phenomenon in Big Data Technologies. Spark turns infrastructure into a service, making provisioning hardware fast, simple, and reliable. Knowing this, many companies are transporting their big data analysis, staging, and storing needs to the Spark Framework. In this course, I will be preparing you for the CCA 175 Spark Developer Certification. This is the most popular and a very potent certificate in the Big Data realm.

In order for you to be able to get into this new realm of intense Tech competition, you will need a course to guide your way in Spark. The problem is that most courses are not designed to help you learn by example (immersion is the most potent way of learning in humans). Rather they bathe you with inapplicable information that you have to learn over and over again anyways.

This course is designed to cover the end-to-end implementation of the major components of Spark. I will be giving you hands on experience and insight into how big data processing works and how it is applied in the real world. We will explore Spark RDDs, which are the most dynamic way of working with your data. They allow you to write powerful code in a matter of minutes and accomplish whatever tasks that might be required of you. They, like DataFrames, leverage the Spark Lazy Evaluation and Directed Acyclic Graphs (DAG) to give you 100x better functionality than MapReduce while writing less than a tenth of the code. You can execute all the Joins, Aggregations,Transformations and even Machine Learning you want on top of Spark RDDs. We will explore these in depth in the course and I will equip you with all the tools necessary to do anything you want with your data.

I have made sure that this journey becomes a fun and learning experience for you as the student. I have structured this course so that you can learn step by step how Spark works and you can do the activities that I do in the course yourself. As you do these activities, you will become a master of Spark and complete any exercise asked of you on the CCA 175 certification exam.

There is no risk for you as a student in this course. I have put together a course that is not only worth your money, but also worth your time. I urge you to join me on this journey to learn how to dominate the IT world with the one of the most popular Big Data Processing Frameworks: Apache Spark.

CCA175 Practice Tests (With Spark 2.4 Hadoop Cluster VM)

5 CCA175 Practice Exams | Spark Hadoop Cluster VM | Realistic Exam Qs| Cloudera Spark Hadoop Developer | Fully Solved Qs

Created by Matthew Barr - Data Scientist & Founder of Verulam Blue

"]

Students: 2314, Price: $39.99

Students: 2314, Price:  Paid

5 fully solved practice tests to help you prepare for the CCA Spark & Hadoop Developer certification &  pass the CCA175 exam on your first attempt.

Students enrolling on this course can be 100% confident that after working on the test questions contained here they will be in a great position to pass the CCA175 exam on their first attempt.

As the number of vacancies for big data, machine learning & data science roles continue to grow, so too will the demand for qualified individuals to fill those roles.

It’s often the case the case that to stand out from the crowd, it’s necessary to get certified.

This exam preparation series has been designed to help YOU pass the Cloudera certification CCA175, this is a hands-on, practical exam where the primary focus is on using Apache Spark to solve Big Data problems.

On solving the questions contained here you’ll have all the necessary skills & the confidence to handle any questions that come your way in the exam.

(a) There are 5 practice tests contained in this course. All of the questions are directly related to the CCA175 exam syllabus.

(b) Fully worked out solutions to all the problems.

(c) Also included is the Verulam Blue virtual machine which is an environment that has a spark Hadoop cluster already installed so that you can practice working on the problems.

• The VM contains a Spark stack which allows you to read and write data to & from the Hadoop file system as well as to store metastore tables on the Hive metastore.

• All the datasets you need for the problems are already loaded onto HDFS, so you don’t have to do any extra work.

• The VM also has Apache Zeppelin installed with fully executed Zeppelin notebooks that contain solutions to all the questions.

Students will get hands-on experience working in a Spark Hadoop environment as they practice:

• Converting a set of data values in a given format stored in HDFS into new data values or a new data format and writing them into HDFS.

• Loading data from HDFS for use in Spark applications & writing the results back into HDFS using Spark.

•  Reading and writing files in a variety of file formats.

• Performing standard extract, transform, load (ETL) processes on data using the Spark API.

• Using metastore tables as an input source or an output sink for Spark applications.

• Applying the understanding of the fundamentals of querying datasets in Spark.

• Filtering data using Spark.

• Writing queries that calculate aggregate statistics.

• Joining disparate datasets using Spark.

• Producing ranked or sorted data.

Cloudera Hadoop |Big Data | Authentication With Kerberos

Hadoop Administrator | Cloudera | Cloudera Hadoop Secure Cluster | Kerberos Authentication | MIT Kerberos

Created by Imran Chaush - Hadoop Administrator

"]

Students: 2273, Price: $19.99

Students: 2273, Price:  Paid

Cloudera Hadoop | Big Data | Secure Cloudera Manager With Kerberos Authentication

You will Learn in This course.

1:- Hadoop 2 Prerequisites.

2:- Cloudera Manager Deployment.

3:- Add New Node To Cloudera Cluster.

4:- Kerberos Authentication Steps.

5:- Secure Cloudera Cluster

I have demonstrated that hadoop2 pre-requisites and Cloudera manager installation after installation enabling it Kerberos authentication on Cloudera manager and check one job on the cluster and check Kerberos is working or not. also, show how to create ec2 instance then creating an image of ec2 instance, spot instance on-demand instance then if you want to secure your Hadoop environment you will learn that in this course.

CCA175 Exam Prep Qs pt A (With Spark 2.4 Hadoop Cluster VM)

Practice for CCA175 Test | ETL Qs | Spark 2.4 Hadoop Cluster VM | Cloudera Spark & Hadoop Developer | Includes Data

Created by Matthew Barr - Data Scientist & Founder of Verulam Blue

"]

Students: 1465, Price: $34.99

Students: 1465, Price:  Paid

Prepare for the transform, stage and store section of the CCA Spark & Hadoop Developer certification and pass the CCA175 exam on your first attempt.

Students enrolling on this course can be 100% confident that after working on the problems contained here they will be in a great position to pass the transform, stage and store section of the CCA175 exam on their first attempt.

As the number of vacancies for big data, machine learning & data science roles continue to grow, so too will the demand for qualified individuals to fill those roles.

It’s often the case the case that to stand out from the crowd, it’s necessary to get certified.

This exam preparation series has been designed to help YOU pass the Cloudera certification CCA175, this is a hands-on, practical exam where the primary focus is on using Apache Spark to solve Big Data problems.

On solving the problems contained here you’ll have all the necessary skills & the confidence to handle any transform, stage & store related questions that come your way in the exam.

(a) There are 30 problems in this part of the exam preparation series. All of which are directly related to the transform, stage & store component of the CCA175 exam syllabus.

(b) Fully worked out solutions to all the problems.

(c) Also included is the Verulam Blue virtual machine which is an environment that has a spark Hadoop cluster already installed so that you can practice working on the problems.

• The VM contains a Spark stack which allows you to read and write data to & from the Hadoop file system as well as to store metastore tables on the Hive metastore.

• All the datasets you need for the problems are already loaded onto HDFS, so you don’t have to do any extra work.

• The VM also has Apache Zeppelin installed with fully executed Zeppelin notebooks that contain solutions to the problems.

Real World Hadoop – Automating Hadoop install with Python!

Deploy a Hadoop cluster (Zookeeper, HDFS, YARN, Spark) with Cloudera Manager's Python API. Hands on.

Created by Toyin Akin - Big Data Engineer, Capital Markets FinTech Developer

"]

Students: 402, Price: $89.99

Students: 402, Price:  Paid

Note : This course is built on top of the "Real World Vagrant - Automate a Cloudera Manager Build - Toyin Akin" course

Deploy a Hadoop cluster (Zookeeper, HDFS, YARN, Spark) with Python! Instruct Cloudera Manager to do the work! Hands on. Here we use Python to instruct an already installed Cloudera Manager to deploy your Hadoop Services.

.The Cloudera Manager API provides configuration and service lifecycle management, service health information and metrics, and allows you to configure Cloudera Manager itself. The API is served on the same host and port as the Cloudera Manager Admin Console, and does not require an extra process or extra configuration. The API supports HTTP Basic Authentication, accepting the same users and credentials as the Cloudera Manager Admin Console.

.

Here are some of the cool things you can do with Cloudera Manager via the API:

Deploy an entire Hadoop cluster programmatically. Cloudera Manager supports HDFS, MapReduce, YARN, ZooKeeper, HBase, Hive, Oozie, Hue, Flume, Impala, Solr, Sqoop, Spark and Accumulo.
    Configure various Hadoop services and get config validation.
    Take admin actions on services and roles, such as start, stop, restart, failover, etc. Also available are the more advanced workflows, such as setting up high availability and decommissioning.
    Monitor your services and hosts, with intelligent service health checks and metrics.
    Monitor user jobs and other cluster activities.
    Retrieve timeseries metric data.
    Search for events in the Hadoop system.
    Administer Cloudera Manager itself.
    Download the entire deployment description of your Hadoop cluster in a json file.

Additionally, with the appropriate licenses, the API lets you:

    Perform rolling restart and rolling upgrade.
    Audit user activities and accesses in Hadoop.
    Perform backup and cross data-center replication for HDFS and Hive.
    Retrieve per-user HDFS usage report and per-user MapReduce resource usage report.

.

Here I present a curriculum as to the current state of my Cloudera courses.

My Hadoop courses are based on Vagrant so that you can practice and destroy your virtual environment before applying the installation onto real servers/VMs.

.

For those with little or no knowledge of the Hadoop eco system
Udemy course : Big Data Intro for IT Administrators, Devs and Consultants

.

I would first practice with Vagrant so that you can carve out a virtual environment on your local desktop. You don't want to corrupt your physical servers if you do not understand the steps or make a mistake.
Udemy course : Real World Vagrant For Distributed Computing

.

I would then, on the virtual servers, deploy Cloudera Manager plus agents. Agents are the guys that will sit on all the slave nodes ready to deploy your Hadoop services
Udemy course : Real World Vagrant - Automate a Cloudera Manager Build

.

Then deploy the Hadoop services across your cluster (via the installed Cloudera Manager in the previous step). We look at the logic regarding the placement of master and slave services.
Udemy course : Real World Hadoop - Deploying Hadoop with Cloudera Manager

.

If you want to play around with HDFS commands (Hands on distributed file manipulation).
Udemy course : Real World Hadoop - Hands on Enterprise Distributed Storage.

.

You can also automate the deployment of the Hadoop services via Python (using the Cloudera Manager Python API). But this is an advanced step and thus I would make sure that you understand how to manually deploy the Hadoop services first.
Udemy course : Real World Hadoop - Automating Hadoop install with Python!

.

There is also the upgrade step. Once you have a running cluster, how do you upgrade to a newer hadoop cluster (Both for Cloudera Manager and the Hadoop Services).
Udemy course : Real World Hadoop - Upgrade Cloudera and Hadoop hands on

Real World Vagrant – Automate a Cloudera Manager Build

Build a Distributed Cluster of Cloudera Manager and any number of Cloudera Manager Agent nodes with a single command!

Created by Toyin Akin - Big Data Engineer, Capital Markets FinTech Developer

"]

Students: 382, Price: $89.99

Students: 382, Price:  Paid

Note : This course is built on top of the "Real World Vagrant For Distributed Computing - Toyin Akin" course

"NoSQL", "Big Data", "DevOps" and "In Memory Database"
technology are a hot and highly valuable skill to have – and this
course will teach you how to quickly create a distributed environment
for you to deploy these technologies on. 

A combination of VirtualBox and Vagrant will transform your desktop
machine into a virtual cluster. However this needs to be configured
correctly. Simply enabling multinode within Vagrant is not good enough.
It needs to be tuned. Developers and Operators within large enterprises,
including investment banks, all use Vagrant to simulate Production
environments. 

After all, if you are developing against or operating a distributed
environment, it needs to be tested. Tested in terms of code deployed and
the deployment code itself.

You'll learn the same techniques these enterprise guys use on your own Microsoft Windows computer/laptop.

Vagrant provides easy to configure, reproducible, and portable work
environments built on top of industry-standard technology and controlled
by a single consistent workflow to help maximize the productivity and
flexibility of you and your team.

This course will use VirtualBox to carve out your virtual
environment. However the same skills learned with Vagrant can be used to
provision virtual machines on VMware, AWS, or any other provider.

If you are a developer,
this course will help you will isolate dependencies and their
configuration within a single disposable, consistent environment,
without sacrificing any of the tools you are used to working with
(editors, browsers, debuggers, etc.). Once you or someone else creates a
single Vagrantfile, you just need to vagrant up and everything is
installed and configured for you to work. Other members of your team
create their development environments from the same configuration. Say
goodbye to "works on my machine" bugs.

If you are an operations engineer,
this course will help you build a disposable environment and consistent
workflow for developing and testing infrastructure management scripts.
You can quickly test your deployment scripts and more using local
virtualization such as VirtualBox or VMware. (VirtualBox for this
course). Ditch your custom scripts to recycle EC2 instances, stop
juggling SSH prompts to various machines, and start using Vagrant to
bring sanity to your life.

If you are a designer, this course will
help you with distributed installation of software in order for you to
focus on doing what you do best: design. Once a developer configures
Vagrant, you do not need to worry about how to get that software running
ever again. No more bothering other developers to help you fix your
environment so you can test designs. Just check out the code, vagrant
up, and start designing.

.

Here I present a curriculum as to the current state of my Cloudera courses.

My Hadoop courses are based on Vagrant so that you can practice and
destroy your virtual environment before applying the installation onto
real servers/VMs.

.

For those with little or no knowledge of the Hadoop eco system
Udemy course : Big Data Intro for IT Administrators, Devs and Consultants

.

I would first practice with Vagrant so that you can carve out a
virtual environment on your local desktop. You don't want to corrupt
your physical servers if you do not understand the steps or make a
mistake.
Udemy course : Real World Vagrant For Distributed Computing

.

I would then, on the virtual servers, deploy Cloudera Manager plus
agents. Agents are the guys that will sit on all the slave nodes ready
to deploy your Hadoop services
Udemy course : Real World Vagrant - Automate a Cloudera Manager Build

.

Then deploy the Hadoop services across your cluster (via the
installed Cloudera Manager in the previous step). We look at the logic
regarding the placement of master and slave services.
Udemy course : Real World Hadoop - Deploying Hadoop with Cloudera Manager

.

If you want to play around with HDFS commands (Hands on distributed file manipulation).
Udemy course : Real World Hadoop - Hands on Enterprise Distributed Storage.

.

You can also automate the deployment of the Hadoop services via
Python (using the Cloudera Manager Python API). But this is an advanced
step and thus I would make sure that you understand how to manually
deploy the Hadoop services first.
Udemy course : Real World Hadoop - Automating Hadoop install with Python!

.

There is also the upgrade step. Once you have a running cluster, how
do you upgrade to a newer hadoop cluster (Both for Cloudera Manager and
the Hadoop Services).
Udemy course : Real World Hadoop - Upgrade Cloudera and Hadoop hands on

CCA175 Spark & Hadoop Developer Exam Practice Sets

Practice sets to boost your confidence and get ready for Cloudera CCA175 certification!

Created by Samir Pal - Sr. Big Data Engineer

"]

Students: 67, Price: $19.99

Students: 67, Price:  Paid

Note: This course is personally designed from my own experience and has no affiliation with any person or organization. This practice course and its videos, documents and other associated content has been produced by myself. This is an unofficial content and has no relation with Cloudera or anyone else. Content in this practice sets are solely meant to help student and build confidence for the real Cloudera CCA175 Spark & Hadoop Developer Exam.

Course Descriptions:

This course is designed for Cloudera CCA175 Spark & Hadoop Developer Exam practice. To practice this sets all you need is Cloudera Quick Start VM with Spark 2.4 or higher installed. This practice course has scenario based problems, the way you might expect to see in real exam. Each set has time clock. At the beginning of each sets you're given instructions how to prepare the input data for all the problems that you'll be solving. You're given expected sample output, wherever possible. All problems are explained with step by step solutions and guided you how you should validate the results to make sure you answered the problem covering the output requirements. Each practice sets are thoughtfully designed to cover what you might encounter in real exam. This course will surely boost your confidence and help you appear for the exam without fear. In each practice sets I've covered all below topics of the latest CCA175 exam curriculum.

Transform, Stage, and Store

Convert a set of data values in a given format stored in HDFS into new data values or a new data format and write them into HDFS.

  • Load data from HDFS for use in Spark applications

  • Write the results back into HDFS using Spark

  • Read and write files in a variety of file formats

  • Perform standard extract, transform, load (ETL) processes on data using the Spark API

Data Analysis

Use Spark SQL to interact with the metastore programmatically. Generate reports by using queries against loaded data.

  • Use metastore tables as an input source or an output sink for Spark applications

  • Understand the fundamentals of querying datasets in Spark

  • Filter data using Spark

  • Write queries that calculate aggregate statistics

  • Join disparate datasets using Spark

  • Produce ranked or sorted data