Hadoop Administration

  1. Home
  2. »
  3. Hadoop Administration

Hadoop Administration

  • Introduction to RPA
  • Blue Prism’s Robotic Automation
  • Process Studio
  • Process Flow
  • Inputs and Outputs
  • Business Objects
  • Object Studio
  • Overview of Error and Case Management
  • Case Management
  • Additional Features
  • Advanced Features
  • Consolidation Exercise
  • Further Application Types
  • Added Features
  • Interview Preparation

Topics Covered

Hadoop Administration​

Big Data

  • What is Big Data?
  • Where Big Data is coming from and use cases?
  • What are 3 V’s of Big Data?
  • What are the challenges in Big Data Storage & Access?

Hadoop

  • What is Hadoop?
  • What is Hadoop History?
  • What are Hadoop distributions?
  • What are Hadoop components?
  • Hadoop Architecture

HDFS

  • Understanding Hadoop Distributed File System (HDFS)
  • HDFS Features
  • HDFS Design Assumptions
  • File Systems supported by Hadoop
  • How file is stored on HDFS?
  • How Metadata is maintained in Hadoop?
  • Check Pointing Mechanism
  • Metadata Memory Allocation
  • Communication between NameNode and DataNode
  • Anatomy of a File Write into HDFS
  • Anatomy of a File Read from HDFS
  • Hadoop Replication
  • HDFS Block Replication Strategy
  • How to deal with Data Corruption?
  • HDFS Rebalancing & Space Reclamation
  • Compression Formats supported by Hadoop
  • Lab:
    • Understanding Hadoop Installation Prerequisites
    • Hadoop 2x installation from scratch
    • Understanding VERSION, FSImage, Editlog
    • Hadoop Admin Commands (FSCK & Block Scanner Report)
    • HDFS Replication (by XML file, by Host, by individual file)
    • Increase & Decrease Replication
    • Hadoop Rack Awareness
    • Default Hadoop Settings

Hadoop Ecosystem Tools

  • Introduction to Sqoop
  • Introduction to Pig
  • Introduction to Hive
  • Introduction to HBase
  • Introduction to Oozie
  • And introduction to other important ecosystem tools

Real Time Concepts

  • Day to Day real-time Admin Activities
  • Demonstration from Hortonworks cluster
  • Frequently Occurring real-time Issues from real-time clusters
  • Roles and Responsibilities
  • Building resume for 2+ year experienced Hadoop admin
  • Real-time interview questions and answers

Map Reduce

  • Map Reduce Introduction
  • How Map Reduce works?
  • Communication between JobTracker and TaskTracker
  • Anatomy of a Map Reduce Job Submission
  • Hadoop Schedulers
    • FIFO Scheduler
    • Fair Scheduler
    • Capacity Scheduler
  • Lab:
    • Setting up Mappers & Reducers
    • Setting up Fair Scheduler
    • Setting up Capacity Scheduler
    • Setting up topology
    • Setting up Logs and Logging mechanism

Install and Configure Cloudera or Hortonworks

  • Understand the minimum hardware and software requirements
  • Understand the Cloudera/Ambari Architecture
  • Understand how to install CDH using Cloudera Manager/ Ambari
  • Understand differences between master and slave services
  • Understand complete deployment layout
  • Understand how to configure and manage different services
  • Understand different configuration parameters

Monitor and Administering Clusters

  • Monitor using the CM UI
  • Monitor using the Ambari UI
  • Monitor using the MCS UI
  • Commission and Decommission of nodes
  • Back up and recover Hadoop data
  • Use Hadoop snapshots – hands on
  • Understand rack awareness and topology
  • Understand NameNode high availability
  • Understand ResourceManager high availability
  • Use the “hdfs haadmin” commands

Hadoop 2.X

  • Hadoop 2.X Architecture
  • Difference between Hadoop 1.X and Hadoop 2.X
  • Understand the architecture of YARN
  • Understand the components of the YARN ResourceManager
  • Demonstrate the relationship between NodeManagers and ApplicationMasters
  • Demonstrate the relationship between ResourceManagers and ApplicationMasters
  • Explain the relationship between Containers and ApplicationMasters
  • Job Flow in YARN
  • Namenode High Availability
    • Using Shared Edits
    • Using Zookeeper Quorum
  • Lab:
    • Namenode High Availability using Shared Edits
    • Namenode High Availability using Journal Nodes
    • Resource Manager High Availability

Cluster Planning

  • Understanding Hardware Components
    • Master Hardware
    • Slave Hardware
    • CPU
    • I/O
    • Network
  • Plan your cluster growth
  • Managing Users & Groups
  • Cluster sharing across multiple use cases