Hadoop Administration
- Home
- »
- Hadoop Administration
Hadoop Administration
- Introduction to RPA
- Blue Prism’s Robotic Automation
- Process Studio
- Process Flow
- Inputs and Outputs
- Business Objects
- Object Studio
- Overview of Error and Case Management
- Case Management
- Additional Features
- Advanced Features
- Consolidation Exercise
- Further Application Types
- Added Features
- Interview Preparation
Topics Covered
Hadoop Administration
Big Data
- What is Big Data?
- Where Big Data is coming from and use cases?
- What are 3 V’s of Big Data?
- What are the challenges in Big Data Storage & Access?
Hadoop
- What is Hadoop?
- What is Hadoop History?
- What are Hadoop distributions?
- What are Hadoop components?
- Hadoop Architecture
HDFS
- Understanding Hadoop Distributed File System (HDFS)
- HDFS Features
- HDFS Design Assumptions
- File Systems supported by Hadoop
- How file is stored on HDFS?
- How Metadata is maintained in Hadoop?
- Check Pointing Mechanism
- Metadata Memory Allocation
- Communication between NameNode and DataNode
- Anatomy of a File Write into HDFS
- Anatomy of a File Read from HDFS
- Hadoop Replication
- HDFS Block Replication Strategy
- How to deal with Data Corruption?
- HDFS Rebalancing & Space Reclamation
- Compression Formats supported by Hadoop
- Lab:
- Understanding Hadoop Installation Prerequisites
- Hadoop 2x installation from scratch
- Understanding VERSION, FSImage, Editlog
- Hadoop Admin Commands (FSCK & Block Scanner Report)
- HDFS Replication (by XML file, by Host, by individual file)
- Increase & Decrease Replication
- Hadoop Rack Awareness
- Default Hadoop Settings
Hadoop Ecosystem Tools
- Introduction to Sqoop
- Introduction to Pig
- Introduction to Hive
- Introduction to HBase
- Introduction to Oozie
- And introduction to other important ecosystem tools
Real Time Concepts
- Day to Day real-time Admin Activities
- Demonstration from Hortonworks cluster
- Frequently Occurring real-time Issues from real-time clusters
- Roles and Responsibilities
- Building resume for 2+ year experienced Hadoop admin
- Real-time interview questions and answers
Map Reduce
- Map Reduce Introduction
- How Map Reduce works?
- Communication between JobTracker and TaskTracker
- Anatomy of a Map Reduce Job Submission
- Hadoop Schedulers
- FIFO Scheduler
- Fair Scheduler
- Capacity Scheduler
- Lab:
- Setting up Mappers & Reducers
- Setting up Fair Scheduler
- Setting up Capacity Scheduler
- Setting up topology
- Setting up Logs and Logging mechanism
Install and Configure Cloudera or Hortonworks
- Understand the minimum hardware and software requirements
- Understand the Cloudera/Ambari Architecture
- Understand how to install CDH using Cloudera Manager/ Ambari
- Understand differences between master and slave services
- Understand complete deployment layout
- Understand how to configure and manage different services
- Understand different configuration parameters
Monitor and Administering Clusters
- Monitor using the CM UI
- Monitor using the Ambari UI
- Monitor using the MCS UI
- Commission and Decommission of nodes
- Back up and recover Hadoop data
- Use Hadoop snapshots – hands on
- Understand rack awareness and topology
- Understand NameNode high availability
- Understand ResourceManager high availability
- Use the “hdfs haadmin” commands
Hadoop 2.X
- Hadoop 2.X Architecture
- Difference between Hadoop 1.X and Hadoop 2.X
- Understand the architecture of YARN
- Understand the components of the YARN ResourceManager
- Demonstrate the relationship between NodeManagers and ApplicationMasters
- Demonstrate the relationship between ResourceManagers and ApplicationMasters
- Explain the relationship between Containers and ApplicationMasters
- Job Flow in YARN
- Namenode High Availability
- Using Shared Edits
- Using Zookeeper Quorum
- Lab:
- Namenode High Availability using Shared Edits
- Namenode High Availability using Journal Nodes
- Resource Manager High Availability
Cluster Planning
- Understanding Hardware Components
- Master Hardware
- Slave Hardware
- CPU
- I/O
- Network
- Plan your cluster growth
- Managing Users & Groups
- Cluster sharing across multiple use cases
Copyright © 2024 IngeniousFusionTek | All Rights Reserved