Big Data & Hadoop

About Course

Big Data Hadoop training will make you an expert in HDFS, MapReduce, Hbase, Hive, Pig, Yarn,, Flume and Sqoop

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.

Fee: INR 14,500/-

Duration: 45 hours

Mode of Training: Online

Upcoming Batch:

  • Online Batch: 16th August-7:00 PM – 10:00 PM, IST, (Mon-Fri)

PREREQUISITES:

A System with preinstalled Linux (Ubuntu) in Virtual Machine or in System with RAM – 4GB

Key Attractions:
  • Live installation 
  • Hadoop Ecosystem: Hive, Apache Pig Programming
  • Projects on various form of datasets
  • Live Projects
  • Online Streaming Projects

Course Content

  • STARTING UP WITH BIG DATA

    • Introduction to BIG Data
    • Use cases of Big Data
    • The Big data core components
    • Knowing the requirements, knowledge on Analyst job profile

    SETTING UP ENVIRONMENT

    • Setting up Linux environment on Commodity Computers
    • LINUX commands & various use cases

    CASE FOR HADOOP ECOSYSTEM

    • A Brief History of Hadoop
    • Core Hadoop Components
    • What are Hadoop Ecosystems
    • Integration Tools
    • Analysis Tools
    • Data storage &Retrieval Tools
    • Fundamental Concepts

    HADOOP INSTALLATION

    • Deployment Types
    • Installing Hadoop
    • Using Hadoop Manager for Easy Installation
    • Basic Configuration Parameters
    • Hands-On Exercise
    • Installing Hadoop in Pseudo-Distributed and Fully-Distributed Mode

    PLANNING YOUR HADOOP CLUSTER

    • General Planning Considerations
    • Choosing the Right Hardware
    • Network Considerations
    • Configuring Nodes

  • The Hadoop Distributed File System (HDFS)

    • HDFS Features
    • HDFS Design Assumptions
    • Overview of HDFS Architecture
    • Writing and Reading files
    • Data Node, Name Node Consideration
    • HDFS Federation
    • HDFS High-Availability
    • The Command-Line Interface
    • Basic Filesystem Operations
    • Hadoop File Systems
    • Interfaces
    • Setting up Various HDFS Commands
    • An Overview of HDFS Security

    CLUSTER MAINTENANCE

    • Checking HDFS status
    • Copying Data Between clusters
    • Adding & removing Cluster Notes
    • Rebalancing The Cluster
    • Name Mode Meta Data Backup

    MAPREDUCE

    • What Is Map Reduce?
    • Features of Map Reduce
    • Basic Map Reduce Concepts
    • Architectural Overview
    • MapReduceVersion2
    • Hands-On Exercise using Java.

    MANAGING & SCHEDULING JOBS

    • Managing & Running Jobs
    • The FIFO scheduler
    • The Fair Scheduler

    MAPREDUCE WITH EXAMPLE

    • A Weather Dataset
    • Data Format
    • Analyzing the Data with Unix Tools
    • Analyzing the Data with Hadoop
    • Map and Reduce
    • Java MapReduce
    • Scaling Out
    • Data Flow
    • Combiner Functions
    • Running a Distributed MapReduce Job
    • Hadoop Streaming
    • Compiling and Running

  • DEVELOPING AN MAPREDUCE APPLICATION

    • The Configuration API
    • Combining Resources
    • Variable Expansion
    • Configuring the Development Environment
    • Managing Configuration
    • Generic Options Parser, Tool, and Tool Runner
    • Writing a Unit Test
    • Mapper
    • Reducer
    • Running Locally on Test Data
    • Running a Job in a Local Job Runner
    • Testing the Driver
    • Running on a Cluster
    • Packaging
    • Launching a Job
    • The MapReduce Web UI
    • Retrieving the Results
    • Debugging a Job
    • Hadoop Logs
    • Tuning a Job
    • Profiling Tasks
    • MapReduce Workflows
    • Decomposing a Problem into MapReduce Jobs
    • JobControl

    HOW MAPREDUCE WORKS

    • Anatomy of a MapReduce Job Run
    • Classic MapReduce (MapReduce 1)
    • Failures
    • Failures in Classic MapReduce
    • Failures in YARN
    • Job Scheduling
    • The Capacity Scheduler
    • Shuffle and Sort
    • The Map Side
    • The Reduce Side
    • Configuration Tuning
    • Task Execution
    • The Task Execution Environment
    • Speculative Execution
    • Output Committers
    • Task JVM Reuse

    Skipping Bad Records

  • MAPREDUCE TYPES & FORMATS

    • MapReduce Types
    • The Default MapReduce Job
    • Input Formats
    • Input Splits and Records
    • Text Input
    • Binary Input
    • Multiple Inputs
    • Database Input (and Output)
    • Output Formats
    • Text Output
    • Binary Output
    • Multiple Outputs
    • Lazy Output
    • Database Output

    MAPREDUCE FEATURES

    • Counters
    • Built-in Counters
    • User-Defined Java Counters
    • User-Defined Streaming Counters
    • Sorting: Preparation, Partial Sort, Total Sort
    • Secondary Sort, Joins

  • PIG

    • Installing and Running Pig
    • Execution Types
    • Running Pig Programs
    • Grunt Shell
    • Creating pig shell scripts and running it
    • Pig Latin Editors
    • An Example

    HIVE

    • Installing Hive
    • The Hive Shell
    • An Example
    • Running Hive, Configuring Hive
    • Hive Services
    • Comparison with Traditional Databases
    • Schema on Read Versus Schema on Write
    • Updates, Transactions, and Indexes
    • SQLite DB / Derby DB

    HiveQuery Language.

  • INTRODUCTION TO HBASE & SQOOP

    • Overview of A NoSQL
    • HBase Overview, Architecture
    • Hbase: Admin: Test
    • Hbase data access
    • Sqoop Installation
    • Importing Data in Sqoop
    • Exporting Data in Sqoop
    • Practical Session on HBASE & SQOOP

    ZOOKEEPER & FLUME

    • Overview of Zookeeper
    • Overview of Flume
    • Installation of Flume
    • How does Flume Works
    • Connecting Flume to HDFS

  •  

    R – PROGRAMMING

    • Overview & Introduction
    • Installation
    • Project Development using R Programming

    PROJECTS AND QUERY:

    • Installing Hadoop
    • Interaction with Hadoop Distributed File System(HDFS)
    • WordCount of whole plane txt file using MapReduce
    • Interaction with Hive Query Language(HQL)
    • SQL & Derby Databases
    • Hive with MySQL DB
    • Writing PIG Scripts
    • Word Count with PIG
    • Handling Twitter Data using Flume
    • Working with NOSQL Databases
    • Finding average data value of month using National Temperature Data of year.
    • Finding hottest day of month using National Temperature Data of year
    • Twitter Data inject using Flume
    • Import & Export data to Scoop
    • Injecting Data into MySQL
    • Implementation of Zookeeper
    • Social Data Analytics using R Programming

 

 

 

 

 

 

 

 

 

 

 

  • the best institute for Big Data & Hadoop training in Hyderabad
  • the best training Institute for Big Data & and Hadoop training in Patna
  • big data and Hadoop training in Hyderabad
  • Hadoop and big data training in Patna
  • certified training on big data and Hadoop