課程目錄:Hadoop for Developers and Administrators培訓
        4401 人關注
        (78637/99817)
        課程大綱:

           Hadoop for Developers and Administrators培訓

         

         

         

        Module 1. Introduction to Hadoop
        The Hadoop Distributed File System (HDFS)
        The Read Path and The Write Path
        Managing Filesystem Metadata
        The Namenode and the Datanode
        The Namenode High Availability
        Namenode Federation
        The Command-Line Tools
        Understanding REST Support
        Module 2. Introduction to MapReduce
        Analyzing the Data with Hadoop
        Map and Reduce Pattern
        Java MapReduce
        Scaling Out
        Data Flow
        Developing Combiner Functions
        Running a Distributed MapReduce Job
        Module 3. Planning a Hadoop Cluster
        Picking a Distribution and Version of Hadoop
        Versions and Features
        Hardware Selection
        Master and Worker Hardware Selection
        Cluster Sizing
        Operating System Selection and Preparation
        Deployment Layout
        Setting up Users, Groups, and Privileges
        Disk Configuration
        Network Design
        Module 4. Installation and Configuration
        Installing Hadoop
        Configuration: An Overview
        The Hadoop XML Configuration Files
        Environment Variables and Shell Scripts
        Logging Configuration
        Managing HDFS
        Optimization and Tuning
        Formatting the Namenode
        Creating a /tmp Directory
        Thinking Namenode High Availability
        The Fencing Options
        Automatic Failover Configuration
        Format and Bootstrap the Namenodes
        Namenode Federation
        Module 5. Understanding Hadoop I/O
        Data Integrity in HDFS
        Understanding Codecs
        Compression and Input Splits
        Using Compression in MapReduce
        The Serialization mechanism
        File-Based Data Structures
        The SequenceFile format
        Other File Formats and Column-Oriented Formats
        Module 6. Developing a MapReduce Application
        The Configuration API
        Setting Up the Development Environment
        Managing Configuration
        GenericOptionsParser, Tool, and ToolRunner
        Writing a Unit Test with MRUnit
        The Mapper and Reducer
        Running Locally on Test Data
        Testing the Driver
        Running on a Cluster
        Packaging and Launching a Job
        The MapReduce Web UI
        Tuning a Job
        Module 7. Identity, Authentication, and Authorization
        Managing Identity
        Kerberos and Hadoop
        Understanding Authorization
        Module 8. Resource Management
        What Is Resource Management?
        HDFS Quotas
        MapReduce Schedulers
        Anatomy of a YARN Application Run
        Resource Requests
        Application Lifespan
        YARN Compared to MapReduce 1
        Scheduling in YARN
        Scheduler Options
        Capacity Scheduler Configuration
        Fair Scheduler Configuration
        Delay Scheduling
        Dominant Resource Fairness
        Module 9. MapReduce Types and Formats
        MapReduce Types
        The Default MapReduce Job
        Defining the Input Formats
        Managing Input Splits and Records
        Text Input and Binary Input
        Managing Multiple Inputs
        Database Input (and Output)
        Output Formats
        Text Output and Binary Output
        Managing Multiple Outputs
        The Database Output
        Module 10. Using MapReduce Features
        Using Counters
        Reading Built-in Counters
        User-Defined Java Counters
        Understanding Sorting
        Using the Distributed Cache
        Module 11. Cluster Maintenance and Troubleshooting
        Managing Hadoop Processes
        Starting and Stopping Processes with Init Scripts
        Starting and Stopping Processes Manually
        HDFS Maintenance Tasks
        Adding a Datanode
        Decommissioning a Datanode
        Checking Filesystem Integrity with fsck
        Balancing HDFS Block Data
        Dealing with a Failed Disk
        MapReduce Maintenance Tasks
        Killing a MapReduce Job
        Killing a MapReduce Task
        Managing Resource Exhaustion
        Module 12. Monitoring
        The available Hadoop Metrics
        The role of SNMP
        Health Monitoring
        Host-Level Checks
        HDFS Checks
        MapReduce Checks
        Module 13. Backup and Recovery
        Data Backup
        Distributed Copy (distcp)
        Parallel Data Ingestion
        Namenode Metadata