課程目錄: 基于Apache Spark的大數(shù)據(jù)可擴展機器學習培訓

        4401 人關(guān)注
        (78637/99817)
        課程大綱:

        基于Apache Spark的大數(shù)據(jù)可擴展機器學習培訓

         

         

         

        Week 1: Introduction

        This is an introduction to Apache Spark.

        You'll learn how Apache Spark internally works and how to use it for data processing.

        RDD, the low level API is introduced in conjunction with parallel programming / functional programming.

        Then, different types of data storage solutions are contrasted. Finally,

        Apache Spark SQL and the optimizer Tungsten and Catalyst are explained.

        Week 2: Scaling Math for Statistics on Apache Spark

        Applying basic statistical calculations using the Apache

        Spark RDD API in order to experience how parallelization in Apache Spark works

        Week 3: Introduction to Apache SparkML

        Understand the concept of machine learning pipelines

        in order to understand how Apache SparkML works programmatically

        Week 4: Supervised and Unsupervised learning with SparkML

        Apply Supervised and Unsupervised Machine Learning tasks using SparkML