課程目錄: 大數據基礎培訓

        4401 人關注
        (78637/99817)
        課程大綱:

        大數據基礎培訓

         

         

         

         

        Section 1: The basics of working with big data

        Understand the four V’s of Big Data (Volume, Velocity,

        and Variety); Build models for data; Understand the occurrence of rare events in random data.

        Section 2: Web and social networks

        Understand characteristics of the web and social networks;

        Model social networks; Apply algorithms for community detection in networks.

        Section 3: Clustering big data

        Clustering social networks; Apply hierarchical clustering; Apply k-means clustering.

        Section 4: Google web search

        Understand the concept of PageRank; Implement the basic; PageRank algorithm for strongly connected graphs;

        Implement PageRank with taxation for graphs that are not strongly connected.

        Section 5: Parallel and distributed computing using MapReduce

        Understand the architecture for massive distributed and parallel computing;

        Apply MapReduce using Hadoop; Compute PageRank using MapReduce.

        Section 6: Computing similar documents in big data

        Measure importance of words in a collection of documents;

        Measure similarity of sets and documents; Apply local sensitivity hashing to compute similar documents.

        Section 7: Products frequently bought together in stores

        Understand the importance of frequent item sets; Design association rules; Implement the A-priori algorithm.

        Section 8: Movie and music recommendations

        Understand the differences of recommendation systems; Design content-based recommendation systems;

        Design collaborative filtering recommendation systems.

        Section 9: Google's AdWordsTM System

        Understand the AdWords System; Analyse online algorithms in terms of competitive ratio; Use online matching to solve the AdWords problem.

        Section 10: Mining rapidly arriving data streams

        Understand types of queries for data streams; Analyse sampling methods for data streams;

        Count distinct elements in data streams; Filter data streams.