Hadoop and MapReduceđź”—
In this class, we start with an overview of the Big Data ecosystem, contextualizing Hadoop, No-SQL Databases, and Business Intelligence tools. We then cover Hadoop and the HDFS in detail with a simple MapReduce example.
- Introduction to Big Data and its ecosystem (1h)
- What is Big Data?
- Legacy “Big Data” ecosystem
- Big Data use cases
- Big Data to Machine Learning
- Big Data platforms, Hadoop & Beyond (2h)
- Hadoop, HDFS and MapReduce,
- Datalakes, Data Pipelines
- From HPC to Big Data to Cloud and High Performance Data Analytics
- BI vs Big Data
- Hadoop legacy: Spark, Dask, Object Storage ...
It contains also a short interactive exercise using Python Map Reduce.