Skip to content

Databases Project🔗

This project is detailed in the ETL class.

You are part of a 4-person data engineering team at a startup, tasked with designing and implementing an ETL/ELT pipeline. Your assignment is to submit a 2-4 page report detailing the choices made for the ETL/ELT pipeline and to provide a demo of an example database.

In your report, you need to clearly explain and justify your decisions for each phase of the pipeline:

  1. Extract (E): Identify and explain where the data is coming from. Discuss the sources and why they were chosen.

  2. Transform (T): Explain how the data is being transformed. Describe the processes, tools, and techniques used to clean, aggregate, or modify the data to make it useful for its intended purpose.

  3. Load (L): Detail how the data is loaded into the system, how it is stored, and how it will be used or queried. Discuss the database or storage options chosen, and explain how the data will be utilized by the organization or application.

Along with the report, you are expected to provide a demo of an example database. You can use PostgreSQL, MongoDB, or another database system of your choice. The demo should include:

  • Documented scripts to load and manipulate example data that demonstrates the choices made for the ETL pipeline.
  • The data used in the demo does not need to be exhaustive, but it should be sufficient to illustrate the key decisions in the ETL process.

Grading Criteria:🔗

  • Report Rigor (6 points): Depth and thoroughness in explaining your ETL/ELT choices.
  • Report Clarity (6 points): How clearly and effectively your report communicates the ETL/ELT pipeline.
  • Demo Data (4 points): Appropriateness and accuracy of the example data used in the demo.
  • Demo Manipulation (4 points): Functionality and quality of the data manipulation demonstrated in the example.

Deadline:🔗

  • The report and demo must be submitted by October 11, 2024, end of day to the LMS.