Databases Project🔗
This project is detailed in the ETL class.
You are part of a 4-person data engineering team at a startup, tasked with designing and implementing an ETL/ELT pipeline. Your assignment is to submit a 2-4 page report detailing the choices made for the ETL/ELT pipeline and to provide a demo of an example database. Your startup ideas will be defined in the AI Business Models class; the goal of this project is to make a proof-of-concept of the data pipeline for part of your startup.
In your report, you need to clearly explain and justify your decisions for each phase of the pipeline:
-
Extract (E): Identify and explain where the data is coming from. Discuss the sources and why they were chosen.
-
Transform (T): Explain how the data is being transformed. Describe the processes, tools, and techniques used to clean, aggregate, or modify the data to make it useful for its intended purpose.
-
Load (L): Detail how the data is loaded into the system, how it is stored, and how it will be used or queried. Discuss the database or storage options chosen, and explain how the data will be utilized by the organization or application.
Along with the report, you are expected to provide a demo of an example database. You can use PostgreSQL, MongoDB, or another database system of your choice. The demo should include:
- Documented scripts to load and manipulate example data that demonstrates the choices made for the ETL pipeline.
- The data used in the demo does not need to be exhaustive, but it should be sufficient to illustrate the key decisions in the ETL process.
Grading Criteria:🔗
- Report Rigor (6 points): Depth and thoroughness in explaining your ETL/ELT choices.
- Report Clarity (6 points): How clearly and effectively your report communicates the ETL/ELT pipeline.
- Demo Data (4 points): Appropriateness and accuracy of the example data used in the demo.
- Demo Manipulation (4 points): Functionality and quality of the data manipulation demonstrated in the example.
Deadline:🔗
- There will be a 3 hour work session on this project on Sep 30 2025
- The report and demo must be submitted by October 23, 2025, end of day to the LMS.