Recruitment Website Management

Tech Stack: Pyspark, Cassandra DB, MySQL, Jupyter Notebook, Kafka, ETL.
Github URL: Project Link |

Summary: Collect, process and store data for the recruitment website system. Building data flow to ETL data from the server (Data lake) into Data Warehouse for analysis.

Batch Processing: Data Lake => ETL Scripts => Data Warehouse => Analysis and reporting

Building data lake contains logs data from server, using Cassandra DB to store logs data.
Use PySpark to connect to Data Lake and then build ETL Flow to put data into Data Warehouse.
Data in Data Warehouse will be used to analyze and query for later purposes.

Stream Processing: Data Lake => ETL Scripts => Kafka Topics => Data Warehouse => Analysis and reporting.

Using Cassandra DB to store logs data.
Build ETL Scripts to put data into Kafka sources.
Config Kafka Sink to automatically push data into MySQL.