Recruitment Website Management
- Tech Stack: Pyspark, Cassandra DB, MySQL, Jupyter Notebook, Kafka, ETL.
- Github URL: Project Link |
Summary: Collect, process and store data for the recruitment website system. Building data flow to ETL data from the server (Data lake) into Data Warehouse for analysis.
- Building data lake contains logs data from server, using Cassandra DB to store logs data.
- Use PySpark to connect to Data Lake and then build ETL Flow to put data into Data Warehouse.
- Data in Data Warehouse will be used to analyze and query for later purposes.
- Using Cassandra DB to store logs data.
- Build ETL Scripts to put data into Kafka sources.
- Config Kafka Sink to automatically push data into MySQL.
Batch Processing: Data Lake => ETL Scripts => Data Warehouse => Analysis and reporting
Stream Processing: Data Lake => ETL Scripts => Kafka Topics => Data Warehouse => Analysis and reporting.