Recruitment Website Management

  • Tech Stack: Pyspark, Cassandra DB, MySQL, Jupyter Notebook, Kafka, ETL.
  • Github URL: Project Link |

Summary: Collect, process and store data for the recruitment website system. Building data flow to ETL data from the server (Data lake) into Data Warehouse for analysis.

    Batch Processing: Data Lake => ETL Scripts => Data Warehouse => Analysis and reporting

  • Building data lake contains logs data from server, using Cassandra DB to store logs data.
  • Use PySpark to connect to Data Lake and then build ETL Flow to put data into Data Warehouse.
  • Data in Data Warehouse will be used to analyze and query for later purposes.

  • Stream Processing: Data Lake => ETL Scripts => Kafka Topics => Data Warehouse => Analysis and reporting.

  • Using Cassandra DB to store logs data.
  • Build ETL Scripts to put data into Kafka sources.
  • Config Kafka Sink to automatically push data into MySQL.