📄️ Serverless Streaming Data on AWS
https://youtu.be/abaVSFFdU-c
📄️ ACLED
Objective
📄️ Citi Bike Trip Histories
The goal of this capstone is to build an end-to-end data pipeline:
📄️ Air Pollution Pipeline
📄️ Airflow - City Traffic Drone
City Vehicle Trajactories Data extraction and warehousing for Traffic analysis
📄️ Global Historical Climatology Network Daily Data Pipeline
Objective
📄️ Building End to end data pipeline in AWS
Architecture Diagram
📄️ README
Objective
📄️ DigitalSkola
Objective
📄️ Disaster Response Pipeline
wall
📄️ Funflix
You are working as a data engineer in an Australian media company Funflix. You got the following requirements and tasks to solve.
📄️ Datalake Schema Correction
Objective
📄️ Log Analytics and Processing in Real-Time
Lab 1: Apache Flink on Amazon Kinesis Data Analytics
📄️ Streaming ETL pipeline with Apache Flink and Amazon Kinesis Data Analytics
We will create an Amazon Kinesis Data Analytics for Apache Flink application with Amazon Kinesis Data Streams as a source and a Amazon S3 bucket as a sink. Random data is ingested using Amazon Kinesis Data Generator. The Apache Flink application code performs a word count on the streaming random data using a tumbling window of 5 minutes. The generated word count is then stored in the specified Amazon S3 bucket. Amazon Athena is used to query data generated in the Amazon S3 bucket to validate the end results.
📄️ Kortex
Objectives
📄️ Lufthansa API
Objective
📄️ Movie Review Sentiment Analysis Pipeline
Build a pipeline that expresses the fact artist review sentiment and film review sentiment, based on the data provided by IMDb and TMDb.
📄️ Multi-touch Attribution
Abstract
📄️ Building Recommender System from Scratch
Overview
📄️ Reddit Submissions, Authors and Subreddits analysis
Problem Statement
📄️ AWS Kafka and DynamoDB for real time fraud detection
Problem Statement
📄️ Data Pipeline with dbt, Airflow and Great Expectations
In this project, we will learn how to combine the functions of three open source tools - Airflow, dbt and Great expectations - to build, test, validate, document, and orchestrate an entire pipeline, end to end, from scratch. We are going to load the NYC Taxi data into Redshift warehouse and then transform + validate the data using dbt and great expectations.
📄️ Smartcity
Files
📄️ Sparkify
Sparkify SQL Data Modeling with Postgres
📄️ Spotify
Implement Complete Data Pipeline Data Engineering Project using Spotify
📄️ Twitter data Topic Analysis and Realtime Sentiment Analysis
Problem Statement
📄️ US Immigration analysis and data pipeline
Problem Statement