ACLED

Objective

Building data pipeline using Armed Conflict Location & Event Data Project (ACLED) API

Problem Statement

The Armed Conflict Location & Event Data Project (ACLED) collects real-time data on the locations, dates, actors, fatalities, and types of all reported political violence and protest events around the world. Let's imagine that you are working for a media organization and you want to bring processed version of the ACLED data to your analysts so that they can generate war and conflict related insights for their stories.

Your goal is to design and develop a data pipeline for it.

Architecture Diagram

ACLED process_flow drawio

What you'll build

Data pull from ACLED api
CSV data ingestion into Postgres database
Use PySpark for data transformation
Store the intermediary and final data into S3
Develop Data pipeline with Airflow
Create and Trigger the Glue crawler using Airflow operator
Run the analysis in Athena
Setting up and using connections and variables in Airflow
Send an Email to stakeholders about pipeline execution status

DAG Run instructions

Create a free account on ACLED
Get the API key and add in the DAG
Install the python libraries mentioned in the DAG
Set the environment variables - ACLED key and user, S3 bucket and AWS credentials
Create a Glue crawler named acled
Modify the DAG - update name, owner and change catchup, scheduling and other info as required
Run the DAG

NOTE

We first tried pipeline 1 but due to limitation in API requests, we decided to go with pipeline 2.

ACLED

Objective​

Problem Statement​

Architecture Diagram​

What you'll build​

DAG Run instructions​

Objective

Problem Statement

Architecture Diagram

What you'll build

DAG Run instructions