Lab: Apache Beam Getting Started

Pipeline 1 - Simple Ingest Data Pipeline

pipeline1

Notebook: 01-sa-ingest-data-pipeline.ipynb

Pipeline 2 - Wordcount

It demonstrates a simple pipeline that uses the Direct Runner to read from a text file, apply transforms to tokenize and count the words, and write the data to an output text file.

pipeline2

Key Concepts:

Creating the Pipeline
Applying transforms to the Pipeline
Reading input
Applying ParDo transforms
Applying SDK-provided transforms (in this example: Count)
Writing output
Running the Pipeline

Notebook: 02-sa-wordcount-pipeline.ipynb

Apache Beam Basic Operations

In this tutorial, we will learn about:

Create and print input data
Read data from files
Write data into files
Read data from SQLite database
Map, FlatMap, Reduce, and Combine functions

Notebook: 03-sa-basic-operations.ipynb

Windowing

In this tutorial, we will learn about:

Global windows
Fixed-time windows
Sliding-time windows
Session windows

Notebook: 04-sa-windowing.ipynb

Dataframes

In this tutorial, we will learn about:

Pandas dataframe to Beam Dataframe
Pandas dataframe to PCollections
Beam Dataframe to Pandas dataframe
PCollections to Pandas dataframe
Beam Dataframe to PCollections
PCollections to Beam Dataframe

Notebook: 05-sa-dataframes.ipynb

Files

├── [ 22K]  01-sa-ingest-data-pipeline.ipynb
├── [ 21K]  02-sa-wordcount-pipeline.ipynb
├── [ 26K]  03-sa-basic-operations.ipynb
├── [ 29K]  04-sa-windowing.ipynb
├── [ 18K]  05-sa-dataframes.ipynb
├── [ 113]  Makefile
├── [2.0K]  README.md
├── [158K]  data
│   ├── [ 62K]  kinglear.txt.zip
│   ├── [8.0K]  moon-phases.db
│   ├── [ 529]  penguins.csv
│   ├── [ 121]  sample1.txt
│   ├── [  72]  sample2.txt
│   ├── [ 160]  solar_events.csv
│   └── [ 87K]  sp500.csv.zip
├── [115K]  output
│   ├── [ 66K]  pipe2-00000-of-00001
│   ├── [  76]  result.txt-00000-of-00001
│   ├── [ 153]  sample-00000-of-00001.txt
│   └── [ 48K]  wordcount-00000-of-00001
└── [5.4K]  src
    ├── [3.5K]  pipeline1.py
    └── [1.9K]  pipeline2.py

 397K used in 3 directories, 20 files

Lab: Apache Beam Getting Started

Pipeline 1 - Simple Ingest Data Pipeline​

Pipeline 2 - Wordcount​

Apache Beam Basic Operations​

Windowing​

Dataframes​

Files​

Notebooks​

Pipeline 1 - Simple Ingest Data Pipeline

Pipeline 2 - Wordcount

Apache Beam Basic Operations

Windowing

Dataframes

Files

Notebooks