Resources
Courses
Python
- Learn Python for free
- https://mode.com/python-tutorial
SQL
- Learn SQL
- MySQL Tutorial
NoSQL
- Getting Started with Amazon DynamoDB
Data Warehouses
- Data Warehouse Tutorial for Beginners: Learn Basic Concepts
PySpark
- Spark with Python (PySpark) Tutorial For Beginners
Software Engineering
- Git: Become an Expert in Git & GitHub in 4 Hours
DevOps
- CI/CD Pipeline: Learn with Example
Blog Posts and Journals
- The Modern Data Stack Repository
- Medium Blog Posts - Data Engineering
- Start Data Engineering Blog Posts
- High Scalability
- The GitHub Blog
- Engineering at Quora
- Yelp Engineering Blog
- Twitter Engineering
- Facebook Engineering
- Yammer Engineering
- Etsy Code as Craft
- Foursquare Engineering Blog
- Airbnb Engineering
- WebEngage Engineering Blog
- LinkedIn Engineering
- The Netflix Tech Blog
- BankSimple Simple Blog
- Square The Corner
- SoundCloud Backstage Blog
- Flickr Code
- Instagram Engineering
- Dropbox Tech Blog
- Cloudera Developer Blog
- Bandcamp Tech
- Oyster Tech Blog
- THE REDDIT BLOG
- Groupon Engineering Blog
- Songkick Technology Blog
- Google AI Blog
- Google Developers Blog
- Pinterest Engineering Blog
- Twilio Engineering Blog
- Bitly Engineering Blog
- Uber Engineering Blog
- Godaddy Engineering
- Splunk Blog
- Coursera Engineering Blog
- PayPal Engineering Blog
- Nextdoor Engineering Blog
- Booking.com Development Blog
- Microsoft Engineering Blog
- Scalyr Engineering Blog
- Myntra Engineering Blog
- Fastly Blog
- AWS Architecture Blog
- Lyft Engineering Blog
- Wish Engineering
- Doordash Engineering
- SnowFlake Blog
- Palantir Blog
- Awesome Data Engineering
Data Engineering
- 97 Things Every Data Engineer Should Know
- Data Engineering with AWS [code]
- Data Engineering with Google Cloud Platform [code]
- Scalable Data Streaming with Amazon Kinesis [code]
- Fundamentals of Data Engineering
- Designing Data-Intensive Applications [code]
- Data Engineering with Python [code]
- Simplifying Data Engineering and Analytics with Delta [code]
- Azure Data Engineering Cookbook [code]
- Data Engineering with Apache Spark, Delta Lake, and Lakehouse [code]
- Data Pipelines Pocket Reference [code]
- Serverless Analytics with Amazon Athena [code]
- Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications [code]
- Apache Spark 3 for Data Engineering and Analytics with Python [code]
- Data Pipelines with Apache Airflow
Spark
- Mastering Big Data Analytics with PySpark [code]
- PySpark Cookbook [code]
- Learning Spark, 2nd Edition [code] [Alternative]
- Spark: The Definitive Guide [code]
- Spark Programming in Python for Beginners with Apache Spark 3 [code]
- Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library
- Data Algorithms with Spark [code]
- Scaling Machine Learning with Spark
- Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications
- Apache Spark 3 Advance Skills for Cracking Job Interviews
- Apache Spark 3 for Data Engineering and Analytics with Python [code]
- Advanced Analytics with PySpark [code]
- Spark in Action, Second Edition [code]
- High Performance Spark [code]
- Real-Time Stream Processing Using Apache Spark 3 for Python Developers [code]
Hadoop
- The Ultimate Hands-On Hadoop [code]
- Hadoop: The Definitive Guide, 4th Edition [code]
- Mastering Hadoop 3 [code]
- Sams Teach Yourself Hadoop in 24 Hours
- Modern Big Data Processing with Hadoop [code]
- Moving Hadoop to the Cloud [code]
- Hadoop with Python [code]
Python
- Fluent Python, 2nd Edition[code]
- Python Crash Course, 2nd Edition
- Introducing Python, 2nd Edition[code]
- Python Workout[code]
- Python for Data Analysis, 3rd Edition[code]
- Python Distilled
- Python in a Nutshell, 4th Edition[code]
- Robust Python[code]
- Automate the Boring Stuff with Python, 2nd Edition
- Expert Python Programming - Fourth Edition[code]
- Python Data Science Handbook, 2nd Edition[code]
- Python Crash Course, 3rd Edition
- Python for DevOps[code]
- Hypermodern Python Tooling
- Dead Simple Python
- Clean Code in Python - Second Edition[code]
- Python for Programmers[code]
- Beyond the Basic Stuff with Python
- Python Object-Oriented Programming - Fourth Edition[code]
- Modern Python Cookbook - Second Edition[code]
- Learn Python Programming - Third Edition[code]
- Advanced Python Programming - Second Edition[code]
- Pandas for Everyone: Python Data Analysis, 2nd Edition
- Pandas in Action
- Hands-On Data Analysis with Pandas - Second Edition[code]
- The Pandas Workshop[code]
Useful articles
Talks
Algorithms & Data Structures
SQL
Programming
Databases
Distributed Systems
Books
Courses
Blogs
- Martin Kleppmann author of Designing Data-Intensive Application
- BaseDS by Vaidehi Joshi about Distributed Systems
- Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
- Apache Spark is a unified analytics engine for large-scale data processing
- Apache Kafka is a distributed streaming platform
- Luigi is a Python package that helps you build complex pipelines of batch jobs.
- Dagster.io is a system for building modern data applications.
- Prefect includes everything you need to create and run data applications.
- Metaflow build and manage real-life data science projects with ease
- lakeFS build repeatable, atomic and versioned data lake operations – from complex ETL jobs to data science and analytics.
Communities
Data Engineering Jobs
Other
Newsletters & Digests