Skip to main content

Resources

Courses

Python

  1. Learn Python for free
  2. https://mode.com/python-tutorial

SQL

  1. Learn SQL
  2. MySQL Tutorial

NoSQL

  1. Getting Started with Amazon DynamoDB

Data Warehouses

  1. Data Warehouse Tutorial for Beginners: Learn Basic Concepts

PySpark

  1. Spark with Python (PySpark) Tutorial For Beginners

Software Engineering

  1. Git: Become an Expert in Git & GitHub in 4 Hours

DevOps

  1. CI/CD Pipeline: Learn with Example

Blog Posts and Journals

  1. The Modern Data Stack Repository
  2. Medium Blog Posts - Data Engineering
  3. Start Data Engineering Blog Posts
  4. High Scalability
  5. The GitHub Blog
  6. Engineering at Quora
  7. Yelp Engineering Blog
  8. Twitter Engineering
  9. Facebook Engineering
  10. Yammer Engineering
  11. Etsy Code as Craft
  12. Foursquare Engineering Blog
  13. Airbnb Engineering
  14. WebEngage Engineering Blog
  15. LinkedIn Engineering
  16. The Netflix Tech Blog
  17. BankSimple Simple Blog
  18. Square The Corner
  19. SoundCloud Backstage Blog
  20. Flickr Code
  21. Instagram Engineering
  22. Dropbox Tech Blog
  23. Cloudera Developer Blog
  24. Bandcamp Tech
  25. Oyster Tech Blog
  26. THE REDDIT BLOG
  27. Groupon Engineering Blog
  28. Songkick Technology Blog
  29. Google AI Blog
  30. Google Developers Blog
  31. Pinterest Engineering Blog
  32. Twilio Engineering Blog
  33. Bitly Engineering Blog
  34. Uber Engineering Blog
  35. Godaddy Engineering
  36. Splunk Blog
  37. Coursera Engineering Blog
  38. PayPal Engineering Blog
  39. Nextdoor Engineering Blog
  40. Booking.com Development Blog
  41. Microsoft Engineering Blog
  42. Scalyr Engineering Blog
  43. Myntra Engineering Blog
  44. Fastly Blog
  45. AWS Architecture Blog
  46. Lyft Engineering Blog
  47. Wish Engineering
  48. Doordash Engineering
  49. SnowFlake Blog
  50. Palantir Blog
  51. Awesome Data Engineering

Data Engineering

  1. 97 Things Every Data Engineer Should Know
  2. Data Engineering with AWS [code]
  3. Data Engineering with Google Cloud Platform [code]
  4. Scalable Data Streaming with Amazon Kinesis [code]
  5. Fundamentals of Data Engineering
  6. Designing Data-Intensive Applications [code]
  7. Data Engineering with Python [code]
  8. Simplifying Data Engineering and Analytics with Delta [code]
  9. Azure Data Engineering Cookbook [code]
  10. Data Engineering with Apache Spark, Delta Lake, and Lakehouse [code]
  11. Data Pipelines Pocket Reference [code]
  12. Serverless Analytics with Amazon Athena [code]
  13. Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications [code]
  14. Apache Spark 3 for Data Engineering and Analytics with Python [code]
  15. Data Pipelines with Apache Airflow

Spark

  1. Mastering Big Data Analytics with PySpark [code]
  2. PySpark Cookbook [code]
  3. Learning Spark, 2nd Edition [code] [Alternative]
  4. Spark: The Definitive Guide [code]
  5. Spark Programming in Python for Beginners with Apache Spark 3 [code]
  6. Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library
  7. Data Algorithms with Spark [code]
  8. Scaling Machine Learning with Spark
  9. Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications
  10. Apache Spark 3 Advance Skills for Cracking Job Interviews
  11. Apache Spark 3 for Data Engineering and Analytics with Python [code]
  12. Advanced Analytics with PySpark [code]
  13. Spark in Action, Second Edition [code]
  14. High Performance Spark [code]
  15. Real-Time Stream Processing Using Apache Spark 3 for Python Developers [code]

Hadoop

  1. The Ultimate Hands-On Hadoop [code]
  2. Hadoop: The Definitive Guide, 4th Edition [code]
  3. Mastering Hadoop 3 [code]
  4. Sams Teach Yourself Hadoop in 24 Hours
  5. Modern Big Data Processing with Hadoop [code]
  6. Moving Hadoop to the Cloud [code]
  7. Hadoop with Python [code]

Python

  1. Fluent Python, 2nd Edition[code]
  2. Python Crash Course, 2nd Edition
  3. Introducing Python, 2nd Edition[code]
  4. Python Workout[code]
  5. Python for Data Analysis, 3rd Edition[code]
  6. Python Distilled
  7. Python in a Nutshell, 4th Edition[code]
  8. Robust Python[code]
  9. Automate the Boring Stuff with Python, 2nd Edition
  10. Expert Python Programming - Fourth Edition[code]
  11. Python Data Science Handbook, 2nd Edition[code]
  12. Python Crash Course, 3rd Edition
  13. Python for DevOps[code]
  14. Hypermodern Python Tooling
  15. Dead Simple Python
  16. Clean Code in Python - Second Edition[code]
  17. Python for Programmers[code]
  18. Beyond the Basic Stuff with Python
  19. Python Object-Oriented Programming - Fourth Edition[code]
  20. Modern Python Cookbook - Second Edition[code]
  21. Learn Python Programming - Third Edition[code]
  22. Advanced Python Programming - Second Edition[code]
  23. Pandas for Everyone: Python Data Analysis, 2nd Edition
  24. Pandas in Action
  25. Hands-On Data Analysis with Pandas - Second Edition[code]
  26. The Pandas Workshop[code]

Useful articles

Talks

Algorithms & Data Structures

SQL

Programming

Databases

Distributed Systems

Books

Courses

Blogs

  • Martin Kleppmann author of Designing Data-Intensive Application
  • BaseDS by Vaidehi Joshi about Distributed Systems

Tools

  • Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
  • Apache Spark is a unified analytics engine for large-scale data processing
  • Apache Kafka is a distributed streaming platform
  • Luigi is a Python package that helps you build complex pipelines of batch jobs.
  • Dagster.io is a system for building modern data applications.
  • Prefect includes everything you need to create and run data applications.
  • Metaflow build and manage real-life data science projects with ease
  • lakeFS build repeatable, atomic and versioned data lake operations – from complex ETL jobs to data science and analytics.

Cloud Platforms

Communities

Data Engineering Jobs

Other

Newsletters & Digests