Intuit
Complete Financial Confidence
Scaling with Databricks to Run Thousands of Data Pipelines: Design and Architecture Best Practices
Superglue Pipelines, a self-serve platform for data analysts at intuit uses a homegrown ETL framework called QuicKETL, a configuration driven framework to define and execute Spark and Presto ETL workflows. Come join us to hear our journey to learn how we scaled our platform leveraging databricks to run thousands of spark workloads.
Here are some of Architecture considerations you should be making:
- Capacity planning : Best practices while designing your cloud environment with Databricks.
- Data Compatibility : Data formats, commit protocols with EMR/Athena/Databricks.
- Databricks autoscaling : How to get the best out of Databricks autoscaling and related configurations to tune specific jobs.
- Default configuration : Default configurations, node sizing and instance types.
- Databricks API’s considerations.
- Specific issues to look out for.
Benefits from Session:
- Things to consider when planning for a production workload in Databricks.
- Best practices while designing the architecture to leverage Databricks at scale.
- How to handle compatibility challenges with EMR/Athena with Databricks.