Proof of concept example - Redshift to snowflake migration
A large UK retailer asked us to investigate how their existing data platform can be optimised.
The platform was using Redshift as a data warehouse and had over 100 ELT processes running on it every day. Alongside these processes, several teams were running their daily analytics, BI and other (ad-hoc) queries to feed reports, dashboards, CRM and various insight-generating activities.
The platform was becoming too slow and unreliable for all these tenants. ELT was taking longer and longer to run, queries were waiting in queues for many minutes, and crucial daily reporting was getting in as late as 3pm as opposed to early morning when it is most needed.
Infinite Lambda engineers conducted an in-depth analysis of the situation and discovered over a dozen points of improvement and consideration.
For example, Redshift tables with billions of records had no distribution / sort keys, or when they had them, often they were not picked to best serve the joins and group-bys of the heaviest queries. The ELT processes used INSERT statements to add new records, even when there were dozens of millions of records being added to a table (a COPY command would have introduced a 10x speed increase due to parallelisation). Many tables were never being VACUUMed after inserts / deletes.
To top it all, the client had a shortage of data engineering skills and, even though our analysis was able to pinpoint exactly what processes, tables and scripts need improvement, there was not enough capacity to make these changes and then introduce the required ongoing maintenance.
Clearly, the problem stemmed from a lack of engineering capacity - there were simply not enough data engineers to maintain this complex data warehouse while developing new ELT processes and solving new business needs.
Furthermore, a large amount of frustration was building up in the business due to the platform's lack of performance. The business was becoming more and more data-driven, and the platform simply didn't have the performance and concurrency capabilities to keep up.
To solve these issues, the technical leadership team decided to work with Infinite Lambda data engineers towards prototyping a migration to the Snowflake data warehouse. Snowflake requires significantly less maintenance due to its ability to automatically pick the most efficient setup for a data table, has better vertical and horizontal scaling than Redshift, and separates compute from storage. This allows ELT processes to run on separate compute clusters to analytics, and separate analytics teams to not step on each other's toes.
Infinite Lambda engineers picked the two most complicated, data-heavy ELT pipelines the client had and worked to migrate them to Snowflake. This involved setting up Snowflake, re-shaping the SQL scripts to be Snowflake-compliant, rewriting the ELT scripts, recreating the schemas and tables and populating them with all the respective data from Redshift. In the end, we were able to demonstrate that the client will gain a 12x performance on their ELT alone, while actually reducing the cost of compute and storage and allowing reporting and analytics processes to execute much faster.
Have a similar use case? Contact us below.