6 Things You Need to Know Before Using Snowflake Data Warehouse
Snowflake is a leading cloud-based “software as a service” data warehousing solution. Because it was built for the cloud, there’s no hardware or software that you need to purchase and maintain.
Snowflake’s unique architecture and easy administration make it a workhorse for modern businesses. Serving as your organization’s “do it all” data lake and data warehouse, Snowflake can compress and compute large data sets for complex queries in a flash. Best yet, their “pay-for-what-you-use” model makes it a manageable solution for tight organizational budgets.
intro to snowflake
So how does Snowflake work, and what makes it unique? Here, we dive into six key characteristics of Snowflake that business leaders should know before implementing a data warehouse.
Separation of Compute and Storage
This is perhaps the biggest concept to understand when evaluating what Snowflake brings to the market and potentially your organization.
Snowflake consists of three separate layers:
- Cloud Services (access, security, optimizer)
- Compute (query processing)
- Data Storage
Each layer runs on separate resources but is accessible to the others. As such, each layer easily scales up or down independently to meet your performance and cost requirements as they change over time.
Traditional data warehouse systems are limited to a single storage layer, accessed by multiple nodes, and are often subject to bottlenecks and resource constraints. They must also be sized upfront based on the heaviest anticipated workload. This makes them traditionally more expensive, as most organizations do not have a clear understanding of their compute needs and end up paying for unneeded bandwidth. Conversely, Snowflake scales up or down at any time based on need, and you only pay for what you process.
By separating compute and storage, a Snowflake data warehouse enables organizations to pay for only the features they need while still offering robust scalability. Other technologies, such as Amazon Redshift, bundle these features together and offer significantly less flexibility.
Additionally, Snowflake handles tasks such as data compression automatically whereas other technologies require more manual administrative work to manage these tasks.
Snowflake refers to their compute layer as “virtual warehouses.” These are made of clusters of computing resources that include CPU, memory, and temporary storage. This middle layer is where you run your data processing. It is very flexible, as we will demonstrate later in this article.
Virtual warehouses can scale both vertically and horizontally. You can think of scaling vertically as increasing the size of the computer you are using for extra power to run a particular query. Meaning, you are able to run that query faster based on breadth. On the other side, scaling horizontally is like adding new sets of computers to be able to process multiple queries at the same time, adding depth of resources.
What’s great about Snowflake’s model is that both these functions turn on and off as needed to accommodate both your budget and workload needs.
See how this helps your total cost of ownership
Similar to other data technologies, a Snowflake “database” refers to a group of schemas, tables, and views. With a Snowflake data warehouse, a single database is accessible by any number of your virtual warehouses. This means that your organization could have a large ETL (extract, transform, load) warehouse that transforms your data, and then a separate reporting warehouse that creates reports and dashboards from the same data. The advantage here comes back to Snowflake’s separation of storage and compute. Leveraging the same database for multiple virtual data warehouses allows you to separate cost and usage by function.
Warehouse Scalability and Auto-Suspension
For many organizations, Snowflake’s main selling point is the cost savings. These cost savings come from the ability to scale up or down at sub-second speeds. With Snowflake’s automatic suspend feature, you will only billed for compute resources when you’re actively running queries. The remainder of the time, the data warehouse will be in idle mode, saving you money.
Take, for example, a company running a very large, complex daily ETL process. To quickly perform the transformations in under an hour, their Snowflake warehouse size is set to X-Large. At 16 credits per hour, this warehouse would cost 384 credits a day (see pricing section below for more details). With auto-suspension enabled, the cluster will turn off after five minutes of inactivity. This would reduce the daily credit cost from 384 credits to just 16.
Now you may be wondering, “What about report and analytical queries run throughout the day?”
In this example, let’s say that for the desired performance, reports and ad-hoc analytics can achieve reasonable performance with a Medium-size warehouse, or 4 credits per hour. To run this analytics warehouse during business hours would cost 40 credits per day. Again, by separating different levels of compute and usage, you can see significant cost savings. In this case, 56 credits per day compared to the original 384.
To further this idea, let’s say that report and analytical queries run, on average, once every five minutes during business hours with large periods of inactivity. With Snowflake’s auto-suspension set as low as one minute, there would be even more cost savings to the organization.
Snowflake provides a powerful user interface called a “worksheet”. This queries data and performs DDL (data definition language) and DML (data manipulation language) tasks. Some of Snowflake’s other features include:
- A user-friendly interface for querying data directly from the website
- A native tool
- An overview of database objects, data preview, execution results, and execution time breakdown (by compiling, executing, and data transfer views)
- Ability to quickly and easily change role, warehouse, database, and schema depending on needs
In contrast to other cloud data platforms that often have hidden costs and unclear usage quotas, Snowflake offers simple, upfront pricing based on just two items: data storage and compute resources.
Data storage prices range from $25-$40/TB per month, and it accrues daily. Compute usage is billed per-second, based on processing units – or, what Snowflake calls “credits”. The number of credits used is based on the size of the warehouse selected, as explained in our examples above. This provides organizations with control over their budget. Any constraints can be easily managed while also maximizing data warehouse performance. Snowflake requires a 60-second minimum when a warehouse is started, but no additional costs will be incurred while the warehouse is idle and not in use.
As of this writing, the cost per-second per-credit for Snowflake’s Standard edition is $0.00056. This means that the cost for a query to run for one minute on an X-Small Snowflake warehouse would be about $0.03.
What is the ROI of a Snowflake Data Warehouse?
When determining which warehouse size to use, Snowflake recommends starting with the smallest. From there, you can work your way up through larger sizes until you reach the optimum size for your performance needs and budget. In our testing, we found that even an X-Small warehouse performed well against a traditional SQL Server-based system. When working with our clients on their Snowflake data warehouse needs, many do not feel the need to upgrade in size once basic needs and data structure have been met.
summing up Snowflake
Snowflake offers some very real performance and cost-saving advantages over older, legacy systems. Its cloud data platform is worth considering if your organization is serious about tackling your data chaos and data processing pain points. Snowflake has created a product that gives users elasticity and control. It removes the headaches and limitations that often come with maintaining on-premises data systems. When implemented with a strong data strategy and focus on data structure, it can be a game-changer for your organization.
Eide Bailly’s approach to data warehousing