Snowflake uses Python to take on Teradata, Google BigQuery and AWS Redshift


Cloud-based data warehouse company Snowflake showed off at its annual Snowflake Summit on Tuesday a new set of tools and integrations to take on rival companies like Teradata and services like Google BigQuery and Amazon Redshift.

The new features, which include data access tools and support for Python on the company’s Snowpark application development system, are aimed at data scientists, data engineers and developers in the goal of accelerating their machine learning journey, thereby accelerating application development.

Snowpark, launched a year ago, is a dataframe-like development environment designed to allow developers to deploy their favorite serverless tools on Snowflake’s virtual warehouse compute engine. Python support is in public preview.

“Python is probably the most requested feature from our customers,” said Christian Kleinerman, senior vice president of product at Snowflake.

Python’s request makes sense, as it is a language of choice for data scientists, analysts say.

Snowflake is catching up on this front as rivals such as Teradata, Google BigQuery and Vertica already have Python support,” said Doug Henschen, principal analyst at Constellation Research.

In one of the updates announced at the summit, the company said it was adding a Streamlit integration for app development and iteration. Streamlit, which is an open source application framework in Python for machine learning engineering and data science teams to help visualize, modify and share data, was acquired by Snowflake in March.

The integration will allow users to remain within the Snowflake environment not only to access, secure and govern data, but also to develop data science applications to model and analyze data, said Tony Baer, ​​analyst Principal at dbInsights.

Snowflake launches Python-related integrations

Some of the other Python-related integrations include Snowflake Worksheets for Python, Large Memory Warehouses, and SQL Machine Learning.

Snowflake Worksheets for Python, which is in private preview, is designed to enable companies to develop pipelines, machine learning models and applications in the company’s web interface, dubbed Snowsight, the company said. , adding that it has capabilities such as code completion and custom logic generation.

In order to help data scientists and development teams perform memory-intensive operations such as feature engineering and model training on large datasets, the company said it is working on a feature called Large Memory Warehouses.

Currently in the development phase, Large Memory Warehouses will provide support for Python libraries through integration with the Anaconda data science platform, he added.

“Several rivals are configurable to support large-memory warehouses as well as Python functions and language support, so it’s Snowflake keeping up with market demands,” Henschen said.

Snowflake also offers SQL Machine Learning, starting with time series data, in private preview. The service will help companies integrate machine learning-based predictions and analytics into business intelligence applications and dashboards, the company said.

According to Henschen, many analytical database vendors have created machine learning models for running in the database.

“Snowflake’s raison d’etre starting with the analysis of time series data is [that it is] among the most popular machine learning analytics because it is about predicting future values ​​based on previously observed values,” Henschen said, adding that time series analysis has many use cases in the financial sector.

Snowflake updates enable increased access to data

With the logic that faster data access could lead to faster application development, Snowflake also introduced new features on Tuesday, including support for streaming data, Apache Iceberg tables in Snowflake, and external for on-site storage.

Support for streaming data, which is in private preview, will help break down the boundaries between streaming and batch pipelines with Snowpipe Streaming. Snowpipe is the company’s streaming data ingestion service.

The rationale for launching the feature, according to Henschen, is the strong interest in supporting low latency options, including near real-time and real-time streaming, and most vendors in this market have ticked the streaming box. .

“This feature gives engineering teams an integrated way to analyze the stream alongside historical data, so data engineers don’t have to tinker with something themselves. It’s a time saver,” said said Henschen.

In order to meet the demand for more open-source table formats, the company said it is developing Apache Iceberg Tables to run in its environment.

“Apache Iceberg is a very popular open-source table format and it is rapidly gaining traction for analytical data platforms. Table formats like Iceberg provide metadata that helps with performance consistency and scalability. Iceberg has also was recently adopted by Google for its Big Lake offering,” Henschen said. said.

Meanwhile, in an effort to keep its on-premises customers engaged while trying to get them to adopt its cloud data platform, Snowflake is introducing on-premises storage of external tables. Currently in private preview, the tool allows users to access their data in on-premises storage systems from companies such as Dell Technologies and Pure Storage, the company said.

“Snowflake had a ‘cloud only’ policy for a while, so they clearly had large, important customers who wanted a way to analyze data on-premises without moving everything into Snowflake,” Henschen said.

Additionally, Henschen said competitors such as Teradata, Vertica, and Yellowbrick offer on-premises deployment as well as hybrid and multicloud deployment.

Copyright © 2022 IDG Communications, Inc.


About Author

Comments are closed.