Serverless wasn’t the only new feature announced last week. AWS also announced the preview of automated materialized views that treat the creation of these views much like cost-based query optimizers: it automatically generates the views based on data hot spots. Nonetheless, serverless grabbed the limelight.
AWS ups its industry ground game at re:Invent 2021
When AWS launched Redshift back in 2013, it was one of the first cloud data warehousing services. Starting with technology acquired from ParAccel, AWS profited but also paid the price for being among the first to market. Its early entry, along with the portfolio of other AWS analytics services, enabled Redshift to carve a large client roster with greater than tens of thousands of customers today. AWS forked the acquired ParAccel technology. But from the get-go, it followed a conventional data warehousing architecture with locally attached storage. By contrast, Google Cloud BigQuery, launched back in 2010, pioneered the cloud-native, data warehouse. Nonetheless, it was the launch of Snowflake in 2014 that put the elastic cloud data warehouse on the map. Incidentally, in our post last spring, we put serverless on our wish list for what we wanted to see next. Once in a blue moon, we occasionally get it right. Let’s make a couple of disclaimers. First of all, don’t confuse data sharing with federated queries. Redshift can remote query data sitting in RDS and Aurora databases for MySQL and PostgreSQL, and via Redshift Spectrum, to EMR and S3. But that’s quite similar to what Google already offers with BigQuery. Secondly, don’t believe that AWS is abandoning provisioned instances – it will keep offering them for Redshift as well because there are customers who prefer level billing. Google eventually learned that when it subsequently introduced flat-rate slots for BigQuery. With cloud-native architecture and serverless support, AWS has some opportunities to score some firsts. With cloud-native serverless architecture, AWS could move more analytic and AI processing in-database. But in-database machine learning has already become table stakes for cloud data warehouses. AWS already does so with Redshift ML, where you can use SQL commands to trigger developing models in SageMaker, then bring the models in-database as a form of user-defined function (UDF) to run training and/or inference workloads. In turn, Google also provides in-database ML for BigQuery, but it is limited to specific, curated models; while Microsoft allows running of ML models within Azure Synapse Spark pools. And with Snowpark, you can use non-SQL languages to push down processing, such as ML models, as UDFs directly into the Snowflake database. Our wish list is to bring Spark directly into Redshift. Today, you’d have to fire up a separate EMR cluster to run Spark (but at least now, it could also be triggered serverless as well). Of course, nothing is preventing AWS from breaking out Spark as a separate serverless service, just as Google Cloud recently did. But today, Azure Synapse Analytics lets you run a curated (subset) version of Spark in-database without firing up a separate cluster; we’d like to see AWS follow through. But let’s not stop there. Serverless also provides the opportunity to fire up workloads with third-party tools, especially with BI reporting and visualization. Redshift currently has integrations with its own QuickSight and with popular tools like Tableau, but you have to move data and process it in separate clusters. So let’s cut to the chase. We’d love to see AWS add a “Redshift-native” mode for third parties willing to run capabilities like ELT or visualization as containerized microservices that run directly inside Redshift RA3 compute nodes, or whatever next-generation nodes come out in future years. By comparison, Snowflake provides common APIs for third parties to access Snowflake data, but the data is processed in separate clusters. Imagine running an ELT service from Informatica or Fivetran as a microservice in a Redshift compute node. AWS could then promote Redshift as the cheapest, fastest data warehouse in the cloud. Disclosure: AWS and Google Cloud are dbInsight clients.