AWS Announces New Analytics Capabilities to Help Customers Embrace Data at Scale
Amazon Redshift RA3 instances let customers scale compute and storage separately and deliver 3x better performance than other cloud data warehouse providers (available today)
AQUA (Advanced Query Accelerator) for Amazon Redshift provides a new innovative hardware accelerated cache that delivers up to 10x better query performance than other cloud data warehouse providers (available mid-2020)
Amazon Redshift Data Lake Export allows customers to export data directly from Amazon Redshift to Amazon S3 in an open data format (Apache Parquet) optimized for analytics (available today)
Amazon Redshift Federated Query lets customers analyze data across their Amazon Redshift data warehouse, Amazon Simple Storage Service (S3) data lake, and Amazon RDS and Aurora (PostgreSQL) databases (available in preview)
UltraWarm offers a new warm storage tier for Amazon Elasticsearch Service at up to one-tenth the current cost that makes it easier for customers to retain any amount of current and historical log data (available in preview)
Customers today are regularly trying to operate on petabytes and even exabytes of data. This new scale of data, along with new application requirements, mean that analytics tools will have to change significantly to scale effectively. Customers want to be able to perform analytics across all of their data, regardless of the format or where the data lives, and scale their applications to support millions of users anywhere in the world. AWS provides the broadest and deepest set of analytics services of any cloud provider, and is constantly innovating based on customer needs for this new scale of data.
Amazon Redshift RA3 instances with Managed Storage allow customers to cost-effectively scale and run 3x faster than any other cloud data warehouse
As the scale of data continues to get much bigger-- reaching petabytes per week-- customers are ingesting even more data into their Amazon Redshift data warehouse. To scale their data warehouse, customers use Redshift’s Elastic resize capability to add additional instances to their cluster. Today, Redshift’s instances include a fixed amount of compute and storage, so it’s possible for customers to end up over-provisioned on either, and paying for capacity they don’t use. Customers have asked for the ability to grow their storage without over-provisioning compute, and for more flexibility to grow their compute capacity without increasing their storage costs.
New Amazon Redshift RA3 instances with Managed Storage (available today) allow customers to optimize their data warehouse by scaling and paying for compute and storage independently. With Amazon Redshift RA3 instances, customers choose the number of instances they need based on their data warehousing workload’s performance requirements, and only pay for the managed storage that they use. Redshift Managed Storage uses large, high-performance SSDs in each Amazon Redshift RA3 instance for fast local storage and Amazon S3 for longer-term durable storage. If the data in an instance grows beyond the size of the large local storage, Redshift Managed Storage automatically offloads that data to Amazon S3. Customers pay the same low rate for Redshift Managed Storage regardless of whether the data sits in high-performance local storage or in Amazon S3, and they only pay for the amount of storage they use on a local RA3 storage, meaning they don’t end up wasting spend on unused storage capacity. For workloads that require a lot of storage, but not as much compute capacity, customers can automatically scale their data warehouse storage capacity without adding and paying for additional instances. Redshift Managed Storage uses a variety of advanced data management techniques to optimize how efficiently data is offloaded to and retrieved from Amazon S3. In addition, Amazon Redshift RA3 instances are built on the AWS Nitro System and feature high bandwidth networking that further reduces the time taken for data to be offloaded and retrieved from Amazon S3. Together, these capabilities enable Amazon Redshift RA3 instances with Managed Storage to deliver 3x the performance of any other cloud data warehouse service, and existing Amazon Redshift customers using Dense Storage (DS2) instances will get up to 2x better performance and 2x more storage capacity at the same cost. RA3 16xlarge instances are generally available today to support workloads with petabytes of data (up to 8 PB compressed), with RA3 4xlarge instances coming early next year. To get started with Redshift RA3 instances, visit https://aws.amazon.com/redshift.
AQUA (Advanced Query Accelerator) for Amazon Redshift brings compute to the storage layer for 10x faster performance than any other cloud data warehouse
Rapid growth in the volume of data that customers need to process in their data warehouse has led to a difficult balancing act between performance and cost-effective scaling. The prevailing approach to data warehousing has been to build out an architecture in which large amounts of centralized storage is moved to waiting compute nodes to process the data. The challenge with this approach is that there is a lot of data movement between the shared data and compute nodes. As data volumes continue to grow at a rapid clip, this data movement saturates available networking bandwidth and slows down performance. Additionally, even if the networking bottleneck can be overcome, because SSD storage throughput to and from storage nodes has scaled 6x faster over the last seven years than the ability for CPUs to process data from memory, absent some significant change, CPUs aren't able to keep up with the faster storage capabilities, which will either become a performance bottleneck itself or create more cost as customers are forced to provision more compute to get the work done quickly.
AQUA (Advanced Query Accelerator) for Amazon Redshift (available mid-2020) is a new distributed and hardware-accelerated cache for Amazon Redshift that provides the next phase of performance improvement and innovation for analytics at the new scale of data. AQUA brings compute to the storage layer, so data doesn’t have to move back and forth between the two, enabling Redshift to run 10x faster than any other cloud data warehouse. AQUA is a big, high-speed cache architecture on top of Amazon S3 that can scale out and process data in parallel across many nodes. Each node possesses a hardware module comprised of AWS designed analytics processors that dramatically accelerate data compression, encryption, and data processing (including filtering and aggregation). This new architecture makes queries run so much faster than today’s cloud data warehouses that customers will be able to query raw data directly, even at scale, giving them more up-to-date dashboards, less development time, and easier to maintain systems. AQUA-powered Amazon Redshift will remain 100% compatible with the current version of Amazon Redshift, so customers can easily migrate existing data warehouses with no code changes. AQUA provides the next phase of performance innovation for analytics at the new scale of data, and will be available in mid-2020. To learn more about AQUA, visit https://pages.awscloud.com/AQUA_Preview.html.
Amazon Redshift Data Lake Export makes it easy to save query results directly to a data lake
Customers require data to be combined across their data warehouse and data lake, and don’t want data locked in silos and proprietary formats. For example, an organization may want to understand what their customer was browsing before they made a purchase, which requires them to combine the order history sitting in the data warehouse with the clickstream data sitting in an Amazon S3 data lake. Amazon Redshift enables customers to directly query and join data across both their Amazon Redshift data warehouse and Amazon S3 data lake, giving customers a ‘lake house’ approach to data warehousing. In this lake house world, where data is stored both in Amazon Redshift and Amazon S3, customers also need an easy way to get the results from Amazon Redshift queries back into Amazon S3 in an open format that can be used by other services.
Amazon Redshift Data Lake Export (available today) allows customers to export data directly from Amazon Redshift to Amazon S3 in an open data format (Apache Parquet) that is optimized for analytics. Customers can now save the results of a query they have done in Amazon Redshift into their data lakes in open formats so that they can analyze that data with other analytics services like Amazon SageMaker, Amazon Athena, and Amazon EMR. No other cloud data warehouse makes it as easy to both query data and write data back to a data lake in open formats. To get started with Amazon Redshift Data Lake Export, visit https://aws.amazon.com/redshift.
Amazon Redshift Federated Query allows customers to analyze data across data warehouses, data lakes, and operational databases
Aggregating, transforming, and uploading large amounts of data from a relational database to a data warehouse can be resource-intensive and time-consuming, which is why many customers choose to do so only once a day. This can create problems when customers need to query their data warehouse for certain types of timely information that is initially stored in an operational database. For example, a customer service representative helping a customer resolve an issue with a recent order might be served day-old results when they pull up the customer’s purchase history, making the information irrelevant. Customers can work around this problem today by writing custom application code to query the operational database directly, but building integrated systems that do this is expensive, time consuming, and difficult to maintain.
Amazon Redshift Federated Query (available in preview) gives customers the ability to run queries in Amazon Redshift on live data across their Amazon Redshift data warehouse, their Amazon S3 data lake, and their Amazon RDS and Amazon Aurora (PostgreSQL) operational databases. This simplifies application development by allowing customers to use familiar SQL statements to combine all of this data across their various data stores. With this capability, Amazon Redshift queries can now provide timely and up-to-date data from operational databases to drive better insights and decisions. To get the best possible performance, the Redshift query optimizer intelligently distributes as much work as possible to the underlying databases. To learn more about Amazon Redshift Federated Query, visit https://aws.amazon.com/redshift.
UltraWarm for Amazon Elasticsearch Service provides fast, interactive analytics on log data at one-tenth the cost
As more and more applications are built using microservices, containers, and purpose-built data stores, they produce an ever-increasing amount of log data. Amazon Elasticsearch Service makes it simple to collect, analyze, and visualize machine-generated log data from websites, mobile devices, and sensors. Amazon Elasticsearch Service is fully managed, so customers can deploy production-ready clusters in minutes, scale clusters up and down, and secure data at rest and in transit. However, given the explosive growth of log data, storing and analyzing months’ or years’ worth of data is cost-prohibitive at scale. This has led customers to use multiple analytics tools, or delete valuable data, missing out on important insights that the longer-term data could yield.
To solve for this customer challenge, AWS built a new storage tier for Amazon Elasticsearch Service called UltraWarm, which finally gives Elasticsearch customers a warm storage tier that both stores large amounts of data cost-effectively and provides the type of snappy, interactive experience that Elasticsearch customers expect. UltraWarm offers a distributed cache for more frequently accessed data, while using advanced placement techniques to determine which blocks of data are less frequently accessed and should be moved outside of the cache to Amazon S3. UltraWarm also uses high-performance EC2 instances to interact with data stored in S3, providing 50% faster query execution versus competing warm-tier solutions, and giving customers the same interactive analytics experience with all their log data. UltraWarm reduces costs by up to 90% to store the same amount of data in Elasticsearch today, and is 80% lower than the cost of warm-tier storage from other managed Elasticsearch offerings. With UltraWarm, customers can manage up to 3 PB of log data with a single Amazon Elasticsearch Service cluster; and with the ability to query across multiple clusters, customers can effectively retain any amount of current and historical log data for interactive operational analysis and visualization. UltraWarm is a seamless extension of the Amazon Elasticsearch Service. Customers can easily query and visualize across both their recent and longer-term operational data, all from their Kibana interface, at a fraction of the cost today. This allows developers, DevOps engineers, and InfoSec experts to use Amazon Elasticsearch Service for the analysis of recent (weeks) and longer-term (months or years) operational data without needing to spend days restoring data from archives (Amazon S3 or Amazon Glacier) to an active searchable state in an Elasticsearch cluster. UltraWarm Service is available in preview today. To learn more about UltraWarm, visit https://aws.amazon.com/elasticsearch-service/features.
“Our customers tell us they are regularly dealing with petabytes, and even exabytes of data, and their existing analytics systems can’t keep up,” said
Duolingo is the most popular language-learning platform and the most downloaded education app in the world, with more than 300 million users. The company's mission is to make education free, fun, and accessible to all. “We use Amazon Redshift to analyze the events from our app to gain insight into how users learn with Duolingo. We load billions of events each day into Amazon Redshift, have hundreds of terabytes of data, and that is expected to double every year. While we store and process all of our data, most of the analysis only uses a subset of that data,” said
Yelp’s mission is to connect people with great local businesses; to do so, data mining and efficient data analysis is important in order to build the best user experience. “We continue to adopt new Redshift features and are thrilled with the new RA3 instance type,” said
Intuit, makers of TurboTax, QuickBooks and Mint, is a global financial platform company designed to empower consumers, self-employed, and small businesses to improve their financial lives. “We are looking forward to exploring how AQUA can empower our team to spend more time innovating on behalf of customers,” said
Ancestry is the global leader in family history and consumer genomics, empowering journeys of personal discovery to enrich lives. “With Amazon Elasticsearch Service we collect and analyze our company’s operational logs in real time,” says
About Amazon Web Services
For 13 years, Amazon Web Services has been the world’s most comprehensive and broadly adopted cloud platform. AWS offers over 165 fully featured services for compute, storage, databases, networking, analytics, robotics, machine learning and artificial intelligence (AI),