Sunday, June 15, 2025

Structure patterns to optimize Amazon Redshift efficiency at scale

Tens of 1000’s of shoppers use Amazon Redshift as a totally managed, petabyte-scale information warehouse service within the cloud. As a company’s enterprise information grows in quantity, the info analytics want additionally grows. Amazon Redshift efficiency must be optimized at scale to realize sooner, close to real-time enterprise intelligence (BI). You may also think about optimizing Amazon Redshift efficiency when your information analytics workloads or consumer base will increase, or to fulfill a knowledge analytics efficiency service degree settlement (SLA). You can too search for methods to optimize Amazon Redshift information warehouse efficiency after you full a web-based analytical processing (OLAP) migration from one other system to Amazon Redshift.

On this put up, we’ll present you 5 Amazon Redshift structure patterns that you may think about to optimize your Amazon Redshift information warehouse efficiency at scale utilizing options corresponding to Amazon Redshift Serverless, Amazon Redshift information sharing, Amazon Redshift Spectrum, zero-ETL integrations, and Amazon Redshift streaming ingestion.

Use Amazon Redshift Serverless to routinely provision and scale your information warehouse capability

To start out, let’s overview utilizing Amazon Redshift Serverless to routinely provision and scale your information warehouse capability. The structure is proven within the following diagram and consists of completely different parts inside Amazon Redshift Serverless like ML-based workload monitoring and computerized workload administration.

Amazon Redshift Serverless architecture diagram

Amazon Redshift Serverless structure diagram

Amazon Redshift Serverless is a deployment mannequin that you should use to run and scale your Redshift information warehouse with out managing infrastructure. Amazon Redshift Serverless will routinely provision and scale your information warehouse capability to ship quick efficiency for even probably the most demanding, unpredictable, or large workloads.

Amazon Redshift Serverless measures information warehouse capability in Redshift Processing Models (RPUs). You pay for the workloads you run in RPU-hours on a per-second foundation. You possibly can optionally configure your Base, Max RPU-Hours, and MaxRPU parameters to change your warehouse efficiency prices. This put up dives deep into understanding price mechanisms to contemplate when managing Amazon Redshift Serverless.

Amazon Redshift Serverless scaling is computerized and primarily based in your RPU capability. To additional optimize scaling operations for giant scale datasets, Amazon Redshift Serverless has AI-driven scaling and optimization. It makes use of AI to scale routinely with workload adjustments throughout key metrics corresponding to information quantity adjustments, concurrent customers, and question complexity, precisely assembly your value efficiency targets.

There isn’t a upkeep window in Amazon Redshift Serverless, as a result of software program model updates are utilized routinely. This upkeep happens with no interruptions for any current connections or question executions. Be certain to seek the advice of the issues information to raised perceive the operation of Amazon Redshift Serverless.

You possibly can migrate from an current provisioned Amazon Redshift information warehouse to Amazon Redshift Serverless by making a snapshot of your present provisioned information warehouse after which restoring that snapshot in Amazon Redshift Serverless. Amazon Redshift will routinely convert interleaved keys to compound keys once you restore a provisioned information warehouse snapshot to a Serverless namespace. You can too get began with a brand new Amazon Redshift Serverless information warehouse.

Amazon Redshift Serverless use circumstances

You need to use Amazon Redshift Serverless for:

  • Self-service analytics
  • Auto scaling for unpredictable or variable workloads
  • New functions
  • Multi-tenant functions

With Amazon Redshift, you possibly can entry and question information saved in Amazon S3 Tables – totally managed Apache Iceberg tables optimized for analytics workloads. Amazon Redshift additionally helps querying information saved utilizing Apache Iceberg tables, and different open desk codecs like Apache Hudi and Linux Basis Delta Lake, for extra info see Exterior tables for Redshift Spectrum and Increase information entry by way of Apache Iceberg utilizing Delta Lake UniForm on AWS.

You can too use Amazon Redshift Serverless with Amazon Redshift information sharing, which might routinely scale your giant dataset in unbiased datashares and keep workload isolation controls.

Amazon Redshift information sharing to share reside information between separate Amazon Redshift information warehouses

Subsequent, we’ll have a look at an Amazon Redshift information sharing structure sample, proven in beneath diagram, to share information between a hub Amazon Redshift information warehouse and spoke Amazon Redshift information warehouses , and to share information throughout a number of Amazon Redshift information warehouses with one another.

Amazon Redshift data sharing architecture patterns diagram

Amazon Redshift information sharing structure patterns diagram

With Amazon Redshift information sharing, you possibly can securely share entry to reside information between separate Amazon Redshift information warehouses with out manually transferring or copying the info. As a result of the info is reside, all customers can see probably the most up-to-date and constant info in Amazon Redshift as quickly because it’s up to date utilizing separate devoted sources. As a result of the compute accessing the info is remoted, you possibly can dimension the info warehouse configurations to particular person workload value efficiency necessities somewhat than the mixture of all workloads. This additionally offers further flexibility to scale with new workloads with out affecting the workloads already being run on Amazon Redshift.

A datashare is the unit of sharing information in Amazon Redshift. A producer information warehouse administrator can create datashares and add datashare objects to share information with different information warehouses, known as outbound shares. A shopper information warehouse administrator can obtain datashares from different information warehouses, known as inbound shares.

To get began, a producer information warehouse wants so as to add all objects (and potential permissions) that have to be accessed by one other information warehouse to a datashare, and share that datashare with a shopper. After that shopper creates a database from the datashare, the shared objects could be accessed utilizing three-part notation consumer_database_name.schema_name.table_name on the buyer, utilizing the buyer’s compute.

Amazon Redshift information sharing use circumstances

Amazon Redshift information sharing, together with multi-warehouse writes in Amazon Redshift, can be utilized to:

  • Help completely different sorts of business-critical workloads, together with workload isolation and chargeback for particular person workloads.
  • Allow cross-group collaboration throughout groups for broader analytics, information science, and cross-product affect evaluation.
  • Ship information as a service.
  • Share information between environments to enhance crew agility by sharing information at completely different granularity ranges corresponding to growth, check, and manufacturing.
  • License entry to information in Amazon Redshift by itemizing Amazon Redshift information units within the AWS Information Change catalog in order that prospects can discover, subscribe to, and question the info in minutes.
  • Replace enterprise supply information on the producer. You possibly can share information as a service throughout your group, however then shoppers may carry out actions on the supply information.
  • Insert further data on the producer. Customers can add data to the unique supply information.

The next articles present examples of how you should use Amazon Redshift information sharing to scale efficiency:

Amazon Redshift Spectrum to question information in Amazon S3

You need to use Amazon Redshift Spectrum to question information in , as proven in beneath diagram utilizing AWS Glue Information Catalog.

Amazon Redshift Spectrum architecture diagram

Amazon Redshift Spectrum structure diagram

You need to use Amazon Redshift Spectrum to effectively question and retrieve structured and semi-structured information from recordsdata in Amazon S3 with out having to instantly load information into Amazon Redshift tables. Utilizing the massive, parallel scale of the Amazon Redshift Spectrum layer, you possibly can run large, quick, parallel queries in opposition to giant datasets whereas a lot of the information stays in Amazon S3. This may considerably enhance the efficiency and cost-effectiveness of large analytics workloads, as a result of you should use the scalable storage of Amazon S3 to deal with giant volumes of knowledge whereas nonetheless benefiting from the highly effective question processing capabilities of Amazon Redshift.

Amazon Redshift Spectrum makes use of separate infrastructure unbiased of your Amazon Redshift information warehouse, offloading many compute-intensive duties, corresponding to predicate filtering and aggregation. Because of this you should use considerably much less information warehouse processing capability than different queries. Amazon Redshift Spectrum may routinely scale to doubtlessly 1000’s of situations, primarily based on the calls for of your queries.

When implementing Amazon Redshift Spectrum, make sure that to seek the advice of the issues information which particulars tips on how to configure your networking, exterior desk creation, and permissions necessities.

Assessment this finest practices information and this weblog put up, which outlines suggestions on tips on how to optimize efficiency together with the affect of various file sorts, tips on how to design across the scaling conduct, and how one can effectively partition recordsdata. You possibly can take a look at an instance structure in Speed up self-service analytics with Amazon Redshift Question Editor V2.

To get began with Amazon Redshift Spectrum, you outline the construction in your recordsdata and register them as an exterior desk in an exterior information catalog (AWS Glue, Amazon Athena, and Apache Hive metastore are supported). After creating your exterior desk, you possibly can question your information in Amazon S3 instantly from Amazon Redshift.

Amazon Redshift Spectrum use circumstances

You need to use Amazon Redshift Spectrum within the following use circumstances:

  • Enormous quantity however much less continuously accessed information, construct lake home structure to question exabytes of knowledge in an S3 information lake
  • Heavy scan- and aggregation-intensive queries
  • Selective queries that may use partition pruning and predicate pushdown, so the output is pretty small

Zero-ETL to unify all information and obtain close to real-time analytics

You need to use Zero-ETL integration with Amazon Redshift to combine together with your transactional databases like Amazon Aurora MySQL-Appropriate Version, so you possibly can run close to real-time analytics in Amazon Redshift, or BI in Amazon QuickSight, or machine studying workload in Amazon SageMaker AI, proven in beneath diagram.

Zero-ETL integration with Amazon Redshift architecture diagram

Zero-ETL integration with Amazon Redshift structure diagram

Zero-ETL integration with Amazon Redshift removes the undifferentiated heavy lifting to construct and handle complicated extract, remodel, and cargo (ETL) information pipelines; unifies information throughout databases, information lakes, and information warehouses; and makes information accessible in Amazon Redshift in close to actual time for analytics, synthetic intelligence (AI) and machine studying (ML) workloads.

At present Amazon Redshift helps the next zero-ETL integrations:

To create a zero-ETL integration, you specify an integration supply, corresponding to an Amazon Aurora DB cluster, and an Amazon Redshift information warehouse, corresponding to Amazon Redshift Serverless workgroup or a provisioned information warehouse (together with Multi-AZ deployment on RA3 clusters to routinely get well from any infrastructure or Availability Zone failures and assist be sure that your workloads stay uninterrupted), because the goal. The combination replicates information from the supply to the goal and makes information accessible within the goal information warehouse inside seconds. The combination additionally displays the well being of the combination pipeline and recovers from points when attainable.

Be certain to overview issues, limitations, and quotas on each the info supply and goal when utilizing zero-ETL integrations with Amazon Redshift.

Zero-ETL integration use circumstances

You need to use zero-ETL integration with Amazon Redshift as an structure sample to spice up analytical question efficiency at scale, allow a simple and safe method to create close to real-time analytics on petabytes of transactional information, with steady change-data-capture (CDC). Plus, you should use different Amazon Redshift capabilities corresponding to built-in machine studying, materialized views, information sharing, and federated entry to a number of information shops and information lakes. You possibly can see extra different zero-ETL integrations use circumstances at What’s ETL.

Ingest streaming information into Amazon Redshift information warehouse for close to real-time analytics

You possibly can ingest streaming information with Amazon Kinesis Information Streams or Amazon Managed Streaming for Apache Kafka (Amazon MSK) to Amazon Redshift and run close to real-time analytics in Amazon Redshift, as proven within the following diagram.

Amazon Redshift data streaming architecture diagram

Amazon Redshift information streaming structure diagram

Amazon Redshift streaming ingestion offers low-latency, high-speed information ingestion instantly from Amazon Kinesis Information Streams or Amazon MSK to an Amazon Redshift provisioned or Amazon Redshift Serverless information warehouse, with out staging information in Amazon S3. You possibly can hook up with and entry the info from the stream utilizing normal SQL and simplify information pipelines by creating materialized views in Amazon Redshift on prime of the info stream. For finest practices, you possibly can overview these weblog posts:

To get began on Amazon Redshift streaming ingestion, you create an exterior schema that maps to the streaming information supply and create a materialized view that references the exterior schema. For particulars on tips on how to arrange Amazon Redshift streaming ingestion for Amazon KDS, see Getting began with streaming ingestion from Amazon Kinesis Information Streams. For particulars on tips on how to arrange Amazon Redshift streaming ingestion for Amazon MSK, see Getting began with streaming ingestion from Apache Kafka sources.

Amazon Redshift streaming ingestion use circumstances

You need to use Amazon Redshift streaming ingestion to:

  • Enhance gaming expertise by analyzing real-time information from avid gamers
  • Analyze real-time IoT information and use machine studying (ML) inside Amazon Redshift to enhance operations, predict buyer churn, and develop your corporation
  • Analyze clickstream consumer information
  • Conduct real-time troubleshooting by analyzing streaming information from log recordsdata
  • Carry out close to real-time retail analytics on streaming level of sale (POS) information

Different Amazon Redshift options to optimize efficiency

There are different Amazon Redshift options that you should use to optimize efficiency.

  • You possibly can resize Amazon Redshift provisioned clusters to optimize information warehouse compute and storage use.
  • You need to use concurrency scaling, the place Amazon Redshift provisioning routinely provides further capability to course of will increase in learn, corresponding to dashboard queries; and write operations, corresponding to information ingestion and processing.
  • You can too think about materialized views in Amazon Redshift, relevant to each provisioned and serverless information warehouses, which incorporates a precomputed end result set, primarily based on an SQL question over a number of base tables. They’re particularly helpful for rushing up queries which can be predictable and repeated.
  • You need to use auto-copy for Amazon Redshift to arrange steady file ingestion out of your Amazon S3 prefix and routinely load new recordsdata to tables in your Amazon Redshift information warehouse with out the necessity for extra instruments or customized options.

Cloud safety at AWS is the very best precedence. Amazon Redshift affords broad security-related configurations and controls to assist guarantee info is appropriately protected. See Amazon Redshift Safety Finest Practices for a complete information to Amazon Redshift safety finest practices.

Conclusion

On this put up, we reviewed Amazon Redshift structure patterns and options that you should use to assist scale your information warehouse to dynamically accommodate completely different workload combos, volumes, and information sources to realize optimum value efficiency. You need to use them alone or collectively—selecting one of the best infrastructural arrange in your use case necessities—and scale to accommodate for any future progress.

Get began with these Amazon Redshift structure patterns and options at the moment by following the directions offered in every part. When you’ve got questions or strategies, depart a remark beneath.


In regards to the authors

Eddie Yao is a Principal Technical Account Supervisor (TAM) at AWS. He helps enterprise prospects construct scalable, high-performance cloud functions and optimize cloud operations. With over a decade of expertise in net software engineering, digital options, and cloud structure, Eddie at present focuses on Media & Leisure (M&E) and Sports activities industries and AI/ML and generative AI.

Julia Beck is an Analytics Specialist Options Architect at AWS. She helps prospects in validating analytics options by architecting proof of idea workloads designed to fulfill their particular wants.

Scott St. Martin is a Options Architect at AWS who’s obsessed with serving to prospects construct trendy functions. Scott makes use of his decade of expertise within the cloud to information organizations in adopting finest practices round operational excellence and reliability, with a spotlight the manufacturing and monetary providers areas. Exterior of labor, Scott enjoys touring, spending time with household, and taking part in piano.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles