![]() However at Equinox there are many membership actions every day. Next, we set up the dblink handshake to Redshift with the following SQL:Įxpand Collapse Adding Change-Data Capture for Large Tablesįor a small table, using a materialized view like the one above will work fine. Our new m4.xlarge RDS instance was placed in the same availability zone as Redshift with security group rules that allow traffic from Redshift to Postgres. With a Redshift instance already up and running, the first step was to create an RDS Postgres (9.5+) instance (no reason to leave the AWS ecosystem). * Named after the bodily version of Iron Man’s computer Implementation Setting up Postgres Outlined below are the steps we took to implement this solution within our data infrastructure in a way that serves our current needs and can also scale to meet future ones. We quickly realized this feature was an elegant solution to our sales reporting problem and thus project Vision* at Equinox was born. Given Redshift’s compatibility with Postgres, using dblink could allow for querying Redshift data on a Postgres database natively with minimal ETL work. It presented the Postgres module dblink that supports connections to other Postgres-compatible databases. Luckily in our research we found a post by Tony Gibbs on the AWS Big Data Blog titled JOIN Amazon Redshift AND Amazon RDS PostgreSQL WITH dblink. While Redis would have provided many benefits-like extremely fast performance for retrieval of data-it would also require designing and maintaining a custom ETL solution to map data from a relational structure to Redis’ key-value model, as well as a reporting solution to consume data in this format. The first proposed solution involved using Redis as a cache layer to hold commonly used data. ![]() As great as it is for performing computations on data, Redshift is not an ideal solution for making that data available to downstream consumers. But when switching to Redshift, we were mindful of its limitation in this regard. With our legacy warehouse backed by SQL server, traffic of this sort was not as immediate an issue. *There is concurrency scaling feature in preview, but that is triggered by wait time in your query queue by WLM settings, so not very helpful for small transactions. Given Redshift’s concurrency limitations*, this potential load carries the risk of overwhelming the system and drastically degrading performance (or forcing us to overprovision resources). And every time another user views or refreshes a sales report, a query is dispatched to be executed on Redshift to pull the latest data. member_contract_idĭuring peak hours, there can be hundreds of internal users opening sales reports at one time. One of the most widely-consumed metrics at Equinox is membership sales data, which is derived from what we call contract actions. Once landed, transaction blocks of SQL queries are executed on the raw data to transform it into the various fact and dimension tables that serve as the home for all metrics we keep about our users and business. To perform this ELT we use a mix of home-grown tooling and the proprietary Informatica Cloud service. The general process involves first landing data from various sources like production databases and our Salesforce instance into our Redshift warehouse (nicknamed Jarvis after Iron Man’s computer). On the data team, we use Redshift’s storage and compute capabilities extensively to support our analytic workloads. Nevertheless, the common ancestry of the two technologies does allow for some interoperability between them, one example of which will be explored in this article.įirst, some quick background about our usage of Redshift here at Equinox, and a problem we were able to solve with it. “ Based on” is very general phrasing and the truth is that Redshift’s being based on Postgres has little practical significance. It should come as a little bit surprising, then, that an oft-mentioned fact states that Redshift is “based on” Postgres version 8.0.2. One has a vertical data storage model, the other horizontal. One is optimized for analytic workloads (Redshift) the other performs better when requests come as frequent, small transactions (Postgres). On the surface, it would appear there is not much similar between Redshift-AWS’s cloud data warehouse service launched in 2012-and Postgres-one of the most popular open source databases first introduced in 1989.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |