NASDAQ, the American stock exchange, was grappling with a major data challenge sometime in 2014. The company was having trouble with processing high-volume data sets that came in every single evening at market closing.
The company’s data intelligence team spotted the problem in no time. The stock exchange needed a better data warehouse, one that could operate at a massive scale, processing billions of records overnight. Of course, budget and operational efficiencies were key factors too, and the company had to make the call to move from an on-premise data warehouse to a cloud-based solution from AWS.
In 2014, the Exchange migrated its data warehouse from legacy, on-premise infrastructure to an Amazon Redshift cluster on Amazon Web Services (AWS). Today, the company is able to ingest 70 billion records every day, with a peak volume of 113 billion using a lake house architecture it upgraded to in 2019.
Redshift for Advanced Analytics at Scale
Businesses today need quick insights from real-time, streaming data to be responsive to changing trends. For this, they need access to not only internal data but external sources as well. The cost of building data warehouses in on-premise, legacy systems can be formidable while also restricting scalability and flexibility.
A cloud-based, fully-managed data warehouse service such as Amazon Redshift is a scalable solution where you can start with gigabytes of data and grow up to petabytes or more. As it is a column-oriented, OLAP-style (Online Analytical Processing) database, it uses PostgreSQL version 8.0 and enables the use of SQL to analyse structured and semi-structured data stored in different data warehouses, data lakes, and operational databases.
As a Data Engineer at Merit adds, “It can handle complex queries and can scale quickly without any need for on-prem data warehouse infrastructure. Redshift uses Massively Parallel Processing (MPP) technology to enable the processing of high-volume data in real-time. It is important to note here that this is not batch processing of data. Rather, it is garnering intelligence on large scale streaming data – as data comes in.”
Amazon Redshift – Features and Capabilities that Data Engineers Look for
Today, data engineers are looking for manage BI workloads that are elastic in nature. In some instances, it is not possible to predict the volume of data that will come in for analysis. In such cases, having a flexible solution like Redshift is ideal, since the cloud infrastructure that is needed can be scaled up or down on the fly.
Query Editor v2 is a web-based analyst workbench that allows data analysts, data engineers, and other SQL users to explore and analyse the data (in the data lake) using SQL. With a single click, you can visualise query results, load them visually, create tables and schemas, and explore database objects.
Automated Table Design
Amazon Redshift maximises query speeds by improving the physical data layout using sophisticated algorithms by monitoring user workloads. The cluster’s workload performance is optimised by Automatic Table Optimisation selecting the most efficient sort and distribution keys without any interventions by the administrator.
The need for manual maintenance is further eliminated by features such as Automatic Vacuum Delete, Automatic Table Sort, and Automatic Analyze, which can tune Redshift clusters to extract the best performance for production workloads and new clusters.
Query with Your Own Tools
In Amazon Redshift, you can run your own queries within the console to connect to SQL client tools and libraries such as Amazon Quicksight, QueryBook, Tableau, PowerBI, and Jupyter Notebook.
Interact with Redshift Using Simple APIs
The Amazon Redshift Data API simplifies accessing, ingestion, and egress of data from programming languages and platforms. It eliminates the need to configure drivers and manage database connections and instead runs SQL commands using a secure API endpoint available in the Data API. The Data API helps with database connection management and data buffering. Being asynchronous, it allows retrieving results, which are stored for a day, up to 24 hours later.
Amazon Redshift ensures data warehouse cluster reliability through continuous monitoring of its health, automatic replication of data from failed drives, and replacement of nodes as needed. It also allows the relocation of clusters can to what is called Availability Zones (AZs) without loss of data loss or changes to application.
Amazon Redshift for Data Intelligence
Today, businesses have a compelling need to make informed decisions based on real-time data and respond to trends in a timely manner. Redshift’s use of MPP technology speeds up access to data and analytics, enabling businesses to meet customer demands, retain competitive advantage, and improve throughput quickly.
Another benefit is that Redshift enables drawing insights not only from structured data but from semi-structured data, and unstructured data as well. It can provide a unified view of enterprise-wide data, collecting data from across sources, including web applications, desktops, mobile phones, and tablets.
It also provides detailed reports and information dashboards with an intuitive user interface and can be used with tools such as Amazon Quicksight. User behavior analytics is another area where Redshift can be leveraged to draw insights into user behavior, interactions, and other such information.
Merit Group’s expertise in cloud BI
At Merit Group, we work with some of the world’s leading B2B intelligence companies like Wilmington, Dow Jones, Glenigan, and Haymarket. Our data and engineering teams work closely with our clients to build data products and business intelligence tools. Our work directly impacts business growth by helping our clients to identify high-growth opportunities.
Our specific services include high-volume data collection, data transformation using AI and ML, web watching, BI, and customised application development.
We’re experts in Cloud BI, helping companies streamline and migrate to a truly next-generation BI stack.
Our team also brings to the table deep expertise in building real-time data streaming and data processing applications. Our 15 years of experience in data engineering is especially useful in this context. Our data engineering team brings to fore specific expertise in a wide range of data tools including Airflow, Kafka, Python, PostgreSQL, MongoDB, Apache Spark, Snowflake, Redshift, Athena, Looker, and BigQuery.
If you’d like to learn more about our service offerings or speak to an Athena expert, please contact us here: https://www.meritdata-tech.com/contact-us/
Related Case Studies
Bespoke Data Engineering Solution for High Volume Salesforce Data Migration
A global market leader in credit risk and ratings needed a data engineering solution for Salesforce data migration.
A Unified Data Management Platform for Processing Sports Deals
A global intelligence service provider was facing challenge with lack of a centralised data management system which led to duplication of data, increased effort and the risk of manual errors.