AWS Data Lake

Coca-Cola Andina, which is the licensed producer and distributor of Coca-Cola products within South America, employs 17,500 people. The company serves 267,000 sub-distributors and 54 million consumers across the continent. Needless to say, the company was generating extremely high-volumes of data across the supply chain.  

Often, this data was stored in different applications or data warehouses; It was also a combination of structured and unstructured data, making it all the more complex to derive insights.  

After some deliberation, the company opted for AWS Data Lake to make data management easier – with unlimited storage, response, and processing capacity. The overall architecture (for the data lake) was built as platform-as-a-service (PaaS), enabling solutions to be built and dismantled quickly and cost-efficiently. 

A well-designed data lake architecture on AWS will include several components – AWS Lambda Microservices to perform specific functions; OpenSearch for search capabilities; access and authorisation using AWS Cognito; Glue for data transformation and Amazon Athena for data analysis.  

With AWS Data Lake, the company was able to accelerate several innovation initiatives, which had a direct impact on financial performance. The analytics team experienced an 80% increase in productivity – as it could ingest and analyse more than 95 percent of the data (of multiple types – images, docs, pdf files, numbers, etc.) from various sources. 

AWS Data Lake – A Centralized Repository on the Cloud  

A data lake acts as a centralised repository where all of a business’s structured and unstructured data (at any scale) can be stored.  

A data lake can be used to store multiple data types without creating a structure or schema. This data can also be used for analytics, running SQL queries to get answers for business problems, feed into machine learning algorithms, and even full-text search. 

AWS data lake is a popular solution for businesses leveraging the cloud for analytics and digital transformation as it offers a secure, comprehensive, scalable, and cost-effective portfolio of services. AWS Lake Formation facilitates the setting up of a secure data lake quickly.  

Some of the unique features of AWS Data Lake include:  

Out-of-the-Box or Customised: The AWS data lake can be leveraged as an out-of-the-box solution or as a reference implementation that can be customised to meet the varying needs of different businesses for data management, search, processing, analysis and access.  

Intuitive UI for data management: A web-based console UI is created and hosted on Amazon S3. Delivered by Amazon CloudFront, it can help to efficiently manage data lake users, policies, add or remove data packages, and perform additional analytics by creating manifests of datasets. 

Data Integration: Command Line Interface (CLI) is available on AWS Data Lake – enabling the automation of certain data management activities including unifying both inbound and outbound data into a central repository.  

Managed Storage Layer: Data storage, retrieval, management, and security is enabled in a managed Amazon S3 bucket, while a solution-specific AWS Key Management Service (KMS) key encrypts data at rest. 

Data Access Flexibility: Pre-signed Amazon S3 URLs or appropriate AWS Identity and Access Management (IAM) role can be used for direct access to datasets in Amazon S3 in a controlled manner. 

Data Transformation and Analysis: Data analysis and transformation become possible by uploading datasets with searchable metadata integrated with AWS Glue and Amazon Athena. 

Federated Sign-in: Users can also sign in using a SAML identity provider (IdP) such as Okta and Microsoft Active Directory Federation Services (AD FS). 

Key Benefits of AWS Data Lake 

One of the key benefits of the AWS Lake Formation solution is the speed at which data can be moved, stored, cataloged, and cleaned. Lake Formation crawls through the data sources and enables the following:  

  • Moving data into the Amazon S3 data lake 
  • Organising data – based on frequently used query terms  
  • Increasing efficiency by creating right-sized chunks of data 
  • Changing data into different formats such as Apache Parquet and ORC for quicker analytics 
  • Deduplicating and finding matching records using machine learning to improve data quality 

“A data lake becomes extremely useful when there is very large volume of data to process, and when a lot of it is unstructured. Once the Lake is setup, it is possible to cull out data with ease – irrespective of whether it is a small or massive data set being searched,” says Mohamed Aslam, Merit’s Senior Service Delivery Manager. 

AWS Data Lake also brings to the fore the following capabilities 

Simplified Security Management: Simplified defining and enforcing access controls at the table, column, row, or cell level from a single location for all users and services accessing the data is possible. Consistent implementation of policies eliminates the need to configure them manually across security services and improves compliance. 

Empowering Business Users with Self-Service: Users can access data relevant to them without depending on IT since the datasets are cataloged – thereby improving their productivity. Because of consistent security enforcement, users and analysts can use the analytics service of their choice and combine them to access data in different silos. 

Merit to help you implement a Data Lake on AWS  

At Merit Group, we work with some of the world’s leading B2B intelligence companies like Wilmington, Dow Jones, Glenigan, and Haymarket. Our data and engineering teams work closely with our clients to build data products and business intelligence tools. Our work directly impacts business growth by helping our clients to identify high-growth opportunities. 

Merit’s team brings to the table extensive knowledge in building next generation data lakes on the cloud using AWS or any other cloud platform that is preferred by the customer.  

Our data engineers can help you with faster time-to-insights using a robust data lake architecture that is much needed in large scale operations.  

Get in touch with our data and business intelligence teams for strategic guidance on building the right data ecosystem, custom-designed for your business. We’ll also help you choose the right data cloud platform based on your volume and/or types of data to be processed. 

If you’d like to learn more about our service offerings or speak to a data lake or AWS expert, please contact us here: 

Related Case Studies

  • 01 /

    A Unified Data Management Platform for Processing Sports Deals

    A global intelligence service provider was facing challenge with lack of a centralised data management system which led to duplication of data, increased effort and the risk of manual errors.

  • 02 /

    Bespoke Data Engineering Solution for High Volume Salesforce Data Migration

    A global market leader in credit risk and ratings needed a data engineering solution for Salesforce data migration.