How does an organisation make effective decisions if it can’t refer to a single version of the truth? And how does it generate that single version of the truth if its data contradicts itself across distributed clusters and nodes? It can’t.
As the infrastructure used to store and process data becomes more complex, in an effort to reduce execution time and deliver actionable insights more quickly, enterprises must deploy strategies tools to identify and remedy anomalies in the data.
In short, they must rely on master data management.
Principles of master data management
Master data management (MDM) isn’t a technology in its own right, but a discipline underpinned by – and facilitated by – technological solutions. As soon as data is duplicated, as is frequently the case when using distributed processing platforms, measures must be implemented to ensure that the data remains aligned, and that updates to one copy are reflected in each of the others. That way, a process will always deliver valid output, wherever it takes place, and whichever copy of the data it’s using.
Effective master data management requires not only that data points agree, regardless of their location, but that they have been “de-duplicated, reconciled and enriched, becoming a consistent, reliable source,” says Informatica. “Once created, this master data serves as a trusted view of business-critical data that can be managed and shared across the business to promote accurate reporting, reduce data errors, remove redundancy, and help workers make better-informed business decisions.”
It might once have been possible to perform these actions by hand, but the speed at which data is generated and amended today means that’s no longer the case. Only a master data management platform has the capacity and agility to cope.
Read about other advanced tools recommended by Merit’s data engineering experts for powering and optimising your BI Stack.
How Master data management platforms work
With an MDM platform, the enterprise can define a single master record (sometimes called a golden record), to which all other sources refer. This master record can be shared in several ways. Most commonly:
- Through data propagation, in which the data is copied directly from one system to another.
- Through data consolidation, in which data is retrieved from multiple sources within the organisation’s tech infrastructure and consolidated in a single location. The data residing at the single central location is considered the master record and, from here, copies are propagated to other nodes within the network.
- Through data federation, in which there can be multiple sources and multiple destinations, with data moving between them.
Whichever method is chosen, the result should be the same: a single version of the truth which, when processed at any location where it might reside, delivers a predictable, reliable, and repeatable result.
While managing the maintenance and transmission of master records, MDM platforms can simultaneously identify and remedy errors in source data, standardise formats, and identify and remove duplicate records using a series of rules. Combined, these actions cleanse the data.
Accelerating MDM through automation
“Cleansing involves the correction of errors, matching, standardisation, and enhancing data,” explains open-source MDM platform developer, Pimcore. “For instance, data cleansing functions can be created to apply simultaneous changes to data records and reduce physical effort. So, if you want to change ‘Co.’ (denoting company) mentioned in every record, to ‘Company,’ then a data cleansing function can be created to automatically apply this change in every record.”
Implementing comprehensive rules thus introduces a degree of automation to the MDM process, but not all inconsistencies can be resolved in this manner. The process may therefore require a degree of human intervention or, with a platform like IBM InfoSphere Master Data Management, machine learning assisted data stewardship.
IBM’s use of MDM
IBM’s implementation can be tasked with comparing massive data sets and identifying candidates for amalgamation, based on records demonstrating a high degree of correlation. Users can drag and drop identified records onto one another to merge them, or onto separate entities if they are a better fit elsewhere. This should deliver considerable time-savings when compared to manual re-keying.
Where MDM platforms are required
The larger the organisation, the more likely it is to need an MDM platform. This is particularly the case if several departments, which may not directly communicate with one another, rely on the same data set to perform seemingly contradictory tasks. In a retail environment, for instance, a sales department may generate customer data, which the marketing department uses to upsell new products. Unless the master record for each customer data point also includes a contemporaneous record of their past purchases, marketing effort – and expenditure – may be wasted when the marketing teams dispatch promotions for new products that customers have already bought.
According to Merit’s Senior Service Delivery Manager ‘An effective MDM is critical for continued sales relations and to generate new business.
A good example would be the whitespace created between Parent organisations and their Subsidiaries. If a parent org is a client, it is easier to upsell to their subsidiaries rather than reaching out to them with standard Prospect Marketing material.
It is also critical for the business to monitor their B2B vs. B2C management as some individuals would prefer to subscribe via their personal email.
If the data structure is well defined and managed in silos/modules, it allows for maximum malleability of applying analytics and Business Intelligence on the data which is the ultimate end goal for massive datasets in CRMs. This method generates smart money with easy wins and aids in Customer retention significantly.’
Where multiple data silos exist, so much MDM
Each of the data stores to which these departments refer is effectively a silo. And, where multiple silos exist, MDM is key to synchronisation.
“Data silos are also organisational silos,” says Stibo Systems, developer of a master data management platform. “Decentralised decision making across departments, lines of business and geographies, often leads to data being duplicated in multiple locations. The result: a broad spectrum of hybrid environments with widely distributed, siloed data, all on their own unique path to becoming a customer-centric, responsive and agile enterprise. As you might guess, mergers and acquisitions are a common root cause of data silos.”
Should an organisation acquire a close competitor, there’s a high likelihood they will share some customers. De-duplication, as part of a broader MDM process, will create a unified view of each customer, taking valid data points from each source to create a single master record, which can be shared across the expanded corporate entity.
The risks of poorly implemented MDM
Within a single organisation, effective MDM has an important role to play. “If customer information lives in multiple locations instead of a master database, then employees have no single source of truth for this essential data,” explains Tableau. “A customer could, for example, update their contact information online in a logged-in account — but, if a different system handles email distribution, then the customer would no longer receive marketing communications.”
Worse, should customers opt out of receiving communications, but this change not be propagated across the organisation’s various data resources, it could find itself in breach of data-use regulations, like GDPR, and subject to prosecution. The penalty could be more costly than any savings made in failing to implement an effective MDM platform in the first place.
Merit Group’s expertise in MDM
At Merit Group, we work with some of the world’s leading B2B intelligence companies like Wilmington, Dow Jones, Glenigan, and Haymarket. Our data and engineering teams work closely with our clients to build data products and business intelligence tools. Our work directly impacts business growth by helping our clients to identify high-growth opportunities.
The Merit team also brings to the table deep expertise in building real-time data streaming and data processing applications as part of an MDM environment. Our data engineering team brings to fore specific expertise in a wide range of data tools including Airflow, Kafka, Python, PostgreSQL, MongoDB, Apache Spark, Snowflake, Tableau, Redshift, Athena, Looker, and BigQuery.
If you’d like to learn more about our service offerings or speak to Looker expert, please contact us here: https://www.meritdata-tech.com/contact-us
Related Case Studies
Bespoke Data Engineering Solution for High Volume Salesforce Data Migration
A global market leader in credit risk and ratings needed a data engineering solution for Salesforce data migration.
A Unified Data Management Platform for Processing Sports Deals
A global intelligence service provider was facing challenge with lack of a centralised data management system which led to duplication of data, increased effort and the risk of manual errors.