semantic layer data

“The importance of data to today’s businesses can’t be overstated,” writes Janice M. Zdankus and Anthony Delli Colli at HP Enterprise. “Studies show data-driven companies are 58% more likely to beat revenue goals than non-data-driven companies and 162% more likely to significantly outperform laggards. Data analytics are helping nearly half of all companies make better decisions about everything, from the products they deliver to the markets they target.” 

However, they warn, “Used optimally, data is nothing less than a critically important asset. Problem is, it’s not always easy to put data to work.” 

Often, this difficulty comes about not because users don’t know which questions to ask, but because of the way the data is stored and catalogued behind the scenes. 

Data management vs record keeping  

Business record keeping was traditionally a job for ledgers and record cards. Businesses would accumulate stacks of paper, on which the data essential to their success was recorded in a range of often incompatible formats. These could be spread across multiple locations – and even several sites – which only made the job of answering questions more difficult for the organisation’s decision makers. 

Some organisations have carried those practices forward to the digital age, without fully realising it. This is a problem when the range of data available to organisations of every size is increasing exponentially. Between 2022 and 2025, the amount of data in global use is projected to grow by 86%, from 97 to 181 zettabytes, having already increased more than 48-fold since 2010. 

Getting insight from data overload 

“Data has become so readily available to business leaders that there is already a danger of there being so much of it produced that it can be difficult for many to know what to analyse and to extract anything worthwhile,” notes Matthew Burrows at The Consultancy Group

If those business leaders are to convert raw metrics into the insight and intelligence that will drive effective decision-making, it must be ordered, accessible, and understandable. 

The importance of data virtualisation 

Because organisations now rely on so much data – and frequently integrate third-party data sources with their own – it is often impossible for them all to be stored in a unique location, not to mention cost-inefficient to opt for anything other than cloud. 

Unless they want to replicate the card and ledger model of old, it is at this point where such enterprises must consider how they can best gather those diverse data sources and present them to decision-makers as a single, unified resource. Today, the most effective solution is data virtualisation

The benefits of data virtualisation 

As we have written elsewhere, “through data virtualisation, these diverse data sets can be presented as though they were physically located in a single location, even though they may be drawn from a diverse range of sources – and locations.” 

The benefits are immediately apparent: stakeholders no longer need to actively query multiple data sources, or even know (or care) where those sources are located. Through virtualisation, they are each accessible through a single tool, allowing them to more easily self-serve.  

A recent example of this lies in the handling of the Covid-19 pandemic. Merit’s Senior Delivery Manager, Mohamed Aslam says that “Data virtualization has helped in managing the Covid-19 situation by providing solutions for reviewing massive data sets passed between scientists and epidemiologists.” This kind of visibility can be harnessed in any industry.  

Efficiency through data virtualisation 

Further, by taking away responsibility for maintaining a working knowledge of where the data they need is stored, businesses allow those decision makers to focus on the task at hand, so they can test multiple scenarios more quickly to identify the most promising outcomes in less time, giving them an edge over competitors. Moreover, because the data sources are available simultaneously, the resources can be combined, compared, and contrasted for greater insight. 

Data virtualisation vs ETL  

By virtualising data, organisations can skip several steps that were once essential. Crucially, ETL is no longer required, since the data remains where it is, rather than being exported, transformed, and loaded into a second system for use. 

This doesn’t only save bandwidth, storage, and processor cycles. ETL could only ever give access to data captured at a particular moment in time. As soon as it arrived on the client system for analysis, it was already starting to age – and the longer it remained there, the less effective it would become. Repeating the ETL operation would only ever be a partial solution, since BI dashboards using existing data would remain inaccurate until the next scheduled refresh. 

Data virtualisation, on the other hand, always gives decision makers a live view of the source material. Dashboards will update in sync with the source, giving stakeholders the ability to trust what they are being told, so they can be both confident and timely in their decision-making. 

Inconsistencies from multiple data silos  

Yet ETL did more than merely move data from one server to another. It also played a significant role in moulding that data to meet an organisation’s ongoing practices and made it more comprehensible to those that would use it. 

Presenting live data from a diverse range of sources – particularly if the range includes third-party data – risks exposing decision makers to a wide range of formats and conventions. These could range from the simplistic, like one data source using Ave and Rd where another employs Avenue and Road, to the more fundamental, like standardisation on incompatible measurement systems. Such inconsistencies can be expensive, as NASA discovered, when it lost a $125m space probe when engineers confused metric and imperial measurements. 

How semantic layers overcome inconsistencies of multiple data silos  

If organisations are to retain their competitive edge, it should not be incumbent upon their data users to overcome inconsistencies within the data themselves – particularly when they may not be aware that such inconsistencies exist. If two data sources record the performance of a range of equities, but neither specifies the currency used to quantify that performance, how is the decision maker to reconcile the data? 

Thus, a semantic layer must be employed to standardise the data and provide a consistent dialect for asking questions – and to make the various data sources easily identifiable. 

The benefits of using semantic layers for one version of truth 

As Datameer illustrates, a semantic layer, “is a business representation of data. It enables end-users to quickly discover and access data using standard search terms — like customer, recent purchase, and prospect. It also provides human-readable terms to data sources that otherwise would be impossible to discover (e.g., table slsqtq121 becomes Sales West 1st Quarter 2021).” 

By implementing a semantic layer, organisations can implement a single data view, so that all parts of the business are working with the same data and a single version of truth – and using the same terms to both access and discuss those metrics. 

This should not only avoid disagreement between departments over the value of gathered metrics, or the story they tell. It should equally enable more effective discussion, easier collaboration, and the ability to consult more widely throughout an organisation where everyone speaks the same data language. 

Related Case Studies

  • 01 /

    A Digital Engineering Solution for High Volume Automotive Data Extraction

    Automotive products required help to track millions of price points and specification details for a large range of vehicles.

  • 02 /

    Bespoke Data Engineering Solution for High Volume Salesforce Data Migration

    A global market leader in credit risk and ratings needed a data engineering solution for Salesforce data migration.