It is rare for data-centric businesses to store all of their data in a single location. The most effective model is usually a combination of on-premises, cloud, and third-party data, that has been virtualised to disguise the joins. This model has emerged at a time when business leaders are increasingly generating their own insights using dashboards and BI tools.
This trend is set to accelerate, with the market for self-service BI projected to experience a CAGR of 15.44% by 2028. Organisations that want to reap its benefits, which include agility and a reduced time to value when working with real-time data, must therefore ensure that for every data source they connect, they do not increase complexity for their stakeholders.
The solution is most often a semantic layer, which maps data-specific terms to the business terminology they already understand.
The semantic layer and data fabric design
The idea of a semantic layer isn’t new. In 1992, Business Objects obtained a patent on a “new data representation and a query technique which allows information system end users to access (query) relational databases without knowing the relational structure or the structure query language (SQL)”. The technique described by the patent used semantically dynamic objects.
But SQL isn’t the only database model in common use. So, the semantic layer must work harder if it is to continue serving the needs of business decision makers – particularly with the architects of the systems they use adopting the principles of data fabric design.
Talend likens data fabric to “a weave that is stretched over a large space that connects multiple locations, types, and sources of data, with methods for accessing that data. The data can be processed, managed, and stored as it moves within the data fabric. The data can also be accessed by or shared with internal and external applications for a wide variety of analytical and operation use cases for all organizations”.
Designed for intelligence
Data fabric is therefore a concept, rather than a tangible ‘thing’, that, as well as managing data, gives it context. It enriches that context over time by linking and binding individual data points based on their relationship with other points in any of the data sets available to the organisation.
The ability for data to be processed within the fabric, rather than merely stored, is key. As Gartner explains, “think of data fabric like a human brain – that can store information (data and metadata from participating systems captured for graph analysis) and process information (the decision engines).”
As humans process the data they store in their brains, they enrich it through experience and calculation. Children who touch a hot kettle, for example, quickly learn not to do it again. The kettle could be considered a data point, and the act of touching it would be a process. The painful outcome of connecting one to the other – the thread, in this instance – is an additional data point that is generated within the data fabric and contextualises the other two data points.
“The ‘thread’ holding together data fabric is metadata,” explains AtScale’s Dave Mariani. “Metadata is created throughout the lifecycle of data – from capture, to preparation, modelling and serving. At each stage, metadata is both an input and an output of the tools interacting with the data pipeline. The ultimate goal of a mature data fabric is to create a fully autonomous data factory where each data ‘processor’ can leverage metadata created from other components in the ecosystem.”
In this respect, the threads themselves become valuable metrics, which describe and explain the data. As these metrics are generated through use, rather than being entered or imported, they are known as ‘inferred’ metadata. “With semantic layers, you have consistency in the way data is being analysed across an enterprise because of the uniform schema of the data fields” says Senior Service Delivery Manager of Merit, Mohamed Aslam
A model that learns and improves
As the data grows, the systems within which it is employed can become more capable. Paul Warburg at Trifacta explains, “in a data fabric, the goal is for metadata to both connect interoperable components and serve as the barometer for the success of the data fabric and recommend areas of improvement… [and it] can even help automate data discovery tasks, depending on the unique needs of the organization, to accelerate a data asset’s time to value.”
Such a model is central to the concept of graph databases. We have demonstrated the benefits of this, explaining how “as humans, we already have a tendency to ‘picture’ ideas in a similar manner. We are all familiar with the family tree or a business organisation chart, which use lines to define the connections (called ‘edges’ in a graph database) between the people (data points, or ‘nodes’ in graph speak). Without the lines, the chart wouldn’t be either as rich or as informative, since they’re an integral part of the data being presented – just like the edges in a graph database.”
Enabling just-in-time aggregation
When the threads between data points become data themselves, the potential for complexity increases. So, where business leaders are already being tasked with self-serving the queries that underpin their decision-making, this could lead to delay and, potentially, errors as users who are not trained data analysts work with more deeply textured resources.
A semantic layer thus remains essential in business analytics, as it can help identify and distinguish between a diverse range of data sources, wherever they may be located, and implement a standard language to describe both the raw data points and the threads that link them.
The semantic layer also allows organisations to preserve the data in its native state, rather than reformatting and otherwise transforming it to conform to the requirements of the immediate use case. Doing so may have reduced its future utility or meant that additional work was required to reformat it at a later date.
Effectively, the semantic layer remains a lingua franca that doesn’t rewrite the original data, but parses it using terms the decision maker will understand in each context. This allows them to service the needs of the business, rather than adapting those needs to fit within the constraints of the data they hold.
As David Mariani explains, “instead of aggregating data early for a specific use case, a semantic layer defines graph-based data relationships to build aggregations only when, and if, they are needed. This ‘late binding’ approach means that data is accessible to your business users and data scientists at whatever granularity they need. Even better, your data consumers never need to know where the data is or how to query it using its particular data platform dialect.”
As data itself becomes more intelligent, and the concepts of data fabric design look set to accelerate the rate at which new data is generated, this will become ever more important.
Related Case Studies
A Digital Engineering Solution for High Volume Automotive Data Extraction
Automotive products required help to track millions of price points and specification details for a large range of vehicles.
Advanced ETL Solutions for Accurate Analytics and Business Insights
This solutions enhanced source-target mapping with ETL while reducing cost by 20% in a single data warehouse environment