Writen by Tomaz Lago, Data Engineer, in 04/11/2021
6 minutes of reading
How to use your data smartly in open data initiatives
Extracting value from data becomes fundamental for offer the best experience, right products and services, and in more accessible channels.
A world awash with shared data brings not just opportunities but a host of challenges. In this view, the focus is no longer on the advantage that the company enjoys by having the information and shifts to transparency and customer comfort. In Brazil, some sectors are already making concrete progress towards the movement of open data, such as Open Banking and the incipient Open Insurance.
Despite the movements linked to Open Finance which are on the display window, open data initiatives are gaining strength in several segments, extrapolating the niches of financial services and insurance companies. In this sense, regardless of the sector, extracting value from data to offer the best experience, right products and services, and in more accessible channels for the user, becomes fundamental for the success of the initiatives.
The business impacts from these new rules are great – many yet unexplored, others undiscovered. I present below some possible paths in the context of data engineering, so that business teams can take advantage to create smart solutions and ensure competitiveness in open data.
Understand and consider the company’s current setting
As already mentioned, many approaches and possibilities with open data have not yet been discovered, as we are still learning to deal with this type of sharing. Thus, the first big challenge is the construction of an architecture that is able to support all and any source – whatever it may be. Following this logic, a natural solution would be the construction of a Data Lake, a large structured and unstructured data repository, which will serve for decision making.
The technological setting and organizational culture are extremely important items and must be taken into account when creating a data foundation that uses a Data Lake as the core of its information. The technological setting refers to the tools that the company has and the environment, whether on premises or cloud. Tools that support multiple connectors and end-to-end flow orchestrators are crucial to this work.
When we talk about organizational culture, we will not necessarily be addressing a Data-Driven culture, as we may have several departments that make decisions based on information, but only use their own data. We have to be aware that the cultural aspect goes a little further, with simple questions, for example: “Are people open to working with other sources of information?” and “would they know how to work with more sources of information?”.
From the moment the technological options are mapped and also as the main consumers of this new information format will receive the novelty, it will be fully possible to structure their data pipelines, Data Lakes, DWs and Data Marts in a more assertive way. And that brings us to the next topic: governance.
DataOps and data governance
Today, governance is much closer to a DataOps philosophy and practices than IT management. And when it comes to DataOps practices, it is critical to be aware that data has become one of the most important assets – if not, the asset – within organizations. In this sense, we understand that it is the responsibility of engineering to move and process the data, ensuring quality, security, integration and correct modelings, as well as constant optimizations.
In other words, when I speak of governance, I am referring to best DataOps practices. In the open data universe, for example, information will be connected and received from numerous different sources; however, the information could be repeated or equivalent.
Multiple sources and the same information? This is precisely one of the scenarios we have with Open Finance and other open data movements, since each institution and system has different ways of registering and storing. Therefore, within the data governance scenario, there is a need to provide guidelines on how to work in these cases – and to make this clear to end users. Once the end user has confidence in the process, everyone wins.
Deliveries according to the data consumer’s need
If the previous two points are guaranteed, it becomes much easier to guarantee assertive deliveries to data consumers.
What is assertive delivery in the context of data? It means delivering the dataset, according to the consumer’s need: in the right update frequency and in a format that adheres to the reality of the person who will consume it, be it an API, a data science team, a system or an analyst. Therefore, with a good foundation and data governance, no matter what the endpoint format may be, the team will have total conditions of making a high quality delivery to the consumer.
When we have control over the capture and storage of information, we know how to deal when we connect with multiple sources of information (near, equal or equivalent), delivering data assertively so that whoever will consume it may affirm that the process is correct. This is what gives business users confidence to consider and analyze information, enabling innovation in solutions for customers and the creation of competitive advantages before the market.
Constant evolutions to support solutions
When a solution is thought of only in the short run, decisions may arise that favor a reduction in time of execution, but that compromise quality and facilitate the emergence of technical liabilities. However, they will always exist within the technological context, since there is a constant delivery of the most modern tools and methodologies in all areas. Furthermore, those implementing the solutions may not have all the knowledge and resources needed to make a high quality delivery.
Having said this, to ensure that your environment is able to support data-based decisions in a scalable and resilient way, it is necessary, in addition to investing in new solutions, that those already delivered be maintained and to constantly evolve them. As a good practice, every time you find a technical liability in your environment, add it to your backlog so it is always on the team’s radar.
There are still many uncertainties within organizations about how to deal with shared data in open data initiatives, whether in cultural, intellectual or technical aspects. However, considering the steps mentioned above, I am absolutely certain that the environment will support any open data demands and that the stress common to this type of process will be minimal, favoring an environment in which data can become the basis for important business decisions.