Writen by Tomaz Lago, Data Engineer,
5 minutes of reading
How to use your data intelligently to leverage your business value
Extracting value from data becomes essential to offer the best experience, the right products and services through more accessible channels for the user.
A world flooded with shared data brings not only challenges, but also many opportunities. In this view, the focus is no longer on the advantages that the company has when holding data, but on transparency and comfort for the customer In Brazil, some sectors are already making concrete progress towards the open data movement, such as Open Banking and the incipient Open Insurance.
Although developments related to Open Finance are on the spotlight, initiatives to use available data with the goal of making better decisions are gaining strength in several segments, beyond the niches of financial services and insurance companies. In this sense, regardless of the sector, extracting value from data to offer the best experience, the right products and services, through more accessible channels for the user, becomes essential for the success of such initiatives.
The business impacts from these new rules are huge – many still unexplored, others undiscovered. Next, I offer some possible paths in the data engineering context, so that business teams can take advantage to create intelligent solutions and ensure competitiveness with open data.
Understand and consider the current context of the company
As already mentioned, many approaches and possibilities of open data have not yet been discovered, since we are still learning how to deal with this type of sharing. Thus, the first major challenge is to build an architecture that can support any and all sources – whatever they may be. Following this logic, a natural solution would be the construction of a Data Lake, a large repository of data, structured and unstructured, which can be used for decision making.
The technological context and organizational culture are extremely important points that must be taken into account when creating a data foundation that has a Data Lake at the core of its information. Technological context refers to the tools that the company has and the environment, be it on premises or in the cloud. Tools that support multiple connectors and end-to-end flow orchestrators are crucial for this task.
When we talk about organizational culture, we are not necessarily talking about a Data-Driven culture, as we may have several departments that make decisions based on information, but that use only their own data. We have to be aware that the cultural aspect goes a little further, with simple questions, for example: “Are people open to working with other sources of information?” and “would they know how to work with more sources of information?”.
From the moment the technological options are mapped and knowing how the main consumers of this new information format will receive the novelty, it will be fully possible to structure their data pipelines, Data Lakes, DWs and Data Marts in a more assertive way. And that brings us to the next topic: governance.
DataOps and Data Governance
Currently, governance is much closer to a DataOps philosophy and practices of IT management. And when it comes to DataOps practices, it is essential to be aware that data has become one of the assets – if not the – most important asset within organizations. In this sense, we understand that engineering is responsible for moving and processing data, ensuring quality, security, correct integration and modeling, as well as constant optimizations.
That is, when I speak of governance, I am referring to good practices of DataOps. In the open data universe, for example, information will be connected and received from several different sources; however, the information may be repeated or equivalent.
Multiple sources and the same information? This is exactly one of the scenarios we have with Open Finance and other open data movements, since each institution and system has different ways of registering and storing it. Therefore, within the data governance scenario, there is a need to provide guidelines on how to address these cases – and make this clear to end users. Once the end user gains confidence in the process, everyone wins.
Deliveries according to the data consumer needs
If the two previous points are secured, it becomes much easier to guarantee assertive deliveries to data consumers.
What does assertive delivery mean in a data context? It means delivering the dataset, according to the consumer’s needs: at the correct update frequency and in a format that adheres to the reality of the consumer, be it an API, a data science team, a system or an analyst. Therefore, with good data foundation and governance, regardless of the endpoint format, the team will be fully able to do a high-quality delivery to the consumer.
When you have control over the collection and storage of information, you know how to deal with it when connecting multiple sources of information (similar, equal or equivalent), and delivering data assertively to those who will consume it affirms that the process is correct. This is what provides business users with the confidence to consider and analyze information, enabling them to innovate solutions for customers and create competitive advantages in the marketplace.
Constant evolutions to support the solutions
When a solution is conceived only for the short term, decisions may arise that favor a reduction of execution time, but that compromise quality and facilitate the emergence of technical debts. However, they will always exist within a technological context, since there is a constant delivery of more modern tools and methodologies in all areas. In addition, whoever is implementing the solutions may not have all the necessary knowledge and resources to make a high-quality delivery.
That said, in order to ensure that your environment is able to support data-based decisions, in a scalable and resilient way, it is necessary, in addition to investing in new solutions, to keep those already delivered and constantly evolve them. As a good practice, every time you find a technical debt in your environment, add it to your backlog, so that it is always kept on the team’s radar.
There are still many uncertainties within organizations about how to deal with shared data in open-data initiatives, in cultural, intellectual or technical aspects. However, considering the steps mentioned above, I am absolutely certain that the environment will support any open data demands and that the stress common to this type of process will be minimal, favoring an environment in which data can become the basis for important business decisions.
Read too: Data Engineer: why is this professional so relevant in a Data Science project?