During the last decade, we have heard a lot about Big Data platforms and how technological advancements have been driving data generation year after year. I recall a study by IBM, published in 2013, which brought the curious finding that 90% of all data in the history of humanity was generated in the two years preceding its publication.
The term Big Data platformsBig Data, initially introduced by then Gartner analyst, Doug Laney, in the early 2000s, refers to a large dataset that cannot be analyzed using traditional Business Intelligence tools and is characterized by the "3 V's" family:
- Volume: Refers to the vast amount of data generated and stored, potentially exceeding the scale of exabytes (1 billion gigabytes) of data.
- Velocity: Relates to the high speed at which data is generated through various sources, including sensors, mobile phones, streaming, social networks, IoT, hardware, among others.
- Variety: Data can come in different formats, structured like traditional tabular databases, but also in unstructured formats such as images, audio, videos, texts, etc.
As discussions on the subject evolved and became more enriched, other terms were added to the "3 V's" family, such as Veracity, Variability, Visualization, and Value. Additionally, the emergence of distributed computing and technologies like Hadoop enabled significant advancements in the analysis of these large data volumes.
However, few organizations can truly structure themselves and leverage the benefits of this practice. High investments in acquiring costly databases that often do not add value to the analysis, an excessive number of hours spent by highly qualified professionals on repetitive tasks of standardization and cleaning to create models that do not generate practical results were some of the factors that raised red flags for organizations to reconsider their data strategies. A 2021 Gartner survey indicated that 70% of organizations will redirect their focus from Big Data to Small and Wide Data by 2025. Big Data platforms para Small and Wide Data até 2025.
We can define Small Data as data arranged in a format and volume suitable for human comprehension, without requiring complex analysis and that can be stored on a standard server or computer. In this case, the focus is on the quality of the data collected, not the volume. This does not mean that this strategy cannot contribute to providing organizations with a comprehensive view of all aspects of their business. By systematically, automatically, and organizedly planning the collection of key information from each area of the organization, we can collect data from virtually any area of the company.
Although it does not require advanced knowledge in cloud, tools, distributed computing, and machine learning, a certain level of expertise is necessary to handle Small Data. A common practice observed in various organizations is that managers and analysts download numerous spreadsheets and csv’s files from different systems, integrating everything through a laborious, manual, and repetitive process using spreadsheets, which in turn can lead to several errors, compromising the reliability of the final indicators.
Identifying data sources and automating the extraction, cleaning, integration, and updating of this data can be considered the foundation that will support the entire Business Intelligence structure of an organization — processes that are part of Data engineering.
Once this foundation is solidified, it is time to devise the best strategy to make the most of this previously hidden treasure. Understanding the business context and activities of each area, knowing the end-user profile of the solution, and putting data analysis programming language skills into practice come into play to create custom Business Intelligence solutions such as dashboards, push notifications, and reports.
Solutions that, in turn, need to be adopted and promoted by the people in the organization to build confidence in the business community. Some uses of this information are related to:
- Supporting a Data-Driven Management: With reliable, secure, and easily accessible data, decision-making based on assumptions is replaced by decisions grounded in concrete facts and evidence. This type of management enables the engaging dynamic of creating action plans and monitoring their outcomes through meticulously designed indicators for each situation, learning from the identification of initiatives that generated the most results.
- Defining and Monitoring Key Indicators: From dashboards for senior management that encompass macro indicators from all areas to simpler models that support operations.
- 360º View of the Business: Integrating data from different systems and areas of an organization (Commercial, Marketing, Logistics, Operations, Finance, Human Resources, Purchasing, S&Op, etc.).
- Analyzing and Improving Processes: Identifying process improvement opportunities, measuring and monitoring efficiency, automating them, and establishing new management practices.
According to reports from companies like McKinsey & Company and Forbes, we are approaching a new era focused on verified, well-defined data that is relevant to the business and ready to be analyzed and incorporated into predictive models: the era of Smart Data.
Here, the focus is on optimizing the generation of insights and information to achieve business results, whether by efficiently utilizing Small Data Small Data or refining Big Data platforms to extract what truly adds value.
We observe a shift from a strategy that attempts to embrace a world of data without clear objectives (in the style of “boiling the ocean” ) to one that seeks to achieve good results with what is already available. The expression “getting your house in order before trying to change the world” can also be applied to the business context. The starting point involves surveying the main systems and digital data sources the organization has, observing how this data is structured and presented — whether exporting to Excel or accessing a database (even basic SQL knowledge enables this).
After this initial assessment, the challenge is to automate the data extraction, transformation, and loading process, integrating all data in a very organized, secure, and practical manner. Once this support structure is established, it is time to start thinking about the strategy for using this valuable resource.
From this point, the most diverse and unique solutions can be created, influenced by the desired perspectives, needs, people, management models, culture, and other unique characteristics of each organization. Some companies may start with a “Small & Wide” strategy to gain a 360º view of the key indicators in their organization.
Others may prefer to monitor and improve processes in a specific area, eliminating costly rework related to inconsistent reports and information compilation. In some cases, they may even pursue large volumes of data to apply a predictive model and attempt to solve a well-defined problem.
Have questions about how to start this journey? Count on equal | BI & Data Engineering to support your organization in this transformation.