Benefits of a specialized “Process Data Lake” for industrial data
What about a global data lake for management and industrial data?
Let’s revisit the appealing idea of standardization by integrating production tool data into a global data lake for the entire enterprise. Naturally, it offers several advantages:
- Managing a single data lake architecture
- Centralized storage of data
- Standardized tools for all business functions within the organization
However, as we have seen, the usage of production process data has certain characteristics that can make this approach less attractive:
- The infrastructure and data architecture may not be suitable for the data types and performance requirements necessary for the intended uses.
- Processing the data requires numerous specific developments to perform the necessary transformations, which can result in a complex and hard-to-maintain system.
- Lack of industry-specific tools to address the specific needs of production, quality, industrial processes, etc., potentially requiring additional custom development.
- Managing the complexity of a large set of data as a whole rather than dividing it into more coherent and manageable subsets.
- Ultimately, high implementation and maintenance costs, as well as long deployment times, to meet the required needs.
There are common elements in terms of industrial data structure, including:
- Temporal data (sensors, online controls, etc.)
- Data related to batches, operations, campaigns, cycles (recipes, quality controls, indicators, teams, tools, etc.)
- Event-related data (planned or unplanned shutdowns, alerts, tool or consumable changes, etc.)
- Traceability data (where, when, and how an operation or batch was performed) and genealogy data (how different units of work are interconnected, how an operation involves one or multiple batches from previous operations).
Data lakes for industrial data – Specificities
The way these data are processed differs depending on the data type.
Therefore, we observe that this data is very different from the data in other business areas, most of which are transactional data.
Storing temporal data or traceability data, for example, requires different approaches to meet performance, cost, and usage requirements.
The “Process Data Lake” for industrial data
The “Process Data Lake” refers to a specialized data lake designed for industrial data, accompanied by industry-specific tools to quickly and operationally meet the needs.
It provides an environment tailored to the storage, processing, and utilization of industrial data, offering:
- An architecture suitable for industrial data and its uses: combining different types of databases to accommodate the requirements of time series data (optimized NoSQL databases for time series) as well as traceability data (relational databases for genealogy management).
- Performance aligned with user processing needs, which directly relates to how the data is structured and stored based on its specific characteristics.
- Rapid deployment, as the most frequent data processing tasks used in process industries (resampling, extrapolation, aggregation, material balance calculations, yield calculations, genealogy construction, etc.) are already pre-configured.
- Controlled implementation and operating costs, benefiting from accumulated experience with multiple industrial clients, and ensuring ongoing maintenance and evolution through our SaaS architecture.