Bandeau de titre - Opéra-Conseil

Data & Big data architecture

The data / bigdata architecture is essential for managing the collection of raw data, which may be more or less structured, in greater or lesser quantities, and which may come from different sources (internal, external). Après cet inventaire, il convient alors de créer et optimiser les infrastructures de stockage.

Cloud-based, on-premise and hybrid, we create secure and flexible data architectures that promote the use of high-quality, relevant and accessible data. Designed to grow with your business, a robust data architecture supports your analytics needs, including business intelligence, data science, custom applications and regulatory reporting.

We build data warehouses on modern platforms using proven techniques to provide a central, governed location for structured and semi-structured data resources. We advise you where your data should reside - in a data warehouse, a data lake or a combination of both.

We can migrate your data assets to a modern, scalable cloud-based database platform, such as Snowflake or one of the database platforms available on AWS and Azure.

Your customized migration plan will include platform set-up and configuration with technical migration details for all environments, training and commissioning procedures.

Get an assessment of your existing Microsoft Azure or AWS environments for operational excellence, security, reliability, performance efficiency and cost optimization. We'll provide you with detailed recommendations and best practices to improve in these five areas and get the most out of your investment.

Data Lake reveals the full potential of your data, you'll need a clear, standardized view of data sources. ​

Mastering these flows is a crucial first line of defense to ensure the proper exploitation of inherently heterogeneous data.

You should also be very vigilant about data security and the organization of your data.

  • The activity of IT systems urbanization allows anticipating the place that the Data Lake will occupy in the application landscape and in the organization of the information system.
  • The activity of data modeling and establishment of reference frameworks to maintain control of your Data Lake and prevent the transformation of the lake (Data Lake) into a swamp (Data Swamp).
  • The activity of change management focused on data and usage, to be able to transform future innovative ideas from your Data Lake into real competitive advantages for the organization.

What modules are in the Data Lake ecosystem? 

Other structural elements of your IS must be taken into account in the urbanization of the Data Lake:

  • The "data catalog," associated repositories, and their lifecycles,
  • The orchestration of processes surrounding the management of the creation, evolution, or disappearance of sources and destinations,
  • The physical transport of data, management of transaction integrity and uniqueness, error recovery...
  • Data normalization must find its place around a Data Lake that promotes raw original data. Pushing it downstream in the processing chain or coexisting old and new chains in parallel, the choices depend on constraints and expectations.

Interest of the specialized data lake or "PROCESS DATA LAKE" for industrial data

Let's revisit this appealing idea of standardization by integrating production tool data into a global data lake for the enterprise. Naturally, it brings a number of advantages:

  • Only one data lake architecture to manage;
  • Centralized storage of data;
  • Standard tools for all business areas within the company.

However, as we have seen, the uses of production process data have a number of characteristics that can make this approach less attractive:

  • An infrastructure and data architecture that are not adapted to the types of data processed, nor to the performance needs required for the expected uses.
  • Data processing needs that require numerous specific developments to perform the necessary transformations to build the expected information, risking ending up with a complex and difficult-to-maintain system.
  • The lack of business tools to meet the specific needs of production-related trades, quality, industrial processes... And potentially the need to carry out specific developments according to the needs.
  • This leads to managing the complexity of a large set rather than dividing it into more coherent and easier-to-manage sets.
  • Ultimately, this results in high implementation and maintenance costs, as well as long deployment times to successfully meet the needs.

We observe constants in terms of industrial data structure. In particular, we find data:

  • Temporales (sensors, online controls...);
  • Related to batches, operations, campaigns, cycles (recipes, quality controls, indicators, team, tools...);
  • Related to events (scheduled or unscheduled shutdowns, alerts, tooling changes, consumables changes...);
  • Of traceability (where, when, and how an operation, a batch, was carried out) and genealogy (how the various work units are linked together, how an operation implements one or more batches from previous operations...).

The way to process this data differs from one type of data to another.

So we observe that these data are very different from those of other business areas within the company, most of which are transactional data types.

Storing temporal data or traceability data, for example, requires different approaches to meet performance, cost, and usage requirements.

What we call the «Process Data Lake» is a specialized data lake for industrial data, accompanied by business tools that allow for quick and very operational responses to needs.

It is a framework suitable for storage, processing, and use of industrial data, and it offers:

  • An architecture adapted to industrial data and their uses: as explained earlier, the juxtaposition of various types of databases meeting the constraints of time series data (noSQL databases optimized for time series) as well as those of traceability data (relational type allowing genealogy management).
  • Des performances en ligne avec les besoins de traitement des utilisateurs : ceci est directement lié à la façon dont les données ont été structurées et stockées selon leurs spécificités​
  • Rapid deployment: because the most frequent processes used in the above-mentioned process industry (resampling, extrapolation, aggregations, material balance calculations, yield, genealogy construction...) are already pre-configured.
  • Controlled implementation and operating costs: because you benefit from accumulated experience with multiple industrial clients and we ensure maintenance and evolution functions through our SaaS architecture.