data-big-data-architecture

The data/big data architecture is essential to manage the collection of raw data, which may be more or less structured, in greater or lesser quantities and which may come from different sources (internal, external). After this inventory, it is then necessary to create and optimize the storage infrastructures.

Your Data & Bigdata architecture concerns

  • Develop new growth drivers thanks to the control of your information capital.
  • Bring out new needs and use cases by capitalizing on the company’s data.
  • Improve the analysis of information using a large volume of data and a wide variety of data sources.
  • Govern your data in compliance with the rules and regulations (especially RGPD).
  • Guarantee data quality and security in a DataLake context.
  • Leverage linear scalability to control the costs of data growth.

A modern data architecture eases the access to – and promotes usage of – large volumes of data from traditional and non-traditional sources.

We build secure and adaptable architectures to help customers overcome these common challenges that arise from rigid or aging architectures:

  • Inability to address business requirements quickly enough
  • Inability to process data in real-time or near real time
  • Difficulty handling big data (huge volumes, streaming, and multitude of data sources and types)
  • Discrepancies in how data is gathered, processed, and used
  • Lack the infrastructure needed to support advanced analytics

Our Data Architecture Services

Enterprise Data Architecture

Cloud-based, on-premise, and hybrid–we build secure and flexible data architectures that promote the use of high quality, relevant, and accessible data. Built to grow along with your business, a solid data architecture supports your analytics needs, including business intelligence, data science, custom applications, and regulatory reporting.

Data Warehouses and Data Marts

We build data warehouses on modern platforms using tried-and-true techniques to provide a central, governed location for structured and semi-structured data assets. We advise where your data should reside – in a data warehouse, a data lake, or a combination of the two.

Cloud Data Migrations

We can migrate your data assets to a modern, scalable cloud-based database platform such as Snowflake or any of the database platforms available on AWS and Azure. Your custom migration plan will include stand-up and configuration of the platform with technical migration details for all environments, training, and go-live procedures.

Platform Health Checks

Get an evaluation of your existing Microsoft Azure or AWS environments for operational excellence, security, reliability, performance efficiency, and cost optimization. We’ll provide detailed recommendations and guiding best practices to improve on these five areas and get the most from your investment.

A Data Lake without enterprise architecture is a leap into the unknown.

A Data Lake unlocks the full potential of your data, but to achieve this, you need to have a clear and standardized vision of data sources.

Mastering these data flows is a crucial first step to ensure the proper utilization of diverse data.

Furthermore, you must be vigilant in data security and data organization.

The essential contributions of enterprise architecture in the value and success of a Data Lake can be divided into three distinct chapters:

  • Information system urbanization activity, which allows you to anticipate the role the Data Lake will play in the “application landscape” and in the organization of the information system.
  • Data modeling and the establishment of repositories to maintain control of your Data Lake and prevent it from transforming into a Data Swamp.
  • Change management activity focused on data and usage, enabling you to transform future innovative ideas from your Data Lake into real competitive advantages for the organization.

Which modules are included in the Data Lake ecosystem?

Other structural elements of your IT system must be considered when urbanizing the Data Lake:

  • The “data catalog,” associated repositories, and their life cycles.
  • Orchestrating processes related to the management of source and destination creation, evolution, or removal.
  • Physical data transportation, integrity and uniqueness of transactions, error recovery, etc.
  • Data normalization should regain its place around a Data Lake that promotes raw data in its original form. Whether to push it downstream in the processing chain or coexist with old and new chains in parallel depends on constraints and expectations.

Benefits of a specialized “Process Data Lake” for industrial data

What about a global data lake for management and industrial data?

Let’s revisit the appealing idea of standardization by integrating production tool data into a global data lake for the entire enterprise. Naturally, it offers several advantages:

  • Managing a single data lake architecture
  • Centralized storage of data
  • Standardized tools for all business functions within the organization

However, as we have seen, the usage of production process data has certain characteristics that can make this approach less attractive:

  • The infrastructure and data architecture may not be suitable for the data types and performance requirements necessary for the intended uses.
  • Processing the data requires numerous specific developments to perform the necessary transformations, which can result in a complex and hard-to-maintain system.
  • Lack of industry-specific tools to address the specific needs of production, quality, industrial processes, etc., potentially requiring additional custom development.
  • Managing the complexity of a large set of data as a whole rather than dividing it into more coherent and manageable subsets.
  • Ultimately, high implementation and maintenance costs, as well as long deployment times, to meet the required needs.

There are common elements in terms of industrial data structure, including:

  • Temporal data (sensors, online controls, etc.)
  • Data related to batches, operations, campaigns, cycles (recipes, quality controls, indicators, teams, tools, etc.)
  • Event-related data (planned or unplanned shutdowns, alerts, tool or consumable changes, etc.)
  • Traceability data (where, when, and how an operation or batch was performed) and genealogy data (how different units of work are interconnected, how an operation involves one or multiple batches from previous operations).

Data lakes for industrial data – Specificities

The way these data are processed differs depending on the data type.

Therefore, we observe that this data is very different from the data in other business areas, most of which are transactional data.

Storing temporal data or traceability data, for example, requires different approaches to meet performance, cost, and usage requirements.

The “Process Data Lake” for industrial data

The “Process Data Lake” refers to a specialized data lake designed for industrial data, accompanied by industry-specific tools to quickly and operationally meet the needs.

It provides an environment tailored to the storage, processing, and utilization of industrial data, offering:

  • An architecture suitable for industrial data and its uses: combining different types of databases to accommodate the requirements of time series data (optimized NoSQL databases for time series) as well as traceability data (relational databases for genealogy management).
  • Performance aligned with user processing needs, which directly relates to how the data is structured and stored based on its specific characteristics.
  • Rapid deployment, as the most frequent data processing tasks used in process industries (resampling, extrapolation, aggregation, material balance calculations, yield calculations, genealogy construction, etc.) are already pre-configured.
  • Controlled implementation and operating costs, benefiting from accumulated experience with multiple industrial clients, and ensuring ongoing maintenance and evolution through our SaaS architecture.