Discussing the Data Layer in Regulated Environments

As regulatory agencies catch up with the information age, thoughtful data historization is becoming more vital to the normal operation of any good-practice facility. Pharmaceutical companies now hoard truckloads of data, devoting significant resources and capital to ensure data integrity and 21CFR Part11 compliance. Many appear to be content limiting archived data to a historian system, creating a sprawling data lake where end users are left to make sense of it. But how is all that data beneficial outside of compliance and where is the return on investment? Why stop at compliance if a little effort will put that data back to work for you? That’s where the data layer comes in!

So, what is the “data layer” and why is it important? The data layer is comprised of the automated, intelligent functionality that collects and assesses data produced by a system, as well as contextualizes and links associated data which then can be used for higher level processing, such as process analytical technology or machine learning. It also includes shuttling readable data to the end user via reports generated automatically or manually. While it may seem complicated, simply put: The key principles of developing a data layer are collection, contextualization, association, and reporting.

Data collection consists of two primary types of data—time series and relational data. Most process data falls under time series data such as temperature, pH, points collected by a historian system with timestamps, and other metadata. Equally important is the relational data that is typically stored in an ELN, LIMS, SQL, and/or MES system. Examples of relational data include raw material lot information, batch execution schedules, and sample data.

Contextualization is the step of giving context to archived data in a historian system. Data storage is cheap, so cost isn’t a barrier in collecting data as it may be useful in the future. Most historized data isn’t relevant to end users. A critical step is defining what data is important and when it is important. This entails clearly defining your process and any vital sub steps, then listing the critical process parameters (CPPs) for each step. This delineates data context and characterizes relationships to process steps.

Association ties related parts of the process together via automated functionality. This is accomplished by establishing a hierarchical structure from the overall process and sub steps, essentially a logical hierarchy of data bins, and configuring automation within the data layer to recognize and link data bins together. For instance, the largest bins would encompass the whole process. Enclosed therein would be individual unit operations which, in turn, contain the logical divides of a batch like media prep, inoculation, growth, and harvest. Additionally, any batch event like alarms, log events, sample data, or lot info would be placed in the batch container. CPP data is contained in the bottom layer of the hierarchy making it available to all the higher levels of the process. Selecting software with the functional ability to access and link historian and relational databases is vital. Despite popular belief, there are cost-effective, reliable, and 21CFR Part11 compliant software options available that fit the bill. Establishing logical ties enables the data layer to reference data in the final step of the process—reporting.

Reporting is where everything comes together! Contextualized and associated data enables reporting tools to easily acquire related data needed to build trends, tables, and other information they require. Auto generated reports can be developed and executed via event triggers, like the completion of a batch, utilizing the association hierarchy to acquire all relevant data and events. Typically, these are exception reports which include log events, sample data, any alarms that occurred, and other information related to batch release. Reports can also be automatically generated based on machine learning triggers to identify when a process trends outside normal operating parameters or identifying equipment requiring preventative maintenance. This gets important information not just to operators on the plant floor, but the individuals responsible for making process decisions and planning maintenance—much faster than processing manually acquired data. End users can use the association hierarchy with manual report templates designed to allow users to select an enumerated event from the hierarchy enabling the system to acquire all information and data applicable to the hierarchical level selected. This greatly reduces potential hours spent collecting data for reports.

Implementing software and hardware within the data layer to accomplish data collection, contextualization, association, and reporting can streamline post process analysis and expedite batch release. Planning and developing the data layer enables end users to focus less on acquiring and processing raw data, making them more effective and moving the cost of operation away from legwork.

Eric Reisz is Lead Engineer at Panacea Technologies, a certified member of the Control System Integrators Association (CSIA). For more information about Panacea, visit its profile on the Industrial Automation Exchange.