What is a Good Approach to Industrial SaaS?

“A chain is only as strong as its weakest link,” goes the old saying. How true that is in industrial control systems. One small glitch on an assembly line can force a shutdown. Lack of a single ingredient in a chemical plant or food processing system might wreak havoc. Factory automation, power generation, resource extraction and processing, transportation and logistics are all supported by chains of mechanisms, hardware, and software, as well as operators and engineers that must each carry out their mission to produce the expected output, product, or service.

From time to time, new technologies come along that provide cost savings, better performance, and ease of use—new links for the chain. Electrical relays, pneumatic controls, DCSs, PLCs, RTUs, SCADA, plant networks, and fieldbus were all at one time proposals on the drawing board. Each had its evangelists and skeptics, and each was thoroughly tested. Once its value was proven, each one has become a strong link in the automation chain.

One of the latest technologies to be proposed is software as a service (SaaS). SaaS provides access to hosted software over a network, typically the Internet, and is closely related to the concepts of smart factories, cloud computing, industrial Internet, machine-to-machine (M2M), and the Internet of Things (IoT). Adding SaaS to an industrial process control system means adding data collection, integration, and/or distribution capabilities beyond the limits of most existing in-house systems.

SaaS can open wider access to plant data in real time, which can support real-time monitoring of processes, supply the big data needed to drive predictive maintenance programs, provide the ability to outsource customer care facilities, deliver real-time KPIs, and otherwise leverage the value of new or existing SCADA investments. Implemented well, software as a service should also provide a significant cost savings over the traditional avenue of software ownership.

To be truly useful, though, industrial software as a service should be secure, quick, and robust, as well as adaptable and convenient to use.

Secure

Industrial systems require the highest possible level of security. “IT security was the most oft-cited obstacle to setting up smart factories,” according to Lopez Research in their January 2014 article Building Smarter Manufacturing With The Internet of Things (IoT). A comprehensive report from GE titled Industrial Internet: Pushing the Boundaries of Minds and Machines states, “The Industrial Internet will require putting in place a set of key enablers and catalysts,” including, “a robust cyber security system and approaches to manage vulnerabilities and protect sensitive information and intellectual property.” Achieving this level of security requires a comprehensive approach, including secure servers, authorization and authentication of users, encrypted data transport mechanisms, and keeping all firewall ports closed at the plant level.

Quick and Robust

Industrial software as a service should provide as close to real-time performance as the network or Internet infrastructure will support. This means that data updates should be in milliseconds, not seconds or minutes. It should be able to handle thousands of data changes per second, and support redundant connections with hot swap over capability.

Adaptable

Industrial systems are diverse, built from a wide range of equipment and controls, using various data protocols, and come in all sizes. Industrial SaaS should be able to connect seamlessly to any new or installed system at any number of locations with no changes to hardware or software. It should use open data protocols and APIs. Ideally it should work with any size of system, from complete DCS and SCADA systems down to a single machine or embedded device. Running as a cloud-based service, it should also readily scale up or down depending on user needs.

Convenient

To gain acceptance in the market, industrial SaaS should be convenient to use. It should be easy to demo, sign up for a service plan, configure connections, and monitor usage and costs. It should offer off-the-shelf tools to get your data to and from the cloud with no programming, provide the ability to easily integrate data from multiple sources, and include options like data storage and HMI displays–all without disrupting the industrial process in any way.

Redesigning for Security

Among these requirements, the most challenging is security. Without the ability to fully protect mission-critical processes and their data, industrial SaaS is simply a non-starter. And yet, a fundamental characteristic of virtually all industrial systems presents a significant security risk for any cloud-based system: a firewall port must be kept open.

The current approach to this problem is to implement VPN (virtual private network) technology. A VPN creates a secure space on a network that is isolated from all other traffic. However, this is not an ideal solution because a VPN allows every connected device and user full access to every other device and user. For a single control room, this may not seem to be too much of an issue. But in the world of cloud computing, operators and field engineers will expect to have access to data on tablets and cell phones, which can easily fall into the wrong hands.

Ironically, using a VPN might even turn a plus into a minus. A strong selling point of SaaS is its potential to act as a platform for sharing limited data sets with authorized suppliers, customers, and other parties. Few IT departments would be willing to hand over the keys to the store by providing these players with access to the corporate VPN.

A better approach is needed. Although VPN might be useful under certain circumstances, it doesn’t address the fundamental design issue of virtually all industrial data communication, which is the client-to-server architecture. To get data out of an industrial system, a client needs to request it from a server. So, if any kind of cloud service needs access to process data from a server located behind a plant firewall, a port on that firewall must be kept open at all times. And open firewall ports are inherent security risks.

What is required is a new design. SaaS transmits data over the Internet, and there is a TCP protocol that supports a different connection model: WebSocket. With the right kind of engineering this protocol can be applied to industrial data communications in a way that allows only out-bound connections from the plant to the cloud. No in-bound connections are necessary; no plant firewall ports need to be left open. Once the connection is established, the data can flow in both directions. Or you can choose to make all or some of your data read-only, preventing any write back from the cloud. Whichever approach you take, the data flows through a closed firewall, making your plant effectively invisible to Internet attacks.

In addition to protecting the plant, with this design no primary or mission-critical control need be performed by the service. All local control can remain untouched. The system manager has complete flexibility over what data gets passed to the service, and the connection can be configured as read-only if desired.

Real-Time Data

Next to security, a good industrial SaaS solution should perform well. When you mention anything related to cloud computing, most people conjure up an image of a giant database sitting up in the air somewhere in which you store data, and pull it out when you need it, like Gmail or Dropbox. Although that model works fine for storing static data, industrial systems function in real time. The true value of industrial SaaS should be realized through real-time performance, which requires a fundamentally different architecture on the cloud.

One good approach is for the service provider to host a real-time, memory-resident database that can receive and retransmit data at speeds of tens of thousands of data changes per second. Achieving these speeds is possible through a publish/subscribe method of data delivery, an event-driven model in which a client registers for data changes one time and then receives subsequent updates immediately after they occur. This kind of low-latency system adds almost nothing to the overall data transmission time, effectively keeping throughput speeds to just a few milliseconds more than network propagation time.

To further speed up throughput, all data should be handled in the simplest possible format, by taking a data-centric approach. This kind of system is able to work with all kinds of data sources and users, such as control systems, OPC servers, databases, spreadsheets, web pages, and embedded devices. When a data source connects, its data gets stripped of all unnecessary formatting (XML, HTML, OPC, SQL, etc.) and is added to a universal data set comprising data from all connected sources. Updates to any point in this data set are immediately passed to any client registered for them. At the receiving end the data can be transformed back into its original format, or into whatever other format the client might need.

Making it Work

Anyone who has spent any time in industrial automation soon discovers that every system is unique. Different industries, plants, and project requirements demand a wide range of tools, machines, and other devices which are provided by hundreds of independent suppliers, and installed by a multitude of diverse system integrators and plant engineers worldwide.

Good industrial SaaS should fit as many of these different types of systems, protocols, and brands of equipment as possible. It should use open, standard protocols like OPC, TCP, and ODBC. If it is completely vendor agnostic, it is in the best position to leverage investments in existing equipment or enhance new installations with real-time data connectivity. Ideally it should be capable of being added to a SCADA system, function as an HMI for an individual machine, or access RTUs and even individual embedded devices.

As a cloud-based system, we would expect the service to be able to scale up or down to meet the needs of its users. This means the ability to handle bursts of high-speed activity in the data flow at any given moment, as well as the capacity for quick reconfiguration to support expansion requirements of a growing system. Users should be able to add data points to a particular device, or bring on new devices, new SCADA systems, even new locations and installations through an easy-to-use interface.

Finally, data from the service should be readily available to key decision-makers in the way they can best use it, be it an operator using an HMI for monitoring and supervisory control, a field engineer picking up the latest figures from the home plant, an analyst running real-time scenarios from facilities spread across three continents, a just-in-time supplier whose system is keyed to current production levels, or a plant manager responsible for production at a group of isolated facilities. Good industrial software as a service should be a solid link in the chain, reducing costs, meeting the needs of all players, and doing it securely, quickly, and conveniently.

Can I Store and Forward OPC Data?

Every once in a while we get asked: “Can I store and forward my OPC data?” It seems like a reasonable question. I’m connecting an OPC server and client across a network, possibly with OPC UA, or more likely OPC DA—using DCOM or an OPC tunnelling product. In any case, should there be a network problem and the connection gets broken, I don’t want to lose any of my data. Why not store it, and forward it later when the connection is back up? A little understanding of the purpose of OPC, and how it works, though, should make it clear that this impossible.  The answer to this question is effectively “No–not real-time data, anyway.”

Let’s look at the OPC DA scenario first, since this is where the question comes up most often. Put simply, you cannot store and forward OPC DA data. The OPC DA protocol is for live, real-time data. OPC DA clients need to know what is happening right now in the system. An OPC DA server does not maintain a history of past values, and an OPC DA client only works with the most recent value of an OPC tag.

Despite this reality, suppose some bright spark decided he wanted to go ahead and add store-and-forward capability to OPC DA. When the connection to the client breaks, somehow the OPC server would need to start storing values―and this raises questions. How would the client identify which items in the server need to replay past values? How would the server know how long to store the data? What should the server do if this limit is exceeded? There is no facility in OPC for the client to specify any of this information, and it could not be robustly specified at the server. Not all clients would have the same requirements.

Then, when the connection gets re-established, the OPC server would start sending those values to the OPC client. Here it runs into more trouble. How would the server know that this was the same client reconnecting? How long has the client been disconnected? Has the client configuration changed since the last connection and therefore the previous store-and-forward configuration is out of date? How quickly should the values get sent? Should the OPC server replicate the time gaps between value changes? Then it would never catch up to the present. Or should it send all the stored values in a sudden burst of changes?

But the biggest problem is that old values are, by definition, wrong – only the most recent value is correct. In a store-and-forward scenario the client would experience a series of changes that could manifest as wrong indications in an HMI, or worse, wrong control signals to a physical system. Would you really want a piece of equipment turning on and off repeatedly as the data history catches up on the client? That could be inefficient, damaging or dangerous. In practice, most OPC clients can only hold one value for any given tag, and only now matters. The client doesn’t need and can’t use a replay of historical data, it needs to know what’s happening this instant.

It is a fact of life that network connections drop occasionally. When this happens, the client should present the data with a Bad or Not Connected quality, or show a Last Known Value warning until the connection is restored. Once the connection is back the client should receive the current data values. Whatever happened in the meantime is unimportant or downright misleading.

The idea of store-and-forward makes perfect sense, though, for data logging, archiving, and historical data. This is where it is possible to maintain a complete record of everything that has taken place on the OPC server. Although OPC DA itself does not offer this capability, a user can add an OPC HDA server, or connect a database to the OPC DA server. The responsibility for store-and-forward is now in the hands of the historian, where it belongs.

Turning now to OPC UA, from a fundamental design perspective the store–and-forward story is actually pretty much the same as it was for OPC DA. Nothing has really changed for an OPC UA client – it still expects to receive the latest data possible. The only slight difference is that the OPC UA architecture integrates historical data access better than OPC Classic does. This makes it easier for OPC UA servers to support historical data, and for OPC UA clients to request historical information. But whether data storage or archiving abilities are built into the server or provided separately, the reality of real-time data communications is that a real-time client cannot handle data that has been stored and forwarded. Nor is that its purpose. There are other tools designed for that.

So, the answer to the question, “Can I store and forward my OPC data?” is two-fold. On the one hand, you can record your OPC data, and then store and forward it as archived data. But you cannot store and forward data in real time.

 

Relational Database or Real-Time Historian for Logging Process Data?

Quite a few years ago, while living in a part of the world where personal computers were a relatively new phenomenon, I happened to be in an office watching a secretary busily typing away at her new PC. She was thrilled to have such powerful tool to use. “Look!” she said excitedly. “Now I can write and correct my work so easily!” I looked at the screen, and had to smile. She was composing an entire business letter within a single cell of an Excel spreadsheet.

What determines the right tool for the job? For that secretary, the right tool was the one that was available and that she knew how to use. What’s the right tool for logging data from a process control application? Sometimes a CSV file is all that is needed. Sometimes Excel will do. More often, though, engineers and system integrators will use either a relational database or a real-time historian to store permanent records of process data.

Relational databases have the advantage of availability and familiarity. SQL Server, MySQL, Oracle, or any other relational database, including Microsoft Access, are usually already installed at the company. They offer a common interface, ODBC, and the IT department is familiar with them. It’s no surprise that relational databases are being used for logging process data, particularly when the requests for data come from management personnel familiar with this kind of database.

But a relational database may not be the ideal choice for all process control applications. Designed for the needs of corporations, businesses, and banks to store transactional data, relational databases are optimized for analyzing complex relationships between data. These databases can afford to focus processing power on these relationships because the data itself gets updated relatively infrequently, usually in terms of hours, days, or months. While analytical power is good for business applications, process control applications typically don’t need it. What they do need is speed.

Blog-HistorianorODBCA real-time historian, on the other hand, is like a flight-recorder for process data. Rather than relational, it is a temporal database, storing its records in a flat file, consisting of simply the name, value, quality, and timestamp for a data point. The historian is designed for speed of storage and retrieval of data, and can typically process millions of transactions per second. This kind of performance is valuable for processes with variables that may change many times per second, and where capturing every event over the course of each eight-hour shift is vital.

Despite the performance advantages of a real-time historian, some companies opt for using relational databases for logging process data. This is completely understandable, since those are the tools that company IT staff and upper management are most familiar with. But there are three important reasons why this approach may not be sufficient:

  1. A real-time historian logs every data change for a point, even when the values change rapidly. Using highly efficient storage algorithms, the complete data set can be stored for long time periods. A relational database, in contrast, typically needs to drop some or most of the data as it is being logged, because it is not optimized for storing high volumes of data. Unfortunately, the data is dropped regardless of its importance. So you might end up logging routine changes and throwing away the unusual events that could lead to alarm conditions. In addition to detecting every change, big or small, the high volume capabilities of a real-time historian are useful for detecting subtle trends that may only appear over months or years.
  2. A strong advantage of a real-time historian is its native ability to process time-based queries. For example, you might need the standard deviation of a point that changes on average 25 times per second, in 10-second windows for the last two minutes. A good historian will provide an easy way to submit such a query, and will return the results quickly, with a minimum drain on system resources. Built-in query functions typically allow you to select any time period, from a few seconds to weeks or more, and retrieve averages, percentages of good and bad quality, time correlations, regressions, standard deviations, and so on. All of this may be possible through SQL queries on a relational database, but through much more programming effort and greater system resource use.
  3. The two above advantages of a real-time historian may perhaps best be appreciated when working with live trend displays. Calculating and displaying a moving line that updates several times per second requires not just the ability to log all the data points in real time, but also to re-emit them quickly and repeatedly for the moving display. And if a user wants to scroll backwards and forwards through the data set as it is being logged, the historian has to be able to manage rapid, continuous queries to the data set. This kind of task is nearly impossible for an off-the-shelf relational database, unless the screen refresh rate is annoyingly slow.

Even with these points in mind, there are many applications for which logging process data in a relational database works just fine. In fact, sometimes logging to a CSV file is sufficient. To be fair, these are really not the same level of technology mis-match as writing a complete business letter in one cell of a spreadsheet. The well-informed system integrator or engineer will understand the values of each approach, look at the needs of the project and resources available, and employ the right tool for the job.

Redundancy for OPC

Early one morning, Mel Farnsworth was sitting in the control booth at the Hardy Automotive Parts assembly line, drinking his final cup of coffee before the end of the shift. Watching the line meter graph in his HMI console, he noticed that the yield and efficiency trends for the Line 3 had dropped to zero. So he looked down through the control-room window, but Line 3 seemed to be rolling right along. What was the problem?

The line was running smoothly, but Mel wasn’t getting the data he needed. Somewhere between the PLCs and his HMI display there was a data disconnect. Maybe it was a fieldbus problem, or a bad network connection. Perhaps it was caused by his OPC server, or possibly even his HMI system itself. Whatever the reason, since Mel’s data connection was a single chain, one break in the chain means that he didn’t get his data. To minimize this kind of risk and ensure the highest possible availability, mission-critical systems often use redundancy.

What is Redundancy?

Redundancy in a process control system means that some or all of the system is duplicated, or redundant. The goal is to eliminate, as much as possible, any single point of failure. When a piece of equipment or a communication link goes down, a similar or identical component is ready to take over. There are three types of redundant systems, categorized by how quickly a replacement (or standby) can be brought online. These are cold standby, warm standby, and hot standby.

Cold standby implies that there will be a significant time delay in getting the replacement system up and running. The hardware and software are available, but may have to be booted up and loaded with the appropriate data. Picture the olden days of steam locomotives. The cold standby was the extra engine in the roundhouse that had to be fired up and brought into service.   Cold standby is not usually used for control systems unless the data changes very infrequently.

Warm standby has a faster response time, because the backup (redundant) system is always running, and regularly updated with a recent copy of the data set. When a failure occurs on the primary system, the redundant system can disconnect from the failed system and connect instead to the backup system. This allows the system to recover fairly quickly (within seconds, usually), and continue the work. Some data will be lost during this disconnect/reconnect cycle, but warm standby can be an acceptable solution where some data loss can be tolerated.

Hot standby means that both the primary and secondary data systems run simultaneously, and both are providing identical data streams to the downstream client. The underlying physical system is the same, but the two data systems use separate hardware to ensure that there is no single point of failure. When the primary system fails, the switchover to the secondary system is intended to be completely seamless, or “bumpless”, with no data loss. Hot standby is the best choice for systems that cannot tolerate the data loss of a cold or warm standby system.

A Typical Redundant OPC System

What does redundancy look like in an OPC-based system? A typical scenario would have two OPC servers connected either to a single device or PLC, or possibly duplicate devices or PLCs. Those two OPC servers would then connect to some kind of OPC redundancy management software which, in turn, offers a single connection to the OPC client, such as an HMI. The redundancy manager is responsible for switching to the secondary OPC server when any problem arises with the data coming from the primary OPC server. This scenario creates a redundant data stream from the physical system all the way to the HMI.

The most common use of redundancy for OPC is with OPC DA or UA, but it is possible to configure redundant OPC A&E systems as well. The principles are the same.  Sometimes, on large systems, it is necessary to configure multiple redundant pairs. Redundancy can also be configured over a network, using DCOM or OPC tunneling. For a networked configuration, the redundancy manager would normally reside on the OPC client machine, to minimize the number of potential points of failure.

Although cold or warm standby may be useful under some circumstances, typically an engineer or system integrator implementing a redundant OPC system is looking for hot standby. This is the most useful kind of redundancy in a process control system, and at the same time the most difficult to achieve. Let’s look a little more closely at that all-important task of the OPC redundancy manager in a hot-standby system—making the switch.

Making the Switch

Put simply, a hot-standby redundancy manager receives data from two identical inputs, and sends a single output to the OPC client. It is the redundancy manager’s job to determine at all times which of the two data streams is the best, and switch from one to the other as soon as possible whenever the status changes. The switch can be triggered by a number of different kinds of events:

  • Single point value change – to or from a certain value, achieving a threshold, etc.
  • Single point quality change – for example, from “Good” to any other OPC quality.
  • Multiple item monitoring – if the quality or value of any point in a group goes bad.
  • Rate of change monitoring – if points change value more slowly than expected.
  • Network breaks and timeouts – checked with some kind of heartbeat mechanism.

Once the switch has occurred, the system or the redundancy manager itself might have the ability to send an alarm or email message, or even launch some kind of diagnostic or investigative program. It might also be able to log diagnostic information about the state of the primary OPC server or network connection. And in a system that distinguishes between primary and secondary inputs, there will often be a means to favor the primary input, and switch back to it when possible, sometimes referred to as a fallback.

Practical Considerations

The idea of redundancy for OPC is not difficult to grasp, but implementing it takes some thought. An initial decision on cold, warm or hot standby will impact all aspects of the implementation. The choice of proper hardware and software is critical for a well-functioning system. Robust system architecture is also important, especially if the connection is across a network. In addition to selecting OPC servers and planning the network infrastructure (if necessary), an important decision will be the software used to manage the redundancy. Good redundancy management software should be easy to use, with no programming necessary. The technology should be up to date, capable of running on the latest version of Windows. There should be an absolute minimum chance of data loss during a switchover, even over a network.

The Timer Pitfall

In practice it is not possible to achieve a completely seamless switchover in all cases, even with a hot standby system. For example, if a network failure occurs on the primary connection, a certain amount of time will pass before a redundancy manager can detect that failure. Data transmitted during this period will fail to arrive, but the redundancy manager will not be able to distinguish between a failure and a normal pause in data flow.

Many redundancy managers implement timers to periodically check the network connection status to try to minimize this delay, but a switchover mechanism based on periodic timers will always suffer from data loss. Systems with multiple timing parameters will often result in additive delays, where the fastest possible switchover for the system is the sum of these timing delays. In addition, the use of timers to detect network failure can result in a configuration problem where the system integrator must trade off switchover latency against false-positive network failure detection. This effectively becomes a trade off between system stability and responsiveness.

Using timers to periodically check data values or qualities, or poll the OPC servers, is also problematic because timers introduce unnecessary latency into the system. Whereas a network failure must be detected based on timing, a data value or quality change can be detected immediately as the event occurs. Generally it is usually best to avoid systems based on time-based value change detection, and use event-based object monitoring instead.

Object and Link Monitoring

A good redundancy manager should be able to support both object monitoring and link monitoring. Object monitoring means the ability to monitor individual points, and make a switchover based on an event. For example, if a designated watchdog tag changes in a significant way, such as turning negative or going over a specified threshold, it can trigger a switch to the secondary OPC server. Or maybe you’d like to monitor a group of points, and if the quality of any of them goes to “Bad” or “Unconnected”, you can switch.

Link monitoring is especially useful for networked connections. Your system will need a way to detect a network break very quickly, to prevent data loss. For hot standby on high-speed systems with fast data update rates, timeout detection with a sub-second response rate is essential. In any event, the system should be able to detect a timeout for a failed network connection, as well as a failure to receive data. This distinction is important. It may take seconds or even minutes to detect a communication failure, but a redundancy manager should be able to detect a stoppage of data flow in an amount of time very close to the true data rate from the physical system. The redundancy manager should be able to switch from one source to the other based solely on an observation that data has not arrived from the primary connection, but has arrived from the backup system.

Some systems use COM timeouts for link monitoring. This may be acceptable for circumstances where relatively long data outages are tolerable, but we do not recommend relying on COM timeouts for hot or warm standby.

Smart Switchover

The behavior of the redundancy system during a switchover can be significant. For example, suppose the primary and secondary connections have both failed for some reason. A typical redundancy manager will begin a cycle of attempting to attach to one and then the other OPC server until one of them responds. The redundancy manager will flip-flop between the two indefinitely, injecting sleep periods between each flip-flop to reduce system resource load. This sleep period is itself a source of latency. A smarter switchover model is to maintain a source health status that allows the redundancy manager to only switch over when a source status changes. This allows the redundancy manager to effectively idle, or perform simultaneous reconnection attempts, until a source status changes, then immediately respond without introducing extra latency. Smarter switching logic can result in substantially reduced system load and switchover times.

Forced Switching vs Preferred Source

It is useful to be able to select one data source over another, even if the currently attached source is healthy. A naïve redundancy manager will “force” the user to switch, even if the backup system is not available. This will again result in a flip-flop behavior as the redundancy manager attempts to switch to the unavailable backup source. A much better approach is for the redundancy manager to understand the concept of a preferred source that can be changed at runtime. If the preferred source is available, the redundancy manager will switch to it. If the user wants to switch from one source to another, he simply changes the preferred source. If that source is available, the switch will be made. If it is not, the redundancy manager will make the switch only when it becomes available. This eliminates the flip-flop behavior while at the same time eliminating the data loss associated with the minimum of two switch cycles that the naïve redundancy manager will impose.

Accessing Raw Data

A good hot redundancy system will give the client application access not just to the redundant data, but also to the raw data from both sources. This gives the client application the option of presenting diagnostic information about the system on the “far side” of the redundancy manager. Most redundancy managers hide this information so that a client application would have to make and manage multiple connections to access the raw data, if it is possible at all.

Other options and features

In addition to the above capabilities, a good redundancy manager may offer additional features for your convenience. It might provide the option to refresh the entire data set at switchover. Maybe it will send out emails or even launch additional programs at each switchover. This can be useful for notifying key personnel of the system status. It may log diagnostics to provide valuable information about the reasons for making the switch. Some redundancy managers can connect to multiple servers, and create multiple redundant connections. Others can let you work with subsets of the data. Another desirable feature is the ability to assign the primary and secondary data sources, and to trigger a fallback from the secondary to the primary data source once the problem that caused the switchover has been resolved.

As control systems continue to grow in complexity, and as we rely more and more on them, Mel Farnsworth’s situation will become more common, and more costly. If data connectivity is crucial to the success of the company, it would be wise to consider the possibility of installing a redundant system, and to take into account the above considerations when implementing redundancy for OPC.

Advanced Tunnelling for OPC with Cogent DataHub

OPC has become a leading standard for industrial process control and automation systems.  Among several OPC standards, the one most widely used throughout the world is OPC DA, or OPC Data Access. Many hardware manufacturers offer an OPC DA interface to their equipment, and OPC DA servers are also offered by third-party suppliers.  Likewise, most HMI vendors build OPC DA client capabilities into their software.  Thus data from most factory floor devices and equipment can connect to most HMIs and other OPC DA clients.  This universal connectivity has greatly enhanced the flexibility and efficiency of industrial automation systems.

But OPC DA has a major drawback—it does not network well.  OPC DA is based on the COM protocol, which uses DCOM (Distributed COM) for networking.  DCOM was not designed for real-time industrial applications. It is neither as robust nor secure as industrial systems require, and it is very difficult to configure. To overcome these limitations, Cogent offers a “tunnelling” solution, as an alternative to DCOM, to transfer OPC data over a network.  Let’s take a closer look at how tunnelling solves the issues associated with DCOM, and how the Cogent DataHub from Cogent Real-Time Systems provides a secure, reliable, and easy-to-use tunnelling solution with many advanced features.

Making Configuration Easy and Secure

The first problem you will encounter with DCOM is that it is difficult to configure.  It can take a DCOM expert hours, and sometimes days, to get everything working properly.  It is difficult to find good documentation on DCOM because configuration is not a simple, step-by-step process.  Even if you are successful, the next Windows Update or additional new setting may break your working system.  Although it is not recommended practise, many companies “solve” the problem by simply bypassing DCOM security settings altogether.  But this kind of granting broad access permissions is becoming less and less viable in today’s security-conscious world, and most companies cannot risk lowering their guard to allow DCOM to function.

Tunnelling with the Cogent DataHub eliminates DCOM completely, along with all of its configuration and security issues.  The Cogent DataHub uses the industry standard TCP/IP protocol to network data between an OPC server on one computer and an OPC client on another computer, thus avoiding all of the major problems associated with using the DCOM protocol.

The Cogent DataHub offers this tunnelling feature by effectively ‘mirroring’ data from one Cogent DataHub running on the OPC server computer, to another Cogent DataHub running on the OPC client computer, as shown in the image above.  This method results in very fast data transfer between Cogent DataHub nodes.

Better Network Communication

When a DCOM connection is broken, there are very long timeout delays before either side is notified of the problem, due to DCOM having hard coded timeout periods which can’t be adjusted by the user.  In a production system, these long delays without warning can be a very real problem.  Some OPC clients and OPC client tools have internal timeouts to overcome this one problem but this approach does not deal with the other issues discussed in this paper.

The Cogent DataHub has a user-configurable heartbeat and timeout feature which allows it to react immediately when a network break occurs.  As soon as this happens, the Cogent DataHub begins to monitor the network connection and when the link is re-established, the local Cogent DataHub automatically reconnects to the remote Cogent DataHub and refreshes the data set with the latest values.  Systems with slow polling rates over long distance lines can also benefit from the user-configurable timeout, because DCOM timeouts might have been too short for these systems.

Whenever there is a network break, it is important to protect the client systems that depend on data being delivered.  Because each end of the tunnelling connection is an independent Cogent DataHub, the client programs are protected from network failures and can continue to run in isolation using the last known data values.  This is much better than having the client applications lose all access to data when the tunnelling connection goes down.

The Cogent DataHub uses an asynchronous messaging system that further protects client applications from network delays.  In most tunnelling solutions, the synchronous nature of DCOM is preserved over the TCP link.  This means that a when a client accesses data through the tunnel, it must block waiting for a response.  If a network error occurs, the client will continue to block until a network timeout occurs.  The Cogent DataHub removes this limitation by releasing the client immediately and then delivering the data over the network.  If a network error occurs, the data will be delivered once the network connection is re-established.

Cogent DataHub Other tunnelling products
The Cogent DataHub keeps all OPC transactions local to the computer, thus fully protecting the client programs from any network irregularities. Other products expose OPC transactions to network irregularities, making client programs subject to timeouts, delays, and blocking behavior. Link monitoring can reduce these effects, while the Cogent DataHub eliminates them.
The Cogent DataHub mirrors data across the network, so that both sides maintain a complete set of all the data. This shields the clients from network breaks as it lets them continue to work with the last known values from the server. When the connection is re-established, both sides synchronize the data set. Other products pass data across the network on a point by point basis and maintain no knowledge of the current state of the points in the system. A network break leaves the client applications stuck with no data to work with.
A single tunnel can be shared by multiple client applications. This significantly reduces network bandwidth and means the customer can reduce licensing costs as all clients (or servers) on the same computer share a single tunnel connection. Other tunnelling products require a separate network connection for each client-server connection. This increases the load on the system, the load on the network and increases licensing costs.

These features make it much easier for client applications to behave in a robust manner when communications are lost, saving time and reducing frustration.  Without these features, client applications can become slow to respond or completely unresponsive during connection losses or when trying to make synchronous calls.

Securing the System

Recently, DCOM networking has been shown to have serious security flaws that make it vulnerable to hackers and viruses. This is particularly worrying to companies who network data across Internet connections or other links outside the company.

To properly secure your communication channel, the Cogent DataHub offers secure SSL connections over the TCP/IP network.  SSL Tunnelling is fully encrypted, which means the data is completely safe for transmission over open network links outside the company firewalls.  In addition, the Cogent DataHub provides access control and user authentication through the use of optional password protection.  This ensures that only authorized users can establish tunnelling connections.  It is a significant advantage having these features built into the Cogent DataHub, since other methods of data encryption can require complicated operating system configuration and the use of more expensive server PCs, which are not required for use with the Cogent DataHub.

Advanced Tunnelling for OPC

While there are a few other products on the market that offer tunnelling capabilities to replace DCOM, the Cogent DataHub is unique in that it is the only product to combine tunnelling with a wide range of advanced and complimentary features to provide even more added benefits.

Significant reduction in network bandwidth

The Cogent DataHub reduces the amount of data being transmitted across the network in a two ways:

  1. Rather than using a polling cycle to transmit the data, the Cogent DataHub only sends a message when a new data value is received.  This significantly improves performance and reduces bandwidth requirements.
  2. The Cogent DataHub can aggregate both client and server connections.  This means that the Cogent DataHub can collect data from multiple OPC servers and send it across the network using a single connection.  On the client side, any number of OPC clients can attach to the Cogent DataHub and they all receive the latest data as soon as it arrives.  This eliminates the need for each OPC client to connect to each OPC server using multiple connections over the network.
Non-Blocking

While it may seem simple enough to replace DCOM with TCP/IP for networking OPC data, the Cogent DataHub also replaces the inherent blocking behaviour experienced in DCOM communication.  Client programs connecting to the Cogent DataHub are never blocked from sending new information.  Some vendors of tunnelling solutions for OPC still face this blocking problem, even though they are using TCP/IP.

Supports slow network and Internet links

Because the Cogent DataHub reduces the amount of data that needs to be transmitted over the network, it can be used over a slow network link.  Any interruptions are dealt with by the Cogent DataHub while the OPC client programs are effectively shielded from any disturbance caused by the slow connection.

Access to data on network computers running Linux

Another unique feature of the Cogent DataHub is its ability to mirror data between Cogent DataHubs running on other operating systems, such as Linux and QNX.  This means you can have your own custom Linux programs act as OPC servers, providing real-time data to OPC client applications running on networked Windows computers.  The reverse is also true.  You can have your Linux program access data from OPC servers running on networked Windows computers.

Load balancing between computers

The Cogent DataHub also offers the unique ability to balance the load on the OPC server computers.  You may have a system where multiple OPC clients are connecting to the OPC server at the same time, causing the server computer to experience high CPU loads and slower performance.  The solution to this is to mirror data from the Cogent DataHub on the OPC server computer to an Cogent DataHub on another computer and then have some of your OPC clients connect to this second ‘mirrored’ computer.  This reduces the load on the original OPC server computer and provides faster response to all OPC client computers.

Advanced Tunnelling for OPC Example – TEVA Pharmaceuticals (Hungary)

TEVA Pharmaceuticals in Hungary recently used the Cogent DataHub to combine tunnelling and aggregation to network OPC data over the network and through the company firewall.

Laszlo Simon is the Engineering Manager for the TEVA API plant in Debrecen, Hungary. He had a project that sounded simple enough. He needed to connect new control applications through several OPC stations to an existing SCADA network. The plant was already running large YOKOGAWA DCS and GE PLC control systems, connected to a number of distributed SCADA workstations. However, Mr. Simon did face a couple of interesting challenges in this project:

  • The OPC servers and SCADA systems were on different computers, separated by a company firewall. This makes it extremely difficult to connect OPC over a network, because of the complexities of configuring DCOM and Windows security permissions.
  • Each SCADA system needed to access data from all of the new OPC server stations. This meant Mr. Simon needed a way to aggregate data from all the OPC stations into a single common data set on each SCADA computer.

After searching the web, Mr. Simon downloaded and installed the Cogent DataHub. Very quickly he had connected the Cogent DataHub to his OPC servers and determined that he was reading live process data from the new control systems. He was also able to easily set up the tunnelling link between the OPC server stations and the SCADA workstations, by simply installing another Cogent DataHub on the SCADA computer and configuring it to connect to the OPC server stations.

“I wanted to reduce and simplify the communication over the network because of our firewall. It was very easy with the Cogent DataHub.” said Mr. Simon after the system was up and running. Currently about 7,000 points are being transferred across the network, in real-time, using the Cogent DataHub. “In the future, the additional integration of the existing or new OPC servers will be with the Cogent DataHub.”