Power Edge RL – Control of electric power systems via edge computing based reinforcement learning

Idea and relation to VEDLIoT

Reinforcement learning (RL) is becoming an increasingly popular approach for controlling complex dynamic systems. However, the actual learning process of RL requires extensive computational power, i.e., it is opposed to the real-time requirement of control processes when using cost-sensitive embedded hardware. To overcome this issue, splitting up the learning process and the control policy inference in an edge computing / IoT framework becomes a viable solution as the policy inference is typical much less computational demanding. By doing so, the computational heavy data handling processes as well as gradient descent-based learning steps are executed in soft real time on an appropriate edge hardware device while only the RL policy inference is realized in hard real time on the embedded control hardware. Challenges of this edge computing RL approach are to keep the delay between the learning process executed on the edge device and the data generating / policy inference process on the embedded controller as small as possible, ensuring a reliable and safe data communication between both entities as well as minimizing energy consumption of the entire learning and control process as low as possible. Against this background, the given project will investigate to what extent VEDLIoT hardware and software technologies can contribute to the stated objectives. The application scope will be power electronic-based energy conversion systems utilizing real-world laboratory test benches.

Power-Edge-RL will contribute to the VEDLIoT ecosystem extension with mainly 4 aspects:

  1. The integration of this project within the VEDLIoT consortium will bring into focus an extremely relevant field of work with a likely high impact in key European industries such as industrial automation, energy conversion, automotive,…. The attractiveness of VEDLIoT technologies for further users will be directly increased by the demonstrated use cases and real-world implementations.
  2. Countless dynamic systems within these industries are operated under closed-loop control, and a corresponding amount of energy must be expended to operate the control electronics. The VEDLIoT focus on energy efficiency can help to reduce this energy consumption in the future and to operate the overall systems more efficiently.
  3. Sharing central work results with the scientific community in the form of open-source software and data.
  4. Scientific publications reporting on the used methods and empirical findings during the project will also contribute to make VEDLIoT technologies more popular within the scientific and industrial community.

Objectives

  1. Transferring, adapting as well as improving an already existing Edge RL rapid control prototyping (RCP) toolchain for VEDLIoT hardware and software technologies.
  2. Adaptation of VEDLIoT training toolchains for RL applications with focus on power electronic systems.
  3. The assessment of the advantageousness of the deployed VEDLIoT technologies is to be evaluated by empirical tests against established toolchains and off-the-shelf hardware.

Approach

Although an RL-based controller requires both the training and inference steps to learn optimal control policies, only the policy inference step (ANN forward execution) must be implemented on the embedded hardware device under hard real time constraints. End-of-line embedded hardware is already nowadays capable of performing the inference of small ANNs based on microprocessor architectures and up to medium-sized ANNs using cost-efficient FPGAs under hard real time, i.e., this part of the problem is manageable with todays technology. In contrast, the much more computationally demanding learning algorithm parts (i.e., updating the ANNs’ weights) cannot be updated in hard real time on such embedded hardware. But this is also not necessary, since state-of-the-art RL algorithms only update the ANNs’ weights during training in cautiously way and with a comparatively small step size. Hence, a small delay in a soft real time sense between receiving new data from the control plant and updating the controller policy ANN weights can be tolerated. This enables RL training to be performed remotely from the embedded hardware using edge computing / IoT technologies on specialized hardware (such as developed within the VEDLIoT project) that is executed asynchronously to the embedded controller process. This leads to a partitioning or decoupling of the most computationally intensive parts of the RL algorithms from the embedded hardware and thus also allows innovative deep RL approaches to be used for fast, dynamic power systems.

Nevertheless, this innovative learning and training process is not risk-free and many challenges have to be addressed when edge computing-based RL solutions are applied to safety-critical control tasks:

  • The latency with which new RL ANNs’ weights are sent from the edge / IoT platform to the embedded controller should be kept as low as possible to ensure fast and stable learning. This requires specialized hardware accelerators on the edge / IoT hardware to calculate the training update step as fast as possible.
  • For the same reasons, the communication link between the embedded and IoT device must function with the lowest possible latency and without data loss.
  • Since the energy efficiency of power systems is usually of great importance to the user, i.e., also the learning task must be realized with minimal energy conversion losses, the edge / IoT device should be as energy-saving as possible.

Expected Impact

The integration of this project within the VEDLIoT consortium will bring into focus an extremely relevant field of work with a likely high impact in key European industries such as industrial automation, energy conversion, automotive,…. The attractiveness of VEDLIoT technologies for further users will be directly increased by the demonstrated use cases and real-world implementations.

Further info/links