FPGA-based real-time simulation for EV station with multiple high-frequency chargers based on C-EMTP algorithm

The electric vehicle (EV) charging station is a critical part of the infrastructure for the wide adoption of EVs. Real-time simulation of an EV station plays an essential role in testing its operation under different operating modes. However, the large numbers of high-frequency power electronic switches contained in EV chargers pose great challenges for real-time simulation. This paper proposes a compact electromagnetic transient program (C-EMTP) algorithm for FPGA-based real-time simulation of an EV station with multiple high-frequency chargers. The C-EMTP algorithm transforms the traditional EMTP algorithm into two parallel sub-tasks only consisting of simple matrix operations, to fully utilize the high parallelism of FPGA. The simulation time step can be greatly reduced compared with that of the traditional EMTP algorithm, and so the simulation accuracy for high-frequency power electronics is improved. The EV chargers can be decoupled with each other and simulated in parallel. A CPU-FPGA-based real-time simulation platform is developed and the proposed simulation of the EV station is implemented. The control strategy is simulated in a CPU with 100 μs time-step, while the EV station circuit topology is simulated in a single FPGA with a 250 ns time-step. In the case studies, the EV station consists of a two-level rectifier and five dual-active bridge (DAB) EV chargers. It is tested under different scenarios, and the real-time simulation results are validated using PSCAD/EMTDC.


Introduction
The electric vehicle (EV) is an effective way to tackle environmental challenges such as carbon emission [1,2]. Due to limited battery capacity, increasing utilization of EVs requires widely installed charging stations [3]. Realtime simulation plays an essential role in the study of the electromagnetic transient characteristics of EV charging stations. These can accelerate the design of control and protection systems. In addition, real-time simulation with hardware-in-the-loop (HIL) simulation can realize joint simulations of an EV charging station model and an actual control prototype. This can reduce test costs and shorten the development cycle compared with offline simulation [4,5]. However, simulating multiple high-frequency EV chargers real-time in small steps is a significant challenge due to the need for detailed devicelevel modeling of power electronic converters (PECs) and accurate interactions [6,7]. Therefore, it is valuable to develop an efficient real-time simulation algorithm and implementation method for the system.
The typical time step of traditional real-time simulators is in the range of 20 μs~100 μs, and interpolation algorithms are required to accurately reflect the switching events [8,9]. However, the interpolation algorithms incur a great computing burden in real-time simulation. Therefore, in recent years, the field programmable gate array (FPGA) is used for small time step real-time simulation of detailed models because of its numerous logic resources [10][11][12][13], which can achieve high parallel computing. Thus, fast-switching circuit topology can be simulated in the FPGA while the control strategy is still performed in the CPU [14]. In this work, a lightweight CPU-FPGA architecture real-time simulation platform is established for simulating an EV station with multiple high-frequency EV chargers in a single FPGA.
There are currently two main categories of EMTP algorithm: state-space method [15,16] and nodal analysis method [17], while both methods have been applied in real-time simulation. To achieve high accuracy and efficiency in device-level real-time simulation, a compact EMTP (C-EMTP) algorithm is proposed for the EV station simulated on FPGA. The C-EMTP algorithm compresses the serial computing process of a traditional EMTP into two parallel sub-tasks consisting of simple matrix calculation, to make full use of the high parallelism of FPGA. In addition, the C-EMTP algorithm avoids the calculation of intermediate variables such as injection current, and thus the computational complexity in each step is greatly reduced. The simulation time step is thus reduced and hardware resources are saved by optimizing the simulation process. In addition to optimizing the simulation algorithm, there have been some studies on the decoupling of the simulation circuit [18,19]. A system decoupling method is proposed for the EV station in this paper. This decouples the EV station into multiple EV chargers to be simulated in parallel. Different from the traditional EMTP algorithm, using this method in C-EMTP can divide large matrix operations into small matrix operations, so as to further reduce the computing burden of FPGA. In addition to the C-EMTP algorithm, the associated discrete circuit (ADC) switch model [20,21] is adopted to accurately reflect the switching moments to further improve the performance of the real-time simulation.
Using high-level synthesis tools for FPGA implementation can improve the efficiency of the simulation [22]. In addition to the lightweight hardware platform and the C-EMTP algorithm mentioned above, implementation methodologies are proposed to optimize the EV station simulation. Given the fast-switching characteristic of the EV chargers and computing complexity of the control strategy, the circuit is simulated at 250 ns time-steps while the control signals are updated every 100 μs. The simulation loop operation of each subsystem can be pipelined to reduce resource consumption and latency. The idle time is eliminated to further reduce the time step of the simulation, while the use of fixed point matrix operation method saves hardware resources. This paper is organized as fellows. The structure of the real-time simulation system for the EV station is discussed in Section II. The C-EMTP algorithm and system decoupling method are proposed in Section III, while Section IV gives the details of the implementation. The simulation study and resource consumption are discussed in Section V. These are verified by PSCAD/ EMTDC simulations. Finally, the conclusion is drawn in Section VI.
2 The structure of the simulation system The EV station, which is integrated with five EV chargers and one central rectifier, is chosen as the study case to perform on the real-time simulation platform. Each EV charger uses the dual active bridge (DAB) topology, while the central rectifier is a two-level converter. The main parameters are shown in Table 1. In this paper, the central rectifier maintains the DC bus voltage and each EV charger controls its battery charging power, as shown in Fig. 1.
The main architecture of the lightweight CPU-FPGAbased platform is shown in Fig. 2. It integrates the CPU and FPGA resources. The PXIe-8135 is the master board, which contains a quad-core Intel i7-3610QE processor, dual channel DDR3, 1600 MHz memory controller, all the standard I/O, and an integrated hard drive, and is connected with the Kintex-7 XC7K410T FPGA board and the host-PC. The control strategies are performed in the CPU, which receives the instantaneous circuit parameters from the FPGA and sends the modulated wave signals to the FPGA through the PXIe bus. The cores of the CPU run at 2.3GHz. The FPGA board is a slave board, which performs the real-time simulation of the EV station circuit and the generation of PWM switch signals under the instruction of the master board. The clock frequency of the FPGA board is set to 160 MHz. There are 254,200 look-up tables (LUTs) and 508, 400 flip-flops (FFs) which constitute the configurable logic blocks (CLBs) to realize both the combinational logic and sequential logic. The FPGA contains 1540 digital signal processing (DSP) slices and 28,620 block RAMs, which provide 25 × 18 multipliers and memory resources mainly used in the matrix operation for the history current update and circuit parameter calculation. In addition, the FPGA I/O interfaces can be connected to oscilloscopes to observe the simulation results. Communication between the CPU controller and the FPGA plays a vital role in real-time simulation. PXI Express increases the available bandwidth to 8GB/s, enabling low-latency data exchange and improving the real-time simulation system performance, particularly when a sub-microsecond level time step is required. The host-PC provides a human-machine interface (HMI), receiving sampling data from the CPU and displaying the control and simulation waveforms.

The C-EMTP algorithm
The traditional EMTP proposed by Professor Dommel is mainly based on a nodal analysis method, which involves complex serial calculations. The traditional EMTP algorithm also contains the calculation of intermediate variables such as injection current. However, only the node voltage and branch current in the simulation are required. In order to improve simulation efficiency, the C-EMTP algorithm is proposed so that the circuit parameters can be obtained directly by redesigning the simulation process. At the same time, the serial calculations  are transferred into two parallel sub-tasks, which only contain simple matrix calculations. Similar to the traditional EMTP, all the branches in the network are transformed into Norton equivalent circuits. The following steps show the process of the simulation adopting the C-EMTP algorithm.
Step 1: The incidence matrix M NAM is formed in such a way that its matrix elements correspond to the nodes and branches in the circuit, as: where N n is the number of nodes and N b is the number of branches in the circuit. When node i is connected to branch j and the current of branch j flows away from node i, the entry of M NAM (m ij ) is 1. On the other hand, when the current of branch j flows away from node i, the entry of M NAM (m ij ) is − 1. When node i and branch j have no connection in the network, M NAM (m ij ) is 0.
Step 2: Voltage source or current source branches are represented by equivalent admittance and parallel current sources, while other branches are represented by equivalent admittance and a parallel history current source. The history current of each branch can be expressed by the branch voltage and current as: where Y b is the equivalent admittance of the branch, α is the voltage coefficient and β is the current coefficient. If the backward Euler method is applied for numerical integration, the equivalent admittance of all branches can be obtained. For the inductance branch L, there are: For the capacitance branch C, there are: For the switch branch, α and β vary with the switch state. The switch is equivalent to a small inductance L s when it is on, and to a small capacitance C s when it is off. Thus, according to (3) and (4), there are: In order to improve the efficiency of simulation, an appropriate inductance L s and capacitance C s are chosen to keep Y b constant, as: In this step, only α and β need to be updated during the simulation (Fig. 3).
Rewriting (2) to matrix form yields: where I h , V b , and I b are the N b × 1 vectors consisting of whereas α and β are the N b × N b diagonal matrices consisting of α and β, respectively.
Step 3: According to the history current I h and equivalent current I s of voltage/current source, there is where I inj is the N b × 1 vector composed of the injection current. Then, the nodal voltage vector V n , the branch voltage vector V b , and the branch current vector I b can be expressed as: where Y n is the N n × N n nodal admittance matrix. By analyzing the relationship between branch voltage and current, the following equation can be obtained: Equations (7)- (11) reveal that the simulation requires many serial steps. When implemented on hardware, the intermediate variables of the traditional EMTP algorithm consume large resources and reduce efficiency. Therefore, a compact simulation loop is derived as: where K and J are coefficient matrices, which can be pre-calculated and pre-stored. Thus, the C-EMTP algorithm saves the resources occupied by intermediate variables during the simulation loop. This is beneficial for the simulation speed while at the same time may also reduce the error diffusion.
In the simulation loop, the node voltage and branch current are decided purely by the last time step history current and source equivalent current. Recording the index of node number and branch number such that the first N n nodes are node voltage and the latter N b branches are branch current, the following matrix equation can be obtained: The simulation loop is more compact and the number of penalties required in a single step is reduced compared to the traditional EMTP method.

System decoupling method
The EV station with multiple high-frequency EV chargers consists of detailed converter models, especially the switching frequency of EV chargers is 50 kHz. Thus, in order to guarantee that the simulation loop can be completed in a time step, a system decoupling method is adopted to further improve the real-time simulation performance. This method decouples the system into multiple EV chargers, which can interact through the interface. This method realizes the parallel simulation of each EV charger and makes full use of the parallelism of FPGA. In comparison, the non-decoupled system involves large-scale non-diagonal matrix operations, which limit the minimum simulation time step and consume more hardware resources to execute. Dividing a system into multiple subsystems can effectively reduce the complexity of matrix operations.
The capacitor C is selected as the interface between multiple subsystems as shown in Fig. 4. It is equivalent to two same voltage sources in branch p and branch q. Then, the interface voltage can be written as: The error analysis of the decoupling method is given in the Appendix. The third-order local truncation error caused by the decoupling method is relatively small compared to the non-decoupling method and the influence on the results is negligible.
Based on the C-EMTP algorithm, the real-time simulation procedure is shown in Fig. 5. The green part shows the initialization of the electrical circuit, in which each subsystem performs its own initialization and the coefficient matrices and variables are loaded to the corresponding simulation steps. The subsystems are updated according to (15) in the blue part, while the blue part and yellow part indicate a closed loop between the circuit and control. The node voltage V n and brand current I b are sent to the CPU and the control signals are transferred to α and β to control the circuit. The purple part shows the interface calculation between the subsystems.
The whole EV station circuit is modeled in one single FPGA board for maximum resource utilization. This method avoids the amount of data to be exchanged between boards and reduces simulation latency. Normally, the subsystems are decoupled at the capacitor, and thus each EV charger composes a subsystem and the central rectifier is also defined as a subsystem. With this decoupling method, the scalability of the electric vehicle charging station simulation is improved, and the number of simulated EV chargers can be adjusted at any time according to actual needs.

Simulation loop
To improve the efficiency of simulating a larger EV station in one FPGA board, a CPU-FPGA-based real-time simulation platform is constructed in the National Instruments (NI) PXI chassis, in which the control strategy and circuit topology are simulated at different time steps. Figure 6 shows the system simulation loop between the electrical and control systems. This indicates the cooperation between the FPGA and CPU.
For the EV station, all the calculations of the circuit and control subsystems start simultaneously. The control signal calculation on the CPU usually takes more time than the circuit update on the FPGA depending on the actual execution times of the CPU and FPGA. As a result, the FPGA must wait for the CPU until the next time step is reached. In order to improve real-time simulation performance, the control signal calculation in the CPU has a 100 μs time-step and the circuit simulation is calculated in the FPGA with a 250 ns time-step. In this way, the FPGA and CPU operations are relatively independent while the data exchange is performed every 100 μs. When the control signals are not updated, the FPGA runs the next simulation cycle with the control signals of the previous calculation. When a simulation loop starts, the FPGA reads the control signal and the CPU reads the electrical variables of the last step. Then, all the modules of the electrical calculation begin simultaneously. At the same time, the CPU begins to update the control signal. During each time step in the FPGA, the node voltages, branch currents, history current sources and switch status are updated. By adopting the pipeline method, not only can each subsystem be calculated in parallel, but the calculation within the subsystem can also be performed in parallel, which further reduces the simulation time step in the FPGA.

FPGA implementation
The whole simulation implementation in the FPGA is shown in Fig. 7. As shown, data exchanges between the CPU and FPGA are through FIFO. The FPGA receives the modulation wave signal and voltage or current source signal of the circuit from the CPU, while transmitting calculation results of the node voltage and branch current to the CPU. Since the control signal calculation on the CPU and circuit simulation on the FPGA are simulated with different time steps, the FIFO performs data exchange to realize asynchronous communication to ensure accurate interaction between the FPGA and CPU (Fig. 7).
In the FPGA, the C-EMTP algorithm simulates the electrical system, which mainly contains the update of the switch states, the history current and the calculation of the node voltage and the branch current. The system can be solved by floating-point operation in offline simulation to obtain high accuracy results. However, it is more suitable to adopt fixed-point operation to satisfy the clock requirement and limit FPGA multiplier resources. The fixed-point format of the variables is set to <±,25,12>, which is confirmed by the DSP multiplier. A DSP slice is used for the 25bits × 18bits multiplication, and the <±,25,12 > fixed-point multiplication uses two DSP slices that are fewer than in the single-precision floating-point format.
While the switch states are being updated to regenerate α and β, the multiplier reads the matrix K and J prestored in block RAM for the next calculation. In  multiplications. For example, the EV station contains 48 nodes and 73 buses, which will generate a 73 × 73 matrix to be solved. Using the system decoupling method, the 73 × 73 matrix can be divided into one 13 × 13 matrix and five 12 × 12 matrices, which can be solved in parallel. The multiplications involving non-diagonal matrices are achieved by a multiply-accumulate module, which is executed in serial and spends the most clock cycles in each time step. The pipeline method is also adapted to execute the multiplication and accumulation. This saves nearly half of the execution time in each step.
To update the history current, mainly logic blocks are used for operations. Since α and β representing the switch states are both 1, 0 or − 1, logic resources are used for calculations to replace multipliers. This improves the history current update speed and saves multiplier resources.

Simulation results and discussion
The tested EV station consists of a two-level rectifier and five high-frequency DAB EV chargers. This section applies the C-EMTP algorithm to the CPU-FPGA-based real-time simulation platform to simulate the EV station. The results of the real-time simulation and corresponding PSCAD simulation are displayed in Fig. 8.

Resource consumption
According to the FPGA implementation details, the execution time, as well as the hardware resource utilization of each subsystem, are presented in Table 2. Look-up tables (LUTs), flip-flops (FF), digital signal processing (DSP) slices and block RAMs are recorded as the mainly consumed resources.
As discussed in Sections 3 and 4, the latencies of the additional steps in each subsystem are the same, and only the inconsistency of matrix dimensions leads to different latency and resource consumption. The number of rows of the matrix determines the DSP slice needed for calculation, and the number of matrix columns determines the time required for matrix multiplication. For example, the central rectifier subsystem that contains 8 nodes and 13 branches consumes the most simulation time and hardware resource because the matrices P and Q have a larger size (21 × 13) than other subsystems (20 × 12).
In the EV station real-time simulation, the execution time is constant which is consistent with the central rectifier subsystem. It demonstrates that the latency is determined by the central rectifier subsystem, which has the largest maximum latency of 38 clock cycles. When new EV chargers are integrated, the execution time will not increase as long as the scale of the new subsystem is not larger than the previous ones. If the traditional EMTP method is used, the real-time simulation can only be realized with a simulation step of 1 μs. However, when the electric vehicle charging station is working at a switching frequency of 50 kHz, the simulation step must be 250 ns or less to achieve the required simulation accuracy. Therefore, the C-EMTP algorithm must be used. In addition, to attain a smaller time step, the scales of the subsystems are similar in case of an increase of the system latency. The parallel structure ensures that more EV chargers can be integrated to take full advantage of the FPGA resources without time constraints.
For comparison, the central rectifier is simulated using the C-EMTP algorithm and the traditional algorithm, and the latencies of the simulations are shown in Table 3. It can be seen that the latency of the C-EMTP algorithm is much smaller than the traditional EMTP algorithm, proving that the C-EMTP algorithm greatly improves simulation efficiency. When trying to simulate the EV station with the traditional EMTP algorithm, the execution time exceeds 2 μs, which is too large for the DAB switched at 50 kHz. However, when the EV station is simulated with the C-EMTP algorithm, the 250 ns time-step realtime simulation is realized. Since the system clock is 160 MHz, it means that the circuit simulation of the EV station in one time step is accomplished within 40 clock cycles.

Plug-and-play scenarios
The EV charger is designed to achieve plug-and-play capability and can adjust the output power according to different vehicles. This greatly improves the convenience of charging. Figure 8 shows the transient waveforms of the plug and play scenarios. EV chargers 4 and 5 start charging the EVs at the power of 4 kW and 5 kW at 2 s and the charging power of EV charger 1 changes from 8 kW to 3 kW at 3 s. At 4 s, EV chargers 3 and 5 are cut off from the charging station. When there is an abrupt power change, both the AC and DC buses experience disturbances. The charging current of the EV station and input DC bus voltage of the EV chargers observed from the oscilloscope are shown in Fig. 8 (a1) and (a2), respectively. Since the central rectifier is responsible for maintaining the DC bus voltage, the DC voltage returns to the rated voltage quickly after each disturbance. Figure 8 (a3) shows the power tracing performance of the EV chargers observed from the host PC, and all follow the control signals well. From Fig. 8 (a1)-(a3) it can be seen that the output power of the EV chargers track the power commands instantly after the plug and disconnection of the EVs. When the charging power changes, the input current changes accordingly. The off-line simulation results in PSCAD/ EMTDC are conducted for validation as shown in Fig. 8 (b1)-(b3).

Conclusion
In this paper, a C-EMTP algorithm has been proposed to achieve real-time simulation of an EV station in a lightweight CPU-FPGA-based platform. The simulation results of the C-EMTP algorithm for an EV station with multiple high-frequency EV chargers prove to have a  good consistency with the simulation results in PSCAD with a switching frequency of 50 kHz. The time latency study shows that compared with the traditional EMTP algorithm, the C-EMTP algorithm for the EV station reduces the simulation execution time by more than 65%, which means that a small time step of 250 ns can be used.
The system decoupling method ensures that the time step remains constant when the scale of the simulation increases. The proposed C-EMTP algorithm and platform implementation in high switching-frequency simulation has great potential for high-precision simulation of power systems with high converter penetration.

Appendix
The error of the decoupling method in Section 3.2 has negligible effect on the simulation results. According to the decoupling method in Fig. 4, the capacitor voltage is expressed as: It can be seen from the above equation that the capacitor is discretized by the Euler method. At the same time, the rest of the circuit still uses the backward Euler method.
For the Euler method, the local truncation error is: If the decoupling method is not used, the whole circuit adopts the backward Euler method and the capacitor voltage is expressed as: For the backward Euler method, the local truncation error is: The local truncation errors relative to the analytical solution under both methods are: It can be seen from the above analysis that the use of the decoupling method has little effect on the system simulation error, while both methods have first-order accuracy.
The local truncation error of the decoupling method relative to the non-decoupling method is: In the same way, the error caused by the decoupling method is negligibly small compared to the non-decoupling method.