 Original research
 Open access
 Published:
Jointly improving energy efficiency and smoothing power oscillations of integrated offshore wind and photovoltaic power: a deep reinforcement learning approach
Protection and Control of Modern Power Systems volume 8, Article number: 25 (2023)
Abstract
This paper proposes a novel deep reinforcement learning (DRL) control strategy for an integrated offshore wind and photovoltaic (PV) power system for improving power generation efficiency while simultaneously damping oscillations. A variablespeed offshore wind turbine (OWT) with electrical torque control is used in the integrated offshore power system whose dynamic models are detailed. By considering the control system as a partiallyobservable Markov decision process, an actorcritic architecture modelfree DRL algorithm, namely, deep deterministic policy gradient, is adopted and implemented to explore and learn the optimal multiobjective control policy. The potential and effectiveness of the integrated power system are evaluated. The results imply that an OWT can respond quickly to sudden changes of the inflow wind conditions to maximize total power generation. Significant oscillations in the overall power output can also be well suppressed by regulating the generator torque, which further indicates that complementary operation of offshore wind and PV power can be achieved.
1 Introduction
Decarbonization of electricity production is critical for addressing the issues of climate change and global warming. Improvements in the efficiency and stability of renewable power generation such as wind and photovoltaic (PV) energy enable a more rapid and lowercost transition to a decarbonized energy system. The joint operation of wind and PV power systems is especially effective in increasing renewable power production [1]. Based on their complementary nature, the integration and coordinated operation of wind and PV power ensure a reliable and adequate power supply, while adhering to the grid requirements by enhancing the peak power shaving capacity. The integration of an offshore wind turbine (OWT) with offshore PV can potentially enhance energy output and reduce overall project cost because of the shared seawater space, power infrastructure and mooring system. In addition, offshore PV can be complementery to OWTs in aspects of increasing the planned cable capacity factor and improving the turbine design life span [2]. Therefore, the integration of offshore wind and PV systems will be beneficial in both economical and technical terms.
Recently, a variety of methods have been investigated, primarily directed at the optimization and coordinated operation of complementary windPV systems to ensure power generation quality and efficiency. A multiobjective model is established to maximize power generation and minimize output fluctuations of a hydrowindphotovoltaic power system in [3], whereas in [4], the feasibility of adding an offshore floating solar farm to an existing Dutch offshore wind farm under the constraint of a certain fixed cable capacity is evaluated. The optimal size and operation of complementary hydrowindPV power systems are explored to provide reliable and adequate power for the grid in [5], while [6] presents a model for estimating emissions of electricity from systems that couple photovoltaic and wind generation with lithiumion and vanadium redox flow batteries. In [7], the potential of combining offshore wind and solar power is explored based on the technical specifications of commercial wind turbines and PV panels, while in [8], a twostage evaluation modebased fuzzy multi criteria decision making method is proposed to select the optimal site of windPV power plants given the different attitudes of decision makers. Reference [9] suggests a multiobjective particle swarm optimization method to enhance the operation of a wind farm and a PV array with a battery energy storage system, while [10] presents the analyses of largescale integration of wind and PV power into a Danish reference energy system by considering certain ancillary services to identify optimal mixtures from a technical point of view. In [11], it shows that renewables could provide a source of power competitive with fossilbased alternatives in India by using a cost optimization model.
Reference [12] uses a mathematical tool, Copula, to unscramble the dependencies between the power of wind and PV plants and introduces a probability method to analyze how power and energy are compensated at a certain confidence level. In [13], fault detection, classification, and location for a PVWindbased microgrid are presented, whereas [14] proposes a damping method for low frequency oscillations by incorporating a supplementary damping controller with a PV generating station whose parameters are coordinated with a power system stabilizer. Reference [15] proposes a 2degree of freedom combined proportionalintegral and derivative control scheme for the frequency and power control of a wind integrated interconnected power system.
However, most recent works mainly focus on the economic/technical planning and feasibility analysis of the onshore windPV systems at the planning level, while the realtime operation and optimization of the offshore integrated windPV system to improve energy efficiency and to smooth power oscillations have not been well investigated. There has been limited attention to the multiobjective realtime operations of the integrated offshore wind and PV power system by considering the intermittent and stochastic nature of wind and solar PV facilities, and hence the realtime complementary potential has not been fully explored. In addition, the conventional proportionalintegral and derivative control methods, which have been widely employed in the current literature, may not fulfill the highquality joint control objectives of the optimal wind power capture and power oscillation smoothing for an integrated OWT and PV power system.
This paper proposes a deep reinforcement learning (DRL) approach and deep deterministic policy gradient (DDPG) algorithm for the joint operations of the integrated offshore wind and PV power system. Inspired by behavioral psychology, DRL has exhibited high potential in dealing with sequential complex decisionmaking problems such that the cumulative reward can be maximized when interacting with an uncertain environment. DRL is also adaptive and modelfree, and does not need prior knowledge of the environment, as it can learn the generalized optimal control strategy from historical data. By using DDPG for realtime control, the OWT rotor speed can be varied according to inflow wind conditions while the generator rotation speed can be synchronized with the grid frequency, and the complementary operation and oscillation suppression of the OWT and the photovoltaic power can be well achieved. The potential and effectiveness of the integrated OWT and PV power system are evaluated based on design experiments.
The main novelties and contributions of this work are as follows:

(a)
A control framework for the realtime operation and optimization of an offshore integrated windPV system to jointly improve energy efficiency and smooth power oscillations.

(b)
The DDPG methodology design for achieving a tradeoff between the power generation and power oscillation damping performance of the integrated OWTPV power system.

(c)
Verifications of the DDPG approach in simultaneously improving the power capture efficiency and power smoothing capability of the integrated power system using only the generator torque control.
2 Integrated offshore wind and photovoltaic power
Figure 1 shows a sketch of the integrated offshore wind and PV power system which consists of an offshore variable speed wind turbine and PV panels that are arranged in an array. All the PV panels are connected to each other and are fixed to the floating wind turbine platform (not shown). By regulating the generator torque, the turbine rotor speed can be continuously varied according to inflow wind conditions, while the generator rotation speed can be kept constant. The PV power system also includes a boost converter connected with the DC link and DCAC inverter tied to the grid through a transformer. The boost converter acts as a stepup converter and performs maximum power point tracking (MPPT) operation of the PV panels. In order to develop the DRL control strategy to promote the complementarity operation of this offshore wind and PV resources and to reduce energy instability, it is essential to consider the system dynamic response and construct a controloriented model.
2.1 Dynamic model of the offshore wind turbine
As shown in Fig. 1, the dualstage mechanical transmission and the electrical generator are important elements for the integrated power system. By varying the generator torque, the rotational speed of the turbine rotor varies, and hence the combined power output of the integrated system can be continuously varied accordingly. When the rotational speed of the wind turbine is changed according to the inflow wind condition to maximize the wind power capture, the rotational speed of the generator is kept constant for connection with the grid. When there is insufficient PV power, the turbine rotor speed and generator power can be directly regulated such that the overall power generation of the integrated power system will be kept stable.
In this paper, the NREL offshore 5MW baseline wind turbine is considered for the integrated power system. The wind turbine has a rotor radius of about 63 m, the rated rotor speed is 12.1 rpm, with a rated generator speed of 1173.7 rpm, and gearbox ratio of 97:1. The control of the wind turbine and the integrated power system is focused on region 2 in which the generator torque is controlled to be proportional to the square of the filtered generator speed to maintain a constant (optimal) tipspeed ratio for optimizing power capture and smoothing the overall power output. In this region, the peak power coefficient of 0.482 occurs at a tipspeed ratio of around 8 and a rotorcollective bladepitch angle of 0.0°.
By neglecting the damping of the mechanical transmission, the dynamic model of the mechanical transmission can be generally formulated as:
where T_{a} is the lowspeed shaft aerodynamic torque, T_{g} is the highspeed shaft generator torque, T_{L} and T_{h} are the respective torques at the gearbox ends of the lowspeed and highspeed shafts. w_{r} and w_{g} are the rotational speeds of the turbine rotor and generator shaft, respectively.
By assuming rigid mechanical transmission and a fixed gear transmission ratio, the model of the transmission can be described as:
where i_{g} denotes the transmission ratio of the wind turbine system.
By combining (1) and (2), the dynamic model of the turbine system cast to the lowspeed turbine rotor side can be described as:
where J_{r} and J_{g} are the turbine rotor inertia relative to the turbine shaft and generator inertia relative to the highspeed shaft, respectively.
The aerodynamic torque T_{a} can be described as:
where v(t) is the inflow wind speed, R is the turbine radius, ρ is the sea air density, β is the turbine pitch angle which is set to zero in the control region 2 of this work, C_{p} denotes the power coefficient which indicates the power extraction performance of the turbine, and λ is the tip speed ratio, defined as λ = w_{r}R/v(t).
The power coefficient can be defined as [16]:
where the constant coefficients are c_{1} = 0.5176, c_{2} = 116, c_{3} = 0.4, c_{4} = 5, c_{5} = 21, c_{6} = 0.0068.
As shown in Fig. 2, the power coefficient has a clearly nonlinear relationship with respect to the tip speed ratio. There also exists an optimal tip speed ratio such that the power coefficient achieves the maximum value. As shown in Fig. 2, the maximum power coefficient is around 0.48 and the optimal tip speed ratio is around 8.11. Thus, it is clear that the maximum power capture can be achieved if appropriate generator torque control is applied. However, the PV power generation control is also designed by regulating the generator torque such that the overall power generation of the integrated system can be relatively smooth.
By observing (1)–(5), it is clear that the transmission ratio of the wind turbine can be continuously varied by regulating the generator torque according to the turbine rotation speed and inflow wind condition, and hence the maximum power tracking control can be readily achieved while the power compensation of the PV is accomplished simultaneously.
2.2 Model of the photovoltaic power
A PV array is composed of multiple series and parallel connected PV modules. Each PV module consists of a number of solar cells connected in series and parallel to obtain the desired output voltage and current. Each solar cell is basically a pn diode that can convert the incident energy from sunlight directly into electrical energy.
The power generation of the PV cells is dependent on weather conditions including irradiation and temperature. For convenience, the singlediode model is used to describe the PV cell [17], so the relationship between the PV current and voltage can be modelled as:
where
I_{ph}: the lightgenerated cell photocurrent at the nominal condition of 25^{◦}C, 1000 W/m^{2}
R_{sh}: the intrinsic shunt resistance or the equivalent parallel resistance, 500 Ω.
R_{s}: the equivalent series resistance, 0.1 Ω.
K_{i}: the shortcircuit current/temperature coefficient, 0.0017 A/K.
I_{sc}: the short circuit current, A
T_{k}: the actual temperature, 298 K.
T_{ref}: the reference temperature, 298 K.λ_{pv}: the irradiation (W/m^{2}), the nominal irradiation is 1200 W/m.^{2}q: the electron charge, 1.610^{−19} Ck: the Boltzmann constant, 1.380510^{−23} J/K
V_{oc}: the opencircuit voltage, V
N_{s}: the number of cells connected in series in the given photovoltaic module.
N_{p}: the number of cells connected in parallel in the given photovoltaic module.n: the ideality factor, 1.6
T: the module operating temperature, 323 K.
T_{ref}: the reference temperature, 298 K.
E_{g0}: the bandgap energy of the semiconductor, 1.1 eV (1 eV = 1.6 × 10^{–19} J).
3 The DDPG methodology
By considering the control system as an extension to a partiallyobservable Markov decision process (MDP), an actorcritic architecture DRL algorithm, i.e., deep deterministic policy gradient (DDPG), is adopted herein to explore and learn the optimal multiobjective control policy. Unlike the existing modelbased control algorithms that are heavily reliant on accurate system modeling, the DDPG algorithm is a modelfree approach that is wellsuited to sequential uncertain optimal control problems.
3.1 The problem formulation
For the integrated power system, the multiobjective value function to be minimized at the time can be formulated as a tradeoff between the power generation and power oscillation damping performance, as:
where ER(t) denotes the operational value function, \(\alpha_{1}\) and \(\alpha_{2}\) denote the penalty coefficients. \(R_{{\text{p}}} (t)\) denotes the normalized variation rate of the power output of the integrated power system.
The normalized variation rate of the power output of the integrated power system \(R_{{\text{p}}} (t)\) can be defined as:
where \(\Delta t\) denotes the time interval for evaluating the total power oscillations, \(\Delta P_{{\text{g,pv}}}\) denotes the integrated power variation during a time interval, and \(R_{{{\text{p}},\max }}\) denotes the variation rate of the power output of the integrated power system.
As a stochastic control process, MDP is designed to provide the framework for modelling the sequentialdecision making problem. MDP can be represented by a tuple of a finite of states, control actions and the state transition probability, and hence can model the interactions between the integrated power system and the DDPG agent.
The main components for MDP can be described as follows:

(1)
Agent The DDPG agent aims to generate control actions by gaining experience through repeated interactions with the environment. A welltrained DDPG agent can generate the optimal or nearoptimal control policy for the integrated system in real time. The control action is:
$$a(t) \triangleq T_{{\text{g}}} (t)$$(9) 
(2)
States As realtime information from the environment, the states are used to indicate the status of the environment. DDPG will make control decisions based on the obtained state information through interaction with the environment. The state of the integrated system is the turbine rotation speed, as:
$$s(t) \triangleq \{ \omega_{{\text{r}}} (t)\}$$(10) 
(3)
Reward As the evaluation index of the DDPG agent in MDP, the reward is designed to guide the agent to learn the optimal control policy by minimizing the control objective in (10). The reward is designed as the performance value, as:
$$r(a(t)s(t)) \triangleq {\text{ER}}(t)$$(11) 
(4)
Constraints The essential constraints of the integrated system include the constraints for the control actions and states i.e.:
$$\left\{ {\begin{array}{*{20}l} {0 < \omega_{{\text{r}}} < \omega_{{\text{r,rated}}} } \hfill \\ {T_{{{\text{g}},\min }} < T_{{\text{g}}} < T_{{{\text{g}},\max }} } \hfill \\ {0 < C_{{\text{p}}} < 1} \hfill \\ \end{array} } \right.$$(12)
where \(\omega_{{{\text{r}},{\text{rated}}}}\) denotes the rated turbine rotation speed (1.267 rad/s), \(T_{{{\text{g}},\min }}\) and \(T_{{{\text{g}},\max }}\) denote the minimum and maximum generator torques, respectively.
As a discounted sum of the reward function, the accumulated reward R(t) of the exploration process beginning with the state s(t) can be formulated as:
where \(\gamma \in \left[ {0,1} \right]\) denotes a discount factor, and T denotes the time period of the exploration process.
For the policy \(\pi\) generated by the agent, the value function \(Q^{\pi } \left( {s,a} \right)\) for performing the exploration process can be described as:
The DDPG agent approximates the longterm reward and hence the optimal control policy \(\pi *\) can be designed as:
At each discrete time step in MDP, the agent decides a possible action as input for the environment by observing the current state. This results in the next state and a reward from the environment. Consequently, through continuous interactions with the environment and mapping the local states to the control actions at finite time steps, the DDPG agent can derive the optimal control policy \(\pi *\) such that the maximum accumulative reward over time can be achieved sequentially.
3.2 The implementation procedure
The DDPG agent approximates the longterm reward given actions and observations by using two critic value function representations for different purposes: the critic function provides the judgement of the actor and the actor function learns and updates the control policy by using the minimum value function estimate [18].
As illustrated in Fig. 3, the overall control architecture comprises three parts, i.e., the Actor networks (online actor and target actor networks), the Critic networks (online critic and target critic networks), and the experience replay buffer [19]. The Actor networks are used to map the states to the control action, the Critic networks are employed for estimating the value of state and stateaction, while the replay buffer takes charge of storing experiences. The twocopy networks (target actor and target critic networks) are employed to improve the stability of the DDPG algorithm by calculating the target values. In addition, the experience replay buffer is adopted to store a large number of transitions and can randomly sample a minibatch data from the memory to help break the correlation among training data when updating. The actor network, which is parameterized by \(\theta^{\mu }\), has the output \(a(t) = \mu (s(t);\theta^{\mu } )\), while the output of the target actor network (parameterized by \(\theta^{\mu \prime }\) is \(a(t)^{\prime } = \mu^{\prime } \left( {s(t + 1);\theta^{{\mu^{\prime } }} } \right)\). The output of the critic network (parameterized by \(\theta^{q}\)) is \(q = Q\left( {s(t),a(t);\theta^{q} } \right)\), and the output of the target critic network (parameterized by \(\theta^{q\prime }\)) is \(q^{\prime } = Q^{\prime } \left( {s(t),a(t)^{\prime } ;\theta^{q\prime } } \right)\).
During the MDP training process, the parameters of the actor networks are updated by the gradient descent method in the direction of reducing the loss. Hence:
where m denotes the batch size.
The parameters \(\theta^{q}\) of the critic network are updated by minimizing the following loss function:
Additionally, a soft update strategy is employed to update the parameters of the target actor and critic networks, as:
where \(\theta^{\mu \prime } ,\theta^{q\prime }\) denote the target actor and critic network parameters, respectively.\(\tau \in (0,1]\) denotes the coefficient of the soft update.
The clipped Gaussian noise (or the target policy noise) is also added to the actions to improve the randomness of the control actions, as:
where \(\mathcal{N}\) represents the Gaussian process, which makes the DDPG agent more effectively explore the continuous action domain.
Afterwards, the algorithm starts running with episodic iteration and contains three stages: exploration, learning and convergence. In order that the critic can provide a more accurate judgement and the actor can learn a better control strategy, the actor and critic networks are trained against each other and the main algorithm is designed as the following:
4 Results and discussions
The potential and effectiveness of the DDPG control for the integrated OWT and offshore PV power system are evaluated based on design experiments. The performance of the optimal power generation and power oscillation suppression is also verified. The validations are conducted based on comparison with the results obtained by using a PI (ProportionalIntegral) type conventional controller widely used in the wind energy industry.
4.1 The design experiments
The employed OWT is a threebladed 5 MW offshore upwind horizontalaxis variable speed wind turbine with a monopile type platform. The power coefficient is a function of the tipspeed ratio and bladepitch angle [20], while the peak power coefficient is 0.48 at the tip speed ratio of 8.116. The minimum generator speed is 670 rpm which corresponds to the minimum turbine rotation speed of 6.9 rpm. The PV cells are designed and grouped in larger units called PV modules, which are further interconnected in a series–parallel configuration to form a PV array.
The main parameters for the case study are provided in Table 1.
The OWT is initially operated under belowrated wind conditions (7–10 m/s) and the PV panels operate based on the maximum power point tracking mode. The DDPG algorithm and MDP are implemented by using the Tensorflow in Python. The MDP model is designed with a discrete time step of 0.1 min and the simulation time period is 6 min. All the DDPG parameters have been carefully tuned to obtain satisfactory performance of the integrated power system. Both the actor and critic networks have five fullyconnected layers with 50 neurons adopted by rectified linear units to each layer. They are then fed into the fullyconnected layer of the actor and critic networks to approximate the action and Q value, respectively. The dropout layers are also employed to avoid the vanishing gradient problem. The DDPG algorithm are trained for 300 episodes to learn the optimal generator torque control strategy.
As shown in Fig. 4, the turbulent inflow wind speed for the OWT is generated with the inflow wind speed to be within the range of (7, 10) m/s, and is employed for the optimal peak power capture and power smoothing of the integrated power system.
As shown in Figs. 5 and 6, 100 PV modules are connected together to form a PV array for validating the power generation control of the integrated OWT and PV power. For each PV module, the PV current varies within 30–320 A and the PV voltage varies within 1–48 V, while the PV power output varies around 0.5 MW. The typical PV array characteristics are employed as the exogenous power input for testing the DDPG method since the PV power acts as external perturbation for the control design.
4.2 The power generation performance
As shown in Fig. 7, the turbine rotor speed can be well regulated such that the maximum active power from the wind can be extracted when the DDPG control is employed while the rotation speed cannot be well regulated to track the optimal value when the conventional control is employed. The result suggests a promising potential in the use of the DDPG in achieving smooth wind power generation and regulation, which is particularly important for electric power with high penetration of intermittent renewables.
As shown in Fig. 8, the integrated power system with the DDPG control can quickly respond to sudden changes of the inflow wind condition. Because of the sufficient capability, the power coefficient of the OWT can be maintained around the optimal value using the conventional control while the power coefficient will vary such that the overall power variations can be compensated for when the DDPG control is used. The result indicates that it is possible to adaptively regulate the power coefficient by using the DDPG control as compared with the conventional control when considering more control objectives.
As shown in Fig. 9, the total power generated from the OWT and PV array has significant oscillations because of the intermittent wind inflow and the PV power variations when using the conventional control method. The rapid variations in the generated power will in turn cause mechanical vibrations on the OWT blades and tower. By using the DDPG control for the power oscillation damping, it is clear that variations of the integrated power generation can be reduced, making the OWT complement the PV power outputs and smoothing the total power fluctuations.
4.3 The training performance of DDPG
As shown in Fig. 10, the episode score converges quickly to around 0.45 at about 50 episodes for the DDPG training despite the initial oscillations of the episode score values. Therefore, the DDPG method has good convergence performance and it is possible to design and apply a welltrained DDPG agent in the control of the integrated power system.
As illustrated in Fig. 11, the rolling score for DDPG also converges eventually to around the value of around 0.45 which is the same as the episode score. In practice, the rolling score is a filtered version of the episode score and it is clear that the DDPG control method will converge to the optimal rolling score value. The results clearly reveal that it is promising to apply the DDPG control in the integrated power system such that the optimal rolling score can be achieved.
As observed from Fig. 12, the normalized variation rate of the power output of the integrated power system per episode decreases and reaches a steady state value of around 0 quickly, within about 50 episodes, indicating that the DDPG control of the integrated power system is closed loop stable and can well achieve the joint operation of power efficiency maximization and power oscillation damping.
As illustrated in the above validation results, it is clear that the DDPG method has relatively good performance and the following advantages:

(a)
The proposed DDPG method has a faster response than the conventional control method. By using the welltrained DDPG model, it is possible to generate the control action within a very short time interval and hence the trained agent in DDPG will be actively involved in the control of the integrated power system.

(b)
The DDPG method can provide a feasible solution to the constrained finite time optimal multiobjective control problem. Hence, by using the DDPG method, the control objectives of improving energy efficiency and smoothing power oscillation of integrated offshore wind and photovoltaic power can be both taken into consideration.

(c)
The proposed RL method is incorporated with a feasible or allowable range of the control actions such that the safety of the integrated power system can be checked in a timely manner in case the agent generates wrong actions and causes failed future power generation. Therefore, the proposed DDPG method is safer than the conventional control algorithm.

(d)
The proposed DDPG method is modelfree and does not require prior domain knowledge or a predefined rule to decide how to choose an action.
Considering the above advantages of the proposed method, it is promising to use the DDPG control method in realworld applications. Therefore, in order to implement the DDPG method, the hyperparameters of the DDPG model can be adaptively tuned to match the realworld control plant. Also, the valuable knowledge from the control plant can be collected before the implementation such that the decisionmaking problems can be quickly addressed. In addition, a laboratorylevel testbed using the Raspberry Pi boards [21] could be first constructed by specifying key technical details and implementation procedure over which the DDPG algorithm can be tested and validated such that appropriate control actions can be generated before practical application.
5 Conclusion
The paper has presented the design, dynamics and DDPG optimal control for an integrated OWT and PV power system, considering the damping of power oscillations in particular. A variablespeed OWT with electrical generator torque control has been used in the integrated offshore system for active power regulation and oscillation damping. The results from design experiments indicate that the OWT responds quickly to maximize the total power generation when using the DDPG control, while the power oscillations can also be better damped by regulating the generator torque when using the DDPG control. The results indicate the potential for synergies between offshore wind and PV power utilization, and it is clear that the complementary operations of the integrated system can achieve peak power shaving performances. In future work, a scaleddown prototype of the offshore WT and PV station will be built and experimental verifications of the proposed control approach will be conducted.
Availability of data and materials
Data and materials will be available upon reasonable request.
References
Parastegari, M., Hooshmand, R. A., Khodabakhshian, A., et al. (2015). Joint operation of wind farm, photovoltaic, pumpstorage and energy storage devices in energy and reserve markets. International Journal of Electrical Power & Energy Systems, 64, 275–284.
Wu, Y., & Zhang, T. (2021). Risk assessment of offshore wavewindsolarcompressed air energy storage power plant through fuzzy comprehensive evaluation model. Energy, 223, 120057.
Wang, X., Mei, Y., Kong, Y., et al. (2017). Improved multiobjective model and analysis of the coordinated operation of a hydrowindphotovoltaic system. Energy, 134, 813–839.
Golroodbari, S. Z. M., Vaartjes, D. F., Meit, J. B. L., et al. (2021). Pooling the cable: A technoeconomic feasibility study of integrating offshore floating photovoltaic solar technology within an offshore wind park. Solar Energy, 219, 65–74.
Tang, Y., Fang, G., Tan, Q., et al. (2020). Optimizing the sizes of wind and photovoltaic power plants integrated into a hydropower station based on power output complementarity. Energy Conversion and Management, 206, 112465.
Miller, I., Gençer, E., & O’Sullivan, F. M. (2018). A general model for estimating emissions from integrated power generation and energy storage. Case study: Integration of solar photovoltaic power and wind power with batteries. Processes, 6(12), 267.
López, M., Rodríguez, N., & Iglesias, G. (2020). Combined floating offshore wind and solar PV. Journal of Marine Science and Engineering, 8(8), 576.
Wu, Y., Zhang, T., Xu, C., et al. (2019). Optimal location selection for offshore windPVseawater pumped storage power plant using a hybrid MCDM approach: A twostage framework. Energy Conversion and Management, 199, 112066.
Elgammal, A., & Jagessar, M. (2020). Optimal control strategy for a marine current farm integrated with a hybrid PV system/offshore wind/battery energy storage system. European Journal of Electrical Engineering and Computer Science, 4(4).
Lund, H. (2006). Largescale integration of optimal combinations of PV, wind and wave power into the electricity supply. Renewable Energy, 31(4), 503–515.
Lu, T., Sherman, P., Chen, X., et al. (2020). India’s potential for integrating solar and onand offshore wind power into its energy system. Nature communications, 11(1), 1–10.
Feng, L., Zhang, J., Li, G., et al. (2016). Cost reduction of a hybrid energy storage system considering correlation between wind and PV power. Protection and Control of Modern Power Systems, 1(1), 1–9.
Anjaiah, K., Dash, P. K., & Sahani, M. (2022). A new protection scheme for PVwind based DCring microgrid by using modified multifractal detrended fluctuation analysis. Protection and Control of Modern Power Systems, 7(1), 1–24.
ElKareem, A., Hesham, A., Abd Elhameed, M., et al. (2021). Effective damping of local low frequency oscillations in power systems integrated with bulk PV generation. Protection and Control of Modern Power Systems, 6(1), 1–13.
Karanam, A. N., & Shaw, B. (2022). A new twodegree of freedom combined PID controller for automatic generation control of a wind integrated interconnected power system. Protection and Control of Modern Power Systems, 7(1), 1–16.
Xia, Y., Ahmed, K. H., & Williams, B. W. (2012). Wind turbine power coefficient analysis of a new maximum power point tracking technique. IEEE Transactions on Industrial Electronics, 60(3), 1122–1132.
Pandiarajan, N., Ramaprabha, R., & Muthu, R. (2012). Application of circuit model for photovoltaic energy conversion system. International Journal of Photoenergy, 2012.
Rui, X., Su, R., Wu, X., et al. (2014). The conceptual design of gridconnected wind turbine based on speed regulating differential mechanism. Journal of Mechanical Science and Technology, 28(6), 2215–2220.
Liu, T., Hu, X., Hu, W., et al. (2019). A heuristic planning reinforcement learningbased energy management for powersplit plugin hybrid electric vehicles. IEEE Transactions on Industrial Informatics, 15(12), 6436–6445.
Kühne, P., Pöschke, F., & Schulte, H. (2018). Fault estimation and faulttolerant control of the FAST NREL 5MW reference wind turbine using a proportional multiintegral observer. International Journal of Adaptive Control and Signal Processing, 32(4), 568–585.
Zhang, X., Lu, R., Jiang, J., et al. (2021). Testbed implementation of reinforcement learningbased demand response energy management system. Applied Energy, 297, 117131.
Acknowledgements
The work was supported by “the Fundamental Research Funds for the Central Universities. No. 2042022gf0008. The work was supported by the Guangxi Science and Technology Base and Talent Special Project No. 2019AC20266. The work was supported by the Natural Science Foundation of Guangxi under Grant No. 2019JJB160062. The work was supported by Guangdong Basic and Applied Basic Research Foundation (Grant No. 2019A1515110709).
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
XY: Conceptualization, Methodology, Writing—original draft, Data curation, Visualization, Investigation, Validation. Both authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yin, X., Lei, M. Jointly improving energy efficiency and smoothing power oscillations of integrated offshore wind and photovoltaic power: a deep reinforcement learning approach. Prot Control Mod Power Syst 8, 25 (2023). https://doi.org/10.1186/s41601023002987
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s41601023002987