A power fluctuation evaluation method of PV plants based on RankBoost ranking

Fluctuation evaluation is an important task in promoting the accommodation of photovoltaic (PV) power generation. This paper proposes an evaluation method to quantify the power fluctuation of PV plants. This consists of an index system and a ranking method based on the RankBoost algorithm. Eleven indices are devised and included in the index system to fully cover diverse fluctuation features. By handling missing and invalid data effectively, the ranking method fuses multiple indices automatically and provides a systematic and comprehensive comparison of power fluctuation. Simulation results based on power data from six PV plants indicate that the evaluation list obtained by the RankBoost ranking method is better represented and more comprehensive than that derived by the equal weight method.


Introduction
With the aggravation of the energy crisis and environmental pollution, PV power, as one of the most important renewable resources, has developed rapidly and attracted significant attention [1]. Nevertheless, the uncertainty and fluctuation of PV power negatively impact on power quality and power system reliability, and bring huge challenges for energy dispatching and renewable energy accommodation [2]. PV plants differ in their power fluctuation and exerted influence on a power system, and this leads to varying preferences of different PV plants by operators and market participants. Hence, it is necessary to quantify and evaluate fluctuation characteristics, and obtain the comparison and ranking results of different PV plants in their power fluctuation. For power system operators and electricity market participants, a ranking list can provide suggestions on the priority of PV power to be accommodated or traded.
Fluctuation evaluation consists of a quantified index system and a comprehensive fusion method for individual indices. Adequate selection of evaluation indices is of great significance for reducing computational complexity and improving evaluation credibility. The indices used in PV power evaluation can sorted into numerical indices, probability indices [3,4] and time indices. These indices also vary in ranges and can be classified into overall indices or local indices. The mean [1] of PV power gives an overview of the power level while the variance and solar variability focus on the overall fluctuation features. And apart from the numerical description of fluctuation, skewness and kurtosis [1] denote the probability distribution of PV plants. Since the probability distribution of PV power usually shows a stark departure from a normal distribution, it is skewness and kurtosis are inappropriate for measuring the difference of probability distribution. It is more appropriate to depict the probability distribution referring to a typical pattern of PV probability distribution. Paying more attention to the local features, fluctuation ratio or ramp ratio [5,6] is proposed to quantify the fluctuation at adjacent moments in the time series. However, these two indices contain excessive information on the fluctuation such that they fail to fulfill a single and synthetic evaluation based on a large amount of historical data. To produce an overall evaluation of PV plants, the deviation ratio and sum of gradient [2] and an overall fluctuation tendency index K fluc of solar radiation is proposed in [7]. However, some extreme and unusual features have been missed. Reference [8] considers the time feature of the solar irradiance by adopting the time of high-level/low-level irradiance to measure the controlled solar irradiance fluctuation. However, it is difficult in practice to set a suitable threshold to PV power between the high-level and lowlevel. It is rather more practical to discuss zero output and nonzero output.
Any single index is insufficient to characterize the fluctuation of PV power. However, none of the above described literatures has formed a complete, scientific, and reasonable index system for fluctuation evaluation of PV power. In addition, the impact of PV fluctuation on load regulation is of great concern, and thus, quantified indices denoting the ability of load regulation need to be investigated.
The fluctuation analysis and quantization method can be generally divided into the following four groups: 1. Grading and classifying the fluctuation based on a particular index is used in reference [5] where the high and low fluctuations to PV plants based on the ramp rate are allocated. However, although it seems the quantified indices is employed, it returns to qualitative analysis after quantification. 2. Probability models are used to analysis the probability feature of PV power, such as the Gaussian mixture or empirical probability model [8]. It helps to form the overview of all the power data, but it lacks the information of order and fails to analyze the fluctuation of two adjacent sampling points. 3. The Fourier transform and wavelet transform methods are introduced to quantify the fluctuation on different time scales in the wavelet spectrum [9]. Although it differentiates the diffusive and jumpy characteristics of PV power with this method, it is complex and not intuitive enough for the operators and market participants since the raw data has been transformed. 4. A typical step waveform [10] is proposed to simulate the solar irradiance. This method is a theoretical account of fluctuation and it is also not intuitive enough for the understanding of fluctuation.
On the comprehensive combination of multiple indices, the most widely used one is to allocate a weight for every index and summarize all the results to obtain an overall score. However, subjectively selecting weights for different parameters is extremely difficult in practice. Reference [11,12] puts forward a minimum variance and weighted geometric averaging operator to determine the weight of each indicator, while reference [5,12] determines the weight of each index by the deviation and mean value. However, the method of artificially allocating weights lacks theoretical foundation. With the latest development of machine learning, the RankBoost algorithm has been employed to produce the maintenance priority for breakers in a power system by combining multiple indices with different ranges and dimensions [13,14].
The main contributions of this study on PV output evaluation are as follows: 1. It provides an in-depth study on the power fluctuation features. 2. Considering characteristics of PV output and PV operational issues, a systematic and complete index system is inaugurated to quantify power fluctuation from its three different aspects. 3. An effective approach is developed to quarry out invalid data or errors in PV power data collection. 4. The machine learning method named RankBoost is applied to produce a synthetic evaluation by automatically fusing the indices with different ranges and dimensions.
The completeness and rationality of the proposed evaluation method are discussed through a case study in Section 3.

Methods
This section presents the index system and the Rank-Boost ranking method employed to evaluate the power fluctuation by elucidating the characteristics of PV power.

The characteristics of PV power
PV power is more deterministic than wind power at the daily scale. Influenced by the day cycle of solar irradiance, the day curve of PV power assumes a unimodal pattern and its shape is similar to the positive semi period of the cosine curve as shown in Fig. 1. PV power usually rises around sunrise, reaches a maximum around midday and ultimately falls to zero near sunset. The earliest time within a day when PV power sets to rise from zero is defined as the initiation time, while the time when PV power falls to zero is the ending time. The period between the initiation time and ending time is called the effective period of PV power. And if the there is a day when PV output keeps zero or negative all day long, the the initiation time, ending time and effective period do not exist and the output data of this day would not be included in fluctuation evaluation.
The fluctuation of PV power is reflected in the effective period, power and electricity generation. The initiation time, ending time and effective period are all variables since there are some horizontal or transverse movements in the power day curve. In addition, vertical movement in the power day curve should also be measured and included in the index system. Finally, integral variation of the power day curve represents the fluctuation of electricity generated by the PV plants. Given these three aspects, eleven indices are proposed to demonstrate and assess the features of PV power fluctuation.

Index system for PV power fluctuation evaluation
The proposed index system has the following advantages: 1) The fluctuation characteristics are divided into three aspects, providing a systematic and comprehensive perspective on the fluctuation. 2) Every aspect of the fluctuation is clearly described and quantified by certain indices. 3) Every index is intuitively clear, brief, and easy to understand by power system operators. 4) The fluctuation is measured by comparing not only with the adjacent sampling point but also with the load fluctuation, as indices measuring the impact of PV power on load regulation are also designed.
Since fluctuation must be discussed based on reference values, different reference values are designed in different aspects of fluctuation. The fluctuation of effective time is compared with the times of sunrise and sunset, whereas both the power and the electricity generation are compared with their adjacent sampling points and load. The index system involving three aspects of fluctuation is shown in Fig. 2 and will be illustrated in detail in the following.

(1) Fluctuation evaluation of effective time
Since solar radiation exerts a great effect on PV power, the initiation time and ending time of PV power have close relationships with the times of sunrise and sunset. In addition, the sunrise and sunset times in different regions are usually recorded and accessible to researchers. Hence, short time displacement rate (STDR) is proposed to measure the probability of negligible differences between the initiation time and the sunrise time or between the ending time and sunset time as: where t up and t down represent the sunrise and sunset times, respectively. t initiation denotes the initiation time of the PV plant while t ending is the ending time of the PV plant. TD starting and TD terminal are the initiation time displacement and the ending time displacement, respectively. If TD starting or TD terminal is positive, it indicates that the initiation time or ending time lags the sunrise or sunset. Otherwise, the initiation time or ending time precedes the sunrise or sunset. To evaluate the probability distributions of the time displacement, 15 min is chosen as the threshold value to calculate STDR: where D is the number of days that are calculated and evaluated, and F D [con] is a function obtaining the number of days in which the condition con is met. This index helps to evaluate the uncertainty and extreme fluctuation of effective period and the rationality of ignoring the gaps between the initiation time and the sunrise or the ending time and sunset. The more likely time displacement is less than 15 min, the more accurately to predict or describe the initiation time and ending time by the sunrise time and sunset time.
(2) Fluctuation evaluation of PV power A. Fluctuation range ratio Fluctuation range ratio is given as: where N denotes the number of sampling points in one day. P i represents the power of the PV plant at the sampling point of i, while P max is the maximum P i in one day. The fluctuation range ratio f r reveals the overall level of the fluctuation. This can provide a quantified overview of the fluctuation range for operators. For example, f r = 2 indicates that the day curve of power assumes a perfect unimodal pattern which means the PV power keeps rising before reaching the peak and then continuously dropping without any oscillation. The larger f r is, the sharper the fluctuation is. Normalized by the P max , fluctuation range ratio excludes overall rising and falling tendency from the fluctuation, which is suitable and reasonable for fluctuation evaluation for PV power.

B. Maximum positive fluctuation and minimum negative fluctuation
Maximum positive fluctuation ΔP max and minimum negative fluctuation ΔP min are given as: ΔP max and ΔP min demonstrate the extreme degree of the fluctuation, which can help operators evaluate whether the range of PV plant fluctuation is acceptable for the power system and examine the obstacles it could bring to power dispatching. Normally, PV plant power is either positive or zero. However, some auxiliary power is consumed so the minimum negative power denotes the maximum auxiliary power that the PV plant draws from the grid.

C. Transition degree
As illustrated before, fluctuation range ratio provides an overview of the fluctuation range. Nevertheless, for real-time power dispatching, the variation of adjacent sample time is also of great interest. Thus, to evaluate the transition frequency of PV power, the coefficient of transition D T is illustrated as: If F P (P i ) > 0, it means that the PV power changes direction and reaches an extreme at sampling time i. The larger D T is, the more frequently the PV power changes its direction. This means the PV power is more fluctuating and unpredictable. Thus, the transition degree represents an overall view of how PV power changes in fluctuation direction.

D. Load matching degree
Power system load is of varying and fluctuating nature and exhibits some daily or seasonal regularity. The fluctuation of load and its consistency of the trends with PV power have close relationships with the difficulty of dispatching, as: where L i is the total power of the load in the region of PV plants. Load matching helps operators assess the difficulty of peak modulation that the fluctuation of PV plant could bring after being connected to the system. F L ( ) is a defined function which identifies the trend consistence of PV outputs and the load [1].

E. Extreme fluctuation frequency
As mentioned above, the extreme boundary of fluctuation is reflected by the maximum positive fluctuation and minimum negative fluctuation. However, the frequency of extreme margin of fluctuation is also of interest to power system operators. Thus, the extreme fluctuation frequency can be given as: where ΔP avg+ is the annual average of the daily The daily electricity generated by PV plant can be calculated as: where d is the sequence number of days in 1 year and P(t) is the power of the PV plant at time t in day d. E d (d = 1, 2, …, D) can be sorted into a descending sequence and a curve can be drawn to describe the decreasing characteristics of the sequence, as shown in Fig. 3. This figure is called the annual sustained electricity curve and the abscissa is the accrued probability of electricity generation. The function of the curve can be represented as E(p) and p is the accrued probability. The 95% effective hours and 95% assuring hours are defined as: where P N is the rated power of the PV plant. E(5%) means there is 5% probability that daily electricity generated by the PV plant exceeds the E (5%). In other words, there is 95% probability that daily electricity generated by the PV plant is less than the E (5%). Similarly, there is 95% probability that daily electricity generated by the PV plant is above the E (95%). E (5%) and E (95%) evaluate the top and bottom boundaries of effective and reliable electricity. The 95% effective hours and 95% assuring hours denote the daily power generation of the PV plant from the perspective of probability. The 95% EH and 95% AH are normalized by the P N to eliminate the influence of plant size. The 95% EH helps operators evaluate the effective utilization of the PV plant, while the 95% AH helps operators to have a fair idea of assuring power generation which can be considered as a constant power generated by the PV plant despite the fluctuation of daily power.

B. Electricity ratios at peak and valley
To reflect the impact of PV power on load modulation, the load matching degree has been proposed to calculate the consistency of the trends of the PV power and the load. However, the difficulty of load modulation is also associated with the electricity generation, and higher electricity generation by the PV plant at peak time means less difficulty for the power system to modulate the load. Before proposing the electricity ratios at peak and valley, the definitions of the peak period and valley period are described.
Considering L avg is the average power of load in 1 day and L(t) is the total power of the load in the region of the PV plant at time t, for t∈[t 1i , t 2i ], the time period t 1i~t2i is called the peak period if L(t) > L avg , or the valley period if L(t) < L avg . The electricity ratios at peak and valley are thus given as: where ERP represents the electricity ratio at peak and the ERV denotes the electricity ratio at valley. N p and N v are the numbers of the peak period and valley period for 1 day. It should be noted that: 1) All the 11 indices evaluate the fluctuation at a scale of day except for the extreme fluctuation frequency, which produces an annual result. Therefore, to use the proposed index system, a minimum of one-year power data should be collected. 2) For the daily indices, annual mean should be further calculated to form the annual evaluation index system. 3) Power systems or load exhibit different preferences to different features of fluctuation. For some indices, such as the 95% effective hours, the larger they are, the more favorable the PV plant is, and accordingly, they are called maximum indices. On the contrary, for other indicators such as the maximum positive fluctuation, the larger they are, the less favorable the PV plant is, so they are classified as minimum indicators.

RankBoost based ranking method for PV power fluctuation evaluation
The index system consists of 11 indices which describe different aspects of fluctuation characteristics and they are in different ranges with different units. To compare the fluctuation of different PV plants and generate an overall list, the RankBoost algorithm [14] is applied to combine individual indices. The method has two main advantages: 1) Different characteristics of fluctuation described by parameters/indices with different ranges and units can be combined by a learning process, which automatically creates fluctuation preference scores to different PV plants and generates an overall ranking list. 2) Missing data, invalid data or errors can be effectively and automatically handled [14].

Principle of the RankBoost ranking method
The detail procedure of the RankBoost ranking method has been illustrated in [13,14], so only its main concept is described here.
The RankBoost ranking method is one type of the pairwise ranking method [13], which determines a comprehensive ranking by analyzing the relative position of every pair of the objections in the ranking sequence, rather than calculating the ranking scores of the individual objects. To find the best comprehensive ranking, the probability of mis-ordering between the best ranking and individual rankings is defined as ranking loss. The best ranking will be successfully found when the ranking loss reaches the minimum value. It can be proved that minimizing the ranking loss is equivalent to minimizing the following Z function [14]: where r is the iteration index. S k (i) denotes the k th index results of the i th PV plant and m denotes the number of the ranking objects. a r is a factor that is a function of h r while h r is a 0-1 valued function that can be calculated from the individual rankings. D r is the distribution of every pair of individual indices of the PV plants. This reveals the rationality of the relative position of a pair of objects in the ranking list. By iterative solution as shown in Fig. 4, Z r can be optimized and the comprehensive ranking can be obtained. It is worth noting that the absolute H values are not important since what the RankBoost algorithm provides is a relative ranking order. The ranking results are nonlinear, which means that the change of one input data can change the ranking order.

Data preprocessing
The raw data for fluctuation evaluation is the sampling power sequences covering a minimum period of 1 year. However, errors during sampling, empty data, invalid power data etc. can occur and will have a detrimental influence on the accuracy of the fluctuation evaluation. Therefore, to improve the accuracy and reliability of the ranking results, the existing data reprocessing process considering the features of PV power data is improved using the following procedures.
(1) Data cleansing When the sampling power is missing, the sampling results may show null which is invalid for fluctuation evaluation. Therefore, days containing Null data should be eliminated and omitted from the fluctuation evaluation.
(2) Error screening There are potential errors in the sampling process such as from the measuring equipment or data transmission process. Hence, raw data should be screened for errors which do not conform to the characteristics of PV power described in Section 2.1. By calculating the index results in the index system and combining the features of error data shown in Table 1, the error data can be located and omitted.

Procedure of the proposed ranking method for power fluctuation evaluation
The procedure of the proposed RankBoost method to determine the fluctuation priority of power for PV plants includes the following steps: (1) Collect the power data of all the target PV plants.
In addition, the data should cover a minimum period of 1 year. (2) Cleanse the null data.
(3) Calculate t initiation , t ending , TD starting , TD terminal , f r and screen the error data according to the features shown in Table 1. (4) Input the cleansed and screened data to the index system to calculate every index result for all the objects for ranking.

Results and discussion
In this section, the historical data collected from six PV plants are used to demonstrate the correctness and effectiveness of the proposed fluctuation evaluation method.

Simulation data
The historical power of six PV plants and total load power were collected from a regional power grid from 2017 to 2019, sampled every 15 min. Simulations are performed in MATLAB(R2016a).

Table 1 The features of indices for error data
Characteristics of error data The problems of error data If t initiation ≤ 0 or t ending ≤ 0, the PV power does not raise or fall, and is keeping either constant or zero during the whole day, which does not conform to the daily features of PV power described in Section 2.1.
TD starting < − 1 h or TD terminal > 1 h If the initiation time precedes sunrise or the ending time lags the sunset for one hour, it indicates the PV plant starting power generation one hour before the sunrise and continuing one hour after sunset, which is considered unnatural.
TD starting > 4 h and TD terminal < − 4 h If the initiation time lags sunrise and the ending time precedes the sunset for more than 4 h, the effective time of PV plant is so short and it becomes meaningless to analyze its fluctuation.
f r < 2 If f r ≥ 2, the day curve of power generally assumes unimodal pattern. If f r < 2, the power experiences continuous increase during the whole day or keeps constant, which is considered error data.

Data preprocessing
As shown in Table 2, the proportions of data cleaning and error screening differ in different PV plants, while Plant 6 exhibits the best reliability and correctness in data measurement and collection. Although the invalid data takes relatively small proportions for all the six plants, the null data and error data of plant 1 have the largest proportion among the six plants. This demonstrates that the data measurement and collection system of plant 1 needs improvement. After data preprocessing, the indices for all the PV plants are derived, as listed in Table 3. By ranking the PV plants based on the individual indices, the distribution pattern of the PV plants in every individual ranking list is described in Fig. 5. 3.3 Fluctuation evaluation for PV plant power based on the proposed index system PV plants perform differently in different features of fluctuation described by different indices. Nevertheless, there still exist some consistencies: 1) Maximum positive fluctuation and minimum negative positive fluctuation, 95% effective hours and 95% assuring hours can be considered as two pairs of indices, while all the PV plants exhibit the same performance on either index of the same pair. 2) By analyzing the individual rankings, it is found that Plant 6 performs badly in most rankings. Hence, the last position in the comprehensive ranking list for fluctuation is assigned to Plant 6, whereas it is hard to arrange other positions directly. Table 3 reveals that the plant indices are of different ranges and units. In addition, some plants differ from the others in some obvious indicators, while on other indicators plants only show small differences. For example, the load matching degrees of the six PV plants are very close to each other, while the 95% effective hours show significant differences.

Fluctuation ranking of PV plants based on the proposed ranking method
As shown in Fig. 6, with the RankBoost ranking method, the fluctuation preference priority of the six PV plants is ranked by the ranking score H. A PV plant with higher H value fluctuates less in power and has less negative impact on power prediction and load regulation.
The comprehensive ranking results demonstrate that Plant 3 is the most outstanding in terms of power fluctuation while Plant 5 behaves the worst. This is proved by the distribution pattern of individual rankings shown in Fig. 5.
Equal weight is a traditional approach to deal with multiple parameters for ranking. Figure 7 shows the ranking results using an equal weight for all the 11 indices. The minimum indicators can be transferred into maximum indicators by x* = 1/x, whereas the negative indicators can be transferred into positive ones by x* = −x. Equal weight is then employed to combine the multiple parameters and plants with higher weighted score value perform better in fluctuation.
Comparing the results of these two approaches, it can be seen that Plant 1 and 2 coincidentally have the same ranking. However, the other 4 plants rank differently in the two ranking lists. In particular, Plant 5 is the worst plant in the list of the RankBoost method, whereas in the list of equal weight, it rises to third place. This is because after the conversion from minimum indicators to maximum indicators and from negative indicators to positive indicators, the values of maximum positive fluctuation and minimum negative fluctuation are almost ten times more than the other indicators. Hence, they take the dominant role in the equal weight results, amplifying the good performance of Plant 5 on these two aspects. This proves that the RankBoost ranking method can lead to a more balanced and comprehensive evaluation than the equal weight method.

Conclusions
This paper proposes an evaluation method to quantify the power fluctuation of PV plants, which consists of an index system and a ranking method based on the Rank Boost. By handling missing and invalid data effectively and combining the indices of fluctuation, the proposed method can provide a systematic and comprehensive comparison of PV plant power fluctuation. The derived ranking information of PV plants is significant for the power system operator and electricity market participants.
The historical power data of six PV plants are used to verify the effectiveness and correctness of the proposed method. The data preprocessing results show that the data collection process has good reliability, while the screening and cleansing method of power data can serve as an evaluation of reliability and correctness for data collection devices. Maximum positive fluctuation and minimum negative positive fluctuation, 95% effective hours and 95% assuring hours can be considered as two pairs of indices, on which all the PV plants exhibit the same performance. It is also shown that the fluctuation of PV power can be comprehensively quantified and precisely described by the RankBoost ranking method. Moreover, a more balanced and comprehensive evaluation can be obtained by the RankBoost ranking method than the equal weight method as the latter may lead to a predominant role for some indices while weakening the others.