Data-driven next-generation smart grid towards sustainable energy evolution: techniques and technology review

Meteorological changes urge engineering communities to look for sustainable and clean energy technologies to keep the environment safe by reducing CO2 emissions. The structure of these technologies relies on the deep integration of advanced data-driven techniques which can ensure efficient energy generation, transmission, and distribution. After conducting thorough research for more than a decade, the concept of the smart grid (SG) has emerged, and its practice around the world paves the ways for efficient use of reliable energy technology. However, many developing features evoke keen interest and their improvements can be regarded as the next-generation smart grid (NGSG). Also, to deal with the non-linearity and uncertainty, the emergence of data-driven NGSG technology can become a great initiative to reduce the diverse impact of non-linearity. This paper exhibits the conceptual framework of NGSG by enabling some intelligent technical features to ensure its reliable operation, including intelligent control, agent-based energy conversion, edge computing for energy management, internet of things (IoT) enabled inverter, agent-oriented demand side management, etc. Also, a study on the development of data-driven NGSG is discussed to facilitate the use of emerging data-driven techniques (DDTs) for the sustainable operation of the SG. The prospects of DDTs in the NGSG and their adaptation challenges in real-time are also explored in this paper from various points of view including engineering, technology, et al. Finally, the trends of DDTs towards securing sustainable and clean energy evolution from the NGSG technology in order to keep the environment safe is also studied, while some major future issues are highlighted. This paper can offer extended support for engineers and researchers in the context of data-driven technology and the SG.


Introduction
Data-driven technologies have become a widely used set of techniques in the field of scientific research and engineering where data are being used for understanding, maintaining, and turning typical systems into smart sustainable systems.The use of data-driven techniques (DDTs) is gaining popularity in various engineering sectors because of their appearance in decision making, transparency, reliability, and sustainability.For example, data-driven machine learning (ML) techniques are used for analysis, prediction, control and diagnosis in medical research [1], precise agriculture [2], quantum finance [3], risk management in the supply chain [4], etc.These techniques may be supervised, semi-supervised or unsupervised depending on the availability and condition of collected data, and can obtain a higher rate of success than typical methods used in the various fields of science and business.Because of the increasing trends of data-driven methodologies, researchers have started contemplating the presence of DDTs in conventional power systems.This has enabled the construction of a next-generation smart grid (NGSG) from the typical smart grid (SG).It also accelerates the traditional SG to unlock the full potential of future SGs with zero carbon emission and lifelong sustainability.
The conventional SG is an improved version of the traditional power grid and microgrid, where advanced technologies are used to enable communication, simulation, sensing, decision-making, etc.A comparative study of the microgrid and different versions of SG in terms of technical features associated with them is of great importance.An SG allows the components of the grid, e.g., smart meters, renewable energy sources (RESs), advanced communication systems, closed-loop feedback systems, distributed generation, storage, etc., to communicate with each other.The grid ensures the production of sufficient high-quality power while integrating other benefits such as self-healing capabilities, fault assessment, consumer friendliness, cyber and physical security [5].Because of the extended features as compared to the microgrid, some countries have successfully enabled the SG with good annual growth rate, as shown in Fig. 1 [6].This is feasible because the existing SGs all around the globe are still operated based on conventional power systems to produce power from the kilowatt to gigawatt scale.
The conventional SG cannot fully meet the requirements as it continuously changes with emerging advanced technologies.The need for clean energy has increased globally over the past decade as a result of changing environmental conditions and expanding populations and technology that may impose non-linear dynamics on the SG.The non-linearity in the smart power grid transmission and distribution systems may add new congestion, outages, fluctuation in voltage and frequency, that lead to blackouts as a result of the increasing demand for electricity [7].Non-renewable energy sources though being an easier, quicker, and cheaper path to generate power, they are a direct obstruction to the green environment because of high emissions [8].Renewable energy sources are on the rise to reduce dependency on fossil fuel-based Fig. 1 Countries contributing to SG world market [6] power generation [9].However, the uncertainty and complexity of SGs are increasing with the addition of more distributed generation (DG), increased market size, and renewable sources [10].
Again, the existing SGs are not yet sustainable in long-term power generation and distribution, because of the lack of absolute compatibility between grid components [11], programmable sensors deployment [12], fast real-time monitoring, analysis and decision making with minimized latency [13], and integrating maximum intermittent generation [14].To make a sustainable SG operation, researchers are interested in formulating the next-generation smart grid (NGSG).An NGSG will have the ability to address the above shortcomings through the integration of advanced DDTs, blockchain technology, and other edge computing techniques based on collecting and analyzing conventional SG data.As the datasets are getting massive because of increasing complexity in the SG systems, a better storage system with secured highspeed data transfer system may also need to be integrated in an NGSG, where the data storage should be encrypted with blockchain technology and managed with advanced data management algorithms.
Further, the preservation of data privacy and data security also needs advancement in the conventional SG domain, where the security of massive amounts of datasets in the NGSG domain will be handled with next-generation blockchain technology and data techniques [15,16].Additionally, an NGSG may also have several extended features including interoperability, less transmission loss, decreased latency, large sources handling capability, grid mobility, ease of renovation, and advanced resilience, all features that are quite dependent on the adaptation of data-driven technology.
Thus, it can be concluded that an NGSG is the improved version of the existing SG which enables some extended features to work on minimizing the shortcomings of the conventional SG.It can be an automated grid driven by data where the control operation, energy management, condition monitoring, forecasting, fraud characterization, energy transaction and its security may be done in an improved manner on the basis of collecting and analyzing data, and implementing advanced datadriven techniques.A comparative study between the conventional SG and an NGSG is reported in Table 1 in terms of operation and technologies.From Table 1, it can be seen that the use of highly computationally efficient DDTs, edge computing devices, next-generation blockchain technology, advanced interoperability, and agentoriented techniques in the NGSG framework makes explicit differences between the conventional SG and the NGSG.The purpose of considering these technologies is to ensure sustainable energy evolution in the NGSG.Thus, it can be stated that the framework of NGSG focuses on sustainable energy technologies.
An NGSG may be largely dependent on the use of DDTs to achieve sustainable energy evolution worldwide.Sustainable evolution refers to the integration of DDTs in data analysis from datasets of multiple decentralized RESs and energy storage systems (ESSs), enabling internet of things (IoT) devices, load forecasting, energy trading, security systems, grid faults, and losses.The ongoing research in the SG domain states that DDTs have been successfully implemented in characterizing grid faults and energy trading.However, it may impose new challenges in terms of security constraints as the energy demand increases, as well as gradually increased cyber security threats around the world.The solution to these challenges requires a revision in the SG structure based on enabling data-driven modeling and planning.The primary benefit of the data-driven NGSG is the availability of faster and more reliable operation and more accurate data that authorize the use of advanced DDTs towards enabling efficient and sustained electricity flow from generation to distribution.Additionally, increased management and monitoring capabilities across the entire power system, as well as more affordable, adaptable, and effective operation, are presented by revolutionary developments in data-driven analysis models and algorithms, mostly inspired by advanced data science.
From the critical surveys addressed in Table 2, it can be seen that there exists much scope for and many applications of DDTs in the SG domain.The purpose of DDTs is to enable advanced features towards securing the sustainable operation for energy evolution from the NGSG, as the absence of these features may hinder the scalability, availability, security, and other issues in the SG.Many of them show the additional challenges that may arise while implementing DDTs in an NGSG.At present, there are many loop-holes in SG systems and it is necessary to study these drawbacks to remove them by improving the present SG technology.The main contributions of this study are: • Studying conventional SG features A study of the technical features of a conventional SG is done to explore improvement potential.Also, some current SG projects around the world with their capacities are studied.• Developing a technical framework for a data-driven NGSG First, a technical framework for an NGSG is developed by integrating new advanced technical features into the SG domain.A study is then pre-sented on the development of a data-driven NGSG along with the necessary analytics required to be performed before the implementation of DDTs.

• Investigating the scope of data-driven techniques in
the NGSG This study also explores the possible prospects of DDTs in an NGSG and discusses the adaptation challenges of data-driven NGSGs in reality.• Exploring the role of DDTs in sustainable energy evolution A brief discussion about the trends of DDTs towards obtaining sustainable energy evolution from an NGSG is also incorporated in this study to highlight the significance of data-driven SG modeling.

Smart grid at present: technical architecture
An SG enables bidirectional flow of electricity between the utility and its end users, with its smart framework structured by combining information, power technologies, and telecommunication with the prevailing electricity system.This energy technology also supports automation mechanization for efficient power distribution, storage elements, fault detection, electric vehicles, grid data supervision, combination of hybrid RESs, and flexibility of grid networks [23].The various components shown in Fig. 2 can be used to build the SG energy technology.They include renewable sources, a smart supervision system, a smart information system, an advanced storage system, a smart security system, sensors, and grid-lines.

Smart distributed generation sources
An SG uses a "smart distributed generation" unit which refers to the process of producing electricity efficiently in small-scale implementations close to the place of consumer usage.The primary technologies for SG application are RESs in addition to ESSs.It offers excellent prospects for controlling frequency and voltage deviations, responding to emergency situations when the load exceeds the generation, and decarbonizing targeted areas.Plug-in hybrid electric vehicles (PHEVs) have the potential to reduce emissions while also lowering transportation costs [24].The potential of PHEVs to integrate onboard energy storage devices with the power grid can increase grid efficiency and dependability.The power grid can also increase its acceptance of intermittent renewable energy generation with the sole use of energy storage devices like battery ESSs.To achieve this, effective coordination among ESSs, the grid, and renewable generation units is needed [25].A crucial prototype for power generation is the DG units that have improved reliability and power quality, and can lower system capacity margin.Executing DGs in practice may be difficult for several reasons including: (1) large fluctuation in terms of availability of RESs; (2) very different generation and demand patterns; and (3) higher execution costs of DGs than the conventional power plants [26].The development of DG units has also introduced the idea of a virtual power plant (VPP) that collects capacities of diverse DERs to increase electricity generation.In a VPP, a controller controls a large group of DGs, and thus, VPPs provide more efficiency and flexibility, and can handle fluctuations better than conventional power plants.However, VPPs require complex optimization, secure communication and intelligent control [27].
An SG consists of many DG units, and therefore electricity generation flexibility increases while the flow control becomes complex There are two domestic electricity distribution systems, i.e., (1) AC (Alternating Current) power distribution; and (2) DC (Direct Current) power dispatch [28].The DC power distribution is more practicable because it makes domestic power distribution well organized and easier to control.Several technologies including microgrid and vehicles to grid (V2G), have emerged to distribute DC power.The microgrid can generate electricity of low voltage, even if it is islanded from the main grid.In the islanded mode, the users do not get electricity from any external sources.Microgrid disentangles execution of SG functions, e.g., better dependability, significant renewable energy penetration, self-healing, and effective load control systems [29].V2G usually enables getting power from stored electricity like vehicles running in battery packs.It enables a novel method of storing and delivering electrical energy and enhances power quality by providing electrical energy stored in PHEV batteries to the grid during peak hours.

Smart metering, measurement and monitoring
Any information technology that is concerned with distributed automation, such as data exchange compatibility and combination with current and future devices or systems should be addressed in SG technology.As a result, in the framework of an SG, a smart information subsystem is employed to enable information production, simulation, analysis, integration, and optimization.
Smart metering technology is the most vital means of obtaining information from consumers.The advanced metering infrastructure (AMI) uses automatic meter reading (AMR) technology to logically fit with an SG.The AMR system works on automatic data gathering, diagnostics, as well as collecting data from smart metering devices and sending data to the main database for accounting, troubleshooting, and analysis.In contrast to a typical AMR, AMI allows bidirectional communication with the meters [30].The advantage of advanced smart metering is that the consumers can predict their approximate bills and manage the power usage to lower bills.It is also beneficial for utilities because smart metering enables real-time pricing [31].
Again, measuring and monitoring the system's current status at various places are essential for the smooth operation of the SG system.The topology of phasor measurement units (PMUs) and sensors is important for advanced monitoring.The status of an electrical grid is measured by PMUs to be used to analyze system health.A high number of PMUs as well as the capability to compare the measurements taken from the grid can enable use of the collected data to track the state of the power system and rapidly respond to system circumstances.The existing frequency monitoring network system architecture is designed to handle large amounts of data flows, processing, storage, and usage [32].Accordingly, the sensor networks provide practicable and low cost sensing as well as communication media for distant monitoring and identification of the system.

Smart management of information
An SG can manage big datasets efficiently by extracting the most efficient information and rejecting the false data.Data management is a process of examining, evaluating, integrating, and optimizing data obtained from a large network of data-gathering devices.The goal of data modeling is to make information interchangeable among multiple devices that are standard for diverse working environments and conditions.Data modeling is required for device forward and backward compatibilities, which means that the device is compatible with its previous and future versions.The goal of information integration is to combine data from several sources with distinct theoretical, contextual, and graphical representations.Information optimization is a technique for increasing the effectiveness of information.Singular value decomposition analysis is used to investigate the coupling architecture of an energy grid in order to uncover chances for lowering network traffic by determining which data must be exchanged between portions of the infrastructure to implement a control action [33].

Smart data transmission
Modern technology has enabled the availability of various commutation modules suited for SG systems.It is complicated to choose a suitable model as SGs tend to have different preferences for data transmission.However, the data transmission system of an SG must be of the utmost high quality to support quality of service.The data transmitted should be accurate, secure, complete, and private.Wireless data transmission uses radio waves to transmit signals and data.Wireless data transmission holds several advantages compared to wired data transmission, including remote access, low maintenance and installation cost, high-speed data transfer, etc.The wireless data transmission category is subdivided into four subcategories described in the following sub-sections [34].
1. Wireless mesh system A wireless mesh system follows the method of mesh topology.A mesh ensures that all the data transmission modules are interconnected.The modules form nodes and gateways.In a particular area, a wireless mesh system will provide a very cost-effective communication system that needs little to no mobility.This data transmission system is highly reliable for communication.This wireless data transmission method is suitable for remote places where complications arise from other data transmission methods.FSO data transmission is highly feasible in urban destinations whereas microwave communications face blockades in particular places [34].

Smart supervision/regulation technology
Supervision of an SG is essential for high-performance output and efficient management of all the subsystems.The flow of energy and information, being bidirectional, needs to be handled by ensuring the completion of various supervision objectives.An SG is easier to manage than typical power grids.This is mainly because SG enables bidirectional flows of electricity and information.The active participation of electricity customers is also the main feature of an SG.Supervision of an SG can be done based on electricity demand, rather than supply.The objectives of SG supervision and management may include ensuring maximum efficiency, enhancing power production, easy monitoring, and analysis, control of emissions, waste management, gaining maximum profit, etc. Reference [37] proposes an optimized control technique from analyzing the profiles of a large group of customers to shave off energy consumption, while [38] presents a pricing method to incentivize customers.
To properly manage the supervision objectives of an SG, different methods have been adopted ranging from game theory to machine learning.Optimization of an SG using both convex and dynamic programming is proposed in [39], while another technique for optimization, swarm intelligence, shows promising performance in the field of energy distribution resources optimization, which has no dimensional limitation.Data gathered from an SG using sensors and PMUs may also be used to predict the behavior of the SG system through properly developed machine learning algorithms.

Smart security system
Avoiding cyber security breaches is imperative to ensure the security of an SG.A smart security system protects the information of an SG and increases the integrity of privacy.The security of an SG can be maintained in three categories: solidity, failure detection and protection.The solidity of a system promises that the system can perform consistent behavior in various situations and changed working conditions.The integration of locally generated power can ensure fewer future failures, including both electrical and mechanical failures [40].Again, an SG has integrated failure detection methods that can detect failures when they occur.This also helps to diagnose and recover from failures in an SG.The failures can be branched out to various faults which occur in an SG, and their protection using digital methods.All the abovementioned features pave the way to feasibly adapt an SG in the real-time environment.Some countries have already implemented SG projects, as shown in Table 3.

Add-on technology towards the next-generation smart grid
NGSGs have the possibility of enabling enhanced features in the SG landscape as compared to conventional SG technologies.The security and privacy issues of the current SG systems may be better covered by an NGSG in the context of integrating more advanced features.The advance of an NGSG entirely depends on the use of data-driven techniques in its different parts.A conceptual framework of an NGSG is illustrated in Fig. 3. From Fig. 3, it can be seen that the framework of an NGSG may consist of integrating edge computing devices, IoT enabled inverters, blockchain-based energy trading, and computationally efficient DDTs in monitoring, controlling, and forecasting.It can also be noted that a data center may appear in an NGSG to collect data from the interconnected technologies and share the data among them to ensure its interoperability.By applying DDTs, the collected data from the different sources can be analyzed intelligently to help make decisions towards sustainable energy evolution.The detailed explanation of the intelligent technologies used in an NGSG framework can be found in the following sub-sections.

Intelligent agent-based modeling of energy sources
To digitize the energy generation process in an NGSG, there has been a significant rise in the use of agent-oriented software.An NGSG can be modularized by assigning the data-driven autonomous software that may virtually control the individual components of an NGSG and convert the centralized SG technology to a scalable and adaptable decentralized technology.For a multi-agent system, the complete NGSG is not needed to be recognized at any single point of a node, while the individual components can work towards predefined goals to achieve optimized performance, where the agent-backed components can interact with the system as well as each other [57].However, the characteristics of an agent depend on the goal, which can be assigned to be cooperative or competitive with the other agent's characteristics.The aggregated characteristics of the agents may be able to determine the generation characteristics of the whole NGSG system.These characteristics of agents can be tweaked and redefined for better optimization of an NGSG.Then the failure of components in an SG does not result in total system failure because the individual agent works automatically with the initial knowledge it possesses.

Intelligent agent-oriented energy conversion unit integration
Agent-based energy models can be used to optimize energy conversion to minimize loss and maximize output.A conventional SG is designed to convert energy on different levels.However, in each step of conversion, energy loss may occur.Analyzed data gathered from such energy conversion systems can be used to construct agent software for efficient energy conversion.At the time of converting energy from one state to another, sometimes losses occur in the form of energy other than the required output energy.Such losses can be reverted by reusing the excess energy through efficient agent-based conversion.The synchronization between different conversion devices may have to be done using a multi-agent system for both monitoring and integrating the devices into the main power system [58].

Edge computing for energy data management
To eradicate the issues from IoT in conventional SG, edge computing (EC) is a technology of great significance for an NGSG.The IoT approach for SG collects massive datasets that are difficult to process because the cloud servers are situated in a distant geographic area.The networking system is stressed when raw data collected from IoT devices are transmitted to the cloud because of the increases in latency and reaction time.The data collected from an SG may contain private data, and as the data are sent to a third-party cloud server it may pose the risk of privacy breach.The EC solution shows huge potential to remove these problems which are presented by typical IoT systems.It takes the data close to the collection point where they are to be processed [59].Another perk of EC is that it can reduce the network load to a great extent by shrinking the volume of transmitted data.This creates a low-latency high-response network system essential for the forthcoming SG systems.EC creates a hierarchical architecture, as shown in Fig. 4. The architecture consists of multiple processing layers where all the IoT devices in the SG are located.In EC, some data processing tasks are shifted from the clouds to the multiple layers.The processing is done at lower-level layer, unless it needs more computation than that when it is offloaded to the higher layers.For example, some embedded IoT devices can perform small prepossessing like noise filtration.However, the computation capabilities of these devices are limited, and sometimes they cannot fully process the data.Thus, the data are sent for processing to a higher layer with more computational power and gateways.The gateways in EC provide local computation work with the IoT devices parallel to their conventional work.Different data-driven models may have to be used to process data in the EC structure, such as a prediction algorithm based on reinforcement learning for energy price estimation and home scheduling [60], and the heuristic evolutionary model for advanced demand side management by load shifting, a model which aims to reduce peak load and cost in the SG domain [61].

Interoperability between multiple energy hubs
The connectivity between different energy components in an NGSG will play a vital role in sustainable energy evolution.Market interoperability also needs to be explored to achieve overall connected operations over the entire system.A system's interoperability refers to its capacity to collaborate with other systems in order to share resources [62].The multiple levels of interoperability in an NGSG can be divided into the following segments: Fig. 4 Hierarchical architecture of edge computing consisting of multiple processing layers 1. User interoperability to ensure that there are options for customers to choose among various commercial and technological options.

Commercial interoperability to ensure that value can
flow to where it is needed.Driven by market forces, it is important to confirm that incentives are matched across the energy system.3. Interoperability of data to ease portability and data sharing between the components of energy sources, consumers and suppliers.4. Equipment interoperability to ensure that the equipment is replaceable or exchangeable when there are changing demands, to allow energy consumers to make intelligent and informed choices. 5. Vector interoperability to make sure that timely coordination takes place and that energy provisions across various components of the energy system are compatible with each other.

Internet-based inverter control technology
Intelligent data-driven inverter technology plays a significant role in the root-level controlling of an NGSG by ensuring the mutual connection between generators and loads.These smart inverters have the capacity to connect with IoT devices with more embedded intelligent datadriven software.This emerging technology ensures the devices perform more intelligently in relation to quick response, effective fault diagnosis, automated maintenance, etc. [63].The inverters in an NGSG will work autonomously without intervention and take a sophisticated step towards the control of power conversion.Smart inverters will be aware of their adjacent environment and guarantee quick adaptation to sudden changes in the context of an SG.They will also have the ability to learn from the accumulated data to enhance future adaptability and control management.

Self-healing grid enabled by agent-based control
The most crucial traits of an SG include self-healing capacity in the presence of unexpected conditions.When defects are found, the power system networks may have the ability to automatically restore the information.Although it is inevitable to have defects and disruptions in power systems, the potential dangers mainly depend on the fault magnitude, nature, duration, and location.The integration of sensors, self-operating sophisticated controllers, and cutting-edge software tools make up the agent-based self-healing grid.It will use real-time data to locate and isolate issues, restructure the system and reduce the number of impacted consumers.To attain the control of self-healing under faulty conditions, an agentoriented control technique based on optimization is required for the SG domain which will mitigate the effect of over-voltage by enabling the automatic restoration of the sound condition of the power network [64].In terms of the multi-agent control systems, fuzzy logic is used to make decisions.

Agent-based holonic approach on the demand side
To balance the demand and supply sides of an NGSG, multi-agent-based holarchies consisting of various abstraction layers of the distribution grid may have to be proposed as a holonic approach [65].The holon concept may be applied as a holonic multi-agent approach to manage the information technology-based infrastructure of NGSG.This leads the path to efficient data transfer and robust communication security.
4 Data-driven next-generation smart grid

Critical steps for data-driven NGSG development
The framework of a data-driven NGSG may depend on the forming of the critical steps as shown in Fig. 5, which demonstrates how a data-driven NGSG solves critical issues and develops the final model for a data-driven NGSG.The bottom of the pyramid is the first step and the top is the last step of the process.Every step in developing the NGSG framework shown in Fig. 5 is discussed in detail in the following sub-sections.

Identifying problems
First and foremost, the SG power system needs to be thoroughly studied to understand the issues to be solved for system sustainability.Understanding the problem plays an important role in data management modeling.The SG power system may produce a large number of datasets that can be analyzed using different data science tools.Most of the data may prove irrelevant when coming to the goal of data science modeling, and thus, datasets related to the problem to be analyzed are of the most significance [66].Intensive studies of the power system incentivize data collection, as it simplifies understanding of the type of data that is needed for further analyzing the data algorithms.

Data requirement and data collection
Data science methods need a huge amount of data to properly analyze certain system characteristics.The more data are available from a system, the easier it is to generate the final output.Data from an SG can be generated by enabling smart meters, sensors, and PMUs.Automation in data collection is an important aspect in the sustainable and robust modeling of the data science method [66].The required data can be found in the first step, "Identifying problems".Additional datasets, such as the power system configuration, voltage and current levels, transformer and generator information, security system, load flow, etc. may need to be added to improve the data science modeling.

Comprehension of data
The data should be studied after collection, and categorized based on the different characteristics of the system.The accuracy of data measured or collected should be high because data science methods require accurate data for smooth analysis.If the measured datasets are far from their actual values, the final output from the data algorithms may not be satisfactory [66].The comprehension of data will significantly enhance the process of acquiring data as well as understanding which data are needed most for the system model.Nonetheless, several characteristics of data such as data type, data quantity, data accessibility, data features, the combination of multiple datasets, previous datasets, etc., should be given attention for better data-driven NGSG modeling.

Exploration and pre-processing of data
Exploration of data involves the analysis of a dataset to summarize its key aspects.Data are explored at first to understand the essence of data towards assessing the quality and characteristics of the data.Various statistical representations can be used to process these datasets with different points of interest.This helps to understand initial trends and attributes of the data.The quality of data may be further enhanced by using various pre-processing methods consisting of noise reduction, finding missing data, smart labeling of data, data filtration, and data formatting.One of the main goals of data preprocessing is to solidify the quality of data by correcting, reformatting, and combining datasets [66].Some of the processes for enriching the available data include data cleansing, data transformation, finding missing values, unbalanced data handling, bias issues handling, distribution of data, detecting anomalies in data, etc.

Data modeling and evaluation
Different forms of data-driven and machine-learning models should be chosen for data analysis with the best fitting of the data according to the type of analytics.The typical process for separating data into training data and test data is either done by dividing the available datasets into a ratio of 8:2 or using the k-fold method for data splitting.To maximize model performance, it is necessary to split data and observe [67].To test model performance, several model validation and assessment benchmarks can be used.These can help data scientists choose or build the learning method or model.These benchmarks include true positive, true negative, false positive, false negative, error rate, accuracy, precision, recall, receiver operating characteristic analysis, f-score, applicability analysis, etc.In addition, researchers may use sophisticated analytics, which may include feature selection and extraction, feature engineering, tuning algorithms, ensemble methods, modification of existing models, etc. to improve the final data-driven model for smart decision-making to handle specific system problems.

Final product and data automation
The final product is the outcome of the system after processing and analyzing all the data.It can be a recommendation, a comprehension, or a forecast.The obtained data product is used to make the best decision on various problems.In practical application, several data products have made considerable contributions to make the system intelligent and self-activated [67].In the case of energy trade, information gained through data analysis, such as churn prediction and customer segmentation, can be used to make smart decisions towards sustainable energy trade.Finally, the whole process of collection, comprehension, processing, and modeling data should be run through an automated algorithm system, thus eliminating the need for manual handling and ultimately reducing data processing time and increasing efficiency.

Data-driven techniques used in NGSGs
Properly designed data-driven techniques can have the ability to make the updated version of an SG and solve the existing problems related to insufficient, incorrect, and unreliable data.These techniques consist of different types of algorithms, which are broadly divided into three categories: supervised, semi-supervised, and unsupervised [67][68][69][70][71][72][73][74][75][76][77][78][79][80][81].A summary of various data-driven algorithms used in SG for executing and improving different functions is reported in "Appendix 1.1".However, the theory behind the development of DDTs can also be split into numerous categories, as discussed in the following sub-sections.

Bayes concept-based learning technique
As a practical data-driven technique, the theory of the Bayes concept establishes the connection between the model and the dataset.Deep learning-driven processes adhere to the Bayesian framework, and its methods exist to measure uncertainty.The Bayesian approaches can be applied to forecast net load in NGSG systems, while a deep long-short-term memory (LSTM) and Bayesian theory can be combined to anticipate the aggregated load in SG systems.A recurrent neural network (RNN) with memory cells which can store important information for a long time can perform effectively for the loads based on long-term reliance, significant volatility, and unpredictability.Conversely, completely Bayesian inference can be used to pick models for both evidence-based and predictive frameworks.The models for both frameworks can be chosen using fully Bayesian inference.Several studies have shown that the predictive approach, which displays data overfitting, does not perform as well as the evidence framework in this area [82].

Probabilistic learning technique
The probabilistic learning concept for smart energy systems includes binary and Bernoulli, univariate Gaussian, and multinomial and categorical distributions.The binomial distribution expresses the probability of a certain value among one or more independent values for a given set of parameters.The probability distribution of the intelligent power system has been significantly influenced by the binary and the Bernoulli distribution model.For plug-in electric vehicles, several methods have been developed to ascertain the probability distribution for their charging patterns at various periods of usage [83].
To reduce uncertainty and volatility in power systems, most studies have recently embraced grey Bernoulli approaches, and as a result, prediction now takes less functional data and research, especially when predicting long-term development.

Common univariate distribution technique
Probability studies typically address common distributions individually when it comes to the data-driven process, e.g., Student-t-, Gamma, Cauchy, and Beta distributions, Laplace irradiance, etc.The Cauchy distribution is heavily used in the analysis of power system harmonics, estimation of wind power uncertainty, prediction-based models, and real-time dispatch of wind-based power plants.The Gamma and Weibull distributions are two methods that are widely used to determine wind speed in dispersed generation [83].

Optimized learning technique
Power systems frequently provide diverse optimization strategies for various issues such as non-linearity, sensitive to uncertainty, and large-scale.The constrained [84], bound and blackbox free optimizations [85] are some of the techniques used in the SG domain.There are also first-order and second-order approaches.The first-order optimization approach is widely used in the classification of numerical optimization strategies that use the first-derivative methodology, while the second-order approach, often called the Newton technique, applies the second derivative in a scalar problem.These modifications have a significant impact on the power system's optimal power flow problem [83].Optimal power flow is an optimization tool for running power systems and controlling energy.The linear programming, Karush-Kuhn Tucker conditions, quadratic programming, and estimation of wind power uncertainty can be applied to SG systems in many ways, including but not limited to power generation planning, power system expansion, advanced energy systems, power flow analysis modeling and heuristic methodologies, threats, unpredictability measures, and demand response.

Key analytics to adopt data-driven techniques in NGSG
Numerous processes of analytics shown in Fig. 6 can significantly aid the data-driven techniques used for an SG [86]. Figure 6 shows the process of prescriptive analytics, predictive analytics, decision intelligence, and data mining.

Predictive analytics
Predictive analytics, a form of advanced analytics, uses statistical modeling, past data, data mining methods, machine learning, etc., to forecast future events.In Fig. 6, data are gathered from numerous datasets and analyzed to comprehend the reasons for and results of every occurrence.A pattern is created, and then all the data are evaluated statistically.Finally, predictive analytics predicts the outcome.

Prescriptive analytics
The practice of using data to decide the best action is known as prescriptive analytics.This form of analysis generates recommendations for the next moves by taking into account all the essential aspects.As shown in Fig. 6, prescriptive analytics methods analyze the model and extract knowledge from the data.Then, possible future outcomes are generated.Observing the possible outcomes, prescriptive analytics gives optimized decision to the systems, or devices, or people, on the action they should take.

Data mining
The process of going through massive datasets to uncover patterns and links in order to forecast outcomes by data analysis is known as data mining.The process of data mining from collection and selection of data to acquiring knowledge via processing target data and interpreting patterns is shown in Fig. 6.

Cohort and cluster analytics
Cluster analytics refers to the grouping of similar data into a number of finite clusters.This is a type of behavioral analytics that divides data into clusters before analysis.The clusters carry similar characteristics or experiences over a period of time.Multiple clusters are made when the substance of a group of data varies from another group of data.Being an unsupervised analysis method, cluster analysis does not assure the number of Fig. 6 Variable processes of advanced analytics using data-driven techniques clusters beforehand, while the number of clusters is only revealed after the clustering algorithm is completed.It tends to find a core similarity within data thus separating a group of data based on differences among the groups.

Decision intelligence analytics
Decision intelligence is a trending method that uses datadriven techniques to make a decision based on cause and effect.It uses various models and algorithms of data science assisted by social and managerial sciences.This method is important for designing, modeling, and tuning the decision-making process of power systems.Figure 6 shows that decision intelligence blends multiple decisionmaking methodologies with AI, ML, automation, and relevant information.

Operationalizing and scaling
Operationalizing refers to the materialization of an abstract idea or concept into a measurable form.This method is valuable for collecting data on abstract or unobservable systems, e.g., future power systems, in a systematic way.This can quantify different parameters of an NGSG as such power systems are not yet available in a practical environment and can only be observed as an idea or a simulation.Conversely, the scaling of an NGSG refers to a comparatively small size prototype and is essential for running various operations on a small scale to identify the characteristics of that test before conducting it on a large scale in the original power system.
5 Data-driven techniques in NGSG: prospects and adaptation challenges

DDTs in intelligent energy materials processing
Energy materials production is on the verge of a breakthrough as per the advancement in data-driven technologies for materials research.Significant growth in the field of materials science can be found in [87].The recent improvements in data-driven techniques for materials engineering show that ML innovates intelligent energy materials' production and design process.Additionally, it can be used for measuring the electronic properties of power systems.In using data-driven techniques like ML, the first step is to gather the objectives and set goals to achieve them.This is the most important step as the goals must be specific and achievable from the available datasets or information.The data-driven ML models are useful for enabling a low-cost and reliable approach toward predictions where computational or experimental approaches increase expense.

DDTs in intelligent energy systems component
A smart energy system is made of multiple components for the generation, storage, distribution, and consumption of energy.These aspects of energy systems can all be subjected to data-driven techniques such as ML or artificial intelligence (AI) for the performance improvement of an NGSG.With the enabling of this technology, data gathering from connected devices has provided a better understanding of system characteristics and improvement in various details.The data-driven ML methods have the ability to allow better simulations to construct prediction and forecasting models.The energy storage system of an NGSG should be improved for efficient charging and discharging of the storage devices [88].

DDTs towards intelligent demand-side management
Data-driven ML technologies play a significant role in demand-side management by allowing energy consumers to try out different market mechanisms in practical scenarios.A sophisticated integration of demand side devices, such as solar PV, battery storage, and smart meters, is done through linking with the internet, being associated with ML techniques, and following advance in data collection and data sharing.The concept of "smart homes" is very popular now and the number of smart homes has seen a spike in recent years [89].A Swedish pilot project was done on reducing peak energy usage significantly by implementing data-driven ML techniques in the field of demand response management [90], in which a multi-agent approach offers demand responses in the NGSG by allowing coordination among its components.Energy devices can communicate with the power grid and exchange information by giving access to the dynamic communication system [90].The demand response programs are categorized into two groups, i.e., price-based and incentive-based.Real-time pricing, rate per usage, critical peak pricing, etc., are included in price-based demand response, whereas emergency response, direct load control, ancillary market services, market capacity arrangement, and buyback programs are included in incentive-based demand responses.These categories can be subjected to data-driven management techniques for better demand-side management of an NGSG.

DDTs towards smart manufacturing in NGSG
The fourth industrial revolution has enabled the production and collection of data from connected machines in industry.Data-driven ML techniques can be used to analyze the collected data as an approach to smart manufacturing [91].Some of the various ML models used for smart manufacturing include: (1) support vector machine  (10) additive models.The newer business models also require smart manufacturing.This is enabled by the technical advance of Industry 4.0.In a data-driven smart manufacturing system, the benefits of real-time data analysis, advanced decision-making, better plant efficiency, and increased production may be crucial for NGSG modeling.

DDT in intelligent energy resource planning
Energy forecasting and management is a significant field of interest for energy resource allocation and demandside handling [92].Decision-makers can be assisted by different data-driven decision-making techniques constructed by data experts.These contribute a lot to designing energy plans, choosing optimal decisions, and finding alternatives.Robust energy systems enabled by intelligent planning allow the use of data-driven algorithms to identify market conditions and aid the building of advanced energy devices.The real-time applications of data-driven methods in the field of energy are commonly seen in various energy systems.A key aspect of data-guided techniques is the use of AI to improve NGSG performance [91].The incorporation of the IoT in intelligent energy planning and management is also one of the most significant aspects of data-driven techniques used in the energy industry.The IoT can enable access to remote access and control of an NGSG with a smart tracking system.
Here, smart meters inform consumers about the volume of energy consumption, while local infrastructures like microgrids can be connected to cloud servers to exchange information to enable significantly better load forecasting.

DDTs in integrating the large-scale heterogeneous energy sources
Policy makers have already been focusing on the upscaling of renewable energy.This will affect the energy market.Thus, power grid operators and engineers are putting emphasis on data-driven techniques and models to achieve a seamless transition from fossil fuel to renewable energy.Harnessing energy from renewables on a large scale requires enabling multiple green sources of energy at the same time, which signifies the importance of heterogeneous energy sources.The synchronization of such sources can be guaranteed using data-driven algorithms, including collecting and analyzing data from the sources with specific ML models.For example, solar and wind power plants already generate a huge amount of data which allows data-guided techniques to forecast different levels of energy with the help of sensor integration [93].These energy consumption datasets can be analyzed to predict peak and low demand times, and design the production rate to minimize losses.However, the upscaling of green energy sources also opens a door in an NGSG for cyber attackers.Thus, the security of an NGSG should be ensured by updating the data-driven ML models regularly to increase integrity.

Challenges to implementing DDTs in the NGSG
The development of a data-driven smart grid system toward achieving sustainable energy transition has some challenges from various points of view.In the following sub-sections, a thorough discussion on the challenges during the adaptation of DDTs in the NGSG is conducted.

Engineering point of view
1. Overfitting mechanism When a model tries to forecast a trend in excessively noisy data, overfitting may occur.This is the result of a model that is too complicated and may have a large number of parameters because it does not accurately reflect the reality in the data.A typical data-driven ML network may contain millions of variables.The training data model typically consists of a large number of records.However, even when a network recognizes the training set and gives answers that are hundred percent precise and correct, it may entirely fail when faced with new data.This mechanism is known as overfitting, and is one of the limitations of data-driven techniques [94].2. Installation of intelligent energy processing unit Intelligent processing methods need complex thermochemical operations and multi-component frameworks.These generate a lot of data quickly.The best scenario is for operators to receive rapid data on the properties of energy material manufacture and process parameters in real-time, allowing them to identify novel processes and phenomena more quickly and react effectively and efficiently.Existing techniques, however, provide "postmortem" data yearly after the manufacturing process has ended.
To improve and assess the production process, datadriven ML techniques can be applied [95].However, the existing data-driven techniques may demand a revision in their structure to maintain the energy materials and electric infrastructure at the energy distribution level.

Feasible energy storage material
The enormous amount of background data and the increasing complexity of energy storage systems provide significant hurdles for the current methodologies and algorithms.For greater precision, stability, and efficiency, emerging cutting-edge technologies can address the shortcomings of traditional approaches.First, the development of energy storage encompasses invention and breakthrough, long-term storage, a high amount of protection for electro-chemical backups, and cheap cost.This low-cost technology is also necessary for high efficiency and physical storage.Secondly, research is focused on modeling energy storage and streamlining the procedure in different energy systems, supporting the use of energy storage technologies, and developing innovative structures and thorough evaluations for modernizing and advertising energy storage [96].

Technology point of view
1. Tech advancement Argonne scientists are trying to develop optimization approaches that combine ML and AI to simulate the intricacy of various electrical system challenges much more quickly than the current methodologies.The primary focus is to accelerate load flow analysis and daily computation of the electricity system [97].2. Improved energy efficiency Future difficulties in sustainable 5G and 6G power management hold significant potential for data-driven methods.For the cost-effective design and optimization of network operations, data-driven ML approaches, like federated learning, deep learning, and optimization may be considered.By gaining flexible network structure and altering traffic conditions, it is possible to construct 5G or 6G air interfaces.Using a variety of 5G and 6G technologies, including SG, intelligent transmission and distribution of network lines, smart buildings, and industrial automation, data-driven ML will be more widespread and crucial than simply conserving energy.On the other hand, these approaches typically require coordination and computing, which can pose significant challenges for the design and implementation of power-efficient data-driven techniques and for upcoming 5G and 6G networks [83].for demand response are designed to monitor and use real-time data on energy consumption to offer energy pricing for thousands of customers via the utility power grid.Customers can adjust their energy consumption in response to grid conditions and the rates.By assisting end-users to think about how they need power grid improvements, ongoing growth can increase reliability, cost-effectiveness, and sustainability.The prospective integration of renewable energy directly into the power grid will be encouraged by the corresponding knowledge and such resilience [99].

Economic challenges
The energy storage industry is now facing difficulties in several countries, including weak legislative support, high price, doubt in value, unsound business practices, etc.In the coming years, it will be crucial because of two factors: first, the suggestion of substitutes to the energy storage plan including power generators and electrical firms; and second, the development of a suitable business competitive structure and arrangement of sufficient funding schemes for fresh data-driven advanced technologies [100].According to Woori's forecast, the cost of energy storage will increase globally by 26 percent in a year.Although there are various market variables for energy storage, the primary obstacles continue to be high costs, poor subsidy programs, a median cost configuration, and lack of a business prototype.

Trained consumers Many companies have to deal
with the challenges of training their consumers on how to use cutting-edge technologies.The same task may be required of data engineers.Investors, developers, and managers overestimate the existing capabilities of data-driven techniques, while anticipating that the algorithms will comprehend difficult issues with ease and make reliable predictions.3. Lack of expert manpower Even though the market for data-driven methodologies is attractive to many people and the energy sector, to further develop this research field it requires more expert manpower.
In the energy sector, power utilities face challenges when innovating new technologies because of a lack of skilled employees.

Trends of DDT towards sustainable energy evolution in NGSG
Utility firms may allocate resources more effectively, reduce costs, and find better ways to serve customers with the help of the proper analytical platform.Additionally, the appropriate data analytic platform enables them to maximize the value of the generated data.This can help the sustainable energy evolution through the improvement of the following aspects in an NGSG.

Securing reliable control operation
The most crucial aspect of SG energy systems is their ability to operate securely and reliably.The SG has already benefited from the involvement of data driven techniques in terms of stability, security, and dependability [90].It is well recognized for providing timely and efficient stability analysis which claims the implementation of automatic control.The use of data-driven techniques, like machine learning, reinforcement learning, and deep learning in stability and control analysis has been the subject of extensive research in recent decades, as shown in Table 4.It is realized that the implementation of the data-driven techniques in an NGSG may offer a reliable solution to address the control issues in terms of frequency, voltage, preventive and restorative measures, and enable a sustainable energy evolution through the reduction of CO 2 emission in the environment.

Definitive energy management
Energy management is associated with the control, planning, and monitoring of energy-related processes to conserve energy resources, reduce energy costs, and safeguard the environment by minimizing CO 2 emissions.Energy management through advanced data-driven methodologies has already started in SGs as shown in Table 5.The advantage of using the advanced methods is the ability to perform work in less time, while offering a realistic solution to manage energy over a small amount of data.This is done through enabling DDTs in SG planning and management, including grid synchronization, active and reactive power management, ancillary services, and techno-economic modeling.From Table 5, it may be predicted that the emergence of DDTs in an NGSG also paves the way to contributing to sustainable energy evolution [112].

Precise asset condition monitoring
Old assets are a prime cause for uncertainty in load and demand management, affecting optimal operation and the overall health of the NGSG.Thus, constant monitoring of all the assets of an NGSG is needed to reduce the risk of equipment failure [120].Obsolete technologies are also to be replaced with advanced technologies.Various data-guided methods, as shown in Table 6, can be prime examples of asset monitoring systems where data taken from an NGSG are analyzed to understand the asset conditions.

Accurate fault prediction and characterization
Traditional fault detection algorithms, like impedance based and wave-based techniques, cannot adjust with the penetration of distributed renewable power generation [130].On the other hand, AI-based data-driven approaches can bypass challenging modeling and fault mechanism analysis.A fault classification approach based on a data-driven CNN fed with features retrieved by the Hilbert-Huang Transform (HHT) in power distribution systems is proposed in [131].This approach performs admirably in fault classification thanks to the CNN's strong feature learning capabilities.Another data-driven Graph Convolutional Networks (GCN)-based method for addressing fault location is suggested in [132].It keeps the spatial information of buses in the GCN structure, which allows improved fault detection accuracy.
To achieve fault detection and location, the voltage and frequency signals are used, respectively.Additionally, a fault contour map that groups the buses into several tiers based on the severity of the impacts is provided.A short summary on the recent progress of data-driven techniques for precise SG fault characterization, detection, and location identification is shown in Table 7.It is seen that the data-driven approaches can satisfactorily perform fault diagnosis, though their performances may suffer because of a lack of sufficient data.By developing the data-driven NGSG infrastructure, data can be gathered from various sources and then combined and used to increase the precision of defect diagnosis.This can help improve sustainable energy evolution.

Accurate forecasting and uncertainty estimation
The increasing integration of RESs, such as tidal, solar, wind, etc., demands more effort to schedule and operate an SG.Load forecasting (LF) is a crucial component for planning and running modern power systems since it helps to preserve stability, and keep the environment

Problems and challenges
Wind turbine ANN, Bayesian network, Support vector regression, RF, KNN [127] Data on vibration taken from wind turbine is combined with data acquired from supervisory control and data acquisition systems (SCADA) which is analyzed using machine learning methods building a condition monitoring system Successful Able to develop individual prediction using historical data samples Needs high computational power Bidirectional gated recurrent unit (BiGRU), CNN [128] CNN and BiGRU methods are used on data acquired from supervisory control and data acquisition systems (SCADA) for condition monitoring Successful Quick response while using very little memory Slight inaccuracies due to quick processing of data Deep convolutional generative adversarial networks (DCGAN) [129] A health condition monitoring (HCM) system for wind turbine using DCGAN Successful Generates high quality artificial data which further enhances the training sequence Requires large quantity of data and is also difficult to train safe by reducing CO 2 .Faultless load forecasting is useful for decreasing production costs, as it enables reducing utility risks by predicting future consumption of products that the utility will transport or deliver.However, it is highly challenging as the load is stochastic in nature [161].Conventional forecasting models frequently do not disclose the degree of uncertainty in their forecasting, which can result in expensive and dangerous choices, and compromise attempts to develop dependable SG systems [162].Before digging into the data-driven deep learning approaches of load forecasting, it is essential to categorize load forecasting techniques.The objective of shorttime load forecasting (STLF) is to measure the load over a few weeks starting at one hour [163].STLF is essential for the generation, transmission, and distribution of SG power.The data-driven techniques in Table 8 are used for improving STLF.The methods of Table 9 are used for analyzing the data for very-short-time load forecasting (VSTLF).For longer periods, such as medium-time load forecasting (MTLF) and long-time load forecasting (LTLF), the techniques shown in Tables 10 and 11 are used, respectively.It can be shown that the data-driven method can provide accurate forecasting for the NGSG model, and can also conveniently improve the possibility of achieving sustainable energy evolution.

Precise fraud characterization
Electricity utilities must deal with non-technical losses incurred by fraud and theft committed by their customers or third parties.Certain approaches have been developed to developed to detect potential scammers among consumers and third-party interference as listed in Table 12.Many data analysis-based approaches are taken toward detecting and diminishing fraud.Table 12 shows that fraud characterization may become more accurate and convenient by enabling the data-driven SG model.This can create a reliable security layer in diminishing Similar day identification and selection based on reinforcement learning on BPNN RBN, MI-ANN, Genetic Wind Driven Optimization (GWDO) [173] Load forecasting for linear and non-linear power systems Singular Spectrum Analysis(SSA), Fuzzy ARTMAP, Neuro-fuzzy, BP [174] Reducing the cost of computational energy and data requirements Ensemble Empirical Mode Decomposition(EEMD), Multivariable Linear Regression(MLR) [175] Analyzing large datasets for electric load Kalman Filtering, Clustering techniques, Weightless Neural Network (WNN) [176] Use of different clustering techniques to cluster load forecasting data ELM, Genetic Algorithm, Support Vector Machine, XBoost, decision Tree [177] Tuning hyper parameter and extracting features for load forecasting 2019 BP, LSTM, CNN [178] Using LTSM and CNN for coupling electric load XG-Boost, Decision Tree, Support Vector Regression (SVR) BP, CNN [179] Predicting load forecasting within the price of electricity WaveNet, CNN, BP, LSTM [180] Improving performance of different error detection of load forecasting LSTM, Ensemble learning, Quantile forecasting, Quantile method, ENN, Parallel computing [181] Diminishing the need for feature extraction in load forecasting Error reduction for unsupervised load forecasting Dropout technique, Fuzzy logic, CNN [183] Feature extraction improvement with high accuracy and the over-fitting issue resolved WaveNet, CNN, BP, LSTM [180] Improving performance of different error detection of load forecasting 2018 SVR, Auto Encoders, Denoising Autoencoders [161] Achieving high features of load forecasting from lower-level datasets and characterizing the fraud.This itself may accelerate the sustainable energy evolution process.

Safe energy trading (blockchain)
The highest priorities of every system are security, privacy, and trust.In the same vein, the upcoming SG should have a good level of security, including: 1) ensuring that an unauthorized third party cannot acquire any information; (2) ensuring established cryptographic techniques; (3) preventing information changes from unauthorized entities; (4) denying access without permission; and (5) ensuring authorized access to those with rights and privileges.Reference [229] presents a revolutionary consensus technique that makes Bitcoin the most popular application of blockchain to date, resolving the issue of creating trust in a distributed system.Additional approaches are also being used, including cryptographically secured data structures, digital signatures, time stamps, and incentive schemes.The majority of current solutions are based on centralized models.To make decentralized energy trading, blockchain technology has emerged and successfully trades energy among consumers, prosumers, and suppliers.Although these technologies are mature and functioning properly, the existing blockchain-enabled SG system has a number of problems, including consumer priority, security, and time consumption.Table 13 indicates the blockchain-based techniques and algorithms for safe energy trading.It is concluded that implementing DDTs in an NGSG can drive the world to sustainable energy evolution.

Future research directions
All the research conducted on DDTs and their results for various aspects of SG highlight the significance of methods to achieve sustainability as a whole for an NGSG.
Reliable control operations powered by data-driven technologies may cover all the control problems of a future SG.The management models used in an SG can be improved by increasing computational capability to analyze large datasets simultaneously.This improvement can also ensure even lower carbon emissions and energy consumption, ultimately aiding the goal of sustainability.The next-generation blockchain enabled trading eradicates the chance of energy theft by keeping decentralized records of all the simultaneous energy transactions happening in a certain time frame.Further, the advancement Over-fitting issue reduction and computational time reduction using CNN, KPCA, MI etc.
LTSM, Bayesian deep learning, Bayesian Theory [82] Probabilistic-residential load forecasting for PV systems 2019 BPNN, Bayesian Regularization, Levenberg-Marquardt algorithm [190] Load forecasting for individual district buildings DBN, BP, Phase Space, Reconstruction PSR, Levenberg-Marquardt algorithm [191] Predicting load forecasting of bus-load forecasting and distributed energy penetration KNN-ANN, FFNN, Euclidean theory [192] Load forecasting for hydro-thermal unit generation combining ANN and KNN 2018 Neuro-fuzzy, ANFIS, Genetic algorithm, Particle Swarm Optimization [193] Decreasing execution or training time as well as reducing feature selection complexity in data-enabled asset monitoring can confirm a robust energy grid by eliminating the chance of component failure, improving NGSG integrity, and prolonging its lifetime.However, the development of a techno-economic model for a data-driven NGSG system in terms of operational cost, time consumption, manufacturing cost, and computational efficiency imposes additional challenges which open the following research platforms for further improvement.
1. Increased robustness in techniques Future SGs based on multiple renewable energy sources will need to depend on data techniques that satisfy multidisciplinary constraints as the system complexity is increas-  [194] Hourly energy demand prediction of a municipality 1. Over-fitting issue 2. Systems precision iii.Huge calculation time SARIMA (seasonal auto-regressive integrated moving average) and ES (Exponential Smoothing) [195] Predicts yearly consumption of electricity for the agriculture sector ISSA-SVM (improved sparrow search algorithm-Support Vector Machine) [196] Error index of load forecasting is kept optimal which results in better prediction accuracy 2021 LSTM network [197] Load forecasting with minimal error for industrial power consumption Support Vector Regression (SVR) [198] Mean absolute percentage error (MAPE) and root mean square error (RMSE) are kept to a minimum 2020 BPNN, Singular Spectrum Analysis (SSA), Weightless Neural Network (WNN), Cuckoo Search algorithm [199] Surveying load forecasting for wavelet disintegration to learn about the reduction of stochastic part Grasshopper Optimization Algorithm, BP, Regressive Model [200] Daily and hourly continuous load forecasting Load Range Discretization (LRD), CNN, BP [201] Probability distribution generation for load forecasting Mutual Information-ANN, Jaya algorithm [202] Removes feature selection redundancy Mid-term load forecasting in terms of green environment and peak load BPNN [210] Identification of max power load at photovoltaic power generation and power capacity ing gradually as per changing requirements.Failure to satisfy any of the requirements of an NGSG may result in disruption of power generation and transmission, increased operational cost, damaged components, and long blackouts.

Enhanced data preprocessing and handling efficiency
Various circumstances such as climate change, tax, regulation, and economic growth, etc., can affect the supply and demand requirements for energy in the future.This will differentiate the data acquired from an SG which may vary from the previously acquired data.This can cause the data techniques trained on historical datasets to be unable to generate accurate results.The variations of collected data can have new information unknown to the algorithm that may be analyzed after advanced preprocessing with higher efficiency.This will increase the demand for better and quicker preprocessing techniques.

Conclusion
With the advance of technologies, the need for a sustainable and green environment is increasing.As well as increasing the amount of intermittent renewable generation, a data-driven technology may boost the capacity of clean energy sources, like solar, wind, and photovoltaic systems.An NGSG promotes energy-efficient power systems and improves the effectiveness of power consumption and energy sustainability.In this paper, the Over-fitting issue reduction and computational time reduction using CNN, KPCA, MI etc.
2019 Parallel deep learning, DC-DC converter [207] Ensuring control of hybrid energy storing system in a distributed system using parallel deep learning 2018 Neuro-fuzzy, ANFIS, BPNN, Levenberg-Marquardt algorithm [217] Effectively predicting long term load forecasting using ANN BPNN [210] Identification of max power load at photovoltaic power generation and power capacity  The data are immutable throughout the whole process P2P power trading [239] Smart contract, Redundant Byzantine Fault Tolerance (RBFT) An industrial control architecture based on blockchain to guarantee effective data, an ICS BlockOpS system consistency Permissioned Blockchain A robust system is used for ensuring increased security Efficiency falls in the case of closed loop system conceptual data-driven NGSG framework for sustainable energy evolution is discussed.The main findings of this paper can be summarized as: • A comparative study on the conventional SG and NGSG is explicitly done here in terms of their operation and technology.Also, the critical steps to build the data-driven NGSG are also demonstrated and briefly discussed.• All the intelligent features of a data-driven NGSG are reported and discussed to identify the scope of DDTs.• Several challenges in initiating the implementation of DDTs are explored and addressed for the growth towards sustainable evolution in an NGSG.
• Advanced DDTs in the conventional SG for the management, condition monitoring, fault prediction, advanced forecasting, and precise fraud characterization are summarized.These lead to the purpose of using DDTs in an NGSG.
In conclusion, it can be seen that a variety of challenging problems in NGSGs, problems which resist even the most determined efforts of conventional mechanismbased solutions, are successfully resolved by data-driven techniques.These techniques improve NGSG security, increase effectiveness, and reduce dependency on labor and knowledge-intensive human tasks.

Fig. 3 A
Fig. 3 A conceptual framework for next-generation smart grid energy system

Fig. 5
Fig. 5 Critical steps to develop a data-driven next-generation smart grid learning representations from data, takes into consideration the knowledge about the data's structure and generates strong representations; nonetheless, the robustness of the GCN depends on the caliber of the feature matrix and the original graph Electric parameter identification CNN[69] CNN is beneficial when the reduction of parameters is necessary in ANN.An important aspect of CNN is that the problems may not have spatially dependent featuresElectricity theft detectionDecision Tree[70] Decision tree is a tree structured algorithm that is useful for classification and regression.A decision tree consists of three parts: internal nodes, branches and leaves.The dataset attributes are represented in nodes Smallfor classification as well as regression problems.SVM is popular in the sectors of data mining, machine learning and pattern recognition because of its remarkable generalization abilityStealthy false data injection detectionSemi-supervised learningGraph Neural Network (GNN)[76] GNN has been proposed as a new deep learning model to learn non-Euclidean material False data injection attack detection Q-Learning[77] Updates are made via bootstrapping in the off policy algorithm known as Q-learningVulnerability analysisParticle Swarm Optimization[78] The Particle Swarm Optimization technique is easy to implement and use, adaptable, and has a small number of controlling parameters (cognitive ratio, inertia weight, and social ratio) HMMs to connect chains of observations with an inherent Markov process-whose unseen states serve as the focus of inferenceexplains their widespread use.Because HMMs can handle discontinuous time series, such as hourly data, they are particularly well suited for describing and forecasting failures Islanding prediction K-means clustering [81] K-means clustering is the most basic, widely used, and computationally efficient clustering technique This method has been heavily applied in a variety of fields, including the categorization of documents, ride data analysis, in-depth call record analysis, customer classification, criminal network analysis, and others Privacy preserving

Table 2
Comparison between the current study and related existing literature Fig. 2 Conventional smart grid architecture Ahsan et al.Protection and Control of Modern Power Systems (2023) 8:43

Table 3
Current smart grid projects in different countries

Table 4
Secure control operation techniques

Table 5
Energy management techniques

Table 6
Asset condition monitoring techniques

Table 7
Fault prediction and characterization techniques

Table 12
Fraud characterization techniques

Table 13
Blockchain-based safe energy trading techniques