Integrate Your Expertise with the Data

Your data science model is only as good as the data you use.  You can get better, cleaner data by using DataSignals to streamline everything from data corrections to standardizing over reporting intervals.  But what about the data not published by the ISO?  And how does your own insight and knowledge fit into data science models?  This is where feature engineering comes in!

Feature engineering is the process associated with creating additional variables in your analysis based on your expertise and knowledge of the energy markets.  A quick example - many of the ISOs don’t publish net load, but net load is often just as important if not more important than each of the variables that make it up (actual load & renewable generation).  Other good examples of important user driven features you can create include lagged variables, leading variables, statistical summaries, binning/categorical feature creation, normalization and logging. 

Here is an ERCOT specific application of feature engineering to improve our understanding of the ERCOT price adder.  We started our analysis by bringing in three series from the API: 

  • Load

  • Online capacity

  • Adder

We can run some quick statistical analyses to summarize our data and build very basic histograms to learn our data trends. 

We can see similar patterns with online capacity and load. And also noticeable is how right-skewed the adder appears. But at this point, the data isn’t actionable. We still don’t know how changes in data drivers, like load or online capacity, can impact whether or not the adder is appearing in the given market interval. To start to understand this, we can look for a simple linear correlation between two of our items - load and online capacity and then shade by the adder.

What can we learn?

  • The relationship between load and online capacity exists

  • It is a negative correlation, but maybe it’s not as strong as we may have initially suspected based on just the histogram

  • There is a lot of noise when load is lower and capacity is higher

  • The adder appears to hit at some level where load is high and the online capacity is low 

We still need to dig further into this data to understand when we will expect adders on the ERCOT LMP.  For that, let’s start by establishing a new feature or variable - specifically a boolean indicator for whether or not the adder bound.  This will allow us to set thresholds for alerts, and early binding intervals.

Now we have a strong starting point for analyzing when the adder will hit:

  • Load needs to be above 55GWs and

  • Online capacity needs to be below 15GWs.  

There are some areas where load could be lower and the adder is even more likely. In this case  we want to further target load as a category and to drill into this more. For this, let’s create one more feature, load category.  This will have five values:

  • Extreme-valley

  • Valley

  • Mid

  • Peak

  • Super-peak

Creating this as a facet or additional partition on our data also allows us to easily display other data drivers.  Let’s bring in the ERCOT HASL data now. In this example, we’ve now updated our results to display HASL as the Y-Axis.

Now we can further break down our data to understand the relationship between HASL, Load and Capacity on the adder. Based on just these charts we can start to set thresholds for multi-conditional alerts in QuickSignals. We can even take this a step further by building machine learning models off of our data. But that’s for another blog. This is the first in a series of Data Science blog posts so please subscribe to our blog to get those sent to your inbox!

If you’re a Yes Energy customer and would like to receive code samples (in R) to try this on your own click here and we’ll send you the code. Unfortunately the code will not work if you aren’t a Yes Energy customer BUT if you’re interested in learning more we’d love to chat! Fill out this form and we’ll be in touch!

Feeding Hungry Data Models

Feedyourmodels-01.png

FEEDING HUNGRY DATA MODELS:

HOW TO ENSURE ACCURATE & CONSISTENT DATA

FOR YOUR DATA MODELS

Traders and Generation Managers rely on accurate generation & transmission flow data to make decisions. With the increasing use & reliance on automation & modeling it is ever more important to ensure the data is clean, accurate & consistent. Here is a quick overview of how our users can rely on & utilize Live Power Generation & Transmission data to make decisions and feed their models.

STEP 1: START WITH ACCURATE DATA

Live Power, Yes Energy’s data partner, monitors, calibrates and delivers real-time generation & transmission flow data at 60 second intervals. This data is closely observed and calibrated against other generation data sources. This attention to accuracy means that their real-time data closely matches sources like SCED & CEMS data (see video 1)

Video 1. Time Series Chart showing Live Power Generation along with SCED, & CEMS data for the same power plant - Rio Nogales in ERCOT. CEMS

STEP 2: ENSURE A CONSISTENT FLOW OF DATA

In addition to the data Yes Energy collects & maintains, we integrate data from our partners, helping traders see a full picture. Real-time Generation & Transmission data is only available in Yes Energy products through a subscription to Live Power data & Planned Generation Outage intelligence is provided through a subscription to IIR Energy. SCED & CEMS generation data is available as part of Yes Energy tools but for both there is a 45- 60 day reporting lag. Blending the real-time & historical generation data sources provides multiple benefits:

 

Provides users with a consistent flow of data from which to base their analysis - even if SCED or CEMS reports aren’t yet published (see video 2).

 

Video 2. SCED & CEMS Data does not have anything published past October 5th. Live Power real-time generation data (blue) continues and and indicates that this plant has come offline. IIR Energy (black line) has reported an outage for this plant. The real-time generation confirms that the plant does indeed go offline - and actually turns off a bit early and comes back on a bit late. Real-time LMP for HB_HOUSTON indicates some volatility during this time and generation appears to be a fundamental driving the market. Access to the Live Power Real-time Generation data & IIR Energy’s plant status provides this insight.

 

Allows users to set a priority level for each of these data series - setting the order in which each data source is used (see figure 1).

Figure 1. Use the NVL function with the three data sources for generation data in ERCOT - SCED, CEMS and Live Power. This function creates a data series that displays the available data sources in order of priority, ensuring that when one data series has a null value, other sources fill in the gaps.

forumula.png
 

Provides a complete data set with which to feed models

STEP 3: FEED SOFTWARE WITH THE ACCURATE CONSISTENT DATA FROM YES ENERGY.

Video 3. Use the Data Table View in the Time Series Analysis module to export the data to Excel. This will pull all of your data series including the blended data series ensuring that your models are not fed null values.

Video 4. DataSignals users can build queries using these generation & transmission flow data series. To address null values, users can use Yes Energy’s “Best Generation Data Series”, create a coalesce statement in their modeling software or configure the data so it is customized for each generator. Then use this latest & greatest generation & transmission data to feed your power flow modeling or any other data model downstream.

INTERESTED IN LEARNING MORE?

SIGN UP TO RECEIVE A GETTING STARTED GUIDE - INCLUDING SAMPLE API CALLS, HOW TO FIND CRITICAL DATA SETS & TIPS FOR USING POPULAR DATA SCIENCE PROGRAMMING LANGUAGES:

Feedyourmodels-01.png