Data science deep dive: Moving beyond R-squared to p-value for better energy analysis

2021.12.17

Data science deep dive: Moving beyond R-squared to p-value for better energy analysis

Bookkeeping

NADECICA編集部
NADECICA編集部

INDEX

目次

    When adding  more variables
    to a model, you need to think about the cause-and-effect assumptions that
    implicitly go with them, and you should also look at how their addition changes
    the estimated coefficients of other variables. And do the residual stats
    and plots indicate that the model’s assumptions are OK? If they aren’t, then you
    shouldn’t be obsessing over small improvements in R-squared anyway. That is, the standard deviation of the
    regression model’s errors is about 1/3 the size of the standard deviation
    of the errors that you would get with a constant-only model. That’s very good, but it
    doesn’t sound quite as impressive as “NINETY PERCENT
    EXPLAINED! So, the next time you run a regression analysis on energy data, calculate its CV(RMSE) to understand the model’s predictive accuracy.

    Arthur Berman (aeberman) Arthur E. Berman is a petroleum geologist with 35 years of oil and gas industry experience. He worked 20 years for Amoco (now BP) and 15 years as consulting geologist. He gives keynote addresses for energy conferences, boards of directors and professional societies. He has been interviewed about oil and gas topics on CBS, CNBC, CNN, Platt’s overhead business Energy Week, BNN, Bloomberg, Platt’s, Financial Times, The Wall Street Journal, Rolling Stone and The New York Times. These pages have hosted over 7,500 articles covering every aspect of the global energy system. It was not unusual for a post to attract over 600 comments, many of which were well informed and contained charts and links to other internet sources.

    The Blog

    However, as we saw, R-squared doesn’t tell us the entire story. You should evaluate R-squared values in conjunction with residual plots, other model statistics, and subject area knowledge in order to round out the picture (pardon the pun). An increase
    in R-squared from 75% to 80% would reduce the error standard deviation by about
    10% in relative terms. That begins
    to rise to the level of a perceptible reduction in the widths of confidence
    intervals. But don’t forget, confidence intervals are realistic guides to
    the accuracy of predictions only if the
    model’s assumptions are correct.

    As per ASHRAE Guideline 14, a CV(RMSE) of and below 25% indicates a good model fit with acceptable predictive capabilities. For the dataset given above, The CV(RMSE) was found to be 6%, implying that the model is reliably predictive. Again, quantify the “errors” of this model by measuring the vertical distance of each data value from the regression line and squaring it. Investors use the r-squared measurement to compare a portfolio’s performance with the broader market and predict trends that might occur in the future. For instance, let’s assume that an investor wants to purchase an investment fund that is strongly correlated with the S&P 500.

    • A high R-squared does not necessarily indicate that the model has a good fit.
    • Founded in 1998, kW Engineering delivers well-engineered, energy efficiency solutions to lower operating costs, optimize building operations and achieve carbon reduction goals in commercial, industrial and agricultural facilities.
    • The
      decisions that depend on the analysis could have either narrow or wide margins
      for prediction error, and the stakes could be small or large.
    • In 2009 I was appointed as Honorary Research Fellow at The University of Aberdeen and teach occasional courses there.

    Once you have a list of errors, you can add them up and run them through the R-squared formula. Jérôme à Paris is an investment banker in Paris, specialised in structured finance for energy projects, in particular in the wind power sector. He is the editor of the European Tribune, a community website on European politics and energy issues. He has written extensively about energy issues, usually from an economic or geopolitical angle for the European Tribune and for DailyKos where he led a collective effort to draft an energy policy for the USA, Energize America. In the case of our dataset, the null hypothesis states that outside the sample, i.e. in the population, there is no relationship between OAT and metered energy use. The problem with both of these questions it that it is just a bit silly to work out if a model is good or not based on the value of the R-Squared statistic.

    Road Trip – Thoughts on the Satsop Nuclear Power Station

    Sometimes people take point 1 a bit further, and suggest that R-Squared is always bad. Or, that it is bad for special types of models (e.g., don’t use R-Squared for non-linear models). There are quite a few caveats, but as a general statistic for summarizing the strength of a relationship, R-Squared is awesome. All else being equal, a model that explained 95% of the variance is likely to be a whole lot better than one that explains 5% of the variance, and likely will produce much, much better predictions. The regression model on the left accounts for 38.0% of the variance while the one on the right accounts for 87.4%. The more variance that is accounted for by the regression model the closer the data points will fall to the fitted regression line.

    Analysis and Interpretation

    But why would we regard the right-hand model as worse than the left? If we were to use either model to predict expected consumption, the absolute error in the estimates would be the same. The probability of observing data, at least as favorable to the alternative hypothesis as the sample dataset, if the null hypothesis were true. At its core, the p-value attempts to clarify whether the correlation between the variables as seen in the sample is purely chance, or if an actual relationship exists. When interpreting the R-Squared it is almost always a good idea to plot the data. That is, create a plot of the observed data and the predicted values of the data.

    Use R-Squared to work out overall fit

    It’s always a good idea to evaluate your data using a variety of statistics. Then interpret the composite results based on the context and objectives of your specific application. If you understand how a statistic is actually calculated, you’ll better understand its strengths and limitations. Another common misconception is that a low value of R2 in the case of heating fuel signifies poor control of the building. Suppose that a building’s fuel consumption is being monitored against locally-measured degree days. Now suppose that the local weather monitoring fails and you switch to using published degree-day figures from a meteorological station 35km away.

    Going by the popular opinion, of wanting an R-squared value of at least 0.75 or higher, one would deem this model as ‘bad’ and rush to discard its summary output. But before we do, let’s pause and divert our attention to the p-value (highlighted in red above). Let’s use our understanding from the previous sections to walk through an example.

    REVIEW: R-SQUARED ENERGY BLOG REVIEWS GUSHER OF LIES

    His interest in the limitations to oil supply dates back to about 1962, when he was at school watching a promotional film from an oil company. The subject of the film was oil exploration, and this caused him to wonder about the dependence of our society on oil and the limits to supply. Other interests are canoeing, kayaking, skiing, hiking, camping, keeping planted aquaria and learning Mandarin Chinese. In 2009 I was appointed as Honorary Research Fellow at The University of Aberdeen and teach occasional courses there.

    Dow Jones Industrial Average, S&P 500, Nasdaq, and Morningstar Index (Market Barometer) quotes are real-time. If the stakes are too high and an error can be costly, the p-value may be dropped down to 0.01 (1% error rate). If the stakes are too high and an error can be costly, the p-value may be dropped down to 0.001 (1% error rate). The p-value varies theoretically between 0 (i.e. no relationship) and 1 (any sample would show direct relationship).

    当社は、この記事の情報(個人の感想等を含む)及びこの情報を用いて行う利用者の判断について、正確性、完全性、有益性、特定目的への適合性、その他一切について責任を負うものではありません。この記事の情報を用いて行う行動に関する判断・決定は、利用者ご自身の責任において行っていただくと共に、必要に応じてご自身で専門家等に 相談されることを推奨いたします。

    記事のお問い合わせはこちら

    CATEGORIES

    アイケア&アイクリーム
    EYE CARE & EYE CREAM
    クレンジング
    CLEANSING
    コンシーラー
    CONCEALER
    ボディローション&ミルク
    BODY_LOTION&MILK
    まつげ美容液
    EYELASH_SERUMS
    化粧水
    SKIN_LOTION
    洗顔料
    FACIAL_WASH
    美容液
    ESSENCE
    SNSをフォローして
    最新の口コミをチェック!
    SNS ACOUNT