Skip to content

Do not go softly into that good R²

Moody's Analytic Services

Table of Contents

While ordinary least squares regression can be an unreasonably effective tool, it can pay dividends to dig deeper. The original chart shows a simple linear fit between AI subscription rates and private payroll changes, yielding a modest of 0.2111. However, running the diagnostic plots revealed a couple of hidden issues.

using CairoMakie
using DataFrames
using GLM
using OLSPlots
using StatsModels
x = [6.0, 7.5, 9.5, 11.0, 11.5, 12.0, 12.5, 13.5, 14.0, 14.5, 15.0, 15.5, 16.0, 16.5, 17.5, 17.5, 18.5, 19.0, 19.5, 20.5, 21.0, 22.5, 23.0, 24.0, 26.5, 27.5, 34.0, 40.5, 42.0, 42.0, 45.0, 45.5, 46.5, 47.0,48.0, 48.5]
y = [0.24, 0.18, 0.02, 0.13, 0.09, 0.15, 0.11, 0.06, 0.08, 0.09, 0.07, 0.10, 0.08, 0.05, 0.06, 0.11, 0.04, 0.03, -0.01, -0.02, 0.11, -0.01, 0.08, 0.16, -0.06, 0.03, 0.05, 0.01, -0.04, 0.07, 0.01, 0.05, -0.02, 0.05, 0.13, 0.05]
model = GLM.lm(@formula(y ~ x), DataFrame(x=x, y=y))
r2(model)
diagnostic_plots(model)

CleanShot 2026-03-03 at 19.53.48@2x.png

First, a single outlier (a firm/sector with very high AI adoption and unusually high payroll changes) was acting as a drag on the model's strength. Second, even after removing it, the "Residuals vs Fitted" plot showed a distinct, sharp U-shaped pattern. This meant the straight-line model was systematically missing the curve of the data—under-predicting at the extremes and over-predicting in the middle.

To address this non-linearity, I first tried a log-transformation on the x-axis. While this successfully bumped the to 0.3785, the stubborn U-shape in the residuals persisted.The real breakthrough came from fitting a quadratic polynomial model (adding an term) without the outlier.

# Quadratic model, without point 35 (outlier: 48.0,0.13)
x = [6.0, 7.5, 9.5, 11.0, 11.5, 12.0, 12.5, 13.5, 14.0, 14.5, 15.0, 15.5, 16.0, 16.5, 17.5, 17.5, 18.5, 19.0, 19.5, 20.5, 21.0, 22.5, 23.0, 24.0, 26.5, 27.5, 34.0, 40.5, 42.0, 42.0, 45.0, 45.5, 46.5, 47.0,48.5]
y = [0.24, 0.18, 0.02, 0.13, 0.09, 0.15, 0.11, 0.06, 0.08, 0.09, 0.07, 0.10, 0.08, 0.05, 0.06, 0.11, 0.04, 0.03, -0.01, -0.02, 0.11, -0.01, 0.08, 0.16, -0.06, 0.03, 0.05, 0.01, -0.04, 0.07, 0.01, 0.05, -0.02, 0.05, 0.05]
df = DataFrame(x=x, x2=x.^2, y=y)
model = GLM.lm(@formula(y ~ x + x2), df)
r2(model)
diagnostic_plots(model)

CleanShot 2026-03-03 at 20.01.02@2x.png

This allowed the regression line to bend, capturing the actual "diminishing returns" dynamic between AI subscriptions and job market impacts. The results were striking:

  • The problematic U-shaped residual pattern flattened out.
  • Both the linear and quadratic terms proved highly statistically significant (p = 0.0009 and p = 0.0048, respectively).
  • The predictive power of the model (R2) more than doubled from the original chart to 0.4306!

The takeaway: A straight line is a great starting point, but always check your residual plots! The real story is often hiding in the curves.

Latest

Most bad maps are the same bad map

Most bad maps are the same bad map

Wow, would you look at that. All that economic output from just these five countries that don't even occupy the most area! This is a classic. Economic activity, like deaths and a host of other social and political aspects, is roughly proportional to population. Showing the raw numbers

Members Public

Excel to Julia: The Rosetta Stone

For when your spreadsheet starts to crawl, but you still need to get the job done. 1. The Basics: Data as a Thing In Excel, the data and the logic live in the same cell. In Julia, we keep them separate for speed and sanity. * Workbook/Sheet ≅ DataFrame * Column ≅:Symbol

Members Public

Using the forum at Julialang.org

The Julia language organization maintains an open, free forum where you can post questions to the helpful users under the New to Julia category. Especially if your question involves advanced scientific capabilities of the language, such as tensors, it will be invaluable. On the other hand … If you are an

Members Public