{ "cells": [ { "cell_type": "markdown", "id": "cbd9acbb-0ce2-404c-affa-d943c74ef43b", "metadata": {}, "source": [ "# Regression using sci-kit learn" ] }, { "cell_type": "code", "execution_count": 1, "id": "a1cf9139-688e-4fef-b814-bd386ed56638", "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import make_regression\n", "from sklearn.preprocessing import scale\n", "import matplotlib.pyplot as plt\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.metrics import mean_squared_error\n", "import pandas as pd\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "id": "4058a0d6-cba5-415c-b336-cc5835b9d9fb", "metadata": {}, "source": [ "## Make some fake regression data\n", "\n", "We make some data with 5 features, but 3 ones that will help us predict. We made one of the features redundant with another by replicating an existing feature and scaling it. We are going to follow our process here:\n", "\n", "1. Build model\n", "2. Fit data\n", "3. Evaluate fit\n", "4. Visualize\n", "\n", "But we can probably appreciate that we should look at these data before we try to do our analysis." ] }, { "cell_type": "code", "execution_count": 2, "id": "932de82b-e135-4467-b5c8-2117cfb53fc2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Feature 1 | \n", "Feature 2 | \n", "Feature 3 | \n", "Feature 4 | \n", "Feature 5 | \n", "Target | \n", "
---|---|---|---|---|---|---|
0 | \n", "-1.867265 | \n", "-186.726519 | \n", "-1.612716 | \n", "2.314659 | \n", "-0.471932 | \n", "67.011794 | \n", "
1 | \n", "-0.493001 | \n", "-49.300093 | \n", "0.849602 | \n", "-0.208122 | \n", "0.357015 | \n", "-63.011106 | \n", "
2 | \n", "-0.059525 | \n", "-5.952536 | \n", "-1.024388 | \n", "-0.926930 | \n", "-0.252568 | \n", "-178.808987 | \n", "
3 | \n", "-1.124642 | \n", "-112.464209 | \n", "1.277677 | \n", "0.711615 | \n", "0.332314 | \n", "-72.663818 | \n", "
4 | \n", "-0.465730 | \n", "-46.572975 | \n", "-1.913280 | \n", "-0.463418 | \n", "-1.724918 | \n", "-47.971879 | \n", "