Code Implementation of Simple Linear Regression 🧑💻🖥️
HTML-код
- Опубликовано: 13 сен 2024
- Simple linear regression:
Simple linear regression aims to find a linear relationship to describe the correlation between an independent and possibly dependent variable.
Importing Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
- `numpy` is used for numerical operations.
- `pandas` is used for data manipulation and analysis.
- `matplotlib.pyplot` is used for plotting graphs.
- `train_test_split` from `sklearn` is used to split the dataset into training and testing sets.
- `LinearRegression` from `sklearn` is used to create and train the linear regression model.
Loading the Dataset
data = pd.read_csv('/content/drive/MyDrive/Data sets ml/Salary.csv')
print(data.head())
- The dataset is loaded using `pd.read_csv`.
- `data.head()` displays the first few rows of the dataset to understand its structure.
Extracting Dependent and Independent Variables
x = data.iloc[:, :-1].values
y = data.iloc[:, 1].values
print(x)
print(y)
- `x` contains the independent variable(s) (in this case, years of experience).
- `y` contains the dependent variable (salary).
- `data.iloc[:, :-1]` selects all columns except the last one for `x`.
- `data.iloc[:, 1]` selects the second column for `y`.
Splitting the Dataset into Training and Testing Sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
- The dataset is split into training and testing sets.
- `test_size=0.2` means 20% of the data is used for testing, and 80% for training.
- `random_state=0` ensures reproducibility of the split.
Fitting the Model
regressor = LinearRegression()
regressor.fit(x_train, y_train)
- A `LinearRegression` model is created.
- The model is trained using the `fit` method on the training data (`x_train` and `y_train`).
Predicting on the Testing Set
y_pred = regressor.predict(x_test)
- The trained model predicts salaries (`y_pred`) based on the testing set (`x_test`).
Plotting the Training Data Graph
plt.scatter(x_train, y_train, color="red")
plt.plot(x_train, regressor.predict(x_train), color="green")
plt.title("Salary vs Experience (Training set)")
plt.xlabel("Years of Experience")
plt.ylabel("Salary")
plt.show()
- A scatter plot of the training data (`x_train`, `y_train`) is created, with red dots representing actual data points.
- A green line represents the predicted salaries based on the training data.
- Titles and labels are added for clarity.
Plotting the Testing Data Graph
plt.scatter(x_test, y_test, color="red")
plt.plot(x_train, regressor.predict(x_train), color="green") # Use x_train for the line
plt.title("Salary vs Experience (Testing set)")
plt.xlabel("Years of Experience")
plt.ylabel("Salary")
plt.show()
- A scatter plot of the testing data (`x_test`, `y_test`) is created, with red dots representing actual data points.
- The green line from the training data is plotted to compare the actual testing data points against the model's predictions.
- Titles and labels are added for clarity.
Summary:
1. **Libraries**: Imported necessary libraries.
2. **Data Loading**: Loaded and displayed the dataset.
3. **Variable Extraction**: Extracted independent (`x`) and dependent (`y`) variables.
4. **Data Splitting**: Split the data into training and testing sets.
5. **Model Fitting**: Created and trained a linear regression model.
6. **Prediction**: Predicted salaries using the test set.
7. **Visualization**: Plotted graphs for both training and testing sets to visualize the model's performance.
#ml #machinelearningbasics #machinelearningtutorialforbeginners #linearregression #code #python #machinelearning
Dataset:
drive.google.com/file/d/13n-nQF6oSmLucFKZQ8KAYvTseUVbi-qg/view?usp=drivesdk