Code Implementation of Simple Linear Regression 🧑‍💻🖥️

Поделиться
HTML-код
  • Опубликовано: 13 сен 2024
  • Simple linear regression:
    Simple linear regression aims to find a linear relationship to describe the correlation between an independent and possibly dependent variable.
    Importing Libraries
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LinearRegression
    - `numpy` is used for numerical operations.
    - `pandas` is used for data manipulation and analysis.
    - `matplotlib.pyplot` is used for plotting graphs.
    - `train_test_split` from `sklearn` is used to split the dataset into training and testing sets.
    - `LinearRegression` from `sklearn` is used to create and train the linear regression model.
    Loading the Dataset
    data = pd.read_csv('/content/drive/MyDrive/Data sets ml/Salary.csv')
    print(data.head())
    - The dataset is loaded using `pd.read_csv`.
    - `data.head()` displays the first few rows of the dataset to understand its structure.
    Extracting Dependent and Independent Variables
    x = data.iloc[:, :-1].values
    y = data.iloc[:, 1].values
    print(x)
    print(y)
    - `x` contains the independent variable(s) (in this case, years of experience).
    - `y` contains the dependent variable (salary).
    - `data.iloc[:, :-1]` selects all columns except the last one for `x`.
    - `data.iloc[:, 1]` selects the second column for `y`.
    Splitting the Dataset into Training and Testing Sets
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
    - The dataset is split into training and testing sets.
    - `test_size=0.2` means 20% of the data is used for testing, and 80% for training.
    - `random_state=0` ensures reproducibility of the split.
    Fitting the Model
    regressor = LinearRegression()
    regressor.fit(x_train, y_train)
    - A `LinearRegression` model is created.
    - The model is trained using the `fit` method on the training data (`x_train` and `y_train`).
    Predicting on the Testing Set
    y_pred = regressor.predict(x_test)
    - The trained model predicts salaries (`y_pred`) based on the testing set (`x_test`).
    Plotting the Training Data Graph
    plt.scatter(x_train, y_train, color="red")
    plt.plot(x_train, regressor.predict(x_train), color="green")
    plt.title("Salary vs Experience (Training set)")
    plt.xlabel("Years of Experience")
    plt.ylabel("Salary")
    plt.show()
    - A scatter plot of the training data (`x_train`, `y_train`) is created, with red dots representing actual data points.
    - A green line represents the predicted salaries based on the training data.
    - Titles and labels are added for clarity.
    Plotting the Testing Data Graph
    plt.scatter(x_test, y_test, color="red")
    plt.plot(x_train, regressor.predict(x_train), color="green") # Use x_train for the line
    plt.title("Salary vs Experience (Testing set)")
    plt.xlabel("Years of Experience")
    plt.ylabel("Salary")
    plt.show()
    - A scatter plot of the testing data (`x_test`, `y_test`) is created, with red dots representing actual data points.
    - The green line from the training data is plotted to compare the actual testing data points against the model's predictions.
    - Titles and labels are added for clarity.
    Summary:
    1. **Libraries**: Imported necessary libraries.
    2. **Data Loading**: Loaded and displayed the dataset.
    3. **Variable Extraction**: Extracted independent (`x`) and dependent (`y`) variables.
    4. **Data Splitting**: Split the data into training and testing sets.
    5. **Model Fitting**: Created and trained a linear regression model.
    6. **Prediction**: Predicted salaries using the test set.
    7. **Visualization**: Plotted graphs for both training and testing sets to visualize the model's performance.
    #ml #machinelearningbasics #machinelearningtutorialforbeginners #linearregression #code #python #machinelearning

Комментарии • 1

  • @LR-EduSphere
    @LR-EduSphere  2 месяца назад +1

    Dataset:
    drive.google.com/file/d/13n-nQF6oSmLucFKZQ8KAYvTseUVbi-qg/view?usp=drivesdk