How to make scatter plot with trendline and stats in python

Поделиться
HTML-код
  • Опубликовано: 19 сен 2023
  • Get a chart with a linear regression line of best fit and the equation of the line, the r-squared value and the p-value.
    ---------------------------------------------------------------------------------------------------------------------
    import numpy as np
    import matplotlib.pyplot as plt
    from scipy import stats
    x = np.array([2,4,5,7,10])
    y = np.array([1,5,4,8,11])
    plt.scatter(x,y)
    slope, intercept, rvalue, pvalue, stderr = stats.linregress(x,y)
    plt.plot(x,slope*x+intercept)
    plt.annotate("y=%.3fx+%.3f
    R$^2$=%.3f
    p=%.3f"%(slope,intercept,rvalue**2,pvalue),xy=(0.15,0.7),xycoords='figure fraction')
    plt.show()
  • НаукаНаука

Комментарии • 4

  • @karinaadcock
    @karinaadcock  9 месяцев назад +1

    If you are doing this with dates, scipy can't handle dates so you have to use:
    Import matplotlib.dates as mdates
    new_x = mdates.date2num(x)
    To covert your dates into numbers.
    Then the slope will be in units of days.

  • @Oceansteve
    @Oceansteve 9 месяцев назад +1

    Thanks.. gave a really nice process of construction.. got me plaing for a few hours after watching and here is my update:
    def plot_regression(x, y, x_label='X', y_label='Y', title='Linear Regression'):
    plt.scatter(x, y, c='black', marker='x', label='Data Points', alpha=0.7, s=20)
    # Linear regression with dark blue line and narrower width
    slope, intercept, rvalue, pvalue, stderr = stats.linregress(x, y)
    plt.plot(x, slope * x + intercept, c='darkblue', label='Regression Line', linewidth=1)
    quantile_values = np.arange(1, 101)
    x_quantiles = np.percentile(x, quantile_values)
    y_quantiles = np.percentile(y, quantile_values)
    plt.scatter(x_quantiles, y_quantiles, c='red', marker='x', s=20, label='Quantiles')
    # Annotation
    annotation_text = (f"y = {slope:.3f}x + {intercept:.3f}
    "
    f"R$^2$ = {rvalue**2:.3f}
    "
    f"p = {pvalue:.3f}")
    plt.annotate(annotation_text, xy=(0.15, 0.7), xycoords='axes fraction', fontsize=10,
    bbox=dict(facecolor='white', edgecolor='black', boxstyle='square,pad=0.5'))
    # Labels, title, and legend
    plt.xlabel(x_label)
    plt.ylabel(y_label)
    plt.title(title)
    # Position legend outside the plot area to the right
    plt.legend(loc='upper left', bbox_to_anchor=(1, 1))
    # Add light grid lines
    plt.grid(True, which='both', linestyle='--', linewidth=0.5, color='grey', alpha=0.5)
    # Adjust layout to accommodate the legend
    plt.tight_layout(rect=[0, 0, 0.85, 1])
    # Show plot
    plt.show()

  • @itsame2649
    @itsame2649 6 месяцев назад

    How can I do this for a log - log plot?

    • @Dawson-qs9zc
      @Dawson-qs9zc 4 месяца назад +1

      numpy.log10(x) will take the base-10 log of everything in your x array, you can do the same with y. numpy.log(x) will take the natural log (ln). I'm guessing there are ways to use other bases if you need to for whatever reason. I think the rest should be the same, you just need to make it clear that you are plotting and doing statistics on the log of your data.