World-Happiness Multiple Linear Regression

15 minute read

project 3- DSC680

Happiness 2020

soukhna Wade 11/01/2020

Introduction

There are three parts of the report as follows:

Cleaning

Visualization

Multiple Linear Regression in Python

The purpose of choosing this work is to find out which factors are more important to live a happier life. As a result, people and countries can focus on the more significant factors to achieve a higher happiness level. We also will implement several machine learning algorithms to predict the happiness score and compare the result to discover which algorithm works better for this specific dataset.

https://www.kaggle.com/mathurinache/world-happiness-report?select=2020.csv

https://www.kaggle.com/pinarkaya/world-happiness-eda-visualization-ml#2019-Data

Import necessary Libraries

# Standard library import-Python program# for some basic operations
import pandas as pd 
import numpy as np                 # linear algebra

import matplotlib.pyplot as plt    # for graphics
import seaborn as sns              # for visualizations
plt.style.use('fivethirtyeight')                

import seaborn as seabornInstance 
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
from sklearn import metrics

# Use to configure display of graph
%matplotlib inline 

#stop unnecessary warnings from printing to the screen
import warnings
warnings.simplefilter('ignore')

# for interactive visualizations
import plotly.offline as py
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go
init_notebook_mode(connected = True)

Import and read Dataset from local library

#The following command imports the CSV dataset using pandas:
import pandas as pd 

happyness_2020 = pd.read_csv("happyness_2020.csv")

happyness_2020.head(1) 

	Country name	Regional indicator	Ladder score	Standard error of ladder score	upperwhisker	lowerwhisker	Logged GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices	Generosity	Perceptions of corruption	Ladder score in Dystopia	Explained by: Log GDP per capita	Explained by: Social support	Explained by: Healthy life expectancy	Explained by: Freedom to make life choices	Explained by: Generosity	Explained by: Perceptions of corruption	Dystopia + residual
0	Finland	Western Europe	7.8087	0.031156	7.869766	7.747634	10.639267	0.95433	71.900825	0.949172	-0.059482	0.195445	1.972317	1.28519	1.499526	0.961271	0.662317	0.15967	0.477857	2.762835

#happyness_2020.columns

Looking at the current shape of the dataset under consideration

# Looking at the current shape of the dataset under consideration
#df.shape   

# Step 2:  check the dimension of the table or the size of dataframe

print("The dimension of the table is: ",happyness_2020.shape)

The dimension of the table is:  (153, 20)

Cleaning - Is threre any missing or null Values in this dataset (happyness_2020)?

In this section, we load our dataset and see the structure of happiness variables. Our dataset is pretty clean, and we will implement a few adjustments to make it looks better.

#check for any missing values or null values (NA or NaN)
happyness_2020 .isnull().sum()
#df.isnull().head(6)

Country name                                  0
Regional indicator                            0
Ladder score                                  0
Standard error of ladder score                0
upperwhisker                                  0
lowerwhisker                                  0
Logged GDP per capita                         0
Social support                                0
Healthy life expectancy                       0
Freedom to make life choices                  0
Generosity                                    0
Perceptions of corruption                     0
Ladder score in Dystopia                      0
Explained by: Log GDP per capita              0
Explained by: Social support                  0
Explained by: Healthy life expectancy         0
Explained by: Freedom to make life choices    0
Explained by: Generosity                      0
Explained by: Perceptions of corruption       0
Dystopia + residual                           0
dtype: int64

** Note that in the above result no missing values so, the dataset is pretty cleaned.**

# Print a list datatypes of all columns 
  
happyness_2020.head(1)

	Country name	Regional indicator	Ladder score	Standard error of ladder score	upperwhisker	lowerwhisker	Logged GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices	Generosity	Perceptions of corruption	Ladder score in Dystopia	Explained by: Log GDP per capita	Explained by: Social support	Explained by: Healthy life expectancy	Explained by: Freedom to make life choices	Explained by: Generosity	Explained by: Perceptions of corruption	Dystopia + residual
0	Finland	Western Europe	7.8087	0.031156	7.869766	7.747634	10.639267	0.95433	71.900825	0.949172	-0.059482	0.195445	1.972317	1.28519	1.499526	0.961271	0.662317	0.15967	0.477857	2.762835

Exploratory Data Analysis

Prints information of all columns:

happyness_2020 .info() # Prints information of all columns:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 153 entries, 0 to 152
Data columns (total 20 columns):
 #   Column                                      Non-Null Count  Dtype  
---  ------                                      --------------  -----  
 Country name                                153 non-null    object 
 Regional indicator                          153 non-null    object 
 Ladder score                                153 non-null    float64
 Standard error of ladder score              153 non-null    float64
 upperwhisker                                153 non-null    float64
 lowerwhisker                                153 non-null    float64
 Logged GDP per capita                       153 non-null    float64
 Social support                              153 non-null    float64
 Healthy life expectancy                     153 non-null    float64
 Freedom to make life choices                153 non-null    float64
Generosity                                  153 non-null    float64
Perceptions of corruption                   153 non-null    float64
Ladder score in Dystopia                    153 non-null    float64
Explained by: Log GDP per capita            153 non-null    float64
Explained by: Social support                153 non-null    float64
Explained by: Healthy life expectancy       153 non-null    float64
Explained by: Freedom to make life choices  153 non-null    float64
Explained by: Generosity                    153 non-null    float64
Explained by: Perceptions of corruption     153 non-null    float64
Dystopia + residual                         153 non-null    float64
dtypes: float64(18), object(2)
memory usage: 24.0+ KB

Display some statistical summaries of the numerical columns data. To see the statistical details of the dataset, we can use describe():

happyness_2020 .describe().head(1)     # display some statistical summaries of the numerical columns data.

	Ladder score	Standard error of ladder score	upperwhisker	lowerwhisker	Logged GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices	Generosity	Perceptions of corruption	Ladder score in Dystopia	Explained by: Log GDP per capita	Explained by: Social support	Explained by: Healthy life expectancy	Explained by: Freedom to make life choices	Explained by: Generosity	Explained by: Perceptions of corruption	Dystopia + residual
count	153.0	153.0	153.0	153.0	153.0	153.0	153.0	153.0	153.0	153.0	153.0	153.0	153.0	153.0	153.0	153.0	153.0	153.0

#happyness_2020.columns               # display the list of the columns

Let us examine the data for the county very happy and the one that is not

maxSupport=np.max(happyness_2020["Social support"])
maxSupport

0.9746695759999999

maxEconomy=np.max(happyness_2020["Logged GDP per capita"])
maxEconomy

11.45068073

happyness_2020[happyness_2020['Ladder score']==np.max(happyness_2020['Ladder score'])]['Country name']

0    Finland
Name: Country name, dtype: object

maxSupport=np.min(happyness_2020["Social support"])
maxSupport

0.31945985600000004

minEconomy=np.min(happyness_2020["Logged GDP per capita"])
minEconomy

6.492642403

happyness_2020[happyness_2020['Ladder score']==np.min(happyness_2020['Ladder score'])]['Country name']

152    Afghanistan
Name: Country name, dtype: object

To rename columns

happyness_2020 .rename(columns={"Country name":"Country",
                      "Logged GDP per capita":"GDP per capita",
                      "Explained by: Healthy life expectancy":"Health",
                      "Freedom to make life choices":"Freedom",
                      "Overall rank":"Happiness Rank"},inplace=True)
#happyness_2020 .columns

Removing unnecessary columns (Freedom to make life choices and Healthy life expectancy)

#''' drop multiple column based on name in pandas'''

d2020=happyness_2020.drop(['Regional indicator','Standard error of ladder score', 'upperwhisker', 'lowerwhisker',
                            'Ladder score in Dystopia', 'Explained by: Social support',
                            'Explained by: Freedom to make life choices',
                            'Explained by: Generosity', 'Explained by: Perceptions of corruption',
                            'Dystopia + residual','Explained by: Log GDP per capita'], axis='columns')
d2020.head(1)

	Country	Ladder score	GDP per capita	Social support	Healthy life expectancy	Freedom	Generosity	Perceptions of corruption	Health
0	Finland	7.8087	10.639267	0.95433	71.900825	0.949172	-0.059482	0.195445	0.961271

d2020.shape

(153, 9)

d2020.columns

Index(['Country', 'Ladder score', 'GDP per capita', 'Social support',
       'Healthy life expectancy', 'Freedom', 'Generosity',
       'Perceptions of corruption', 'Health'],
      dtype='object')

d2020.head(1)

	Country	Ladder score	GDP per capita	Social support	Healthy life expectancy	Freedom	Generosity	Perceptions of corruption	Health
0	Finland	7.8087	10.639267	0.95433	71.900825	0.949172	-0.059482	0.195445	0.961271

Visualization

The correlation of the variables of the dataset - Heatmap

fig, ax = plt.subplots()
fig.set_size_inches(15, 10)
sns.heatmap(happyness_2020 .corr(),cmap='coolwarm',ax=ax,annot=True,linewidths=2)

<matplotlib.axes._subplots.AxesSubplot at 0x20db1c630d0>

png

There is an inverse correlation between “Ladder score” and all the other numerical variables. The lower the happiness score, the higher the score, and the higher the other factors that contribute to happiness.

The correlation of the new dataset after renaming and droping columns- Heatmap

#The correlation of the new dataset
fig, ax = plt.subplots()
fig.set_size_inches(15, 10)
sns.heatmap(d2020.corr(),cmap='coolwarm',ax=ax,annot=True,linewidths=2)

<matplotlib.axes._subplots.AxesSubplot at 0x20db23735b0>

png

According to the above correlation plot, GPA per capita, social support, and healthy life expectancy play the most significant role in contributing to happiness. While perceptions of corruption have the lowest impact on the happiness score.

Let’s see relationship between different features with happiness score.

1. GDP per capita

Relationship between GDP per capita(Economy of country) has postive strong relationship with happiness score. So If GDP per Capita of a country is high than Happiness Score of that country also more likely to be high.

#https://www.kaggle.com/dgtech/world-happiness-with-basic-visualization-and-eda
import matplotlib.pyplot as plt
import seaborn as sb
import warnings  
warnings.filterwarnings('ignore')
import pandas as pd 

happyness_2020 = pd.read_csv("happyness_2020.csv")

plt.figure(figsize=(14,7))

plt.title("Ladder Score vs Logged GDP per capita")
sb.regplot(data=happyness_2020, x='Logged GDP per capita', y='Ladder score');

png

Using the histogram helps us to make the decision making process a lot more easy to handle by viewing the data that was collected

d2020.hist()

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000020DB3454A00>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000020DB34C91C0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000020DB34F5610>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000020DB3521A60>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000020DB354DEB0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000020DB3584280>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000020DB3584370>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000020DB35B1820>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000020DB3619070>]],
      dtype=object)

png

How is the Happiness Score is distributed?

As you can see below ladder score (happiness score) has values above 2.50 and below 7.77. So there is no single country which has a happiness score above 8.

sns.distplot(d2020['Ladder score'])
d2020.head(1)

	Country	Ladder score	GDP per capita	Social support	Healthy life expectancy	Freedom	Generosity	Perceptions of corruption	Health
0	Finland	7.8087	10.639267	0.95433	71.900825	0.949172	-0.059482	0.195445	0.961271

png

Top 6 Countries with high GDP (Economy)

plt.figure(figsize=(14,7))
plt.title("Top 6 Countries with High GDP")
sb.barplot(data = happyness_2020.sort_values('Logged GDP per capita', ascending= False).head(6), y='Logged GDP per capita', x='Country name')
plt.xticks(rotation=90);

png

2. Perceptions of corruption

Distribution of Perceptions of corruption rightly skewed, which means very less number of country has high perceptions of corruption. That means most of the country has corruption problem.

Corruption is a very big problem for the word. How corruption can impact on Happiness Score?

Perceptions of corruption data is highly skewed no wonder why the data has weak linear relationship, but as you can see in scatter plot most of the data points are on left side and most of the countries with low perceptions of corruption has happiness score between 4 to 6.

Countries with high perception score has high happiness score above 7.

plt.figure(figsize= (15,7))

plt.subplot(1,2,1)
plt.title("Perceptions of corruption distribution")
sb.distplot(a=happyness_2020['Perceptions of corruption'], bins =np.arange(0, 0.45+0.2,0.05))
plt.ylabel('Count')

plt.subplot(1,2,2)
plt.title("Happiness Score vs Perceptions of corruption")
sb.regplot(data=happyness_2020, x='Perceptions of corruption', y='Ladder score');

png

Top 6 Countries with high Perceptions of corruption in the year 2020

plt.figure(figsize=(14,7))
plt.title("Top 6 Countries with High Perceptions of corruption in the year")
sb.barplot(data =happyness_2020.sort_values('Perceptions of corruption', ascending= False).head(6), x='Country name', y='Perceptions of corruption')
plt.xticks(rotation=90);

png

3. Healthy life expectancy

A healthy life expectancy has a strong and positive relationship with a happiness score. If a country has a high life expectancy that means it can also have a high happiness score. It makes sense because anyone who has a very long healthy life he/she is happy. Everyone likes to get a healthy and long life aren’t you?

plt.figure(figsize=(14,7))

plt.title("Happiness Score vs Healthy life expectancy")
sb.regplot(data=happyness_2020, x='Healthy life expectancy', y='Ladder score');

png

Top 6 Countries with high Healthy life expectancy in the year 2020

plt.figure(figsize=(14,7))
plt.title("Top 6 Countries with High Healthy life expectancy in the year 2020")
sb.barplot(data = happyness_2020.sort_values('Healthy life expectancy', ascending= False).head(6), x='Country name', y='Perceptions of corruption')
plt.xticks(rotation=90);

png

Social support of countries also has a strong and positive relationship with happiness scores. Also, the relationship with happiness or ladder score needs to be strong because the more you will help socially more you will be happy.

Social support measures the perception that one has assistance available, the received assistance, or the degree to which a person can integrate into a social network. Support can come from many sources, such as family, friends, pets, neighbors, coworkers, etc.

import matplotlib.pyplot as plt
import seaborn as sb
import warnings  
import pandas as pd 

happyness_2020 = pd.read_csv("happyness_2020.csv")
warnings.filterwarnings('ignore')
plt.figure(figsize=(14,7))

plt.title("Happiness Score vs Social Support")
sb.regplot(data=happyness_2020, x='Social support', y='Ladder score');

png

plt.figure(figsize=(14,7))
plt.title("Top 6 Countries with Social Support")
sb.barplot(data = happyness_2020.sort_values('Social support', ascending= False).head(6), x='Country name', y='Social support')
plt.xticks(rotation=90);

png

Freedom to make life choices

“Freedom to make life choices” is the national average of responses to the question “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”

Freedom to make life choices has some positive relationships with happiness scores. This relation makes sense because the more you will get free to make decisions about your life, the more you will be happy.

plt.figure(figsize=(14,7))

plt.title("Happiness Score vs Freedom to make life choices")
sb.regplot(data=happyness_2020, x='Freedom to make life choices', y='Ladder score');

png

The top 6 countries with high freedom to make life choices

plt.figure(figsize=(14,7))
plt.title("Top 6 Countries with High Freedom to make life choices")
sb.barplot(data = happyness_2020.sort_values('Freedom to make life choices', ascending= False).head(6), x='Country name', y='Freedom to make life choices')
plt.xticks(rotation=90);

png

6. Generosity

The generosity and life expectancy are among the six variables scientists peek at when making the World Happiness Report.

Generosity has a weak linear relationship with the ladder score. One can ask a question:” Why the word “generosity” has not a linear relationship with happiness score?

The reason is that the generosity score depends on the countries that can give the most to nonprofits around the world. Countries which are not generous that does not mean they are not happy.

plt.figure(figsize=(14,7))

plt.title("Happiness Score vs Generosity")
sb.regplot(data=happyness_2020, x='Generosity', y='Ladder score');

png

The top 6 countries with high generosity

plt.figure(figsize=(14,7))

plt.title("The Top 6 Countries with High Generosity")
sb.barplot(data = happyness_2020.sort_values('Generosity', ascending= False).head(6), x='Country name', y='Generosity')
plt.xticks(rotation=90);

png

#p = sb.PairGrid(happyness_2020)
#p.map_diag(plt.hist)
#p.map_offdiag(plt.scatter);

Multiple Linear Regression in Python

It is the most common form of Linear Regression. Multiple Linear Regression describes how a single response variable Y depends linearly on a number of predictor variables. Consider ‘Ladder score’ as the dependent variable and the rest of the attributes as independent variables.

It is the most common form of Linear Regression. Multiple Linear Regression basically describes how a single response variable Y depends linearly on a number of predictor variables.

Step 1:Have a glance at the shape

d2020.shape

(153, 9)

#Have a look at the data first five row
d2020.head()
#d2020.columns

	Country	Ladder score	GDP per capita	Social support	Healthy life expectancy	Freedom	Generosity	Perceptions of corruption	Health
0	Finland	7.8087	10.639267	0.954330	71.900825	0.949172	-0.059482	0.195445	0.961271
1	Denmark	7.6456	10.774001	0.955991	72.402504	0.951444	0.066202	0.168489	0.979333
2	Switzerland	7.5599	10.979933	0.942847	74.102448	0.921337	0.105911	0.303728	1.040533
3	Iceland	7.5045	10.772559	0.974670	73.000000	0.948892	0.246944	0.711710	1.000843
4	Norway	7.4880	11.087804	0.952487	73.200783	0.955750	0.134533	0.263218	1.008072

Step 2: Have a glance at the dependent and independent variables

import pandas as pd

# predictores
   
#x = d2020[['GDP per capita', 'Social support',
#       'Healthy life expectancy', 'Freedom', 'Generosity',
#       'Perceptions of corruption', 'Economy', 'Health']]

x=pd.DataFrame(d2020.iloc[:,2 :]) 
x.head(3)

	GDP per capita	Social support	Healthy life expectancy	Freedom	Generosity	Perceptions of corruption	Health
0	10.639267	0.954330	71.900825	0.949172	-0.059482	0.195445	0.961271
1	10.774001	0.955991	72.402504	0.951444	0.066202	0.168489	0.979333
2	10.979933	0.942847	74.102448	0.921337	0.105911	0.303728	1.040533

# target feature or dependable variable

#y = d2020['Ladder score']
y=pd.DataFrame(d2020.iloc[:, 1])
y.head(3)

Index(['Country', 'Ladder score', 'GDP per capita', 'Social support',
       'Healthy life expectancy', 'Freedom', 'Generosity',
       'Perceptions of corruption', 'Health'],
      dtype='object')

Step 3: Visualize the change in the variables

# Step 4: Visualize the change in the variables
import matplotlib.pyplot as plt    
d2020.plot(x ='Health', y ='Ladder score', style ='o' )
plt.xlabel('Health')
plt.ylabel('Ladder score')
plt.show()

png

Step 4: Divide the data into train and test sets

#Divide the data into train and test sets:
 
x_train, x_test, y_train, y_test, = train_test_split(x,y,test_size=0.3, random_state=10, shuffle=True)

Step 5: Have a glance at the shape of the train and test sets:

#Have a glance at the shape of the train and test sets:
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

(107, 7)
(46, 7)
(107, 1)
(46, 1)

Step 6: Train the algorithm

from sklearn.linear_model import LinearRegression

# Regression model
regressor = LinearRegression()

# Fitting the data to my model
#model = regression.fit(x_train, y_train)
regressor.fit(x_train, y_train)   

LinearRegression()

Step 7:Having a look at the coefficients that the model has chosen

# Having a look at the coefficients that the model has chosen:
import pandas as pd

v = pd.DataFrame(regressor.coef_,index=['Co-efficient']).transpose()
w = pd.DataFrame(x.columns, columns=['attribute'])

Step 8: Concatenating the DataFrames to compare

coeff_d2020 = pd.concat([w,v], axis=1, join='inner')
coeff_d2020

	attribute	Co-efficient
0	GDP per capita	1.166126e-01
1	Social support	2.181179e+00
2	Healthy life expectancy	1.063904e+05
3	Freedom	2.092947e+00
4	Generosity	3.345929e-01
5	Perceptions of corruption	-9.598309e-01
6	Health	-2.955160e+06

Step 9:Comparing the predicted value to the actual value

import numpy as np
y_pred = regressor.predict(x_test)
y_pred = pd.DataFrame(y_pred, columns=['Predictions'])
y_pred.head(3)

	Predictions
0	5.395633
1	5.862217
2	5.259823

from sklearn import metrics
import numpy as np

print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

Mean Absolute Error: 0.3962943704844692
Mean Squared Error: 0.2837987519155517
Root Mean Squared Error: 0.5327276526664931

%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std

np.random.seed(9876789)

Fit and summary:

model = sm.OLS(y, x)
results = model.fit()
print(results.summary())

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:           Ladder score   R-squared (uncentered):                   0.990
Model:                            OLS   Adj. R-squared (uncentered):              0.990
Method:                 Least Squares   F-statistic:                              2082.
Date:                Sat, 14 Nov 2020   Prob (F-statistic):                   7.81e-143
Time:                        22:51:37   Log-Likelihood:                         -127.33
No. Observations:                 153   AIC:                                      268.7
Df Residuals:                     146   BIC:                                      289.9
Df Model:                           7                                                  
Covariance Type:            nonrobust                                                  
=============================================================================================
                                coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------
GDP per capita                0.2291      0.082      2.791      0.006       0.067       0.391
Social support                2.7233      0.661      4.119      0.000       1.417       4.030
Healthy life expectancy      -0.0103      0.015     -0.689      0.492      -0.040       0.019
Freedom                       1.7768      0.498      3.571      0.000       0.794       2.760
Generosity                    0.4106      0.337      1.218      0.225      -0.256       1.077
Perceptions of corruption    -0.6282      0.315     -1.995      0.048      -1.250      -0.006
Health                        1.2655      0.393      3.219      0.002       0.488       2.043
==============================================================================
Omnibus:                        7.971   Durbin-Watson:                   1.482
Prob(Omnibus):                  0.019   Jarque-Bera (JB):                7.701
Skew:                          -0.503   Prob(JB):                       0.0213
Kurtosis:                       3.441   Cond. No.                     1.02e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.02e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Testing the model

tmodel = sm.OLS(y_test, x_test)

resultt = tmodel.fit()

print(resultt.summary())

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:           Ladder score   R-squared (uncentered):                   0.994
Model:                            OLS   Adj. R-squared (uncentered):              0.993
Method:                 Least Squares   F-statistic:                              938.9
Date:                Sat, 14 Nov 2020   Prob (F-statistic):                    2.09e-41
Time:                        22:51:41   Log-Likelihood:                         -26.099
No. Observations:                  46   AIC:                                      66.20
Df Residuals:                      39   BIC:                                      79.00
Df Model:                           7                                                  
Covariance Type:            nonrobust                                                  
=============================================================================================
                                coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------
GDP per capita                0.3581      0.114      3.150      0.003       0.128       0.588
Social support                3.6510      0.952      3.834      0.000       1.725       5.577
Healthy life expectancy      -0.0550      0.022     -2.555      0.015      -0.099      -0.011
Freedom                       1.9445      0.778      2.499      0.017       0.370       3.519
Generosity                    0.2116      0.614      0.345      0.732      -1.030       1.453
Perceptions of corruption    -0.2617      0.480     -0.546      0.588      -1.232       0.709
Health                        2.0042      0.687      2.915      0.006       0.614       3.395
==============================================================================
Omnibus:                        3.039   Durbin-Watson:                   1.771
Prob(Omnibus):                  0.219   Jarque-Bera (JB):                2.374
Skew:                           0.117   Prob(JB):                        0.305
Kurtosis:                       4.088   Cond. No.                     1.02e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.02e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Share on

Twitter Facebook Google+ LinkedIn

Soukhna Wade

World-Happiness Multiple Linear Regression

project 3- DSC680

Happiness 2020

Introduction

Import necessary Libraries

Import and read Dataset from local library

Cleaning - Is threre any missing or null Values in this dataset (happyness_2020)?

Exploratory Data Analysis

Let us examine the data for the county very happy and the one that is not

Visualization

The correlation of the variables of the dataset - Heatmap

The correlation of the new dataset after renaming and droping columns- Heatmap

Let’s see relationship between different features with happiness score.

1. GDP per capita

How is the Happiness Score is distributed?

Top 6 Countries with high GDP (Economy)

2. Perceptions of corruption

Top 6 Countries with high Perceptions of corruption in the year 2020

3. Healthy life expectancy

Top 6 Countries with high Healthy life expectancy in the year 2020

Freedom to make life choices

The top 6 countries with high freedom to make life choices

6. Generosity

The top 6 countries with high generosity

Multiple Linear Regression in Python

Fit and summary:

Testing the model

Share on

You May Also Enjoy

Weather-Introduction-to-programming

Gettysburg-Introduction-to-programming

Create Optimal Hotel Recommendations

Airline Safety

Soukhna Wade

project 3- DSC680

Happiness 2020

Introduction

Import necessary Libraries

Import and read Dataset from local library

Cleaning - Is threre any missing or null Values in this dataset (happyness_2020)?

Exploratory Data Analysis

Let us examine the data for the county very happy and the one that is not

Visualization

The correlation of the variables of the dataset - Heatmap

The correlation of the new dataset after renaming and droping columns- Heatmap

Let’s see relationship between different features with happiness score.

1. GDP per capita

How is the Happiness Score is distributed?

Top 6 Countries with high GDP (Economy)

2. Perceptions of corruption

Top 6 Countries with high Perceptions of corruption in the year 2020

3. Healthy life expectancy

Top 6 Countries with high Healthy life expectancy in the year 2020

4. Social Support

Top 6 Countries with high Social Support in year 2020

Freedom to make life choices

The top 6 countries with high freedom to make life choices

6. Generosity

The top 6 countries with high generosity

How one feature is related to another feature?

Multiple Linear Regression in Python

Fit and summary:

Testing the model

Share on

You May Also Enjoy

Weather-Introduction-to-programming

Gettysburg-Introduction-to-programming

Create Optimal Hotel Recommendations

Airline Safety