Christina Campbell

Final Tutorial: What is the Impact of Land Use and Forestry on GHG Emissions in the US?

Import the datasets

For my tutorial, I am interested in exploring Climate Change data. With the recent Climate Change UN Summit, the severity and pressing need for change has never been more apparent. I want to explore some of the datasets that are available and see how they coincide with or challenge the information that is being displayed to the public.

The first data set that I am exploring displays the total GHG emissions produced by state, and also breaks down these emissions by industry. The data set also provides information on each state, such as their total population, GDP, and total energy use. I found this dataset on the World Resources Institute(http://datasets.wri.org), developed by CAIT (Climate Analysis Indicators Tool). With this data set, I want to examine what industry is having the largest impact on the GHG emissions of each region of the US and see if these results are expected or surprising.

In [523]:
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
%matplotlib inline
import numpy as np
import pandas as pd
data=pd.read_excel("/Users/christinacampbell/Downloads/cait2.0u.s.statesghgemissions-alldata.xlsx",sheet_name="State GHG Emissions")
data.columns=['State',"Year","Total GHG Emissions Excluding LUCF (MtCO2e)","Total GHG Emissions Including LUCF (MtCO2e","Total CO2 (excluding LUCF) (MtCO2e)","Total CH4 (MtCO2e)","Total N2O (MtCO2e)","Total F-Gas (MtCO2e)","Energy (MtCO2e)","Industrial Processes (MtCO2e)","Agriculture (MtCO2e)","Waste (MtCO2e)","Land Use and Forestry (MtCO2e)","Bunker Fuels (MtCO2e)","Electric Power (MtCO2e)","Commercial (MtCO2e)","Residential (MtCO2e","Industrial (MtCO2e","Transportation (MtCO2e)","Fugitive Emissions (MtCO2e)","State GDP (Million US$ (chained 1997/2005))","Population (People)","Total Energy Use (Thous. tonnes oil eq. (ktoe))"]
data=pd.DataFrame(data,columns=['State',"Year","Total GHG Emissions Excluding LUCF (MtCO2e)","Total GHG Emissions Including LUCF (MtCO2e","Total CO2 (excluding LUCF) (MtCO2e)","Total CH4 (MtCO2e)","Total N2O (MtCO2e)","Total F-Gas (MtCO2e)","Energy (MtCO2e)","Industrial Processes (MtCO2e)","Agriculture (MtCO2e)","Waste (MtCO2e)","Land Use and Forestry (MtCO2e)","Bunker Fuels (MtCO2e)","Electric Power (MtCO2e)","Commercial (MtCO2e)","Residential (MtCO2e","Industrial (MtCO2e","Transportation (MtCO2e)","Fugitive Emissions (MtCO2e)","State GDP (Million US$ (chained 1997/2005))","Population (People)","Total Energy Use (Thous. tonnes oil eq. (ktoe))"])
data=data.drop(data.index[0])
data=data.drop(data.index[1])
data= data.drop(data.index[0]).reset_index()
data.head()
Out[523]:
index State Year Total GHG Emissions Excluding LUCF (MtCO2e) Total GHG Emissions Including LUCF (MtCO2e Total CO2 (excluding LUCF) (MtCO2e) Total CH4 (MtCO2e) Total N2O (MtCO2e) Total F-Gas (MtCO2e) Energy (MtCO2e) ... Bunker Fuels (MtCO2e) Electric Power (MtCO2e) Commercial (MtCO2e) Residential (MtCO2e Industrial (MtCO2e Transportation (MtCO2e) Fugitive Emissions (MtCO2e) State GDP (Million US$ (chained 1997/2005)) Population (People) Total Energy Use (Thous. tonnes oil eq. (ktoe))
0 3 Alabama 1990 137.889 81.3057 112.129 20.3764 4.78913 0.594547 125.397 ... 0 50.5856 2.45053 3.21186 25.4329 29.575 14.1415 83766 4050055 41971
1 4 Alaska 1990 45.4933 14.1738 35.8776 9.30383 0.269331 0.0425567 44.024 ... 0 2.61091 2.20804 1.60736 15.8435 12.2986 9.45562 28772 553290 14718.6
2 5 Arizona 1990 70.3445 77.2725 63.2755 3.45763 2.81589 0.795469 64.5135 ... 0 32.6838 1.91184 1.89047 3.88477 24.0605 0.0821269 81606 3684097 23661.8
3 6 Arkansas 1990 66.9749 21.0251 52.1941 7.73925 6.7694 0.272173 52.9544 ... 0 21.699 1.628 2.55221 9.47997 16.8803 0.714961 44496 2356586 21569.3
4 7 California 1990 420.741 391.51 368.596 30.7299 18.7054 2.70968 378.354 ... 0 40.3034 19.0688 30.0707 71.9105 212.485 4.5154 906103 29959515 187616

5 rows × 24 columns

In [524]:
data.shape
Out[524]:
(1144, 24)

The second data set that I am considering is also from the World Resources Institute, and was developed by CAIT. This data set shows GHG emissions (with and without land-use change and forestry), as well as fossil fuel emissions for almost every country. This dataset also includes data on each countries GDP and population. The unique aspect of this data set is that it includes predictive data, that extends all the way into 2050. The dataset predicts not only where GHG emissions will be if each country continues down their current path, but also predicts where the country would be if they followed a policy scenario to change their behavior. The policies that are chosen are described in the dataset, and are based off of proposed policies within the countries.

I will use this dataset to compare the GHG emissions of each region to the total GHG emissions of the US. I will first import the data, then I will drop the unessary columns and change the column names approriately.

In [525]:
data2=pd.read_excel("/Users/christinacampbell/Downloads/caitprojectionsalldata4-9-150 (1).xlsx",sheet_name="GHG Emissions Data")
data2=data2.drop(['Unnamed: 1','Unnamed: 2','Unnamed: 6','Unnamed: 7','Unnamed: 8','Unnamed: 9'],axis=1)
data2.columns=['region','Year','Total GHG Emissions Excluding LUCF (MtCO2e)','Total GHG Emissions Including LUCF (MtCO2e']
data2=pd.DataFrame(data2,columns=['region','Year','Total GHG Emissions Including LUCF (MtCO2e','Total GHG Emissions Excluding LUCF (MtCO2e)'])
data2=data2.drop([0])
In [526]:
data2.head()
Out[526]:
region Year Total GHG Emissions Including LUCF (MtCO2e Total GHG Emissions Excluding LUCF (MtCO2e)
1 Australia 1990 416.1 516.74
2 Australia 1995 439.48 470.59
3 Australia 2000 494.27 529.64
4 Australia 2005 527.76 572.43
5 Australia 2010 542.69 558.49

Now that I have the data in the format that I want, I wil drop all data that is not for the US, and all values of NA.

In [527]:
data2=data2[data2['region']=='United States']
In [528]:
data2=data2.dropna()
In [529]:
data2
Out[529]:
region Year Total GHG Emissions Including LUCF (MtCO2e Total GHG Emissions Excluding LUCF (MtCO2e)
104 United States 2000 7076 6395
105 United States 2005 7195 6197
106 United States 2010 6812 5923
107 United States 2011 6702 5797
108 United States 2015 6643 5759
109 United States 2020 6815 5918
110 United States 2025 6967 6050
111 United States 2030 7041 6104

Note that in both of these datasets, there are recorded GHG Emissions Including and Excluding LUCF, and that the values including LUCF are less than the values of Excluding LUCF in both datasets. This is because LUCF involves more plants and trees, which naturally process CO2. Thus, LUCF decreases the GHG Emissions. I want to explore more on just how much of an impact LUCF has on each region of the US, and the US as a whole.

Process and Tidy Dataset 1

For my tutorial, I will be focusing primarily on comparing the emmissions from different industries in different regions, but I want to keep this information more generalized in order to see the bigger picture. Thus, I can drop the columns that show data about sub-sectors.

In [530]:
data=data.drop(['Electric Power (MtCO2e)','Commercial (MtCO2e)','Residential (MtCO2e','Industrial (MtCO2e','Transportation (MtCO2e)','Fugitive Emissions (MtCO2e)'],axis=1)
data.head()
Out[530]:
index State Year Total GHG Emissions Excluding LUCF (MtCO2e) Total GHG Emissions Including LUCF (MtCO2e Total CO2 (excluding LUCF) (MtCO2e) Total CH4 (MtCO2e) Total N2O (MtCO2e) Total F-Gas (MtCO2e) Energy (MtCO2e) Industrial Processes (MtCO2e) Agriculture (MtCO2e) Waste (MtCO2e) Land Use and Forestry (MtCO2e) Bunker Fuels (MtCO2e) State GDP (Million US$ (chained 1997/2005)) Population (People) Total Energy Use (Thous. tonnes oil eq. (ktoe))
0 3 Alabama 1990 137.889 81.3057 112.129 20.3764 4.78913 0.594547 125.397 3.51822 5.6312 3.34175 -56.5829 0 83766 4050055 41971
1 4 Alaska 1990 45.4933 14.1738 35.8776 9.30383 0.269331 0.0425567 44.024 1.09924 0.0462487 0.323831 -31.3195 0 28772 553290 14718.6
2 5 Arizona 1990 70.3445 77.2725 63.2755 3.45763 2.81589 0.795469 64.5135 1.12535 3.38128 1.32434 6.92798 0 81606 3684097 23661.8
3 6 Arkansas 1990 66.9749 21.0251 52.1941 7.73925 6.7694 0.272173 52.9544 1.18757 11.2791 1.5538 -45.9498 0 44496 2356586 21569.3
4 7 California 1990 420.741 391.51 368.596 30.7299 18.7054 2.70968 378.354 7.88305 21.4835 13.0199 -29.2304 0 906103 29959515 187616

I will now make the data more tidy, by combining the data that expresses the specific sector emmissions, and the specific gas type emmissions into one column. I will do this using the melt method.

In [531]:
id_vars=['index','State','Year','Total GHG Emissions Excluding LUCF (MtCO2e)','Total GHG Emissions Including LUCF (MtCO2e','Total CO2 (excluding LUCF) (MtCO2e)','Total CH4 (MtCO2e)','Total N2O (MtCO2e)','Total F-Gas (MtCO2e)',
         'State GDP (Million US$ (chained 1997/2005))','Population (People)','Total Energy Use (Thous. tonnes oil eq. (ktoe))'
]
data=pd.melt(frame=data,
             id_vars=id_vars,
             var_name="sector",
            value_name='MtCO2e emmisions by sector')
data.head()
Out[531]:
index State Year Total GHG Emissions Excluding LUCF (MtCO2e) Total GHG Emissions Including LUCF (MtCO2e Total CO2 (excluding LUCF) (MtCO2e) Total CH4 (MtCO2e) Total N2O (MtCO2e) Total F-Gas (MtCO2e) State GDP (Million US$ (chained 1997/2005)) Population (People) Total Energy Use (Thous. tonnes oil eq. (ktoe)) sector MtCO2e emmisions by sector
0 3 Alabama 1990 137.889 81.3057 112.129 20.3764 4.78913 0.594547 83766 4050055 41971 Energy (MtCO2e) 125.397
1 4 Alaska 1990 45.4933 14.1738 35.8776 9.30383 0.269331 0.0425567 28772 553290 14718.6 Energy (MtCO2e) 44.024
2 5 Arizona 1990 70.3445 77.2725 63.2755 3.45763 2.81589 0.795469 81606 3684097 23661.8 Energy (MtCO2e) 64.5135
3 6 Arkansas 1990 66.9749 21.0251 52.1941 7.73925 6.7694 0.272173 44496 2356586 21569.3 Energy (MtCO2e) 52.9544
4 7 California 1990 420.741 391.51 368.596 30.7299 18.7054 2.70968 906103 29959515 187616 Energy (MtCO2e) 378.354
In [532]:
id_vars2=['index','State','Year','Total GHG Emissions Excluding LUCF (MtCO2e)','Total GHG Emissions Including LUCF (MtCO2e','sector','MtCO2e emmisions by sector',
         'State GDP (Million US$ (chained 1997/2005))','Population (People)','Total Energy Use (Thous. tonnes oil eq. (ktoe))']
data=pd.melt(frame=data,
             id_vars=id_vars2,
             var_name="gas",
            value_name='total emmisions by gas')
In [533]:
data.head()
Out[533]:
index State Year Total GHG Emissions Excluding LUCF (MtCO2e) Total GHG Emissions Including LUCF (MtCO2e sector MtCO2e emmisions by sector State GDP (Million US$ (chained 1997/2005)) Population (People) Total Energy Use (Thous. tonnes oil eq. (ktoe)) gas total emmisions by gas
0 3 Alabama 1990 137.889 81.3057 Energy (MtCO2e) 125.397 83766 4050055 41971 Total CO2 (excluding LUCF) (MtCO2e) 112.129
1 4 Alaska 1990 45.4933 14.1738 Energy (MtCO2e) 44.024 28772 553290 14718.6 Total CO2 (excluding LUCF) (MtCO2e) 35.8776
2 5 Arizona 1990 70.3445 77.2725 Energy (MtCO2e) 64.5135 81606 3684097 23661.8 Total CO2 (excluding LUCF) (MtCO2e) 63.2755
3 6 Arkansas 1990 66.9749 21.0251 Energy (MtCO2e) 52.9544 44496 2356586 21569.3 Total CO2 (excluding LUCF) (MtCO2e) 52.1941
4 7 California 1990 420.741 391.51 Energy (MtCO2e) 378.354 906103 29959515 187616 Total CO2 (excluding LUCF) (MtCO2e) 368.596

Next, I will convert the data types of the columns appropriatly.

In [534]:
data['Total GHG Emissions Excluding LUCF (MtCO2e)']=pd.to_numeric(data['Total GHG Emissions Excluding LUCF (MtCO2e)'])
data['Total GHG Emissions Including LUCF (MtCO2e']=pd.to_numeric(data['Total GHG Emissions Including LUCF (MtCO2e'])
data['MtCO2e emmisions by sector']=pd.to_numeric(data['MtCO2e emmisions by sector'])
data['total emmisions by gas']=pd.to_numeric(data['total emmisions by gas'])
data['State GDP (Million US$ (chained 1997/2005))']=pd.to_numeric(data['State GDP (Million US$ (chained 1997/2005))'])
data['Population (People)']=pd.to_numeric(data['Population (People)'])
data['Total Energy Use (Thous. tonnes oil eq. (ktoe))']=pd.to_numeric(data['Total Energy Use (Thous. tonnes oil eq. (ktoe))'])
data['Year']=pd.to_numeric(data['Year'])
data=data.drop('index',axis=1)

Turn the State Data into Region Data

Instead of analyzing the data by state, we want to analyze it by regions. in order to do this, I will separate and group the data into regions based off of their states. I created a separate dataframe for each region, and then recombined them once I create and assign a column called "region".

In [535]:
Northeast=data[(data['State']=="Main") |(data['State']=="New York" )| (data['State']== "New Jersey") | (data['State']=="Vermont" )| (data['State']=="Massachusets") | (data['State']=="Rhode Island") | (data['State']=="Conneticut" )| (data['State']=="New Hampshire") | (data['State']=="Pennsylvania")|(data['State']=="Maryland")] .reset_index()
Northeast['region']='northeast'
Northeast=Northeast.drop(['State'],axis=1)
Northeast.head()
Out[535]:
index Year Total GHG Emissions Excluding LUCF (MtCO2e) Total GHG Emissions Including LUCF (MtCO2e sector MtCO2e emmisions by sector State GDP (Million US$ (chained 1997/2005)) Population (People) Total Energy Use (Thous. tonnes oil eq. (ktoe)) gas total emmisions by gas region
0 20 1990 79.905796 77.408280 Energy (MtCO2e) 72.485691 136165 4799770 31977.45708 Total CO2 (excluding LUCF) (MtCO2e) 71.931167 northeast
1 29 1990 15.737613 10.902850 Energy (MtCO2e) 15.021075 27259 1112384 6739.75292 Total CO2 (excluding LUCF) (MtCO2e) 14.681697 northeast
2 30 1990 120.199795 109.795573 Energy (MtCO2e) 113.397511 253360 7762963 56593.73967 Total CO2 (excluding LUCF) (MtCO2e) 111.462115 northeast
3 32 1990 233.803453 220.332967 Energy (MtCO2e) 212.715571 595908 18020784 94132.16384 Total CO2 (excluding LUCF) (MtCO2e) 209.793924 northeast
4 38 1990 298.480702 275.367345 Energy (MtCO2e) 281.150013 288955 11903299 91627.90546 Total CO2 (excluding LUCF) (MtCO2e) 268.946929 northeast
In [536]:
Southeast=data[(data['State']=="Alabama") |(data['State']=="Florida" )| (data['State']== "Georgia") | (data['State']=="Kentucky" )| (data['State']=="Mississippi") | (data['State']=="North Carolina") | (data['State']=="South Carolina" )| (data['State']=="Tennessee") |(data['State']=="Virginia")|(data['State']=="West Virginia")|(data['State']=="Arkansas")|(data['State']=="Louisiana")|(data['State']=="Delaware")] .reset_index()
Southeast['region']='southeast'
Southeast=Southeast.drop(['State'],axis=1)
Southeast.head()
Out[536]:
index Year Total GHG Emissions Excluding LUCF (MtCO2e) Total GHG Emissions Including LUCF (MtCO2e sector MtCO2e emmisions by sector State GDP (Million US$ (chained 1997/2005)) Population (People) Total Energy Use (Thous. tonnes oil eq. (ktoe)) gas total emmisions by gas region
0 0 1990 137.888617 81.305733 Energy (MtCO2e) 125.397445 83766 4050055 41970.979410 Total CO2 (excluding LUCF) (MtCO2e) 112.128509 southeast
1 3 1990 66.974876 21.025058 Energy (MtCO2e) 52.954443 44496 2356586 21569.300910 Total CO2 (excluding LUCF) (MtCO2e) 52.194051 southeast
2 7 1990 19.042026 19.320112 Energy (MtCO2e) 17.986728 25887 669567 6437.055599 Total CO2 (excluding LUCF) (MtCO2e) 17.844128 southeast
3 9 1990 210.033368 219.555860 Energy (MtCO2e) 191.339682 304324 13033307 82688.053050 Total CO2 (excluding LUCF) (MtCO2e) 189.418217 southeast
4 10 1990 151.334129 108.539070 Energy (MtCO2e) 141.517374 165062 6512602 56187.598080 Total CO2 (excluding LUCF) (MtCO2e) 139.392909 southeast
In [537]:
Northwest=data[(data['State']=="Oregon") |(data['State']=="Washington" )| (data['State']== "Idaho") | (data['State']=="Wyoming" )| (data['State']=="Montana")] .reset_index()
Northwest['region']='northwest'
Northwest=Northwest.drop(['State'],axis=1)
Northwest.head()
Out[537]:
index Year Total GHG Emissions Excluding LUCF (MtCO2e) Total GHG Emissions Including LUCF (MtCO2e sector MtCO2e emmisions by sector State GDP (Million US$ (chained 1997/2005)) Population (People) Total Energy Use (Thous. tonnes oil eq. (ktoe)) gas total emmisions by gas region
0 12 1990 18.915797 -16.207316 Energy (MtCO2e) 11.816223 20161 1012384 10216.261390 Total CO2 (excluding LUCF) (MtCO2e) 11.772727 northwest
1 26 1990 37.961805 -31.020367 Energy (MtCO2e) 30.340779 15355 800204 8758.138255 Total CO2 (excluding LUCF) (MtCO2e) 28.695920 northwest
2 37 1990 39.449523 -48.112014 Energy (MtCO2e) 32.207869 64881 2860375 24650.226420 Total CO2 (excluding LUCF) (MtCO2e) 31.282144 northwest
3 47 1990 85.396670 43.970095 Energy (MtCO2e) 74.228496 141946 4903043 51571.867120 Total CO2 (excluding LUCF) (MtCO2e) 72.540246 northwest
4 50 1990 72.035105 57.527740 Energy (MtCO2e) 69.728641 13673 453690 10065.492320 Total CO2 (excluding LUCF) (MtCO2e) 61.405931 northwest
In [538]:
Midwest=data[(data['State']=="Illinois") |(data['State']=="Indiana" )| (data['State']== "Iowa") | (data['State']=="Kansas" )| (data['State']=="Michigan")|(data['State']=="Minnesota")|(data['State']=="Missouri")|(data['State']=="North Dakota")|(data['State']=="South Dekota")|(data['State']=="Ohio")|(data['State']=="Wisconsin")] .reset_index()
Midwest['region']='midwest'
Midwest=Midwest.drop(['State'],axis=1)
Midwest.head()
Out[538]:
index Year Total GHG Emissions Excluding LUCF (MtCO2e) Total GHG Emissions Including LUCF (MtCO2e sector MtCO2e emmisions by sector State GDP (Million US$ (chained 1997/2005)) Population (People) Total Energy Use (Thous. tonnes oil eq. (ktoe)) gas total emmisions by gas region
0 13 1990 234.264978 224.283614 Energy (MtCO2e) 201.291462 324012 11453316 90244.07068 Total CO2 (excluding LUCF) (MtCO2e) 195.496541 midwest
1 14 1990 227.977405 214.146802 Energy (MtCO2e) 208.930041 128197 5557798 62807.60261 Total CO2 (excluding LUCF) (MtCO2e) 207.404311 midwest
2 15 1990 96.491244 88.053097 Energy (MtCO2e) 65.009762 63460 2781018 23835.54929 Total CO2 (excluding LUCF) (MtCO2e) 65.607835 midwest
3 16 1990 96.087576 85.876650 Energy (MtCO2e) 74.657781 60672 2481349 26874.24030 Total CO2 (excluding LUCF) (MtCO2e) 72.081673 midwest
4 22 1990 202.554763 148.267445 Energy (MtCO2e) 183.520225 229534 9311319 72426.38203 Total CO2 (excluding LUCF) (MtCO2e) 181.947154 midwest
In [539]:
Southwest=data[(data['State']=="New Mexico")|(data['State']=="Arizona")|(data['State']=="Texas")|(data['State']=="Oklahoma")] .reset_index()
Southwest['region']='southwest'
Southwest=Southwest.drop(['State'],axis=1)
Southwest.head()
Out[539]:
index Year Total GHG Emissions Excluding LUCF (MtCO2e) Total GHG Emissions Including LUCF (MtCO2e sector MtCO2e emmisions by sector State GDP (Million US$ (chained 1997/2005)) Population (People) Total Energy Use (Thous. tonnes oil eq. (ktoe)) gas total emmisions by gas region
0 2 1990 70.344474 77.272454 Energy (MtCO2e) 64.513508 81606 3684097 23661.77300 Total CO2 (excluding LUCF) (MtCO2e) 63.275484 southwest
1 31 1990 62.767553 57.307337 Energy (MtCO2e) 57.713747 28714 1521574 15030.26262 Total CO2 (excluding LUCF) (MtCO2e) 53.021372 southwest
2 36 1990 115.379625 106.404779 Energy (MtCO2e) 96.481862 67083 3148825 34847.15973 Total CO2 (excluding LUCF) (MtCO2e) 91.800397 southwest
3 43 1990 691.978385 676.913922 Energy (MtCO2e) 633.980408 437292 17056755 259690.58890 Total CO2 (excluding LUCF) (MtCO2e) 612.285944 southwest
4 54 1991 71.754660 78.681823 Energy (MtCO2e) 65.373325 81946 3788576 23807.09896 Total CO2 (excluding LUCF) (MtCO2e) 64.642897 southwest
In [540]:
region_data=Northeast.merge(Southeast, how='outer')
region_data=region_data.merge(Northwest, how='outer')
region_data=region_data.merge(Midwest, how='outer')
region_data=region_data.merge(Southwest, how='outer')
region_data=region_data.drop(['index'],axis=1)
In [541]:
region_data.shape
Out[541]:
(20592, 11)

Now, I want to aggregate the data of the numerical columns, so that there is one row for each combination of region, year, sector, and gas emmission type. I will use the sum aggregation function in order to get the total emissions for each region.

In [542]:
region_data=region_data.groupby(['region','Year','sector','gas']).agg({'Total GHG Emissions Excluding LUCF (MtCO2e)':'sum','Total GHG Emissions Including LUCF (MtCO2e':'sum','MtCO2e emmisions by sector':'sum','State GDP (Million US$ (chained 1997/2005))':'sum','Population (People)':'sum','Total Energy Use (Thous. tonnes oil eq. (ktoe))':'sum','total emmisions by gas':'sum'})
region_data=region_data.reset_index()
region_data.head()
Out[542]:
region Year sector gas Total GHG Emissions Excluding LUCF (MtCO2e) Total GHG Emissions Including LUCF (MtCO2e MtCO2e emmisions by sector State GDP (Million US$ (chained 1997/2005)) Population (People) Total Energy Use (Thous. tonnes oil eq. (ktoe)) total emmisions by gas
0 midwest 1990 Agriculture (MtCO2e) Total CH4 (MtCO2e) 1521.534389 1377.340495 146.50876 1441985 57509946 489191.907914 115.472189
1 midwest 1990 Agriculture (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1521.534389 1377.340495 146.50876 1441985 57509946 489191.907914 1283.189128
2 midwest 1990 Agriculture (MtCO2e) Total F-Gas (MtCO2e) 1521.534389 1377.340495 146.50876 1441985 57509946 489191.907914 7.603820
3 midwest 1990 Agriculture (MtCO2e) Total N2O (MtCO2e) 1521.534389 1377.340495 146.50876 1441985 57509946 489191.907914 115.269251
4 midwest 1990 Bunker Fuels (MtCO2e) Total CH4 (MtCO2e) 1521.534389 1377.340495 0.00000 1441985 57509946 489191.907914 115.472189

Analysis

Question 1: What is the impact of LUCF on each region?

First, I want to compare the emissions with and without LUCF in each region.

In [543]:
fig, ax = plt.subplots(1, 5, figsize=(30,10))
groups=region_data.groupby('region')
groups.get_group('midwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax[0], label='Excuding',title='midwest')
groups.get_group('midwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax[0], label='Including')
groups.get_group('northeast').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax[1], label='Excuding',title='northeast')
groups.get_group('northeast').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax[1], label='Including')
groups.get_group('southeast').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax[2], label='Excuding',title='southeast')
groups.get_group('southeast').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax[2], label='Including')
groups.get_group('northwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax[3], label='Excuding',title='northwest')
groups.get_group('northwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax[3], label='Including')
groups.get_group('southwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax[4], label='Excuding',title='southwest')
groups.get_group('southwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax[4], label='Including')
Out[543]:
<matplotlib.axes._subplots.AxesSubplot at 0x12c287160>

As you can see, in each region the GHG emissions recordings including LUCF are significantly less than the recordings of GHG emissions without LUCF. This is as expected, since we saw this trend in the data and due to the consumption of CO2 by plants.

Next, I want to compare the GHG emissions of the regions to each other.

The line graph below shows a comparison of the Total GHG Emissions Including LUCF of the 5 different regions.

In [544]:
import matplotlib.pyplot as plt
ax=plt.gca()
groups=region_data.groupby('region')
groups.get_group('midwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax, label='midwest')
groups.get_group('northeast').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax,label='northeast')
groups.get_group('southeast').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax,label='southeast')
groups.get_group('northwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax, label='northwest')
groups.get_group('southwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax,label='southwest',title='Total GHG Emissions Excluding LUCF of the 5 different regions')
Out[544]:
<matplotlib.axes._subplots.AxesSubplot at 0x1351cb6d8>

The line graph below shows a comparison of the Total GHG Emissions Including LUCF of the 5 different regions.

In [545]:
ax=plt.gca()
groups=region_data.groupby('region')
groups.get_group('midwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax, label='midwest')
groups.get_group('northeast').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax,label='northeast')
groups.get_group('southeast').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax,label='southeast')
groups.get_group('northwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax, label='northwest')
groups.get_group('southwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax,label='southwest',title='Total GHG Emissions Including LUCF of the 5 different regions')



    
Out[545]:
<matplotlib.axes._subplots.AxesSubplot at 0x12c2b8da0>

As you can see from these two graphs, although LUCF does help reduce the GHG emissions in each region, LUCF does not seem to change the order of regions from most GHG emissions to least. It appears that the impact of LUCF is not large enough in any specific region to completely change the ordering.

Question 2: What industries are contributing the most to the GHG emissions in each region?

Now, that we have done some analysis on the total GHG emissions in each region, I want to evaluate what industry sectors are having the largest effect on each region.

In [546]:
region_Year_sector=region_data.drop(["gas","Total GHG Emissions Excluding LUCF (MtCO2e)","Total GHG Emissions Including LUCF (MtCO2e","State GDP (Million US$ (chained 1997/2005))","Population (People)","Total Energy Use (Thous. tonnes oil eq. (ktoe))","total emmisions by gas"],axis=1)
region_Year_sector=region_Year_sector.groupby(['region','sector']).agg({'MtCO2e emmisions by sector':'mean'}).reset_index()
region_Year_sector.head()
Out[546]:
region sector MtCO2e emmisions by sector
0 midwest Agriculture (MtCO2e) 159.680978
1 midwest Bunker Fuels (MtCO2e) 0.000982
2 midwest Energy (MtCO2e) 1431.448447
3 midwest Industrial Processes (MtCO2e) 65.004013
4 midwest Land Use and Forestry (MtCO2e) -161.851542
In [547]:
region_Year_sector.pivot_table(index='region',columns='sector').plot.bar(stacked=True,figsize=(20,10), title='Sector Emissions by Region')
Out[547]:
<matplotlib.axes._subplots.AxesSubplot at 0x12c255d68>

As you can see, Land Use and Forestry is contributing negatively to the GHG Emissions. As we discussed above this is because where there is land and forests,there are trees which naturally consume CO2 out of the air (the main gas that contributes to Greenhouse Gases). Lets look at this same plot without Land Use and Forestry contributing, in order to accurately analyze what industry sector is contributing the most to the GHG emissions. By removing the LUCF data, we will be able to compare all of the regions from a starting point of 0.

It is also clear that Energy is substantially the largest source of the emissions in each region. Since we are trying to analyze the differences in each region, lets look at this same plot without Energy contributing as well. This will let us see what industries are uniquely having impacts in the different regions.

In [548]:
region_Year_sector[(region_Year_sector['sector']!= 'Land Use and Forestry (MtCO2e)') &(region_Year_sector['sector']!= 'Energy (MtCO2e)')].pivot_table(index='region',columns='sector').plot.bar(stacked=True,figsize=(20,10), title='Sector Emissions by Region without Energy and LUCF')
Out[548]:
<matplotlib.axes._subplots.AxesSubplot at 0x12f3dc240>

We can now more clearly see what industries are impacting each region. For the midwest, northwest, and southwest, the largest contributor is Agriculture while in the northeast Industrial processes is the main contributors. The southeast appears to have an almost even amount of emissions coming from Industrial processes and Agriculture. Something important to note is that the regions with a large contribution of emissions from Agriculture (midwest and southeast) have the higher amount of emissions in general. This is an idea we are going to explore further below.

Below is a different form of a graph in order to visualize the comparisons of emissions by industry sector in the different regions. This will allows us to see what industry is consistently producing the most emissions across the different regions.

In [549]:
region_data.groupby(['sector','region']).median().plot.bar(y='MtCO2e emmisions by sector',figsize=(20,10),fontsize=20,stacked=True)
Out[549]:
<matplotlib.axes._subplots.AxesSubplot at 0x128d54be0>

From the graphs above, you can tell that certain regions have a lot higher sector emmissions than others. You can also tell that the sector that is effecting each region the most varies. More interestingly, it appears that regions that are thought to have a larger population, such as the Northeast, appear to be producing substantially less GHG emissions than more sparsely populated regions such as the midwest. This is not a result that I would have expected.

Question 3: Is there a correlation between population and Total GHG emissions in each region?

I now want to see if there is a correlation between the Total GHG Emissions Excluding LUCF and total population, in order to determine if the population size has any relation to the GHG emissions in each region. I decided to use the data excluding LUCF in order to analyze all of the GHG emissions being produced, without some being processed by plants. My hypothesis is that there will be a strong positive correlation.

In [550]:
from scipy import stats 

fig, ax = plt.subplots(1, 5, figsize=(20,5))
regions= region_data.region.unique()
i=0
for r in regions:
    data=region_data[region_data.region==r]    
    data.plot.scatter(x='Population (People)', y='Total GHG Emissions Excluding LUCF (MtCO2e)', ax=ax[i],title=r)
    slope, intercept, r_value, p_value, std_err = stats.linregress(data['Population (People)'],data['Total GHG Emissions Excluding LUCF (MtCO2e)'])
    line = slope *data['Population (People)']  + intercept
    ax[i].plot(data['Population (People)'], line, lw=1, ls='--', color='red')
    i+=1

As you can see, there does appear to be a positive correlation between population and Total GHG Emissions Excluding LUCF, as I have expected. Interestingly, however, this correlation is higher in regions such as the midwest that have a smaller population than areas such as the northeast. Since there is a correlation, I want to normalize the GHG emissions data by population, in order to better compare the emissions in each region.

I will first make a new dataframe with the region data normalized, and then will provide some of the same graphs as above for interpretation and comparison.

In [551]:
 norm=region_data[['Total GHG Emissions Excluding LUCF (MtCO2e)','Total GHG Emissions Including LUCF (MtCO2e','MtCO2e emmisions by sector']].div(region_data['Population (People)'],axis=0)
region_data_norm=region_data[['region','Year','sector','gas','Population (People)']]
region_data_norm['Total GHG Emissions Excluding LUCF (MtCO2e)']=norm['Total GHG Emissions Excluding LUCF (MtCO2e)']
region_data_norm['Total GHG Emissions Including LUCF (MtCO2e']=norm['Total GHG Emissions Including LUCF (MtCO2e']
region_data_norm['MtCO2e emmisions by sector']=norm['MtCO2e emmisions by sector']
region_data_norm.head()
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """
Out[551]:
region Year sector gas Population (People) Total GHG Emissions Excluding LUCF (MtCO2e) Total GHG Emissions Including LUCF (MtCO2e MtCO2e emmisions by sector
0 midwest 1990 Agriculture (MtCO2e) Total CH4 (MtCO2e) 57509946 0.000026 0.000024 0.000003
1 midwest 1990 Agriculture (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 57509946 0.000026 0.000024 0.000003
2 midwest 1990 Agriculture (MtCO2e) Total F-Gas (MtCO2e) 57509946 0.000026 0.000024 0.000003
3 midwest 1990 Agriculture (MtCO2e) Total N2O (MtCO2e) 57509946 0.000026 0.000024 0.000003
4 midwest 1990 Bunker Fuels (MtCO2e) Total CH4 (MtCO2e) 57509946 0.000026 0.000024 0.000000
In [552]:
ax=plt.gca()
groups=region_data_norm.groupby('region')
groups.get_group('midwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax, label='midwest')
groups.get_group('northeast').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax,label='northeast')
groups.get_group('southeast').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax,label='southeast')
groups.get_group('northwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax, label='northwest')
groups.get_group('southwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax,label='southwest',title='Total GHG Emissions Excluding LUCF of the 5 different regions')
Out[552]:
<matplotlib.axes._subplots.AxesSubplot at 0x12f477ef0>
In [553]:
ax=plt.gca()
groups=region_data_norm.groupby('region')
groups.get_group('midwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax, label='midwest')
groups.get_group('northeast').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax,label='northeast')
groups.get_group('southeast').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax,label='southeast')
groups.get_group('northwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax, label='northwest')
groups.get_group('southwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax,label='southwest',title='Total GHG Emissions Including LUCF of the 5 different regions')
Out[553]:
<matplotlib.axes._subplots.AxesSubplot at 0x12f4cc668>

As you can see, this did provide some rearrangement of the ranking of the regions, however did not change the fact that the North East and the North West have the lowest GHG emissions. Again, this is interesting considering the northeast is considered the most populated region.

In [554]:
region_Year_sector_norm=region_data_norm.drop(["gas","Total GHG Emissions Excluding LUCF (MtCO2e)","Total GHG Emissions Including LUCF (MtCO2e","Population (People)"],axis=1)
region_Year_sector_norm=region_Year_sector_norm.groupby(['region','sector']).agg({'MtCO2e emmisions by sector':'mean'}).reset_index()
region_Year_sector.head()
Out[554]:
region sector MtCO2e emmisions by sector
0 midwest Agriculture (MtCO2e) 159.680978
1 midwest Bunker Fuels (MtCO2e) 0.000982
2 midwest Energy (MtCO2e) 1431.448447
3 midwest Industrial Processes (MtCO2e) 65.004013
4 midwest Land Use and Forestry (MtCO2e) -161.851542
In [555]:
region_Year_sector_norm[(region_Year_sector_norm['sector']!= 'Land Use and Forestry (MtCO2e)') &(region_Year_sector_norm['sector']!= 'Energy (MtCO2e)')].pivot_table(index='region',columns='sector').plot.bar(stacked=True,figsize=(20,10), title='Sector Emissions by Region without Energy and LUCF')
Out[555]:
<matplotlib.axes._subplots.AxesSubplot at 0x131c1c048>

As you can tell, the normalization by population helped make it easier to compare the contribution of each industry sector to each region. It also helps depict the fact that throughout the US, Agriculture is producing the most Greenhouse gasses after energy. The impact of agriculture is extremely substantial in almost every region. This goes to show further the impact of how we use land.

Question 4: How does each regions total emissions compare/contribute to the total emissions of the US from dataset 2 (with and without LUCF)

Although this gives us good insight on the emissions in each region, I am curious how each region's emissions would compare to the emissions of the US as a whole. I will use dataset 2 in order to make these comparisons.

I am going to explore how much of an impact Land Use and Forestry has on lessening the Emissions both across the US and in each region. I would expect for the emissions to be drastically lessened in regions such as the Midwest, but not as impacted by LUCF in regions such as the Northeast.I expect for the US emissions overall to not be drastically impacted.

I am also going to look at what proportion of the US's total emissions (found in dataset 2) each region makes up. I am predicting that the Northeast and the southeast will make up majority of the total US emissions.

As we saw above, the region data normalized by population was much more informative. However, I want to analyze how each regions total emissions contribute to the US's total emissions. Thus, I will not be using the normalized data.

In [556]:
region_data['Year'].unique()
Out[556]:
array([1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
       2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011])
In [557]:
data2['Year'].unique()
Out[557]:
array([2000, 2005, 2010, 2011, 2015, 2020, 2025, 2030], dtype=object)

As you can tell, the US data is much more limited in years than the regions data. In order to have consistency, we must restrict both data to the same years: 2000, 2005, 2010, 2011.

In [558]:
data2=data2[data2['Year'].isin([2000,2005,2010,2011])]
region_data=region_data[region_data['Year'].isin([2000,2005,2010,2011])]
region_data
Out[558]:
region Year sector gas Total GHG Emissions Excluding LUCF (MtCO2e) Total GHG Emissions Including LUCF (MtCO2e MtCO2e emmisions by sector State GDP (Million US$ (chained 1997/2005)) Population (People) Total Energy Use (Thous. tonnes oil eq. (ktoe)) total emmisions by gas
240 midwest 2000 Agriculture (MtCO2e) Total CH4 (MtCO2e) 1776.276788 1775.776322 154.792176 2400674 62021767 564074.643871 106.246298
241 midwest 2000 Agriculture (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1776.276788 1775.776322 154.792176 2400674 62021767 564074.643871 1524.668613
242 midwest 2000 Agriculture (MtCO2e) Total F-Gas (MtCO2e) 1776.276788 1775.776322 154.792176 2400674 62021767 564074.643871 21.407679
243 midwest 2000 Agriculture (MtCO2e) Total N2O (MtCO2e) 1776.276788 1775.776322 154.792176 2400674 62021767 564074.643871 123.953615
244 midwest 2000 Bunker Fuels (MtCO2e) Total CH4 (MtCO2e) 1776.276788 1775.776322 0.000583 2400674 62021767 564074.643871 106.246298
245 midwest 2000 Bunker Fuels (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1776.276788 1775.776322 0.000583 2400674 62021767 564074.643871 1524.668613
246 midwest 2000 Bunker Fuels (MtCO2e) Total F-Gas (MtCO2e) 1776.276788 1775.776322 0.000583 2400674 62021767 564074.643871 21.407679
247 midwest 2000 Bunker Fuels (MtCO2e) Total N2O (MtCO2e) 1776.276788 1775.776322 0.000583 2400674 62021767 564074.643871 123.953615
248 midwest 2000 Energy (MtCO2e) Total CH4 (MtCO2e) 1776.276788 1775.776322 1493.056766 2400674 62021767 564074.643871 106.246298
249 midwest 2000 Energy (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1776.276788 1775.776322 1493.056766 2400674 62021767 564074.643871 1524.668613
250 midwest 2000 Energy (MtCO2e) Total F-Gas (MtCO2e) 1776.276788 1775.776322 1493.056766 2400674 62021767 564074.643871 21.407679
251 midwest 2000 Energy (MtCO2e) Total N2O (MtCO2e) 1776.276788 1775.776322 1493.056766 2400674 62021767 564074.643871 123.953615
252 midwest 2000 Industrial Processes (MtCO2e) Total CH4 (MtCO2e) 1776.276788 1775.776322 90.732903 2400674 62021767 564074.643871 106.246298
253 midwest 2000 Industrial Processes (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1776.276788 1775.776322 90.732903 2400674 62021767 564074.643871 1524.668613
254 midwest 2000 Industrial Processes (MtCO2e) Total F-Gas (MtCO2e) 1776.276788 1775.776322 90.732903 2400674 62021767 564074.643871 21.407679
255 midwest 2000 Industrial Processes (MtCO2e) Total N2O (MtCO2e) 1776.276788 1775.776322 90.732903 2400674 62021767 564074.643871 123.953615
256 midwest 2000 Land Use and Forestry (MtCO2e) Total CH4 (MtCO2e) 1776.276788 1775.776322 -0.500467 2400674 62021767 564074.643871 106.246298
257 midwest 2000 Land Use and Forestry (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1776.276788 1775.776322 -0.500467 2400674 62021767 564074.643871 1524.668613
258 midwest 2000 Land Use and Forestry (MtCO2e) Total F-Gas (MtCO2e) 1776.276788 1775.776322 -0.500467 2400674 62021767 564074.643871 21.407679
259 midwest 2000 Land Use and Forestry (MtCO2e) Total N2O (MtCO2e) 1776.276788 1775.776322 -0.500467 2400674 62021767 564074.643871 123.953615
260 midwest 2000 Waste (MtCO2e) Total CH4 (MtCO2e) 1776.276788 1775.776322 37.694360 2400674 62021767 564074.643871 106.246298
261 midwest 2000 Waste (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1776.276788 1775.776322 37.694360 2400674 62021767 564074.643871 1524.668613
262 midwest 2000 Waste (MtCO2e) Total F-Gas (MtCO2e) 1776.276788 1775.776322 37.694360 2400674 62021767 564074.643871 21.407679
263 midwest 2000 Waste (MtCO2e) Total N2O (MtCO2e) 1776.276788 1775.776322 37.694360 2400674 62021767 564074.643871 123.953615
360 midwest 2005 Agriculture (MtCO2e) Total CH4 (MtCO2e) 1805.553503 1480.094927 163.705830 2549646 63214882 575383.962180 109.594592
361 midwest 2005 Agriculture (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1805.553503 1480.094927 163.705830 2549646 63214882 575383.962180 1549.002713
362 midwest 2005 Agriculture (MtCO2e) Total F-Gas (MtCO2e) 1805.553503 1480.094927 163.705830 2549646 63214882 575383.962180 24.585936
363 midwest 2005 Agriculture (MtCO2e) Total N2O (MtCO2e) 1805.553503 1480.094927 163.705830 2549646 63214882 575383.962180 122.364291
364 midwest 2005 Bunker Fuels (MtCO2e) Total CH4 (MtCO2e) 1805.553503 1480.094927 0.005970 2549646 63214882 575383.962180 109.594592
365 midwest 2005 Bunker Fuels (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1805.553503 1480.094927 0.005970 2549646 63214882 575383.962180 1549.002713
... ... ... ... ... ... ... ... ... ... ... ...
2610 southwest 2010 Land Use and Forestry (MtCO2e) Total F-Gas (MtCO2e) 1091.976718 1065.550266 -26.426452 1540986 37477742 393399.016310 16.775904
2611 southwest 2010 Land Use and Forestry (MtCO2e) Total N2O (MtCO2e) 1091.976718 1065.550266 -26.426452 1540986 37477742 393399.016310 30.408323
2612 southwest 2010 Waste (MtCO2e) Total CH4 (MtCO2e) 1091.976718 1065.550266 17.618051 1540986 37477742 393399.016310 113.357431
2613 southwest 2010 Waste (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1091.976718 1065.550266 17.618051 1540986 37477742 393399.016310 931.313836
2614 southwest 2010 Waste (MtCO2e) Total F-Gas (MtCO2e) 1091.976718 1065.550266 17.618051 1540986 37477742 393399.016310 16.775904
2615 southwest 2010 Waste (MtCO2e) Total N2O (MtCO2e) 1091.976718 1065.550266 17.618051 1540986 37477742 393399.016310 30.408323
2616 southwest 2011 Agriculture (MtCO2e) Total CH4 (MtCO2e) 1109.992795 1083.057384 62.185055 1586783 37961930 401194.102400 112.371048
2617 southwest 2011 Agriculture (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1109.992795 1083.057384 62.185055 1586783 37961930 401194.102400 953.870525
2618 southwest 2011 Agriculture (MtCO2e) Total F-Gas (MtCO2e) 1109.992795 1083.057384 62.185055 1586783 37961930 401194.102400 17.505875
2619 southwest 2011 Agriculture (MtCO2e) Total N2O (MtCO2e) 1109.992795 1083.057384 62.185055 1586783 37961930 401194.102400 26.135466
2620 southwest 2011 Bunker Fuels (MtCO2e) Total CH4 (MtCO2e) 1109.992795 1083.057384 0.109881 1586783 37961930 401194.102400 112.371048
2621 southwest 2011 Bunker Fuels (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1109.992795 1083.057384 0.109881 1586783 37961930 401194.102400 953.870525
2622 southwest 2011 Bunker Fuels (MtCO2e) Total F-Gas (MtCO2e) 1109.992795 1083.057384 0.109881 1586783 37961930 401194.102400 17.505875
2623 southwest 2011 Bunker Fuels (MtCO2e) Total N2O (MtCO2e) 1109.992795 1083.057384 0.109881 1586783 37961930 401194.102400 26.135466
2624 southwest 2011 Energy (MtCO2e) Total CH4 (MtCO2e) 1109.992795 1083.057384 998.358392 1586783 37961930 401194.102400 112.371048
2625 southwest 2011 Energy (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1109.992795 1083.057384 998.358392 1586783 37961930 401194.102400 953.870525
2626 southwest 2011 Energy (MtCO2e) Total F-Gas (MtCO2e) 1109.992795 1083.057384 998.358392 1586783 37961930 401194.102400 17.505875
2627 southwest 2011 Energy (MtCO2e) Total N2O (MtCO2e) 1109.992795 1083.057384 998.358392 1586783 37961930 401194.102400 26.135466
2628 southwest 2011 Industrial Processes (MtCO2e) Total CH4 (MtCO2e) 1109.992795 1083.057384 31.682648 1586783 37961930 401194.102400 112.371048
2629 southwest 2011 Industrial Processes (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1109.992795 1083.057384 31.682648 1586783 37961930 401194.102400 953.870525
2630 southwest 2011 Industrial Processes (MtCO2e) Total F-Gas (MtCO2e) 1109.992795 1083.057384 31.682648 1586783 37961930 401194.102400 17.505875
2631 southwest 2011 Industrial Processes (MtCO2e) Total N2O (MtCO2e) 1109.992795 1083.057384 31.682648 1586783 37961930 401194.102400 26.135466
2632 southwest 2011 Land Use and Forestry (MtCO2e) Total CH4 (MtCO2e) 1109.992795 1083.057384 -26.935410 1586783 37961930 401194.102400 112.371048
2633 southwest 2011 Land Use and Forestry (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1109.992795 1083.057384 -26.935410 1586783 37961930 401194.102400 953.870525
2634 southwest 2011 Land Use and Forestry (MtCO2e) Total F-Gas (MtCO2e) 1109.992795 1083.057384 -26.935410 1586783 37961930 401194.102400 17.505875
2635 southwest 2011 Land Use and Forestry (MtCO2e) Total N2O (MtCO2e) 1109.992795 1083.057384 -26.935410 1586783 37961930 401194.102400 26.135466
2636 southwest 2011 Waste (MtCO2e) Total CH4 (MtCO2e) 1109.992795 1083.057384 17.656818 1586783 37961930 401194.102400 112.371048
2637 southwest 2011 Waste (MtCO2e) Total CO2 (excluding LUCF) (MtCO2e) 1109.992795 1083.057384 17.656818 1586783 37961930 401194.102400 953.870525
2638 southwest 2011 Waste (MtCO2e) Total F-Gas (MtCO2e) 1109.992795 1083.057384 17.656818 1586783 37961930 401194.102400 17.505875
2639 southwest 2011 Waste (MtCO2e) Total N2O (MtCO2e) 1109.992795 1083.057384 17.656818 1586783 37961930 401194.102400 26.135466

480 rows × 11 columns

I now will drop unnecessary columns and rows from the region data, and then combine the datasets.

In [559]:
compare_data=region_data[['region','Year','Total GHG Emissions Including LUCF (MtCO2e','Total GHG Emissions Excluding LUCF (MtCO2e)']]
In [560]:
compare_data=compare_data.groupby(['Year','region']).aggregate({'region':'first','Year':'first','Total GHG Emissions Including LUCF (MtCO2e':'first','Total GHG Emissions Excluding LUCF (MtCO2e)':'first'})
compare_data=compare_data.drop(['region','Year'],axis=1).reset_index()
In [561]:
compare_data
Out[561]:
Year region Total GHG Emissions Including LUCF (MtCO2e Total GHG Emissions Excluding LUCF (MtCO2e)
0 2000 midwest 1775.776322 1776.276788
1 2000 northeast 724.801619 820.135892
2 2000 northwest 221.764854 297.531731
3 2000 southeast 1572.632365 1914.664776
4 2000 southwest 1068.345498 1107.075545
5 2005 midwest 1480.094927 1805.553503
6 2005 northeast 750.983038 835.572392
7 2005 northwest 227.935083 303.758915
8 2005 southeast 1537.480821 1955.788158
9 2005 southwest 1073.022533 1109.142535
10 2010 midwest 1446.726251 1717.992532
11 2010 northeast 651.983313 735.310914
12 2010 northwest 233.561254 306.511131
13 2010 southeast 1436.314322 1856.056807
14 2010 southwest 1065.550266 1091.976718
15 2011 midwest 1398.968607 1674.527143
16 2011 northeast 629.700639 713.365860
17 2011 northwest 216.579098 290.518675
18 2011 southeast 1356.455694 1781.478633
19 2011 southwest 1083.057384 1109.992795
In [562]:
data2.dtypes
Out[562]:
region                                         object
Year                                           object
Total GHG Emissions Including LUCF (MtCO2e     object
Total GHG Emissions Excluding LUCF (MtCO2e)    object
dtype: object

Before we can merge, we have to convert the Year and the Total GHG emissions columns into a data types int

In [563]:
data2['Total GHG Emissions Including LUCF (MtCO2e']=pd.to_numeric(data2['Total GHG Emissions Including LUCF (MtCO2e'])
data2['Year']=pd.to_numeric(data2['Year'])
data2['Total GHG Emissions Excluding LUCF (MtCO2e)']=pd.to_numeric(data2['Total GHG Emissions Including LUCF (MtCO2e'])
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
In [564]:
regions_US=compare_data.merge(data2, how="outer")
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/reshape/merge.py:962: UserWarning: You are merging on int and float columns where the float values are not equal to their int representation
  'representation', UserWarning)
In [565]:
regions_US
Out[565]:
Year region Total GHG Emissions Including LUCF (MtCO2e Total GHG Emissions Excluding LUCF (MtCO2e)
0 2000 midwest 1775.776322 1776.276788
1 2000 northeast 724.801619 820.135892
2 2000 northwest 221.764854 297.531731
3 2000 southeast 1572.632365 1914.664776
4 2000 southwest 1068.345498 1107.075545
5 2005 midwest 1480.094927 1805.553503
6 2005 northeast 750.983038 835.572392
7 2005 northwest 227.935083 303.758915
8 2005 southeast 1537.480821 1955.788158
9 2005 southwest 1073.022533 1109.142535
10 2010 midwest 1446.726251 1717.992532
11 2010 northeast 651.983313 735.310914
12 2010 northwest 233.561254 306.511131
13 2010 southeast 1436.314322 1856.056807
14 2010 southwest 1065.550266 1091.976718
15 2011 midwest 1398.968607 1674.527143
16 2011 northeast 629.700639 713.365860
17 2011 northwest 216.579098 290.518675
18 2011 southeast 1356.455694 1781.478633
19 2011 southwest 1083.057384 1109.992795
20 2000 United States 7076.000000 7076.000000
21 2005 United States 7195.000000 7195.000000
22 2010 United States 6812.000000 6812.000000
23 2011 United States 6702.000000 6702.000000

I first want to assess what kind of impact LUCF is having on each region, including the US.

In [566]:
regions_US.groupby(['region']).median().plot.bar(y=['Total GHG Emissions Including LUCF (MtCO2e','Total GHG Emissions Excluding LUCF (MtCO2e)'],figsize=(20,10),fontsize=20, title="US emissions vs. the Regions")
Out[566]:
<matplotlib.axes._subplots.AxesSubplot at 0x131cc8d30>

As you can see,in the regions where Land Use and Forestry is prominent, there is an apparent decrease in the GHG emissions. There is not much of an impact of LUCF in regions such as the southwest and the northwest, where LUCF is not a prominent industry. This is physical evidence of the success of planting trees on removing CO2 emissions!

As I predicted, there was no significant change in the emissions of the US. The midwest and the southeast have the most drastic change, and the southwest has the least drastic change. These results match our previous results regarding the impacts of industries on the GHG Emissions of each region.

I now want to create a pie chart to represent what proportion of US GHG emissions each region made up for each of the available years.

In [567]:
groups=regions_US.groupby('Year')
In [568]:
G1=groups.get_group(2000)
G2=groups.get_group(2005)
G3=groups.get_group(2010)
G4=groups.get_group(2011)
In [569]:
G1
Out[569]:
Year region Total GHG Emissions Including LUCF (MtCO2e Total GHG Emissions Excluding LUCF (MtCO2e)
0 2000 midwest 1775.776322 1776.276788
1 2000 northeast 724.801619 820.135892
2 2000 northwest 221.764854 297.531731
3 2000 southeast 1572.632365 1914.664776
4 2000 southwest 1068.345498 1107.075545
20 2000 United States 7076.000000 7076.000000
In [570]:
US=7076.000000
G1['Total GHG Emissions Including LUCF (MtCO2e']=G1['Total GHG Emissions Including LUCF (MtCO2e']/US
G1['Total GHG Emissions Excluding LUCF (MtCO2e)']=G1['Total GHG Emissions Excluding LUCF (MtCO2e)']/US
                                                                                
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
In [571]:
G1
Out[571]:
Year region Total GHG Emissions Including LUCF (MtCO2e Total GHG Emissions Excluding LUCF (MtCO2e)
0 2000 midwest 0.250958 0.251028
1 2000 northeast 0.102431 0.115904
2 2000 northwest 0.031340 0.042048
3 2000 southeast 0.222249 0.270586
4 2000 southwest 0.150982 0.156455
20 2000 United States 1.000000 1.000000
In [572]:
G2
Out[572]:
Year region Total GHG Emissions Including LUCF (MtCO2e Total GHG Emissions Excluding LUCF (MtCO2e)
5 2005 midwest 1480.094927 1805.553503
6 2005 northeast 750.983038 835.572392
7 2005 northwest 227.935083 303.758915
8 2005 southeast 1537.480821 1955.788158
9 2005 southwest 1073.022533 1109.142535
21 2005 United States 7195.000000 7195.000000
In [573]:
US=7195.000000
G2['Total GHG Emissions Including LUCF (MtCO2e']=G2['Total GHG Emissions Including LUCF (MtCO2e']/US
G2['Total GHG Emissions Excluding LUCF (MtCO2e)']=G2['Total GHG Emissions Excluding LUCF (MtCO2e)']/US
     
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
In [574]:
G3
Out[574]:
Year region Total GHG Emissions Including LUCF (MtCO2e Total GHG Emissions Excluding LUCF (MtCO2e)
10 2010 midwest 1446.726251 1717.992532
11 2010 northeast 651.983313 735.310914
12 2010 northwest 233.561254 306.511131
13 2010 southeast 1436.314322 1856.056807
14 2010 southwest 1065.550266 1091.976718
22 2010 United States 6812.000000 6812.000000
In [575]:
US=6812.000000
G3['Total GHG Emissions Including LUCF (MtCO2e']=G3['Total GHG Emissions Including LUCF (MtCO2e']/US
G3['Total GHG Emissions Excluding LUCF (MtCO2e)']=G3['Total GHG Emissions Excluding LUCF (MtCO2e)']/US
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
In [576]:
G4
Out[576]:
Year region Total GHG Emissions Including LUCF (MtCO2e Total GHG Emissions Excluding LUCF (MtCO2e)
15 2011 midwest 1398.968607 1674.527143
16 2011 northeast 629.700639 713.365860
17 2011 northwest 216.579098 290.518675
18 2011 southeast 1356.455694 1781.478633
19 2011 southwest 1083.057384 1109.992795
23 2011 United States 6702.000000 6702.000000
In [577]:
US=6702.000000
G4['Total GHG Emissions Including LUCF (MtCO2e']=G4['Total GHG Emissions Including LUCF (MtCO2e']/US
G4['Total GHG Emissions Excluding LUCF (MtCO2e)']=G4['Total GHG Emissions Excluding LUCF (MtCO2e)']/US
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
In [578]:
fig, ax = plt.subplots(1, 4, figsize=(20,40))
G1.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Including LUCF (MtCO2e',title = '2000',ax=ax[0])
G2.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Including LUCF (MtCO2e',title = '2005',ax=ax[1])
G3.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Including LUCF (MtCO2e',title = '2010',ax=ax[2])
G4.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Including LUCF (MtCO2e',title = '2011',ax=ax[3])
Out[578]:
<matplotlib.axes._subplots.AxesSubplot at 0x131ef4208>

As you can see each year, the southeast and the midwest are the largest contributers to US GHG emissions. Note that for some of the years, there are some GHG emissions that are unaccounted for by the region data, this is represented by the blank portions of the pie charts. Lets now take a look at the porportion without LUCF.

In [579]:
fig, ax = plt.subplots(1, 4, figsize=(20,40))
G1.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Excluding LUCF (MtCO2e)',title = '2000',ax=ax[0])
G2.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Excluding LUCF (MtCO2e)',title = '2005',ax=ax[1])
G3.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Excluding LUCF (MtCO2e)',title = '2010',ax=ax[2])
G4.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Excluding LUCF (MtCO2e)',title = '2011',ax=ax[3])
Out[579]:
<matplotlib.axes._subplots.AxesSubplot at 0x134051b70>

As you can see, the proportions are fairly similar for emissions with and without LUCF. Thus LUCF does not have much of an impact on the emissions of the US as a whole, although it does have an impact on the specific regions.

Conclusion

In conclusion, it would appear that the way land is used, whether that be agriculture or LUCF, has one of the largest impacts on regional GHG emissions. You can see this due to the large contribution of the midwest and southwest to the total GHG emissions of the US, and their largest industry sector contributors were both agriculture. We can also conclude that population does have a correlation to GHG emissions, however in a manner that needs further exploration considering the most populated region (northeast) appears to have the lowest GHG emissions.

Since LUCF is able to have a positive impact on the individual regions,it is clear that it could have a positive effect on the US as a whole as well. However, it is not contributing enough in order to have an effect. This is evidence that planting more trees is a easy, cheap, and natural solution to the Climate Change crisis. This is not a new idea, but is one that people don't seem to be taking seriously enough yet.

In conclusion, EVERYONE GO PLANT TREES!