Github Page: (https://christinacampbell98.github.io)
For my tutorial, I am interested in exploring Climate Change data. With the recent Climate Change UN Summit, the severity and pressing need for change has never been more apparent. I want to explore some of the datasets that are available and see how they coincide with or challenge the information that is being displayed to the public.
The first data set that I am exploring displays the total GHG emissions produced by state, and also breaks down these emissions by industry. The data set also provides information on each state, such as their total population, GDP, and total energy use. I found this dataset on the World Resources Institute(http://datasets.wri.org), developed by CAIT (Climate Analysis Indicators Tool). With this data set, I want to examine what industry is having the largest impact on the GHG emissions of each region of the US and see if these results are expected or surprising.
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
%matplotlib inline
import numpy as np
import pandas as pd
data=pd.read_excel("/Users/christinacampbell/Downloads/cait2.0u.s.statesghgemissions-alldata.xlsx",sheet_name="State GHG Emissions")
data.columns=['State',"Year","Total GHG Emissions Excluding LUCF (MtCO2e)","Total GHG Emissions Including LUCF (MtCO2e","Total CO2 (excluding LUCF) (MtCO2e)","Total CH4 (MtCO2e)","Total N2O (MtCO2e)","Total F-Gas (MtCO2e)","Energy (MtCO2e)","Industrial Processes (MtCO2e)","Agriculture (MtCO2e)","Waste (MtCO2e)","Land Use and Forestry (MtCO2e)","Bunker Fuels (MtCO2e)","Electric Power (MtCO2e)","Commercial (MtCO2e)","Residential (MtCO2e","Industrial (MtCO2e","Transportation (MtCO2e)","Fugitive Emissions (MtCO2e)","State GDP (Million US$ (chained 1997/2005))","Population (People)","Total Energy Use (Thous. tonnes oil eq. (ktoe))"]
data=pd.DataFrame(data,columns=['State',"Year","Total GHG Emissions Excluding LUCF (MtCO2e)","Total GHG Emissions Including LUCF (MtCO2e","Total CO2 (excluding LUCF) (MtCO2e)","Total CH4 (MtCO2e)","Total N2O (MtCO2e)","Total F-Gas (MtCO2e)","Energy (MtCO2e)","Industrial Processes (MtCO2e)","Agriculture (MtCO2e)","Waste (MtCO2e)","Land Use and Forestry (MtCO2e)","Bunker Fuels (MtCO2e)","Electric Power (MtCO2e)","Commercial (MtCO2e)","Residential (MtCO2e","Industrial (MtCO2e","Transportation (MtCO2e)","Fugitive Emissions (MtCO2e)","State GDP (Million US$ (chained 1997/2005))","Population (People)","Total Energy Use (Thous. tonnes oil eq. (ktoe))"])
data=data.drop(data.index[0])
data=data.drop(data.index[1])
data= data.drop(data.index[0]).reset_index()
data.head()
data.shape
The second data set that I am considering is also from the World Resources Institute, and was developed by CAIT. This data set shows GHG emissions (with and without land-use change and forestry), as well as fossil fuel emissions for almost every country. This dataset also includes data on each countries GDP and population. The unique aspect of this data set is that it includes predictive data, that extends all the way into 2050. The dataset predicts not only where GHG emissions will be if each country continues down their current path, but also predicts where the country would be if they followed a policy scenario to change their behavior. The policies that are chosen are described in the dataset, and are based off of proposed policies within the countries.
I will use this dataset to compare the GHG emissions of each region to the total GHG emissions of the US. I will first import the data, then I will drop the unessary columns and change the column names approriately.
data2=pd.read_excel("/Users/christinacampbell/Downloads/caitprojectionsalldata4-9-150 (1).xlsx",sheet_name="GHG Emissions Data")
data2=data2.drop(['Unnamed: 1','Unnamed: 2','Unnamed: 6','Unnamed: 7','Unnamed: 8','Unnamed: 9'],axis=1)
data2.columns=['region','Year','Total GHG Emissions Excluding LUCF (MtCO2e)','Total GHG Emissions Including LUCF (MtCO2e']
data2=pd.DataFrame(data2,columns=['region','Year','Total GHG Emissions Including LUCF (MtCO2e','Total GHG Emissions Excluding LUCF (MtCO2e)'])
data2=data2.drop([0])
data2.head()
Now that I have the data in the format that I want, I wil drop all data that is not for the US, and all values of NA.
data2=data2[data2['region']=='United States']
data2=data2.dropna()
data2
Note that in both of these datasets, there are recorded GHG Emissions Including and Excluding LUCF, and that the values including LUCF are less than the values of Excluding LUCF in both datasets. This is because LUCF involves more plants and trees, which naturally process CO2. Thus, LUCF decreases the GHG Emissions. I want to explore more on just how much of an impact LUCF has on each region of the US, and the US as a whole.
For my tutorial, I will be focusing primarily on comparing the emmissions from different industries in different regions, but I want to keep this information more generalized in order to see the bigger picture. Thus, I can drop the columns that show data about sub-sectors.
data=data.drop(['Electric Power (MtCO2e)','Commercial (MtCO2e)','Residential (MtCO2e','Industrial (MtCO2e','Transportation (MtCO2e)','Fugitive Emissions (MtCO2e)'],axis=1)
data.head()
I will now make the data more tidy, by combining the data that expresses the specific sector emmissions, and the specific gas type emmissions into one column. I will do this using the melt method.
id_vars=['index','State','Year','Total GHG Emissions Excluding LUCF (MtCO2e)','Total GHG Emissions Including LUCF (MtCO2e','Total CO2 (excluding LUCF) (MtCO2e)','Total CH4 (MtCO2e)','Total N2O (MtCO2e)','Total F-Gas (MtCO2e)',
'State GDP (Million US$ (chained 1997/2005))','Population (People)','Total Energy Use (Thous. tonnes oil eq. (ktoe))'
]
data=pd.melt(frame=data,
id_vars=id_vars,
var_name="sector",
value_name='MtCO2e emmisions by sector')
data.head()
id_vars2=['index','State','Year','Total GHG Emissions Excluding LUCF (MtCO2e)','Total GHG Emissions Including LUCF (MtCO2e','sector','MtCO2e emmisions by sector',
'State GDP (Million US$ (chained 1997/2005))','Population (People)','Total Energy Use (Thous. tonnes oil eq. (ktoe))']
data=pd.melt(frame=data,
id_vars=id_vars2,
var_name="gas",
value_name='total emmisions by gas')
data.head()
Next, I will convert the data types of the columns appropriatly.
data['Total GHG Emissions Excluding LUCF (MtCO2e)']=pd.to_numeric(data['Total GHG Emissions Excluding LUCF (MtCO2e)'])
data['Total GHG Emissions Including LUCF (MtCO2e']=pd.to_numeric(data['Total GHG Emissions Including LUCF (MtCO2e'])
data['MtCO2e emmisions by sector']=pd.to_numeric(data['MtCO2e emmisions by sector'])
data['total emmisions by gas']=pd.to_numeric(data['total emmisions by gas'])
data['State GDP (Million US$ (chained 1997/2005))']=pd.to_numeric(data['State GDP (Million US$ (chained 1997/2005))'])
data['Population (People)']=pd.to_numeric(data['Population (People)'])
data['Total Energy Use (Thous. tonnes oil eq. (ktoe))']=pd.to_numeric(data['Total Energy Use (Thous. tonnes oil eq. (ktoe))'])
data['Year']=pd.to_numeric(data['Year'])
data=data.drop('index',axis=1)
Instead of analyzing the data by state, we want to analyze it by regions. in order to do this, I will separate and group the data into regions based off of their states. I created a separate dataframe for each region, and then recombined them once I create and assign a column called "region".
Northeast=data[(data['State']=="Main") |(data['State']=="New York" )| (data['State']== "New Jersey") | (data['State']=="Vermont" )| (data['State']=="Massachusets") | (data['State']=="Rhode Island") | (data['State']=="Conneticut" )| (data['State']=="New Hampshire") | (data['State']=="Pennsylvania")|(data['State']=="Maryland")] .reset_index()
Northeast['region']='northeast'
Northeast=Northeast.drop(['State'],axis=1)
Northeast.head()
Southeast=data[(data['State']=="Alabama") |(data['State']=="Florida" )| (data['State']== "Georgia") | (data['State']=="Kentucky" )| (data['State']=="Mississippi") | (data['State']=="North Carolina") | (data['State']=="South Carolina" )| (data['State']=="Tennessee") |(data['State']=="Virginia")|(data['State']=="West Virginia")|(data['State']=="Arkansas")|(data['State']=="Louisiana")|(data['State']=="Delaware")] .reset_index()
Southeast['region']='southeast'
Southeast=Southeast.drop(['State'],axis=1)
Southeast.head()
Northwest=data[(data['State']=="Oregon") |(data['State']=="Washington" )| (data['State']== "Idaho") | (data['State']=="Wyoming" )| (data['State']=="Montana")] .reset_index()
Northwest['region']='northwest'
Northwest=Northwest.drop(['State'],axis=1)
Northwest.head()
Midwest=data[(data['State']=="Illinois") |(data['State']=="Indiana" )| (data['State']== "Iowa") | (data['State']=="Kansas" )| (data['State']=="Michigan")|(data['State']=="Minnesota")|(data['State']=="Missouri")|(data['State']=="North Dakota")|(data['State']=="South Dekota")|(data['State']=="Ohio")|(data['State']=="Wisconsin")] .reset_index()
Midwest['region']='midwest'
Midwest=Midwest.drop(['State'],axis=1)
Midwest.head()
Southwest=data[(data['State']=="New Mexico")|(data['State']=="Arizona")|(data['State']=="Texas")|(data['State']=="Oklahoma")] .reset_index()
Southwest['region']='southwest'
Southwest=Southwest.drop(['State'],axis=1)
Southwest.head()
region_data=Northeast.merge(Southeast, how='outer')
region_data=region_data.merge(Northwest, how='outer')
region_data=region_data.merge(Midwest, how='outer')
region_data=region_data.merge(Southwest, how='outer')
region_data=region_data.drop(['index'],axis=1)
region_data.shape
Now, I want to aggregate the data of the numerical columns, so that there is one row for each combination of region, year, sector, and gas emmission type. I will use the sum aggregation function in order to get the total emissions for each region.
region_data=region_data.groupby(['region','Year','sector','gas']).agg({'Total GHG Emissions Excluding LUCF (MtCO2e)':'sum','Total GHG Emissions Including LUCF (MtCO2e':'sum','MtCO2e emmisions by sector':'sum','State GDP (Million US$ (chained 1997/2005))':'sum','Population (People)':'sum','Total Energy Use (Thous. tonnes oil eq. (ktoe))':'sum','total emmisions by gas':'sum'})
region_data=region_data.reset_index()
region_data.head()
First, I want to compare the emissions with and without LUCF in each region.
fig, ax = plt.subplots(1, 5, figsize=(30,10))
groups=region_data.groupby('region')
groups.get_group('midwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax[0], label='Excuding',title='midwest')
groups.get_group('midwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax[0], label='Including')
groups.get_group('northeast').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax[1], label='Excuding',title='northeast')
groups.get_group('northeast').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax[1], label='Including')
groups.get_group('southeast').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax[2], label='Excuding',title='southeast')
groups.get_group('southeast').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax[2], label='Including')
groups.get_group('northwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax[3], label='Excuding',title='northwest')
groups.get_group('northwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax[3], label='Including')
groups.get_group('southwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax[4], label='Excuding',title='southwest')
groups.get_group('southwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax[4], label='Including')
As you can see, in each region the GHG emissions recordings including LUCF are significantly less than the recordings of GHG emissions without LUCF. This is as expected, since we saw this trend in the data and due to the consumption of CO2 by plants.
Next, I want to compare the GHG emissions of the regions to each other.
The line graph below shows a comparison of the Total GHG Emissions Including LUCF of the 5 different regions.
import matplotlib.pyplot as plt
ax=plt.gca()
groups=region_data.groupby('region')
groups.get_group('midwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax, label='midwest')
groups.get_group('northeast').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax,label='northeast')
groups.get_group('southeast').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax,label='southeast')
groups.get_group('northwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax, label='northwest')
groups.get_group('southwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax,label='southwest',title='Total GHG Emissions Excluding LUCF of the 5 different regions')
The line graph below shows a comparison of the Total GHG Emissions Including LUCF of the 5 different regions.
ax=plt.gca()
groups=region_data.groupby('region')
groups.get_group('midwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax, label='midwest')
groups.get_group('northeast').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax,label='northeast')
groups.get_group('southeast').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax,label='southeast')
groups.get_group('northwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax, label='northwest')
groups.get_group('southwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax,label='southwest',title='Total GHG Emissions Including LUCF of the 5 different regions')
As you can see from these two graphs, although LUCF does help reduce the GHG emissions in each region, LUCF does not seem to change the order of regions from most GHG emissions to least. It appears that the impact of LUCF is not large enough in any specific region to completely change the ordering.
Now, that we have done some analysis on the total GHG emissions in each region, I want to evaluate what industry sectors are having the largest effect on each region.
region_Year_sector=region_data.drop(["gas","Total GHG Emissions Excluding LUCF (MtCO2e)","Total GHG Emissions Including LUCF (MtCO2e","State GDP (Million US$ (chained 1997/2005))","Population (People)","Total Energy Use (Thous. tonnes oil eq. (ktoe))","total emmisions by gas"],axis=1)
region_Year_sector=region_Year_sector.groupby(['region','sector']).agg({'MtCO2e emmisions by sector':'mean'}).reset_index()
region_Year_sector.head()
region_Year_sector.pivot_table(index='region',columns='sector').plot.bar(stacked=True,figsize=(20,10), title='Sector Emissions by Region')
As you can see, Land Use and Forestry is contributing negatively to the GHG Emissions. As we discussed above this is because where there is land and forests,there are trees which naturally consume CO2 out of the air (the main gas that contributes to Greenhouse Gases). Lets look at this same plot without Land Use and Forestry contributing, in order to accurately analyze what industry sector is contributing the most to the GHG emissions. By removing the LUCF data, we will be able to compare all of the regions from a starting point of 0.
It is also clear that Energy is substantially the largest source of the emissions in each region. Since we are trying to analyze the differences in each region, lets look at this same plot without Energy contributing as well. This will let us see what industries are uniquely having impacts in the different regions.
region_Year_sector[(region_Year_sector['sector']!= 'Land Use and Forestry (MtCO2e)') &(region_Year_sector['sector']!= 'Energy (MtCO2e)')].pivot_table(index='region',columns='sector').plot.bar(stacked=True,figsize=(20,10), title='Sector Emissions by Region without Energy and LUCF')
We can now more clearly see what industries are impacting each region. For the midwest, northwest, and southwest, the largest contributor is Agriculture while in the northeast Industrial processes is the main contributors. The southeast appears to have an almost even amount of emissions coming from Industrial processes and Agriculture. Something important to note is that the regions with a large contribution of emissions from Agriculture (midwest and southeast) have the higher amount of emissions in general. This is an idea we are going to explore further below.
Below is a different form of a graph in order to visualize the comparisons of emissions by industry sector in the different regions. This will allows us to see what industry is consistently producing the most emissions across the different regions.
region_data.groupby(['sector','region']).median().plot.bar(y='MtCO2e emmisions by sector',figsize=(20,10),fontsize=20,stacked=True)
From the graphs above, you can tell that certain regions have a lot higher sector emmissions than others. You can also tell that the sector that is effecting each region the most varies. More interestingly, it appears that regions that are thought to have a larger population, such as the Northeast, appear to be producing substantially less GHG emissions than more sparsely populated regions such as the midwest. This is not a result that I would have expected.
I now want to see if there is a correlation between the Total GHG Emissions Excluding LUCF and total population, in order to determine if the population size has any relation to the GHG emissions in each region. I decided to use the data excluding LUCF in order to analyze all of the GHG emissions being produced, without some being processed by plants. My hypothesis is that there will be a strong positive correlation.
from scipy import stats
fig, ax = plt.subplots(1, 5, figsize=(20,5))
regions= region_data.region.unique()
i=0
for r in regions:
data=region_data[region_data.region==r]
data.plot.scatter(x='Population (People)', y='Total GHG Emissions Excluding LUCF (MtCO2e)', ax=ax[i],title=r)
slope, intercept, r_value, p_value, std_err = stats.linregress(data['Population (People)'],data['Total GHG Emissions Excluding LUCF (MtCO2e)'])
line = slope *data['Population (People)'] + intercept
ax[i].plot(data['Population (People)'], line, lw=1, ls='--', color='red')
i+=1
As you can see, there does appear to be a positive correlation between population and Total GHG Emissions Excluding LUCF, as I have expected. Interestingly, however, this correlation is higher in regions such as the midwest that have a smaller population than areas such as the northeast. Since there is a correlation, I want to normalize the GHG emissions data by population, in order to better compare the emissions in each region.
I will first make a new dataframe with the region data normalized, and then will provide some of the same graphs as above for interpretation and comparison.
norm=region_data[['Total GHG Emissions Excluding LUCF (MtCO2e)','Total GHG Emissions Including LUCF (MtCO2e','MtCO2e emmisions by sector']].div(region_data['Population (People)'],axis=0)
region_data_norm=region_data[['region','Year','sector','gas','Population (People)']]
region_data_norm['Total GHG Emissions Excluding LUCF (MtCO2e)']=norm['Total GHG Emissions Excluding LUCF (MtCO2e)']
region_data_norm['Total GHG Emissions Including LUCF (MtCO2e']=norm['Total GHG Emissions Including LUCF (MtCO2e']
region_data_norm['MtCO2e emmisions by sector']=norm['MtCO2e emmisions by sector']
region_data_norm.head()
ax=plt.gca()
groups=region_data_norm.groupby('region')
groups.get_group('midwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax, label='midwest')
groups.get_group('northeast').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax,label='northeast')
groups.get_group('southeast').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax,label='southeast')
groups.get_group('northwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax, label='northwest')
groups.get_group('southwest').plot.line(x='Year',y='Total GHG Emissions Excluding LUCF (MtCO2e)',ax=ax,label='southwest',title='Total GHG Emissions Excluding LUCF of the 5 different regions')
ax=plt.gca()
groups=region_data_norm.groupby('region')
groups.get_group('midwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax, label='midwest')
groups.get_group('northeast').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax,label='northeast')
groups.get_group('southeast').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax,label='southeast')
groups.get_group('northwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax, label='northwest')
groups.get_group('southwest').plot.line(x='Year',y='Total GHG Emissions Including LUCF (MtCO2e',ax=ax,label='southwest',title='Total GHG Emissions Including LUCF of the 5 different regions')
As you can see, this did provide some rearrangement of the ranking of the regions, however did not change the fact that the North East and the North West have the lowest GHG emissions. Again, this is interesting considering the northeast is considered the most populated region.
region_Year_sector_norm=region_data_norm.drop(["gas","Total GHG Emissions Excluding LUCF (MtCO2e)","Total GHG Emissions Including LUCF (MtCO2e","Population (People)"],axis=1)
region_Year_sector_norm=region_Year_sector_norm.groupby(['region','sector']).agg({'MtCO2e emmisions by sector':'mean'}).reset_index()
region_Year_sector.head()
region_Year_sector_norm[(region_Year_sector_norm['sector']!= 'Land Use and Forestry (MtCO2e)') &(region_Year_sector_norm['sector']!= 'Energy (MtCO2e)')].pivot_table(index='region',columns='sector').plot.bar(stacked=True,figsize=(20,10), title='Sector Emissions by Region without Energy and LUCF')
As you can tell, the normalization by population helped make it easier to compare the contribution of each industry sector to each region. It also helps depict the fact that throughout the US, Agriculture is producing the most Greenhouse gasses after energy. The impact of agriculture is extremely substantial in almost every region. This goes to show further the impact of how we use land.
Although this gives us good insight on the emissions in each region, I am curious how each region's emissions would compare to the emissions of the US as a whole. I will use dataset 2 in order to make these comparisons.
I am going to explore how much of an impact Land Use and Forestry has on lessening the Emissions both across the US and in each region. I would expect for the emissions to be drastically lessened in regions such as the Midwest, but not as impacted by LUCF in regions such as the Northeast.I expect for the US emissions overall to not be drastically impacted.
I am also going to look at what proportion of the US's total emissions (found in dataset 2) each region makes up. I am predicting that the Northeast and the southeast will make up majority of the total US emissions.
As we saw above, the region data normalized by population was much more informative. However, I want to analyze how each regions total emissions contribute to the US's total emissions. Thus, I will not be using the normalized data.
region_data['Year'].unique()
data2['Year'].unique()
As you can tell, the US data is much more limited in years than the regions data. In order to have consistency, we must restrict both data to the same years: 2000, 2005, 2010, 2011.
data2=data2[data2['Year'].isin([2000,2005,2010,2011])]
region_data=region_data[region_data['Year'].isin([2000,2005,2010,2011])]
region_data
I now will drop unnecessary columns and rows from the region data, and then combine the datasets.
compare_data=region_data[['region','Year','Total GHG Emissions Including LUCF (MtCO2e','Total GHG Emissions Excluding LUCF (MtCO2e)']]
compare_data=compare_data.groupby(['Year','region']).aggregate({'region':'first','Year':'first','Total GHG Emissions Including LUCF (MtCO2e':'first','Total GHG Emissions Excluding LUCF (MtCO2e)':'first'})
compare_data=compare_data.drop(['region','Year'],axis=1).reset_index()
compare_data
data2.dtypes
Before we can merge, we have to convert the Year and the Total GHG emissions columns into a data types int
data2['Total GHG Emissions Including LUCF (MtCO2e']=pd.to_numeric(data2['Total GHG Emissions Including LUCF (MtCO2e'])
data2['Year']=pd.to_numeric(data2['Year'])
data2['Total GHG Emissions Excluding LUCF (MtCO2e)']=pd.to_numeric(data2['Total GHG Emissions Including LUCF (MtCO2e'])
regions_US=compare_data.merge(data2, how="outer")
regions_US
I first want to assess what kind of impact LUCF is having on each region, including the US.
regions_US.groupby(['region']).median().plot.bar(y=['Total GHG Emissions Including LUCF (MtCO2e','Total GHG Emissions Excluding LUCF (MtCO2e)'],figsize=(20,10),fontsize=20, title="US emissions vs. the Regions")
As you can see,in the regions where Land Use and Forestry is prominent, there is an apparent decrease in the GHG emissions. There is not much of an impact of LUCF in regions such as the southwest and the northwest, where LUCF is not a prominent industry. This is physical evidence of the success of planting trees on removing CO2 emissions!
As I predicted, there was no significant change in the emissions of the US. The midwest and the southeast have the most drastic change, and the southwest has the least drastic change. These results match our previous results regarding the impacts of industries on the GHG Emissions of each region.
I now want to create a pie chart to represent what proportion of US GHG emissions each region made up for each of the available years.
groups=regions_US.groupby('Year')
G1=groups.get_group(2000)
G2=groups.get_group(2005)
G3=groups.get_group(2010)
G4=groups.get_group(2011)
G1
US=7076.000000
G1['Total GHG Emissions Including LUCF (MtCO2e']=G1['Total GHG Emissions Including LUCF (MtCO2e']/US
G1['Total GHG Emissions Excluding LUCF (MtCO2e)']=G1['Total GHG Emissions Excluding LUCF (MtCO2e)']/US
G1
G2
US=7195.000000
G2['Total GHG Emissions Including LUCF (MtCO2e']=G2['Total GHG Emissions Including LUCF (MtCO2e']/US
G2['Total GHG Emissions Excluding LUCF (MtCO2e)']=G2['Total GHG Emissions Excluding LUCF (MtCO2e)']/US
G3
US=6812.000000
G3['Total GHG Emissions Including LUCF (MtCO2e']=G3['Total GHG Emissions Including LUCF (MtCO2e']/US
G3['Total GHG Emissions Excluding LUCF (MtCO2e)']=G3['Total GHG Emissions Excluding LUCF (MtCO2e)']/US
G4
US=6702.000000
G4['Total GHG Emissions Including LUCF (MtCO2e']=G4['Total GHG Emissions Including LUCF (MtCO2e']/US
G4['Total GHG Emissions Excluding LUCF (MtCO2e)']=G4['Total GHG Emissions Excluding LUCF (MtCO2e)']/US
fig, ax = plt.subplots(1, 4, figsize=(20,40))
G1.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Including LUCF (MtCO2e',title = '2000',ax=ax[0])
G2.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Including LUCF (MtCO2e',title = '2005',ax=ax[1])
G3.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Including LUCF (MtCO2e',title = '2010',ax=ax[2])
G4.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Including LUCF (MtCO2e',title = '2011',ax=ax[3])
As you can see each year, the southeast and the midwest are the largest contributers to US GHG emissions. Note that for some of the years, there are some GHG emissions that are unaccounted for by the region data, this is represented by the blank portions of the pie charts. Lets now take a look at the porportion without LUCF.
fig, ax = plt.subplots(1, 4, figsize=(20,40))
G1.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Excluding LUCF (MtCO2e)',title = '2000',ax=ax[0])
G2.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Excluding LUCF (MtCO2e)',title = '2005',ax=ax[1])
G3.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Excluding LUCF (MtCO2e)',title = '2010',ax=ax[2])
G4.set_index('region').drop('United States').plot.pie(y='Total GHG Emissions Excluding LUCF (MtCO2e)',title = '2011',ax=ax[3])
As you can see, the proportions are fairly similar for emissions with and without LUCF. Thus LUCF does not have much of an impact on the emissions of the US as a whole, although it does have an impact on the specific regions.
In conclusion, it would appear that the way land is used, whether that be agriculture or LUCF, has one of the largest impacts on regional GHG emissions. You can see this due to the large contribution of the midwest and southwest to the total GHG emissions of the US, and their largest industry sector contributors were both agriculture. We can also conclude that population does have a correlation to GHG emissions, however in a manner that needs further exploration considering the most populated region (northeast) appears to have the lowest GHG emissions.
Since LUCF is able to have a positive impact on the individual regions,it is clear that it could have a positive effect on the US as a whole as well. However, it is not contributing enough in order to have an effect. This is evidence that planting more trees is a easy, cheap, and natural solution to the Climate Change crisis. This is not a new idea, but is one that people don't seem to be taking seriously enough yet.
In conclusion, EVERYONE GO PLANT TREES!