# Combined Cycle Power Plant Analysis

In Combined Cycle Power Plant analysis with the given dataset we have sensor data for Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) of the power plant which produces the net hourly electrical energy output (EP) of the plant. Use Data Science Techniques to find out patterns from the available data to check which features impact the label most.

A combined cycle power plant (CCPP) is composed of gas turbines (GT), steam turbines (ST) and heat recovery steam generators. In a CCPP, the electricity is generated by gas and steam turbines, which are combined in one cycle, and is transferred from one turbine to another. While the Vacuum is collected from and has the effect on the Steam Turbine, the other three of the ambient variables affect the GT performance.

The Dataset is taken from here – https://archive.ics.uci.edu/ml/datasets/combined+cycle+power+plant

The label for the above data set is PE which is a continuous variable.

Data Visualization is one of the powerful parts of Data Science to infer logic form data and find some patterns.

The factors affecting PE are also continuous. So, in order to analyze the dataset using graphs we use the plots like pairplot , heatmap, lmplot etc.

**Analysis 1:**

**import seaborn as sns**

**import matplotlib.pyplot as plt**

**plt.figure(figsize=(4,6))**

**sns.pairplot(data)**

**plt.show()**

IMAGE

While executing the above lines of code we get the pairplot of all the parameters in the data set.

By analyzing the above graph we get to understand that AT and V may have a negative correlation with PE, i.e as their values increases there may be a decrease in the energy PE released from the combined cycle power plant.

Relative humidity RH also has similar graph compared to AP vs PE but is somewhat more uniform which implies PE may have less effect due to RH, that to0 there may be an increase in PE with an increase in RH.

**Analysis 2:**

**import matplotlib.pyplot as plt**

**import seaborn as sns**

**cor=data.corr()**

**sns.heatmap(cor,annot=True,cmap=’coolwarm’)**

**plt.show()**

IMAGE

Heatmap is the plot of values of correlation between the variables in data set. The correlation values are plotted using heatmap.

AT has a negative correlation of -0.95 with PE, which is very much close to -1. Thus AT may have an inverse relation with PE. Thus the value of PE may decrease linearly with increase in the value of AT, which supports previous analysis.

V has a negative correlation of -0.87 with PE, which is close to -1. Thus V may also have an inverse relation with PE. Thus the value of PE may also decrease linearly with an increase in V.

AP has a correlation of 0.52, this indicates that there may be an increase in the value of PE with an increase in AP.

RH has a correlation of 0.39, this indicates that there may be a slight increase in the value of PE with an increase in AP.

**Analysis 3:**

**sns.lmplot(x=”AT”,y=”PE”,data=data)**

**plt.show()**

IMAGE

**V vs PE**

**sns.lmplot(x=”V”,y=”PE”,data=data)**

**plt.show()**

IMAGE

By analyzing the above graphs we get to understand that AT and V may have inverse proportionality with PE,i.e as their values increases there may be a decrease in the energy PE released from the combined cycle power plant.

**AP vs PE**

**sns.lmplot(x=”AP”,y=”PE”,data=data)**

**plt.show()**

IMAGE

The ambient pressure has somewhat linear distribution plot which infers that there may be a slight increase in PE with an increase in AP.

**RH vs PE**

**sns.lmplot(x=”RH”,y=”PE”,data=data)**

**plt.show()**

IMAGE

Relative humidity RH also has similar graph compared to AP vs PE but is somewhat more uniform which implies PE may have less effect due to RH, that to there may be an increase in PE with increase in RH.

**Code:**

**import pandas**

**import numpy**

**import matplotlib.pyplot as plt**

**import seaborn as sns**

**data=pandas.read_excel(r”C:\Users\user1\Downloads\Power_plant_energy_condition.xlsx”)**

**data.PE.count()**

**data.isnull().sum()**

**#statistical analysis**

**data.AT.mean()**

**data.AT.median()**

**data.AT.mode()**

**data.AT.var()**

**data.V.mean()**

**data.V.median()**

**data.V.mode()**

**data.V.var()**

**data.PE.median()**

**data.RH.var()**

**import matplotlib.pyplot as plt**

**cor=data.corr()**

**#plt.figure(figsize(12,10))**

**sns.heatmap(cor,annot=True,cmap=’coolwarm’)**

**plt.show()**

**import seaborn as sns**

**sns.distplot(data.AT[data.PE>450])**

**sns.distplot(data.AT[data.PE<450])**

**plt.legend([‘1′,’0’])**

**plt.show()**

**sns.distplot(data.V[data.PE>450])**

**sns.distplot(data.V[data.PE<450])**

**plt.legend([‘1′,’0’])**

**plt.show()**

**import seaborn as sns**

**plt.figure(figsize=(4,6))**

**sns.pairplot(data)**

**plt.show()**

**sns.lmplot(x=”AT”,y=”PE”,data=data)**

**plt.show()**

**sns.lmplot(x=”AP”,y=”PE”,data=data)**

**plt.show()**

**sns.lmplot(x=”V”,y=”PE”,data=data)**

**plt.show()**

**sns.lmplot(x=”RH”,y=”PE”,data=data)**

**plt.show()**

**Conclusion:**

Therefore By analyzing the given data, we can say that PE is increasing with AT and V. While PE is decreasing with the increment of AP.

So, in order to increase energy production of power plant(PE), we need to operate the combined cycle power plant at low AT, low V, high RH, and high AP.

There can be some more Data Science Techniques which can be applied to find some more patterns form the given dataset.

This work is performed by a team of enthusiastic interns and coders while having an internship at TechTrunk Ventures during Summer Internship 2018.