fbpx

AI and ML

Pump It Up

Pump It Up

Artificial Intelligence and Machine Learning

 

Problem Statement:-

Using data from Tarifa and the Tanzanian Ministry of Water, our goal is to predict the operating condition of a hand pump as ‘Functional’, ‘Non-functional’ or ‘Functional needs repair.’

After analyzing the given data set and considering the features, we narrowed down 6 features we felt would affect the condition of a pump the most. These are:

  1. Payment type
  2. Gps Height – Altitude of the well
  3. Construction year
  4. Region Code
  5. Quantity
  6. Extraction Type Class

Solution:

 

  1. Payment type

 

IMAGE

Code Used:

plt.figure(figsize=(12,6))

sns.countplot(data.payment_type,hue=data.status_group)

plt.show()

Conclusion:  From the graph, we can conclude that those regions where the payment is never done, the number of nonfunctional pumps is really high. Any other case where the payment is done annually, monthly or per bucket number of functional pumps is much higher than the non-functional ones.

  1. Gps Height

IMAGE

Code Used:

plt.figure(figsize=(15,5))

sns.distplot(data.gps_height[data.status_group==’non functional’],color=’red’,label=’Non Functional’)

sns.distplot(data.gps_height[data.status_group==’functional needs repair’],color=’green’,label=’Needs Repair’)

sns.distplot(data.gps_height[data.status_group==’functional’],color=’blue’,label=’Functional’)

sns.distplot(data.gps_height)

plt.legend()

plt.show()

Conclusion:  From the graph, it is clear that pumps at the height ‘0’ or simply surface wells are not functioning well when compared to other wells which are either on a certain height or at a certain depth. So we can conclude that the probability of the pumps being non functional at the surface is more than the wells at certain height or depth.

 

  1. Construction Year

IMAGE

Code Used:

data.construction_year.replace([0],2020,inplace=True)

plt.figure(figsize=(25,6))

sns.countplot(data.construction_year,hue=data.status_group)

plt.show()

 

we’ve replaced all the unknown values of the pumps’ construction years by 2020 to make the pattern clear.

Conclusion: From the plot, we can clearly conclude that as the construction year increases the probability of the number of wells being functional increases

i.e., the older the pump or well the more likely it is to become ‘nonfunctional’

So construction year has a good impact on the prediction.

 

  1. Region Code

IMAGE

Code Used:

plt.figure(figsize=(10,5))

sns.countplot(data.region_code,hue=data.status_group)

plt.show()

 

Conclusion:  From the above graph, we can conclude that the region code numbering from 25 has less and also not even a single functional pump, so by providing proper facilities in those regions, there will be a chance of increasing pumps which functions well.

  1. Quantity

IMAGE

Code Used:

plt.figure(figsize=(10,5))

sns.countplot(data.quantity,hue=data.status_group)

plt.show()

Conclusion:  From the graph, we can conclude if the quantity of water is dry or insufficient, the number of non-functional pumps is really high. In the other case where the water is in enough quantity, we can say that the functional pumps are more.

  1. Extraction Type Class

IMAGE

Code Used:

plt.figure(figsize=(12,6))

sns.countplot(data.extraction_type_class,hue=data.status_group)

plt.show()

 

Conclusion: From the graph, it is evident that a motor pump type and other types of pumps are heavily non functional. And submersible type is next in line. Thus the extraction type class also impact on predicting if a well or a pump is ‘functional’ or ‘non functional’ or ‘functional but needs repair’.

 

The above-mentioned graphs show the impact of various input parameters on the output parameter status_group. The count and distribution plots have been plotted to analyze the data.

For the much better understanding the correlation between features a heatmap is suggested but here the majority of features are in string format so using heatmap does not help us in analyzing the data.

Code:

import pandas
import numpy
data=pandas.read_csv(r”E:\Intern\Pump It Up Project\pump_it_up_training.csv”)
data.shape
cols=pandas.DataFrame(data.columns.values)
data[‘num_private’].max()
desc=data.describe()
desc
data.isnull().sum()
data.funder.fillna(‘Not Known’,inplace=True)
data.permit.fillna(‘Not Known’,inplace=True)
data.public_meeting.fillna(‘Not Known’,inplace=True)
data.scheme_management.fillna(‘Not Known’,inplace=True)
data.drop([‘date_recorded’,’funder’,’installer’,
‘longitude’,’latitude’,’wpt_name’,’num_private’,
‘basin’,’subvillage’,’region’,’lga’,’ward’,’recorded_by’,’scheme_name’,
‘extraction_type’,’extraction_type_group’,’management’,’payment’,’water_quality’,’quantity_group’,’source’,
‘source_type’,’waterpoint_type’],axis=1,inplace=True)

data.drop([‘id’],axis=1,inplace=True)

Data=data.copy()
data=pandas.read_csv(r”E:\Intern\Pump It Up Project\pump_it_up_training.csv”)
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10,5))
sns.distplot(Data.amount_tsh[Data.status_group==’non functional’],color=’red’,label=’non’)
sns.distplot(Data.amount_tsh[Data.status_group==’functional needs repair’],color=’green’,label=’repair’)
sns.distplot(Data.amount_tsh[Data.status_group==’functional’],color=’blue’,label=’functional’)
sns.distplot(Data.amount_tsh)
plt.show()

plt.figure(figsize=(10,5))
sns.distplot(Data.gps_height[Data.status_group==’non functional’],color=’red’,label=’non’)
sns.distplot(Data.gps_height[Data.status_group==’functional needs repair’],color=’green’,label=’repair’)
sns.distplot(Data.gps_height[Data.status_group==’functional’],color=’blue’,label=’functional’)
sns.distplot(Data.gps_height)
plt.show()

plt.figure(figsize=(10,5))
sns.distplot(Data.region_code[Data.status_group==’non functional’],color=’red’,label=’non’)
sns.distplot(Data.region_code[Data.status_group==’functional needs repair’],color=’green’,label=’repair’)
sns.distplot(Data.region_code[Data.status_group==’functional’],color=’blue’,label=’functional’)
sns.distplot(Data.region_code,label=’total’)
plt.show()

plt.figure(figsize=(10,5))
sns.distplot(Data.district_code[Data.status_group==’non functional’],color=’red’,label=’non’)
sns.distplot(Data.district_code[Data.status_group==’functional needs repair’],color=’green’,label=’repair’)
sns.distplot(Data.district_code[Data.status_group==’functional’],color=’blue’,label=’functional’)
sns.distplot(Data.district_code,label=’total’)
plt.show()

plt.figure(figsize=(20,5))
sns.distplot(Data.population[Data.status_group==’non functional’],color=’red’,label=’non’)
plt.show()
plt.figure(figsize=(20,5))
sns.distplot(Data.population[Data.status_group==’functional needs repair’],color=’green’,label=’repair’)
plt.show()
plt.figure(figsize=(20,5))
sns.distplot(Data.population[Data.status_group==’functional’],color=’blue’,label=’functional’)
plt.show()
plt.figure(figsize=(20,5))
sns.distplot(Data.population,label=’total’)
plt.show()
plt.figure(figsize=(50,5))
sns.distplot(Data.population[Data.status_group==’non functional’],color=’red’,label=’non’)
sns.distplot(Data.population[Data.status_group==’functional needs repair’],color=’green’,label=’repair’)
sns.distplot(Data.population[Data.status_group==’functional’],color=’blue’,label=’functional’)
sns.distplot(Data.population,label=’total’)
plt.show()

plt.figure(figsize=(10,5))
sns.countplot(Data.public_meeting[Data.status_group==’non functional’],color=’red’)
sns.countplot(Data.public_meeting[Data.status_group==’functional needs repair’],color=’green’)
sns.countplot(Data.public_meeting[Data.status_group==’functional’],color=’blue’)
sns.countplot(Data.public_meeting)
plt.show()

plt.figure(figsize=(15,15))
sns.countplot(Data.scheme_management[Data.status_group==’non functional’],color=’red’)
plt.show()
plt.figure(figsize=(15,15))
sns.countplot(Data.scheme_management[Data.status_group==’functional needs repair’],color=’green’)
plt.show()
plt.figure(figsize=(15,15))
sns.countplot(Data.scheme_management[Data.status_group==’functional’],color=’blue’)
plt.show()
plt.figure(figsize=(15,15))
sns.countplot(Data.scheme_management)
plt.show()

plt.figure(figsize=(15,15))
sns.countplot(Data.permit[Data.status_group==’non functional’],color=’red’)
sns.countplot(Data.permit[Data.status_group==’functional needs repair’],color=’green’)
sns.countplot(Data.permit[Data.status_group==’functional’],color=’blue’)
sns.countplot(Data.permit)
plt.show()

plt.figure(figsize=(15,5))
sns.countplot(Data.extraction_type_class[Data.status_group==’non functional’],color=’red’)
plt.show()
plt.figure(figsize=(15,5))
sns.countplot(Data.extraction_type_class[Data.status_group==’functional needs repair’],color=’green’)
plt.show()
plt.figure(figsize=(15,5))
sns.countplot(Data.extraction_type_class[Data.status_group==’functional’],color=’blue’)
plt.show()
plt.figure(figsize=(15,5))
sns.countplot(Data.extraction_type_class)
plt.show()

plt.figure(figsize=(10,5))
sns.distplot(Data.construction_year[Data.status_group==’non functional’],color=’red’,label=’non’)
sns.distplot(Data.construction_year[Data.status_group==’functional needs repair’],color=’green’,label=’repair’)
sns.distplot(Data.construction_year[Data.status_group==’functional’],color=’blue’,label=’functional’)
sns.distplot(Data.construction_year,label=’total’)
plt.show()

plt.figure(figsize=(10,5))
sns.countplot(Data.management_group[Data.status_group==’non functional’],color=’red’)
sns.countplot(Data.management_group[Data.status_group==’functional needs repair’],color=’green’)
sns.countplot(Data.management_group[Data.status_group==’functional’],color=’blue’)
sns.countplot(Data.management_group)
plt.show()

plt.figure(figsize=(10,5))
sns.countplot(Data.payment_type[Data.status_group==’non functional’],color=’red’)
sns.countplot(Data.payment_type[Data.status_group==’functional needs repair’],color=’green’)
sns.countplot(Data.payment_type[Data.status_group==’functional’],color=’blue’)
sns.countplot(Data.payment_type)
plt.show()

plt.figure(figsize=(10,5))
sns.countplot(Data.quality_group[Data.status_group==’non functional’],color=’red’)
sns.countplot(Data.quality_group[Data.status_group==’functional needs repair’],color=’green’)
sns.countplot(Data.quality_group[Data.status_group==’functional’],color=’blue’)
sns.countplot(Data.quality_group)
plt.show()

plt.figure(figsize=(10,5))
sns.countplot(Data.quantity[Data.status_group==’non functional’],color=’red’)
sns.countplot(Data.quantity[Data.status_group==’functional needs repair’],color=’green’)
sns.countplot(Data.quantity[Data.status_group==’functional’],color=’blue’)
sns.countplot(Data.quantity)
plt.show()

plt.figure(figsize=(10,5))
sns.countplot(Data.source_class[Data.status_group==’non functional’],color=’red’)
sns.countplot(Data.source_class[Data.status_group==’functional needs repair’],color=’green’)
sns.countplot(Data.source_class[Data.status_group==’functional’],color=’blue’)
sns.countplot(Data.source_class)
plt.show()

plt.figure(figsize=(15,5))
sns.countplot(Data.waterpoint_type_group[Data.status_group==’non functional’],color=’red’)
sns.countplot(Data.waterpoint_type_group[Data.status_group==’functional needs repair’],color=’green’)
sns.countplot(Data.waterpoint_type_group[Data.status_group==’functional’],color=’blue’)
sns.countplot(Data.waterpoint_type_group)
plt.show()

cor=data.corr()
plt.figure(figsize=(12,5))
sns.heatmap(cor,annot=True,cmap=’coolwarm’)
plt.show()

No comments yet! You be the first to comment.

Leave a Reply

Your email address will not be published. Required fields are marked *

error: