b7990885d8b26b9404fd9ce952b0b2f005019594,california_housing/feature_engineering.py,,,#,23

Before Change


plt.legend() 
plt.show()

attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]

scatter_matrix(housing[attributes], figsize=(12,8))

After Change


// passing data in for imputation and one hot encoding
//////////////////

city_lat_long = pd.read_csv("cal_cities_lat_long.csv")
city_pop_data = pd.read_csv("cal_populations_city.csv")
county_pop_data = pd.read_csv("cal_populations_county.csv")



original, had to change because we only want to deal with cities we have
both location and population data on.

city_coords = {}
for dat in city_lat_long.iterrows():
    row = dat[1]
    city_coords[row["Name"]] = (float(row["Latitude"]), float(row["Longitude"]))

//how we deiscovered the need for the change
present = []
absent = []
for city in city_coords.keys():
    if city in city_pop_data["City"].values:
        present.append(city)
    else:
        absent.append(city)
len(present)
len(absent)
absent


city_coords = {}

for dat in city_lat_long.iterrows():
    row = dat[1]
    if row["Name"] not in city_pop_data["City"].values:   
        continue           
    else: 
        city_coords[row["Name"]] = (float(row["Latitude"]), float(row["Longitude"]))


//clean pop
//fill in the missing 1980s values with avg rate of change
//make a dictonary of cities lat/long pass in a tuple of lat/longs
//for a given point and do the comparison

//two functions
/Ǘ. take two lat long tuples as input
	//return the distance between the two
    //vincenty(tuple1, tuple2)


//example below
newport_ri = (41.49008, -71.312796)
cleveland_oh = (41.499498, -81.695391)
x = vincenty(newport_ri, cleveland_oh)
x //distance stored in km, see units on printing

In pattern: SUPERPATTERN

Frequency: 3

Non-data size: 6

Instances

Link

Project Name: CNuge/kaggle-code

Commit Name: b7990885d8b26b9404fd9ce952b0b2f005019594

Time: 2018-01-12

Author: nugentc@uoguelph.ca

File Name: california_housing/feature_engineering.py

Class Name:

Method Name:

Link

Project Name: oddt/oddt

Commit Name: e626254b74ecb6dc71396c1b35237b53a5e35163

Time: 2017-08-23

Author: maciek@wojcikowski.pl

File Name: oddt/datasets.py

Class Name: pdbbind

Method Name: __init__

Link

Project Name: EpistasisLab/penn-ml-benchmarks

Commit Name: 862f99942ce1eefe93f0cfd1bcf3ade031679cd4

Time: 2020-09-03

Author: grixor@gmail.com

File Name: pmlb/dataset_lists.py

Class Name:

Method Name: