Folium python là gì

Datasets with geographical data such as latitudes, longitudes, and FIPS codes lend themselves really well to visualization through mapping packages like Folium. While state codes and FIPS county codes are widely used in mapping packages, I wanted to map out ZIP code level data while working with GeoJSON.

We look at the LA county restaurant and market inspection dataset for this purpose. There are two separate csv files available: one for inspection records and one for violation records.

What’s this process like from a high level?

  1. Clean the data
  2. Transform the violation records and merge with inspection records
  3. Find appropriate GeoJSON
  4. Visualize some data

1 Clean the data

Lets start with loading the csv files into data frames and seeing what variables we have.

Data frame for inspections

Data frame for violations

Datetime object

The ‘activity_date’ column is a string and we’ll just run an apply with a function to convert them to datetime objects in both tables.

ZIP codes

A look at unique zip codes found many with additional 4 digits appended to the usual 5 digits. These digits are mainly for USPS mail sorting. For the purposes of this analyses, we only keep the first 5 digits.

Outliers in Violations

A look at the violation codes yield codes that mostly begin with a ‘F’. There seem to be a few starting with ‘W’ that only appear once or twice. When matched with the violation description, they were the only descriptions that did not have a violation number in front of them. Furthermore, some didn’t even result in point deductions. As they only make up 17 entries out of the 272,801 violations, we can safely drop them.

Creating new features with regex

Nope, I still can’t

When looking at the column ‘pe_description’, records looked something like this: ‘RESTAURANT [0–30] SEATS MODERATE RISK’.

It seems to describe 3 different things: what type of establishment it is, how many people it can host, and the risk level.

To better represent the data, we write three helper functions with regex and string split to create new feature variables.

A quick description of the two regex statements used here:

For extracting the type of establishment, we want to get everything before the first opening parentheses. The regex was thus in the form .+[?= []

Let’s break this down:
.+ →This matches, returns any character, and keeps going. The use of ‘+’ means it has to match at least once.
[?= []→This is a lookahead which indicates that the string ends with ‘ [‘ and that the open parentheses will not be returned.

For extracting the size of the establishment, I used the regex [?

Chủ Đề