Open
Description
How could the content be improved?
The following section introduce how data can be processed using loops
Automating data processing using For Loops
I believe it would also be advantageous to have a similar section in the following
Here we can briefly introduce python generators as well. For example, consider a CSV file where entries are name
, age
, location
. We can parse this data to a dataframe using a generator. Image location
is a comma separated string field and we want to read latitude and longitude separately.
name | age | location |
---|---|---|
John | 50 | 123341,123321 |
Emily | 25 | 321321,123321 |
Wick | 35 | 123341,654789 |
Raj | 40 | 987789,123321 |
import csv
import pandas as pd
def transform_lines(csv_path):
reader = csv.reader(open(csv_path))
for line_no, line in enumerate(reader):
if line_no == 0:
yield ["Name", "Age", "Latitude", "Longitude"]
else:
name, age, location = line
lat, lng = location.split(",")
yield [name, int(age), float(lat), float(lng)]
lines = transform_lines("./data.csv")
df = pd.DataFrame(lines)
print(df.head())
This is specially useful in large datasets where loading large amount of data in text form is memory consuming.
Metadata
Assignees
Labels
No labels
Activity