I recently had the unpleasant experience of discovering someone had hit my shiny new red car. It happened on the parking lot of my partners job. You would imagine that if you hit a colleague you have the decency to leave your details, but they didn’t. So there we were, no details and significant damage.
Luckily we have a modern car, a car that records a video after detecting an impact.
The video contained the entire series of events, but it was impossible to read the other car’s license plate. At best we could make out 2 letters with some imagination, which is not enough for anything. So what to do, what to do, ..
After a bit of searching I discovered that the Netherlands has een open dataset for all the cars on the road. Including their license plate, make and model, color and other attributes.
So we got to work. While my partner used her machine learning skills to try and get more details out of the video. I went to work filtering down the dataset from 16 million cars, to hopefully one.
Below are the commands I used to filter the dataset down to about 1.500 cars. Lots of filtering was based on the video, but also some common sense was applied.
- Download the Open Dataset from https://opendata.rdw.nl/Voertuigen/Open-Data-RDW-Gekentekende_voertuigen/m9d7-ebf2 and rename it to
data.csv
- Filter out all company cars
1
awk -F';' '$2 == "Personenauto" { print $0 }' data.csv > filtered-1.csv
- Filter out all white cars
1
awk -F';' '$10 == "WIT" { print $0 }' filtered-1.csv > filtered-colour.csv
- Filter out cars with 4 or 5 seats
1
awk -F';' '$9 == "4" || $4 == "5" { print $0 }' filtered-colour.csv > filtered-seats.csv
- Filter out cars heavier than 2.5T
1
awk -F';' '$14 < 2500 { print $0 }' filtered-seats.csv > filtered-mass.csv
- Filter out expensive cars
1
awk -F';' '$22 < 60000 { print $0 }' filtered-doors.csv > filtered-price.csv
- Filter out brand
1
awk -F';' '$3 == "AUDI" { print $0 }' filtered-price.csv > filtered-brand.csv
- Filter out model
1
awk -F';' '$4 == "A4" || $4 == "AUDI A4" { print $0 }' filtered-brand.csv > filtered-type.csv
- Run a regex on the license plate
1 | import csv |
After all the filtering and the final regex (based on the letters we could identify in the license plate) we had one car left. Success!
I’m honestly not sure if my insurance will accept this, but it was fun to do anyway.
(I did notice an odd behavior with sed on MacOS. For some reason it performs slower compared to Linux, but that’s the topic of another blogpost.)