What is a person, really?
This was my first week as a data engineer (same company, just promoted from Data Visualization Analyst) and it was challenging in a way I did not expect it to be.
We have one big project that is priority right now so it’s all hands on deck and we’re looking at some of the data we’ve been collecting for many years. This data lives in a secure database somewhere and we get extracts from it based on what we need to answer. The challenge here were the questions we were asking!
Let’s say you have a school and you have a special program for kids that love science. To get in the program people have to apply and you either get rejected or accepted. After that, every year you have to apply again to re-enroll. Also, if anything changes in your household, say, you move, you have to let the school know and that modification means you have to start a new application.
Not the best system, in my opinion, but it works. Now, you can easily identify kids in your program, they each have student ids and other unique identifiers but how do you measure the impact your program has? how exclusive it is? how diverse?
A quick measure could be the number of applications accepted / the number of applications submitted, right? The share of successful applications? But that’s not real, a kid can submit many times, they have to submit at least twice if they are returning to the program - best case scenario? What if a kid moved, was rejected at first because of a typo in the application, and then is re-enrolling for the second year? That’s at least 4 applications. That’s a 25% success rate for your one program even if you have just one student.
What if each time you start an application you are asked to include your demographics? I’m Mexican. Growing up, I think I had to put White as my race or Other and then go to the next line to check Hispanic. What if on my first application I put white and then I forget and on my second application I put Other? Now the school’s database is able to uniquely identify me (using my student id and SSN) but when it’s time to report demographics, how will they classify me? This one is easy because regardless of my race, my Hispanic-ism wins me the label Hispanic/Latino, end of the story. What about the non-latinxs? People that sometimes put Two or More Races and sometimes they check Other?
Also, if you already applied once why aren’t I pre-populating these fields for you to make you, as a person in my database, look more consistent across applications?
Anygüeys, that’s one thing I’ve been thinking about a lot this week. What is a person? data-wise. Also, is good database design an equity issue???