I gave a talk in November to a local high school about computer science as a career field. Aha, I think – I’ve given this talk before – I’ll just brush up my well-prepared slide deck.
My slide deck has a graphic in it that looks something like the below. All credit to Daniel van der Ende and his work on the GitHub Data Challenge in 2014. It’s an interesting way to show the various combinatrics of languages that are used in projects today. It’s actually common nowadays that a project has multiple types of code in it. Often there’ll be the front-end (often JavaScript + HTML + CSS) with some sort of back-end. The point I wanted to convey in the original presentation was that software engineers often don’t just need to know one language. I then would riff lightly one which of the languages they could see in my slide I’d worked with in some form or fashion. (In the snippet you can see of the image, Perl, Scala, Go, JavaScript, Ruby, and Lua. I did just enough of CoffeeScript to not want to do it anymore…)
Well, now it’s 2021. The slide information needs to be updated, and Mr. van der Ende has not updated his image, but he was kind enough to make available his source code and a handy README file which walks (loosely) through how to get the data.
Challenges then solved so far:
- getting access to BigQuery
- finding new sources of the data, since the dataset van der Ende references doesn’t seem to exist anymore
- making BigQuery convinced that I have permission to run queries
- updating the query to match the new data source, including figuring out how to flatten arrays – really not in his original flow
- downloading mysql to my developer machine and setting up a database and username/password combo
- updating van der Ende’s code to read directly from a CSV, rather than assuming I’m using a JSON file
- getting php to work on my developer workstation – this particular box has done lots of things for me lately, but php hasn’t been one of them
- figuring out how to populate the languages list the code asked for, given the languages represented in the dataset I downloaded. (For the record, awk, sort, uniq was the happy combo.)
- uh, figuring out a better way to ingest the CSV, since pulling in the full file at once took up too much memory for my computer
- (more to come undoubtedly to get it working…)
Note: I ultimately ran into enough things with it that I left the original image. Still on my todo list to bring this to resolution…