(This is an experimental crossover post for both my blog and Substack, since the topic applied to each of them.)
As I write my new book, Peak Higher Education, I research a good amount of statistics. Enrollment data, publication numbers, number of colleges and universities over time – all of these feed into the hopper for analysis, reflection, and representation. I thought it would be interesting to see what generative AI made of it.
So I came up with a simple experiment and applied it to a range of AI tools. I used the same basic prompt on each one: “Find American higher education enrollment from 2000-2024 and visualize it as a graph.” I didn’t upload or link to data because I wanted to see how generative AI applications are doing with search and retrieval.
TL;DR – response were generally disappointing, but ChatGPT came out on top.
ChatGPT gave me this graph, followed by a text explanation:
Here is a graph showing the American higher education enrollment from 2000 to 2024. The data shows a general increase in enrollment up to 2010, peaking at around 21 million students. After 2010, there is a gradual decline, with a sharper drop in 2020, likely due to the impact of the COVID-19 pandemic. Enrollment appears to begin recovering from 2021 onwards.
That looks pretty good, both the graphic and the text.
ChatGPT also gave me some code to use:
Note it doesn’t cite sources.
Next, I asked SearchGPT and received mixed results. That application offered more text and actual sources:
The bad news is that the graph link didn’t work, even when I asked SearchGPT to try again. It also wouldn’t display as an inline image.
Next, I put the same exact prompt to Google’s Gemini. It thought about it for a while, roughly half a minute, then responded with “Sure, here is the graph:” and:
Gemini added “Be sure to check the data in the chart to make sure I got it right,” then supplied some Python code.
Next, I headed over to Microsoft’s Copilot, which responded thusly:
Unfortunately, Copilot didn’t show the image, despite repeated entreaties.
Next, I tried Perplexity, which is rapidly becoming one of my favorites. It replied with text:
So it won’t offer a graph, but advised me on how to make my own.
It also added a substantial set of related questions:
Perplexity then shared a bunch of graphs from other sites, only one of which really helped me question:
Then I tried Claude, and it was in some ways the strangest experience. It interrupted its first answer to my question, deleting the text, claiming it was overloaded. So I immediately (without waiting, like it asked me to) reran the query, and this time it replied with text:
A bit embarrassed, it asked me if I wanted what I asked for. I did and the results weren’t great:
OK, I’m old enough to be charmed by ASCII art, having created some when I was a teenager. And I’m glad Claude also lets us copy the text. But man, this is so wrong. It puts the peak in the wrong year, 2005 instead of 2012. Visually, it takes up less than half of the space it allocated for the thing. And it falls short of 2024.
But Claude adds text, which contradict the image, and is actually right:
Next, over to Meta’s AI, to see what it could do. Meta offered not a graph but a table:
Followed by more text, including an apology and an offer to help me to it on my own:
Then I tried Llama, hosted by Poe. Like Meta, appropriately, it offered a table, followed by code – well, HTML. (And at this point WordPress stopped letting me upload images successfully, which is frustrating.)
To sum up: none of these responses satisfied me. ChatGPT did the best, actually pproviding a graph which was accurate, but it didn’t cite sources. All others failed at the graphic side, and some had some other problems. Some did a good job with sources, namely Perplexity, Copilot, and Gemini. Perplexity didn’t create an image, but did harvest a bunch from the web, citing them.
As a research assistant for my book, only ChatGPT gave me something useful. I will still make my own graph, probably through Google Apps, because I want to tinker with data and layout some more (might overlay National Clearinghouse and federal data together).
Yet this gives me an idea. How would ChatGPT do with my dual data source idea?
So I adjusted my prompt to read: “Find American higher education enrollment from 2000-2024 and visualize it as a graph. Use data from NCES and National Student Clearinghouse Research Center.”
The results were more extensive than we saw in the first run through. It starts off with discussing its plans and process:
(Since WordPress is still being annoying, I’ll copy and paste the text)
Next, ChatGPT offered four (4) separate Python script attempts, each of which we could export. Between each script effort the bot added an “Analysis errored.”
Finally, ChatGPT satisfied itself and offered this graph (WordPress permitting):
With this explanation:
Here is the graph showing American higher education enrollment from 2000 to 2024. The data reflects the growth in enrollment up to around 2010, followed by a gradual decline, with a slight recovery expected in the near future.
If you need further analysis or additional details, feel free to ask!
Honestly, much as I enjoyed watching the process unfold this time, I’m not happy with the results. It synthesized the two data streams and seems to have nudged the peak ahead three years, to 2015, without noting the slight enrollment bump in 2023-2024.
To sum up: based on these experiments, using generative AI to create graphs based on web-based data is a work in progress. ChatGPT appears to be in the lead thus far.
I admire your patience with these AI’s. I would have taken pains to ensure that “American higher education enrollment” was well-defined because there are enormous variations in what this means and to whom. Local Adult Vo-Tech programs run by school districts are very much included in Title IV post-secondary education, but I don’t know if you want to include this in your count.
I forget the terminology that AIers use, but the first step is always getting AI to access the right databases, and then, in that context, posing the question is the second step.
Same for part-time, full-time; certificate versus degree, 2 year versus 4 year, versus graduate, etc.
The other variable I’d be interested in is the programmatic online offerings versus Face-to-Face, or hybrid.
The essential characteristics of higher ed are changing before our eyes, so reducing all this dynamism to “enrollment” seems beside the point. Capturing the immense variability here requires data visualization of another order because it extends to applicants and candidates, and the changes in the secondary pipeline that we are all hearing about. For example, this interactive map from US Census https://www.census.gov/library/visualizations/interactive/young-adults-higher-education.html
https://www.census.gov/library/visualizations/interactive/from-college-to-jobs-stem.html