In my current work on forecasting the intersection of AI and higher ed, I’ve been running into an interesting problem. Well, several, but today I’d like to share a structural one, caught between futures thinking and where AI is right now.
We’re on the verge of several major decision points about how we use and respond to artificial intelligence, particularly due to the explosion of large language models (LLMs). These occur at large-scale and also radical levels. We could easily head in different directions for each, which makes me think of branching paths or decision gates. As a result the possibilities have ramified.
To better think through this emerging garden of forking paths, I decided to identify the big ones, then tried to map different ways through them. I also mapped them out in a flow chart, which yielded some surprising results.
Dataset size. Right now LLMs train on enormous datasets. This is a problem for multiple reasons: a large carbon footprint, reducing access to the handful of people owning or working at capital-intensive enterprises (OpenAI, Google, etc). For some LLM applications, the bigger the training set, the better. On the other hand, there have been some developments with AI software which get good results from smaller sets, and OpenAI’s leader stated that big datasets are now a thing of the past. So in which direction will we take AI, towards building and using bigger or smaller source collections?
Copyright and intellectual property (IP). Those big datasets include some problematic material in terms of ownership and rights. Usually the corpus curators did not solicit approval from all content owners. Some (probably most) or the content is under copyright, and these for-profit firms cannot reasonably expect to defend their use as fair use (especially after this week’s Prince/Warhol decision). (Check out this helpful Washington Post article, which lets you check what content is in one major dataset. My blog is in there, among many others.)
Moreover, some creators want to shield their work from being used by LLMs. Some are already suing. Will we see such lawsuits or allied regulations shut down these giant digital scraping projects and hence stall AI growth or use, or will LLM projects evade sanctions and continue?
Cultural attitudes. Right now AI is a hotly debated topic, going far beyond hype and kneejerk reactions. The controversy is not just a pro/con binary, but consists of arguments across numerous divides, axes, and topics. For example, in the previous point I mentioned the art topic, which turns on copyright and autonomy. Consider as well debates over job automation, which focus on labor markets and self-worth. Think of divides over machine creativity as creepy or interesting. And recall the deep anxiety about machines making decisions without humans in the loop.
Taken together, we could see an emergent cultural revulsion against AI, much as the 20th century saw build up over nuclear and biological weapons. This could lead to a hardened anti-AI attitude. Alternatively, many people enjoy or just find useful new AI tools for a range of reasons: convenience, boosted creativity, etc. Will we crystalize as a civilization into one stance or another? (This is where I keep reminding people of Frank Herbert’s Butlerian Jihad idea.)
Nonprofit/edu/culture heritage projects. Right now most AI projects have surfaced from giant companies (Google, Microsoft, Meta, Baidu) or from a nonprofit owned by same (OpenAI). So far they have succeeded in being able to assemble the necessary constellation of human talent, massive computing power, huge datasets, and the right software. Accordingly, we could imagine a short/medium-term future where LLMs are solely the property of giants. Yet the history of digital technology has reliably shown tools tend to become more accessible and cheaper over time. On a recent Future Trends Forum we discussed the possibility of small, low resourced groups making their own LLM applications. Indeed, this is an opportunity for nonprofits, educators, and cultural heritage organizations to make their own and with their own spin. I’ve thought of libraries or faculty and staff teams creating AI based on non-problematic datasets, such as Internet Archive or Hathitrust content.
The gate here is then: does AI continue to be the province of giant companies, or does it democratize and enter the non-profit realm?
OK, wrapping these together, I found that each had one option which led to maintaining the status quo in key ways. In contract, their alternatives pointed to new ways of structuring AI. In the flow chart I traced these together:
On reflection, the “New AI Pathways Emerge” looks fascinating, and worth teasing out in a different post.
I’ve used the gate metaphor here and spoken as if humanity is making a decision as a whole at each point, but that’s not historically sound. We contradict ourselves, normally, and especially spatially. Perhaps we should expect multiple branching points. Imagine a polity which decides that AI should be carefully controlled by the state, and only take the form of giant projects, well regulated, and only accessing licensed content, while a different nation prefers a wild west approach, with many types of AI projects using all kinds of content. How would such divergences impact geopolitics?
Additionally, these are just four gates. Several others are in my mind, like technical quality (perpetually flawed or artificial general intelligence?) and private status (nationalized or not?). Which ones do you see?
Back to higher education: should academics advocate for one path or another through these gates? How is enterprise IT preparing for the alternatives? What are the implications of each branch for research and teaching?
PS: I asked Midjourney to image some futures for AI as a flowchart:
Beautiful, but the text makes no sense.