einarwh

The banana test for AI-generated artifacts

October 3, 2024

Do you like bananas? I do - sort of. It’s more precise to say I like banana. That is, one banana, not multiple.

I suspect many people are like me. We enjoy eating a banana, very few eat two bananas in a row, no-one in the history of mankind has ever eaten three.

Human enjoyment as a function of bananas consumed.

To really drive the point home, I have created a pseudo-scientific graph showing the typical banana enjoyment rate for humans as a function of the number of bananas consumed. As you can see, eating bananas starts out as highly enjoyable, but then plunges into the abyss somewhere halfway into banana number two. If we actually finish the second banana that we enthusiastically started eating based on the success of the first one, we rarely enjoy it. It’s a chore.

What does this have to do with anything?

You’ll be aware that Google’s NotebookLM has released a feature that allows you to generate a synthetic podcast based on documents that you upload to it. The generated podcasts are remarkable because they sound very realistic. If you didn’t know that they were AI-generated, you probably couldn’t tell. At least not until we’ve all learned to recognize their mannerisms, but we haven’t yet.

This has unleashed a new wave of AI hype, a new round of oohs and aahs and wild predictions about implications. “This is a game-changer! This will revolutionize how we learn! Now we will finally get the all the inaccessible information out of their hard-to-digest forms and into our brains by a couple of enthusiastically chattering Americans while we exercise and cook and fold our laundry!”

Predictally, it has also unleashed a new swarm of people hastening to show others the podcasts they’ve made with whatever sad documents they found at their hands, not unlike children who rush to show their drawings to their parents for praise. Of course in this case, the effort is rather less than that of the child - all we did was upload a document and push a button. It’s more like ripping a page out of a book than actually drawing anything. But that’s an aside and hey, we all need praise sometimes.

What are we to make of this though? And what does it have to do with bananas?

Well, to me, people who are enthusiastic about NotebookLM (or any new generative AI tool) sound like they’ve just taken their first bite of a banana. “This is great! I will be consuming these by the truck-load!” But we’ll see how it plays out. Let me know when you’ve had three of them.

I must admit that in this case, I personally don’t like bananas at all, not even one. I’d rather stab myself with a fork than listen to one of those podcasts. I have tried several times, but the format doesn’t work for me at all. I get annoyed. It’s noise. I hate it. But I imagine this is not unique for these AI-generated podcasts; I am sure there are many human-made podcasts I also wouldn’t enjoy for the same reasons. It is, after all, a matter of form, and I have every reason to believe that NotebookLM has captured a certain kind of glib, smart alec podcast dialogue nicely.

I think we tend to confuse impressive with good. It is indeed very impressive that it is possible to generate a plausible podcast from written sources in the way NotebookLM does. But that’s not really the same as saying that the podcasts are in fact so good that we will choose listening to them over time. After all, they will have to compete against all other kinds of resources for our time and attention.

This is where the banana test comes in. I suggest that we subject all new AI-generated artifacts to the banana test. That is, before we hail the revolution brought on by any new AI-generated artifacts, we first try to consume more than a couple of them, over a period of time. The proof of the pudding is not just in the eating, but in the continued eating. This will help us distinguish impressive but ultimately hollow gimmicks from real value.

My experience with AI-generated artifacts is that they’ve largely failed this test so far, despite all the brouhaha. My brain immediately rejects reading AI-generated text. Reading 500 AI-generated abstracts for a software conference made me allergic to it. Similarly, I cringe when I see that people use obviously AI-generated images for their social media posts. It’s evident that they just took the first response they got from their tool of choice without putting any effort into it. I feel embarrassed for them, like they have their pants down. Should I say something? But I can’t, because it’s too late, and they’d just feel bad. I would encourage people who use AI-generated images to try to view them as a consumer sees them, though, and not just as someone who’s happy they got a custom-made illustration instantly and for free, without having to worry about copyright. Is it really any good? Or is it just convenient?

These experiences are worth remembering now that people say that on-the-fly AI-generated entertainment is just around the corner. Let’s assume that the hyperbolic projections are right, and that in the near future, we will be able to prompt some AI system to generate a full-length movie that we will actually be able to watch and enjoy. That is, we will enjoy it as if it were a regular movie, without too many distracting illusion-breaking problems like limbs disappearing, people morphing into other people, and so forth. Let’s say those problems can be fixed. This is already a tremendous assumption to make, since we are extremely sensitive to visual nonsense. But let’s pretend. And at least story-wise, it is perhaps not entire unwarranted that people assume that such a system might be able to generate yet another movie set to the Star Wars universe. Presumably it’s possible to synthesize a story like that, and then wrap it up in the appropriate form. But then what will the second movie be like? And the third? Will we enjoy watching those too?

Do we assume that the capability to generate one coherent, acceptable, enjoyable movie means that we will also be able to generate an infinite number of such movies - all with reasonably unique yet coherent stories? This is a property of human-made movies, but we can’t automatically assume that it will hold for AI-generated ones. We’re starting to make some hefty assumptions. And how would we go about doing it in practice? Presumably I’ll have to vary my prompt? Provide a seed number? How will I know that I’ve provided a good prompt for my next movie? If the prompt wasn’t any good I won’t enjoy watching the movie, but I can’t know that until I’ve watched the movie! Or should I ask NotebookLM to create a 7-minute podcast episode about my movie, and make my decision based on that?

When we make predictions about how generative AI is going to change our habits, I think we vastly underestimate ourselves. I think we assume that form is everything. If it looks sufficiently like a human-made movie, we’re going to enjoy it like we enjoy a human-made movie, and it will have all the properties of human-made movies. I don’t think this is true. And I think it is a symptom of something very sad and worrisome. As Hayao Miyazaki said: We humans are losing faith in ourselves. When we claim that the output of AI tools is on par with what humans can make, we are devaluing our own work, selling ourselves short, reducing our ambitions, and ultimately debasing our humanity.