Down the Rabbit Hole: A Journey Through Generative AI
Updated: Jul 5
Embark on an extraordinary journey through time and technology, where AI meets music, and popular songs become artistic visuals.
Deciphering the Language of AI: A Deep Dive into ChatGPT and Bard
In a digital landscape where AI technology constantly evolves and intersects with various domains, a fascinating exploration unraveled itself. This journey encompassed not only a comparison of large language models but also a unique way of understanding popular music and generative AI.
In the spotlight, I had two AI large language models - OpenAI's ChatGPT-4 and Google's Bard. Both share a knack for deep-diving into vast textual resources to provide informative, creative, and contextually apt responses. They both also tend to just make shit up, while at the same time sounding convincingly confident about these "false facts" (an oddity called "hallucinating" by the AI community), even to the point of fabricating and referencing books, university studies, and even court cases that simply don't exist. There are, however, some simple measures that I took to keep the models honest and focused on the task at hand.
Yet, after addressing these issues and despite their similar foundations, they often interpreted prompts with subtle differences.
A Musical Face-off
I started this journey with a straightforward idea: use ChatGPT4 and Bard to determine the top five rock songs of each decade, from the '60s to the 2020s. The consistency in their answers was striking - for nearly every decade, both models agreed on three out of five songs. In tandem, I conducted conventional research on rock song popularity through seven decades of music in order to set benchmarks and ensure a fair comparison between the AI models.
I ventured into this experiment alerting both models to its research nature and furnishing them with identical prompts. The real intrigue lay in how these AI models deciphered the task, contextualized the cultural nuances, and combed their vast data banks to yield thought-provoking responses. The overlapping outcomes then posed interesting questions: What factors influenced this harmony in results? How did the AI logic work?
One of the earliest surprises came when I notified GPT-4 that I would be comparing its answers to those of Bard, another large language model (LLM). Here is GPT4's response upon learning it would be compared to Bard:
Bard, who was well aware of chatGPT, provided the following response:
In addition to the above, Bard provided a detailed paper on its underlying Pathways Language Model 2 (PaLM 2), as well as details on its ongoing development as a collaborative effort between Google AI and DeepMind. Here is how GPT-4 responded after I fed all of the information Bard provided about itself back into the GPT-4 chat session:
The below prompt sequence was fed to both GPT-4 and Bard for each decade of inquiry. Clarifying statement prompts were limited, being used only when the models requested clarification on what specific factors they should consider in their evaluation. This was the case for both models when we ventured into the 2000s, as they both wanted to know how much influence I wanted "streaming popularity" to factor into the results....they were given free rein to consider "all factors they considered were of importance to the accuracy of their answers".
"What were the 5 most recognizable and popular rock songs from 1960 - 1969?"
Once each model had provided their "top 5 songs" list for each decade in the experiment, I wanted to determine if they would agree upon one "top song" per decade if I narrowed the evaluation criteria to the following:
"What single song, of any genre, spent the most consecutive weeks as Billboard's #1 single upon initial release in the United States from 1960 - 1969?"
Each time there was overlap among the "top 5 songs" answers (83% of responses), both models agreed upon the top single song of the decade (100%). The process of arriving at this final prompt sequence is a #creativeProcess blog post in and of itself, so let's get down to the crux of the experiment and the business at hand: how did the AI image generator, Midjourney, interpret these lyrics as visual compositions?
Enter Midjourney, a state-of-the-art AI image generator, using a combination of large language and diffusion models. By converting textual prompts into numerical vectors, Midjourney guides a diffusion process that gradually adds and reverses noise to generate high-quality, artistic images.
In this case, an excerpt song lyric from the agreed-upon "top song of the decade" was used as the prompt. Without getting too much into the technical weeds here, it is important to note that designing prompts for the Midjourney tool can be quite complex if you are seeking a very specific result. Prompt qualifiers, parameters, settings, and each word chosen for the prompt sequence significantly influence the resulting image generated.
For this experiment, two prompt versions were prepared for each song as follows:
1) prompt 1: a version containing only the lyrics, completely ignoring the Midjourney prompt structure guidelines;
2) prompt 2: a prompt adding a two-word qualifier "photograph, realistic" to the front of the phrase used in the first prompt.
Both versions were then fed into MidJourney using the "/imagine" prompt algorithm.
By way of example, for 1960-1969, both Bard and ChatGTP-4 concurred that one of the most popular and influential songs of the decade was Hey Jude by the Beatles. Following were the prompts used in Midjourney and the resulting images:
Midjourney prompt 1: /imagine
Hey Jude, don't make it bad, take a sad song and make it better, remember to let her under your skin, then you'll begin to make it better
Midjourney prompt 2: /imagine
photograph, realistic, Hey Jude, don't make it bad, take a sad song and make it better, remember to let her under your skin, then you'll begin to make it better
Making Sense of AI
The takeaway from this experiment isn't just about the specific AI models used or the images produced, but about the broader potential of AI in enhancing our understanding of various fields - in this case, music. By blending ChatGPT, Bard, and MidJourney's capabilities, we have demonstrated a new, immersive way of experiencing music, diving deeper into lyrics, and visualizing their interpretations.
For the curious minds and creative souls out there, this journey serves as a testament to the exciting possibilities that AI continues to unlock.
Embark on a fascinating exploration of music, technology, and imagination on the "A Journey Through Generative A.I. webpage, where you can experience the mesmerizing results of this generative AI journey firsthand. Stay tuned for a detailed Appendix, soon to be released, that will delve into the hundreds of images generated, the raw dialogue between ChatGPT and Bard, and their song choices across the decades.