Revisiting the Lovecraft test

This entry is part 5 of 19 in the series Artificial Intelligence

One of the first tests I did of AI’s (or, more accurately LLM’s) ability to creatively storytell was by prompting it to write a story in the style of H.P. Lovecraft. At the time, I was impressed by ChatGPT’s (the only model I tested) ability to pick out common elements from Lovecraft and develop an interesting plot twist at the end while maintaining the requested (albeit forced) “romance elements” of the prompt. The produced short story was then run through a readability calculator which came to the consensus that the grade level of the story was 7th grade — which seemed to be appropriate based on the story.

I decided to come back to the Lovecraft test to test out Claude.ai and ChatGPT’s GPT-4 model and see how they compare. I also ran actual Lovecraft through the readability calculator to see how close to H.P. Lovecraft’s actual writing style (at least in terms of complexity) the robots were able to get.

tell me a story with romance elements in the style of H.P. Lovecraft
the prompt used for AI-assisted H.P. Lovecraft.

The results

Claude.ai

Here’s the story Claude came up with:

The somber twilight cast an eerie glow over the decrepit old mansion on the cliff’s edge. Jonathan had come to this place to find his lost love, the bewitching Lavinia. She had disappeared months ago, lured here by mysterious forces.

As Jonathan approached the iron gates, a raven cawed ominously from a skeletal tree. The mansion seemed alive with shadows that writhed and twisted in the corners of his vision. Finding the front door unlocked, he slowly entered the dust-covered foyer.

“Lavinia?” he called tremulously. Only echoes replied. But he could feel her presence surrounding him. There was a disturbance in the dust at the top of the grand staircase. Heart pounding, Jonathan ascended the steps.

He came to a hallway lined with doors. One was slightly ajar. Holding his breath, Jonathan pushed it open. By the wan light of the moon he saw Lavinia lying upon a four-poster bed, still and pale. As he rushed forward, her eyes suddenly opened! But they were blank and staring.

As Jonathan took Lavinia’s cold hand, hideous realization crept upon him. This perversion was not his love. It was some eldritch creature that had stolen her form. For a moment he saw its true appearance – an amorphous blur of swirling shadows.

With a shriek that was not human, the thing attacked! But Jonathan’s love overcame his terror. “Foul creature, release her!” he cried. Summoning all his strength, he grappled with the horror. Lavinia’s form twisted and shifted under his grasp. Shadowy claws rent his flesh but he did not relent.

At last Jonathan wrested Lavinia’s essence from the creature’s clutches. With an unearthly howl it shriveled away, dissolving into mist. Lavinia’s eyes cleared and her cheeks bloomed rose again. “My love, you saved me,” she whispered. Together they fled that accursed mansion where cosmic forces beyond human comprehension had sought to sunder their bond. But their love had prevailed.

From a storytelling standpoint, there’s good and bad here. On the one hand, there’s a lot of tone setting that feels Lovecraftian, the names are certainly on the nose, as is the general feeling of dread. The story starts to fall apart when the words “eldritch creature” are literally used and Jonathan “grapples” with an “amorphous blur of swirling shadows”, but said “amorphous blur” is reminiscent of The Color Out of Time — an actual Lovecraft story — so it’s still mostly on point.

On the same readability calculator, this story got the following results:

TEXT READABILITY CONSENSUS CALCULATOR
www.ReadabilityFormulas.com

Timestamp: 09/07/2023 — 10:08:45am
Purpose: Our Text Readability Consensus Calculator uses 7 popular readability formulas to calculate the average grade level, reading age, and text difficulty of your sample text.

Your Results:
Your text: The somber twilight cast an eerie glow over the de ... (317 words total)

1.
Flesch Reading Ease score: 73 (text scale)
Flesch Reading Ease scored your text: fairly easy to read.

2.
Gunning Fog: 7 (text scale)
Gunning Fog scored your text: fairly easy to read.

3.
Flesch-Kincaid Grade Level: 5
Grade level: Fifth Grade.

4.
The Coleman-Liau Index: 10
Grade level: Tenth Grade

5.
The SMOG Index: 6
Grade level: Sixth Grade

6.
Automated Readability Index: 6
Grade level: 10-11 yrs. olds (Fifth and Sixth graders)

7.
Linsear Write Formula: 5
Grade level: Fifth Grade.

----------------------------------------------
READABILITY CONSENSUS
----------------------------------------------
Based on (7) readability formulas, we have scored your text:

Grade Level: 6
Reading Level: fairly easy to read.
Age of Reader: 10-11 yrs. olds (Fifth and Sixth graders)
----------------------------------------------

Interestingly, this got a lower grade level than ChatGPT in December last year (presumably on GPT-3). This gives us an interesting baseline. Let’s try the more modern GPT-3.5.

GPT-3.5

In the shadowed alleys of Arkham, amidst the eerie glow of eldritch moonlight, there unfolded a tale both forbidden and enthralling. Our protagonist, a reclusive librarian by the name of Arthur, found himself drawn into the enigmatic world of the occult. It was during his relentless pursuit of ancient tomes and esoteric knowledge that he met the mysterious and alluring Evelyn.

Evelyn was unlike any other woman in Arkham. Her dark hair cascaded like the night itself, and her eyes held secrets that beckoned Arthur to explore the depths of his curiosity. She frequented the rare book section of the Miskatonic University library, searching for the most obscure and arcane manuscripts.

Their fateful encounter transpired one gloomy evening when Arthur stumbled upon Evelyn, engrossed in a tome bound in leather as ancient as the knowledge it contained. With a voice as smooth as the wind whispering through forgotten crypts, she introduced herself and initiated a conversation that would forever change their lives.

As they delved deeper into the mysteries of the Necronomicon and the forbidden rituals within its pages, an undeniable connection formed between them. Their shared passion for the occult grew into a consuming obsession, one that transcended the boundaries of mere academia. They became inseparable, their love as forbidden as the ancient incantations they studied.

Yet, as their love flourished, so did the sinister forces they meddled with. Unbeknownst to Arthur and Evelyn, their unholy union had caught the attention of entities beyond mortal comprehension. Unfathomable horrors began to manifest in Arkham, casting a shadow over their burgeoning romance.

In a final, desperate bid to protect their love, Arthur and Evelyn sought out the darkest of rituals, hoping to shield themselves from the malevolent forces that closed in around them. With trembling hands, they invoked eldritch powers, sealing their fate and intertwining their souls for eternity.

Their love, tainted by the sinister knowledge they pursued, became a curse and a blessing. Together, they wandered the nightmarish realms of the Lovecraftian cosmos, forever bound by a love that transcended the boundaries of human understanding.

And so, in the eerie and otherworldly style of H.P. Lovecraft, this tale of romance and forbidden knowledge concludes, with Arthur and Evelyn eternally entwined in a cosmic dance of love and madness, lost to the eldritch horrors that lurk just beyond the veil of reality.

So, again, looking at this as a story, it does pretty well. ChatGPT namedrops Miskatonic University and Arkham, two fictional places that frequent Lovecraft’s mythos, and there’s a Poe-like obsession on Evelyn. I’m pretty sure the Necronomicon has its fictional origins in Lovecraft, too, or at least it’s one of the things that Lovecraft mentions, so that’s a good pull that’s on topic. Again it falls apart when it tries to create conflict, and destroys the illusion of a Lovecraftian story by literally naming H.P. Lovecraft in the last two paragraphs.

Let’s check the readability scores…



TEXT READABILITY CONSENSUS CALCULATOR
www.ReadabilityFormulas.com

Timestamp: 09/07/2023 — 10:41:37am
Purpose: Our Text Readability Consensus Calculator uses 7 popular readability formulas to calculate the average grade level, reading age, and text difficulty of your sample text.

Your Results:
Your text: In the shadowed alleys of Arkham, amidst the eerie ... (392 words total)

1.
Flesch Reading Ease score: 46 (text scale)
Flesch Reading Ease scored your text: difficult to read.

2.
Gunning Fog: 14 (text scale)
Gunning Fog scored your text: hard to read.

3.
Flesch-Kincaid Grade Level: 12
Grade level: Twelfth Grade.

4.
The Coleman-Liau Index: 13
Grade level: college

5.
The SMOG Index: 11
Grade level: Eleventh Grade

6.
Automated Readability Index: 13
Grade level: 18-19 yrs. old (college level entry)

7.
Linsear Write Formula: 13
Grade level: College.

----------------------------------------------
READABILITY CONSENSUS
----------------------------------------------
Based on (7) readability formulas, we have scored your text:

Grade Level: 12
Reading Level: difficult to read.
Age of Reader: 17-18 yrs. old (Twelfth graders)
----------------------------------------------

So this is considerably higher than either GPT-3 and Claude.ai — leading me to believe that Claude.ai is using some variation of GPT-3 or has a similar level of complexity. But it’s also possible that Claude has just been trained so much to be “helpful” that that manifests in using simpler language even when prompted to do something that might otherwise be complex. Because, let’s take a minute to look at what actual H.P. Lovecraft writing scores…

H.P. Lovecraft

I took the first section from The Dunwich Horror, a fairly well-known Lovecraft story, and fed it into the same readability calculator to see what it got. I considered using the last section or the second-to-last — the idea being feeding the part where weird, macabre stuff is happening and seeing how that scores — but discarded the idea. I figured that Lovecraft’s prose is probably going to be pretty uniformly readable/unreadable no matter what part of the story we’re in, so the beginning would be as good as the end. Here’s how Lovecraft himself scored:



TEXT READABILITY CONSENSUS CALCULATOR
www.ReadabilityFormulas.com

Timestamp: 09/07/2023 — 10:10:51am
Purpose: Our Text Readability Consensus Calculator uses 7 popular readability formulas to calculate the average grade level, reading age, and text difficulty of your sample text.

Your Results:
Your text: When a traveler in north central Massachusetts tak ... (1330 words total)

1.
Flesch Reading Ease score: 50 (text scale)
Flesch Reading Ease scored your text: fairly difficult to read.

2.
Gunning Fog: 16 (text scale)
Gunning Fog scored your text: difficult to read.

3.
Flesch-Kincaid Grade Level: 14
Grade level: College.

4.
The Coleman-Liau Index: 10
Grade level: Tenth Grade

5.
The SMOG Index: 11
Grade level: Eleventh Grade

6.
Automated Readability Index: 16
Grade level: College graduate

7.
Linsear Write Formula: 19
Grade level: College Graduate and above.

----------------------------------------------
READABILITY CONSENSUS
----------------------------------------------
Based on (7) readability formulas, we have scored your text:

Grade Level: 14
Reading Level: fairly difficult to read.
Age of Reader: 21-22 yrs. old  (college level)
----------------------------------------------

Lovecraft, himself, is “fairly difficult to read” (accurate), and is a grade level 14. This bumps GPT-3.5 into the lead in terms of how closely it was able to emulate Lovecraft despite the problems of the story it generated itself (which are more issues with the way the prompt was constructed and how ChatGPT interpreted it than with the prose).

Now that we have Lovecraft as a baseline, let’s see how GPT-4, the currently most advanced LLM available through ChatGPT, fares in its attempt.

GPT-4

In the coastal town of Arkham, Massachusetts, lived Elara, a young woman with an insatiable thirst for knowledge of the arcane. She’d frequent the dimly lit corridors of Miskatonic University’s library, losing herself in manuscripts of forbidden lore. She was especially engrossed in translating ancient languages that most deemed nonsensical or even malevolent.

During these solitary quests for eldritch wisdom, Elara frequently crossed paths with Nathaniel, a scholar who held a fascination for cosmic horrors that lay beyond human comprehension. Though equally reserved, they began a correspondence, first as cryptic notes in the margins of shared books, then as longer letters discussing their findings. A bond formed, delicate as an old parchment yet profound as the secrets they pursued.

As months passed, the stars aligned in mysterious constellations, announcing an event of cosmic significance. Elara came across an unsettling prophecy that spoke of the awakening of an elder god, capable of tearing the fabric of reality itself. Disturbingly, the event was to coincide with the upcoming winter solstice.

Fearful yet determined, Elara and Nathaniel decided they must delve into rituals known to only a few in order to prevent the cataclysm. By now, their relationship had deepened, yet both were acutely aware that the path they tread upon could either strengthen their bond or tear it irreversibly.

They traveled to an isolated lighthouse rumored to be a focal point for strange energies. The midnight hour approached; the atmosphere thickened, saturated with tension and unseen forces. With every incantation, Elara felt her essence intertwining with Nathaniel’s. She experienced a euphoria that was both intellectual and emotional—a connection far surpassing mundane romance.

But, as the final words of the ritual were uttered, a cacophony erupted from the depths of the ocean. Tentacles, each massive and covered in luminous eyes, surged forth, reaching for the fabric of reality as if to tear it asunder.

Summoning their collective will, Elara and Nathaniel recited a counter-spell, merging their energies into a barrier of pure intention. The monstrous appendages recoiled as if scorched, and, with an earsplitting shriek, retreated back into the abyss.

They had succeeded, but the cost was evident. The ritual had drained them both, leaving them in a state of ethereal detachment. They knew that a part of them would forever belong to the arcane world they had interacted with. Yet, the bond between them was solidified, akin to the alchemical fusion of elements.

Returning to their scholarly life, Elara and Nathaniel found solace only in each other’s company, understanding that their romance was far from ordinary—it was a symbiotic nexus fueled by love and a perpetual quest for the unknown. However, they also understood that their love, like the forbidden tomes they studied, was a thing of sublime beauty and inherent danger, forever teetering on the edge of cosmic horror and human emotion.

And so, in the echoing chambers of Miskatonic University and the silent abyss below, their love story became another layer of the complex tapestry of arcane mysteries—an enigma that would never be fully understood but would always be deeply felt.

Again, we’re namedropping Arkham and Miskatonic. The names are appropriately old-timey. Interestingly, the story of Elara and Nathaniel is similar to the story of Arthur and Evelyn from GPT-3.5 — both met and are joined by research at Miskatonic University, and in both, the main characters end up doing some magic thing together that binds them somehow. GPT-4 gets with the program, though, and understands implicitly that I don’t want you to say “hey everyone, listen to my Lovecraftian story about eldritch horrors!” but show, rather than tell. Specifics are, as usual, pretty vague, but this is the first time we get an actual peek at an actual creature: “tentacles, each massive and covered in luminous eyes, surged forth.” As with the early experiment (and each subsequent attempt), AI doesn’t really get human emotion, so the actual relationship is pretty glossed over, but of all the attempts, this one gets closest to the mark in terms of tone and style and not showing its hand as much that it was written by a robot.

Let’s see how it compares as far as readability:

TEXT READABILITY CONSENSUS CALCULATOR
www.ReadabilityFormulas.com

Timestamp: 09/07/2023 — 10:09:19am
Purpose: Our Text Readability Consensus Calculator uses 7 popular readability formulas to calculate the average grade level, reading age, and text difficulty of your sample text.

Your Results:
Your text: In the coastal town of Arkham, Massachusetts, live ... (515 words total)

1.
Flesch Reading Ease score: 41 (text scale)
Flesch Reading Ease scored your text: difficult to read.

2.
Gunning Fog: 15 (text scale)
Gunning Fog scored your text: hard to read.

3.
Flesch-Kincaid Grade Level: 12
Grade level: Twelfth Grade.

4.
The Coleman-Liau Index: 13
Grade level: college

5.
The SMOG Index: 12
Grade level: Twelfth Grade

6.
Automated Readability Index: 13
Grade level: 18-19 yrs. old (college level entry)

7.
Linsear Write Formula: 14
Grade level: College.

----------------------------------------------
READABILITY CONSENSUS
----------------------------------------------
Based on (7) readability formulas, we have scored your text:

Grade Level: 13
Reading Level: difficult to read.
Age of Reader: 18-19 yrs. old (college level entry)
----------------------------------------------

This time, it scores a grade level 13 overall. Remember, Lovecraft was a 14 and GPT-3.5 only got a 12 whereas GPT-3 (in December 2022) only got a 7. I’m not sure how well GPT-4 would be able to weave together a longer form story, but based on pure numbers, it gives the impression that it does a fairly decent job of immitating Lovecraft.

Conclusion

One of the reasons I used Lovecraft is because his style is pretty unmistakable, and there are context clues you can pick up if you know the vernacular having read at least a couple of his stories. Also, lots of his stuff is available on Gutenberg.org in the public domain, and has been for years, so LLMs trained on the internet might have had direct access to the source material to understand how to emulate it, making it a good (as good as anything) thing to test against. If I wanted, I could try similar tests to get LLMs to write a poem in the style of Edgar Allen Poe and see how it compares to The Raven.

I am still waiting to be fully convinced about GPT-4. But it’s fascinating to me that, in terms of actually copying the style and tone and complexity of Lovecraft, GPT-4 gets pretty damn close, and observably closer than GPT-3.5 or GPT-3 did almost a year ago. Perhaps telling an AI to write a book isn’t actually to far off.

Series Navigation← Comparing different language modelsChatGPT’s Bing Search and the “Google yourself” trick →

Published on

September 7, 2023

Revisiting the Lovecraft test

The results

Claude.ai

GPT-3.5

H.P. Lovecraft

GPT-4

Conclusion

Comments

1 response to “Revisiting the Lovecraft test”

Leave a Reply Cancel reply

My Profile

Your Profile