Unexpected side-effects of automatic writing
Your computer might be doing a bit more than you want
Sir Arthur Conan Doyle, the author of the Sherlock Holmes mysteries, was a complicated man. Not only was a medical doctor and a rational thinker, but also a bit of a mystic. He was a true believer in spirits and ethereal manifestations and even wrote (although he would say “transcribed”) an entire book, Pheneas Speaks: Direct Spirit Communications in the Family Circle (1927) that was created without conscious thought by automatic writing through the hand of his wife, Jean Elizabeth Leckie. These “written communications” were from “relations and friends who had passed beyond the border.” (That is, they were dead. Pheneas himself was “…a very, very high soul, sent especially to work through you on the earth plane. He died thousands of years ago in the East, near Arabia.” Of course.)
The writings were made without any particular mental effort—the “higher spirit” moved the hand leaving text in its wake, usually while the family was trying to do something else, like have dinner. [Doyle]
In a similar way, we now live in a world that’s full of computer-powered automatic writing in the form of spell checking, automatic error corrections, and AI-based writing tools like ChatGPT. While incredibly useful, they can at time cause immense headaches.
Automatic spell correction can be a lifesaver, but when used without some careful attention, it can make a huge number of errors at an unparalleled speed. Mostly it’s embarrassing, but when spell-correction is an automatic function of your data analysis tool, it can cause gigantic problems.
In a recent survey of over 166,000 genetic research papers published between 2014 and 2020, researchers have found that the number of papers that use Microsoft’s Excel spreadsheet program as their basic data manipulation tool. [Abeysooriya ] That’s fine, except that these studies found that autocorrect had changed a bunch of data in the genomic data!
Amazingly, roughly every third table containing genetic data ended up with incorrect information as gene names are automatically renamed to calendar dates. For example: the gene Membrane Associated Ring-CH-type finger 1, commonly known as March-1, gets rewritten as a date when the data is read into Excel. Similarly, names of genes in the Septin family (such as Sept1) also parsed as a date rather than a gene name. Likewise, Basic-Helix-Loop family member E41 (aka in the literature as DEC1), somehow gets transformed to the date Dec-1-1900. And, as you might expect, this new string is a date, not the name of a gene. Except in this case, when it is.
In August 2020, the international committee that standardizes gene names stopped trying to fight the tide and just changed gene names to avoid this problem. Thus, those gene names beginning with MARC, MARCH, and SEPT will now begin with MTARC, MARCHF. That is, the bioinformatics community decided it was easier to change gene symbols than changing the habits of researcher. They just want to analyze freshly sequenced genetic data and not worry about the potential for errors creeping into their analysis.
This is suggestive of a deeper problem: spreadsheets and autocorrect allow for silent errors, ones that go unnoticed, often for years. The even deeper problem is that this lesson may (or may not) be learned by everyone. The pleasure of naming a gene is given to the scientist who first identifies it. Do all genetic researchers know to avoid names that could possibly be confused with dates by particular pieces of software?
The problem is bad enough that other researchers have made apps that undo the damage caused by importing gene data into Excel spreadsheets. [Koh]
Similarly, the internet world is full of stories about breakups between people who have had autocorrect intervene in an unfortunate way. It’s hard to tell which of these are apocryphal vs. actual, but given the number of people who communicate largely by texts, it’s easy to believe that it’s happened at least a few (thousand?) times. It’s a small misclick from “I want to see you dead” (instead of “I want to see you dear”) or “I’m happier when you’re gone” (instead of “I’m happier when you’re home”). You can imagine the fallout from such autocorrected errors. [Marder]
I leave it to the reader to find more astounding examples (and there are many… and they are astounding).
But now we have AI to help us write vast amounts of text automatically.
Using AI Large Language Models (LLMs) in court cases is just asking for Unanticipated Consequences to strike. With their well-known tendency to “hallucinate” (that is, just make up text), using an LLM to write your legal brief is a terrible idea.
And yet lawyers persist in using such systems. In one lawsuit, ChatGPT made up a number of non-existent court decisions. The lawyer who filed the brief claimed that in using an AI system to do his legal research, he found it to be “a source that has revealed itself to be unreliable.” No kidding. Thing is, people have been writing about LLM hallucinations ever since they first appeared on the technology landscape. [Weiser] They should have known better.
Ironically, while the brief was being prepared, one of the lawyers asked ChatGPT if the citations were real or fictional. “Yes,” the LLM replied, “it is a real case.” Which is, of course, a hallucination.
Rule of Thumb: Take caution when creating automatic correction or text generation tools, it’s easy for them to generate well-intentioned gibberish or severely mangle the best of texts (or data). In cases when the text is too much to verify, reconsider what you’re doing, and if the risk / reward is worth the possibility of major errors. Do you want Pheneas updating your spreadsheet? I’m asking for a spiritual (and quite possibly dead) friend.
========
[Abeysooriya ] Abeysooriya, M., Soria, M., Kasu, M. S., & Ziemann, M. (2021). Gene name errors: Lessons not learned. PLoS Computational Biology, 17(7), e1008984. https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008984&type=printable
[Doyle] Dolye, Arthur Conan. Pheneas Speaks. The Psychic Press and Bookshop (UK) (1927) https://archive.org/details/in.ernet.dli.2015.242449/page/n3/mode/2up also: https://www.arthur-conan-doyle.com/index.php/Pheneas_Speaks
[Koh] Koh, C. W., Ooi, J. S., Joly, G. L., & Chan, K. R. (2022). Gene Updater: a web tool that autocorrects and updates for Excel misidentified gene names. Scientific Reports, 12(1), 12743.
https://www.nature.com/articles/s41598-022-17104-3.pdf
[Marder] Marder, Ariane. “7 Accidental Breakup Texts (Thanks, Smartphones!)” Glamour (July 10, 2012)
[Weiser] Weiser, Benjamin “Here’s What Happens When Your Lawyer Uses ChatGPT” New York Times (May 27, 2023) https://www.nytimes.com/2023/05/27/nyregion/avianca-airline-lawsuit-chatgpt.html
I'm sure you've seen this before but perhaps not everybody has. It shows how spell checks can get things wrong. (And there are grammar checkers that do also - or can't tell the difference without being told. (Several examples in "Eats, Shoots and Leaves" by Lynne Truss. (An elephant walks into a bar, orders food, then fires his gun before going out. When questioned on his motives he points to a guide on feeding elephants with an unnecessary comma).
ODE TO A SPELL CHECKER
by Jerrold H Zar
Eye halve a spelling check her,
It came with my pea sea.
It plane lee marks four my revue
Miss steaks aye kin knot sea.
Eye ran this poem threw it,
Your sure reel glad two no.
Its vary polished in it’s weigh,
My checker tolled me sew.
A check her is a bless sing;
It freeze yew lodes of thyme.
It helps me right awl stiles two reed,
And aides me when aye rime.
Each frays come posed up on my screen,
Eye trussed too bee a joule;
The checker pours o’er every word
To cheque sum spelling rule.
Bee fore wee rote with checkers
Hour spelling was inn deck line,
Butt now when wee dew have a laps,
Wee are knot maid too wine.
Butt now bee cause my spelling
Is checked with such grate flare,
There are know faults with in my cite,
Of nun eye am a wear.
Now spelling does knot phase me,
It does knot bring a tier;
My pay purrs awl due glad den
With wrapped words fare as hear.
To rite with care is quite a feet
Of witch won should be proud;
And we mussed dew the best wee can
Sew flaws are knot aloud.
That’s why eye brake in two averse
Cuz eye dew want too please.
Sow glad eye yam that aye did bye
This soft wear four pea seas.