I recently wrote to ScotRail’s press office to express some thoughts about the new automatic PA system being trailed on some of their trains, called Iona.
I’ve put the body of the email bellow but wanted to expand on some thought processes while drafting it, and to maybe jump off into other topics.
I wont reveal the actual route but I regularly travel on a line in Glasgow which uses ScotRail’s class 380 trains (the modern looking ones with the slanted flat faces) and, like many I imagine, was slightly taken a back when I heard the new PA system.
Something I openly acknowledge about myself is I can sometimes get set in my ways. I take time to get used to changes, I get familiar with things, I can get uncomfortable when they change and need to adjust. So when I initially heard the system and thought “what is this nonsense, “I dismissed it as my own intransigence.
The more I heard it though, the I could not escape the conclusion that, no, this is actually just bad. I’m very conscious not to let my own change-bias cloud my vision, but no matter how I try to square it, this system is just garbage.
Tone of Critique
In addition to avoiding my own personal preference bias in thinking bout Iona, I was keen to make sure the body of my critique was written in good-faith and conveyed an collaborative tone. It occurred to me that many people might feel the same way I do but lack the specific language to articulate a critique, in addition to my wanting to separate myself from a more general “I don’t like it” complaint.
As someone trained in industrial design and software engineering, I can leverage the dual benefit of an ergonomic / system design perspective as well as an understanding of how a system like this is technically pulled together. This is by no means unique, but I suppose its an interesting case-study in the importance of design and, as the Americans tend to group them, ‘the humanities’ / social sciences in technical work.
I am a big believer in the Railways. In fact, if it were not for the fact that my country has not yet had enough “oh shit” moments to act as it will need to in response to climate change, I would love to be working in them. Someday soon perhaps, enough “oh shit” moments willing.
Railways are one of maybe three key pillars in how we will build a truly sustainable future. If we (as the UK) were action proportionally, we would be aiming to double network capacity by 2040 and ensure that we were on track for the majority of inter-urban transport happens by rail by the 2050’s.
I want this to be the backdrop to critiques such as this; that its important to make sure the railway is as robust and accessible as possible, in every aspect of its operation.
Quality
I’ve got two sections in the email where I talk about the specific quality issues, but I wanted to expand on some of those references here, kept brief so as to limit the length of the email.
There’s really a couple of things going on here, an unnatural speed variation mid-sentance, and actual audio jitters when pronouncing certain phrases such as “Williamwood”, as though the recording playback is experiencing a signal interruption or someone bumping an audio cable, but every time.
On the mid-sentence speed variation, the closest analogy I can draw is that its like someone on stage, reading directly from a speech, who is only partially dealing with their obvious nerves. I’ve been there, its like the part of your brain thats coordinating speech regulation gets impatient and wants to jump to the end of the sentence, causing cadence changes. Bizarre that we’ve got a computer exhibiting the same “anxiety” by accident.
The concerned tone may also be impacted by the accent, but you also have sentences such as “please mind the gap when a lighting from this train” end in a dip in tone with a sharp rise towards the end of “train”, as though its a question.
Lastly, the tone is just flat. I mentioned this in point 5 of my ‘specification’, recommending that a synthesised piece of speech does not need to sound the same as a real person, a bit of robotic voice is OK, in fact, if it helps create greater fluctuation in tone and consistant in place names, it’s better!
The ppremise behind using a generative AI system to “sound more authentic” is flawed. You don’t want a PA system to sound like a real person, real people can talk with flat mumbleing tonexthink of the last time you heard someone on a tanoy asking someone to come to reception, there probably wasn’t any intonation at all.
Recordings
I took a trip to Whitecraigs recently to visit the Garden Centre there and had the idea to record some segments between there and Mount Florida where I had to change. I think there may be better and worse examples but I’m interested in how many quality issues there are here after only four stations.
In this clip, listen for the jitter in the word “Williamwood”, the tonal fluctuation in “Muirend”, and the slightly rushed “Glasgow Central”.
In this next clip, I actually think it does quite a good “Mount Florida”, but sounds stern and monotone with the note about “stations to Newton”.
Here is the full audio clip from departing Whitecraigs to approaching Mount Florida, I’d also note the rushed “Cathcart”
The National Article
It was at this point of drafting that I saw the article in The National, ScotRail issues statement on new AI voice announcements by Lucy Jackson, which confirmed that the system being used was a generative AI based system by ReadSpeaker.
I decided against criticising the system based due to its basis in generative AI alone as I did not want to be dismissed as having a staunch bias against it. In fact I don’t have a hard position on the technology in one way or the other.
The article also mentioned the point about regional voices being used in separate parts of the network. So far all comments I have seen are about the one voice Iona, perhaps the intention is to create new voices after a wider roll out?
Or perhaps, more frustratingly, this is a product selling point given by ReadSpeaker which is being uncritically reproduced by ScotRail’s communication teams. Either way I can’t say I’m pleased.
On the points made about integrated networks, I am specifically referencing various pieces of work by Gareth Dennis, who champions principles of integrated transport systems. This episode of RailNatter is a good primer on these ideas, specifically referencing the “Dimensions of infrastructure” from Star and Ruhleder 1996, and the book Transport for Suburbia: Beyond the Automobile Age, first edition by Paul Mees, 16 Dec. 2009.
The principles from the latter book are summarised like this:
- Visibility: The visible touch points of the physical infrastructure and of the various interfaces of the system.
- Simplicity: How complex the system is to interface with and understand.
- Integration: How well a particular part of the network is connected to other lines, routes, and between other systems.
- Affordability: The system has to be perceived as acceptable level of cost.
- Safety: The comfort of the system, its feeling of safety, its accessibility*.
- Ownership: The perception that we all have a buy-in to the system, a sense of ownership.
*Accessibility spans all of these points, its mentioned on safety as a reference to the psychological safety of knowing how accessible you’ll find it, where potential problem areas might be, what provisions and pre-planning are required.
Could a Generative System Work?
I’ve been quiet publicly on the topic of so-called generative AI.
I say ‘so called’ because I’ve always disliked this grouping of quite diverse technologies under the generalised label for the sake of mostly a marketing tool. More on this in a bit.
The reason I’ve kept my mouth shut is pretty much the same reason alot of people have; I don’t want to make the wrong bet and end up looking foolish.
That said it is worth stating here that I stand by artists, workers, content producers etc in their defence of their various trades and intellectual property in their griviances (for lack of a better word) with generative AI.
This topic had me wondering though, hypothetically, if these issues were sattisfied for a hypothetical system of this type, how might it be made to work? Or to put it another way, how might this system have not turned out so bad?
The recordings are all the same, Iona is clearly not being prompted at speaking time, to do so would be incredibly expensive and risk unexpected outputs. As described by ScotRail, the system is prompted, produces one or more outputs, and then one is selected. It’s not entirely clear if ScotRail or ReadSpeaker perform this selection, though I’d wager ScotRail performs the selection for a certain number of attempts offered by ReadSpeaker.
I would suggest that a better synthesiser could still use a pre-recorded segments like the previous one, but use the generative component when phrases are not found in its language base, using the language base as it’s fine-tuning data, with the original voice actor’s permission and compensation.
This way the error space is far smaller, more permutations can be created at once to be chosen from, errors can be more easily refined.
On a single word basis it would be easier to get a specific intonation and inflection. All words of a type, like time codes or station names which get inserted into a preformed section of sentence, are fairly consistent within themselves already. Fine-tuning a synthesiser to produce just these types of words easier than larger compound sentences.
Again though this raises the question of why not just bring the original voice actor back in, or someone else who sounds similar?
The BBC Article, ReadSpeaker and Gayanne Potter
I had the drafted email ready to go. Nice collaborative “maybe you should rethink this” vibe to it, and then I saw the BBC News article ‘Stop using my voice’ – New train announcer is my AI clone by Jamie Russell
I’ve about covered what I think of it, of ReadSpeaker’s pathetic, aggravatingly curt response, and what needs to happen next in the body of the email.
But to summarise, the story presents what should be a scandal under “normal” circumstances.
Even if you take the position that general training data used for LLM systems is acceptable (which i will, for the sake of argument, take a neutral stance on), the idea of taking recordings before systems like Chat GPT were even publicly available, under an alternative stated purpose, and fitting an AI system to it, is egregiously, cartoonishly wrong.
Right now the UK is on a knife edge concerning AI and IP law. The current Labour government have not only not broken from the current austerity based, fiscal rules given managed decline that underpins our politics cross-party, but what little ideas they have tried to create economic growth have largely fallen flat.
In response, they have pinned their hopes on appeasing big US tech companies, being a tax haven for datacentres, and the public buying into generative AI in a big way.
I won’t pull any punches by stating now that this will end in failure. The real concern lies in what will be conceded in the mean time in the attempt to appease the aforementioned tech companies, namely intellectual property protections and privacy matters.
Our arts and culture sectors have been decimated after decades of hash austerity measures and foolish cost-cutting. I don’t think its an exaggeration to view instances like this, instigated by the nationalised railway of Scotland, as an early warning system of what direction we are heading in the future.
While I will not discount generative AI systems I a broad sweep, I think that we must not fall at even the first hurdle and fail to challenge issues like this because we want to stand behind an AI system bullishly. Instances such as this could be the ‘easy mode’ test case for us concerning how we will respond to such breeches of principles of labour and data ownership. How we think of art, of labour, and of intellectual property are being tested. It cannot stand that Potter’s recordings can be used in this way, and her direct requests to have her voice removed are so quickly dismissed.
Cheapness and Network Prestige
Lastly I want to pick up on the comment about this system feeling cheap.
It is an insult that such a low quality product is being used on our railway network, on what should the the default wag to travel.
We continue spend billions on road projects in Scotland, despite how evident it is that this is environmentally unconscionable, and yet we allow such a poorly conceived system out on trains because “it’s better value in the long run”. Better value than what? Keeping one person on a retainer to record the odd new station name?
If a cheaper or better alternative to a system is available as a replacement then it would make sense to use it. If, however, a worse system is forced into place, resulting in a downgrade for passengers, all for the sake of trying to save money, is unacceptable.
Even down to the promotional image of “Iona” is half-assed. A generic typically-soulless production of a conventionally attractive white woman with red hair against a vaguely highland looking mountain-shaped background, like a generic tourist trap pop-up on High Street in Edinburgh was combined in a vat with an doodle by someone who’d only interaction with Scotland was half-remembering the Disney movie Brave, on a deadline of 10 minutes.

I have worked quite a bit in the past two years on LLM systems at work, namely on our ‘Aiden’ pan-bank general assistant system. This includes features like retrieval augmented generation, where a knowledge base is queried and results passed into a prompt as part of the context base, in order to direct the result in a particular direction, and fine-tuning foundation models to a particular use case, similar to how Iona would have been built.
I think a system like Iona could be produced with relative ease.
In my opinion this adds to the insult of this system. No matter how cheap it is to purchase, it’s likely far cheaper to produce and run (for ReadSpeaker).
ScotRail have bought a lemon.
The Email
To whom it may concern,
I am writing to comment on the new automatic train announcement system recently rolled out on ScotRail trains and to express my belief that this new system is a negative change for the network.
I’d first briefly like to comment on the quality of the announcements, the new ReadSpeaker system “Iona” is clunky and stilted sounding, with a tone of voice tinged with almost a concerned breathlessness at times. Compared with the previous pre-recorded composer system, this feels like a downgrade, comparable to a much older voice synthesis system from, say, 15 years ago. It sounds robotic, stilted, inhumane, and frankly, cheap.
I’d call into question the quality of product offered by this new system. I’d strongly urge ScotRail management to consider the quality and clarity of the announcements over whatever supposed benefits this new software offers (I would guess that it’s billed as future proof, adaptable to custom announcements etc).
I note in the recent National article on the topic that in the response from ScotRail it was mentioned that custom voices could be generated by the new system bespoke per ‘region’.
If this is the case, and the system does change based on region, I believe that this is a significant mistake.
In terms of accessibility, users of an integrated system get used to a particular set of common phrases found in announcement systems they hear from, even if it’s only on a brief exposure during a trip. This familiarity helps with recognising key phrases and pulling out important information, even if a hearing impairment is present. Familiarity with the voice (or signage system, ticketing system, graphic design etc) lessens the effort needed to understand what is being communicated.
If a separate voice is used in separate regions or routes, this diminishes the effect of this existing association and would have the effect of making the system feel more fragmented, not more local, as seems to be intended.
All writing on the topic of what goes into making an integrated transport system highlights the importance of consistency of wayfinding and other interfaces. I believe that having distinct voices diminishes the network level accessibility and could reduce how connected the experience of using the network feels.
This same idea holds true for the stilted nature of the announcements that I have referenced above. The variation, infections, little glitches and quirks, speed changes mid-sentence, sharp gaps between sentence segments and names, and – as remarked in the National article – bizarre pronunciation choices, are simply not an acceptable way-finding interface for those listening for route announcements.
As someone trained in industrial design and working as a software engineer, I would strongly recommend that further consideration is made on the best way to replace the previous announcement system (assuming that this is a goal in itself), to see if this generative system can be better fit and it’s quality improved, or if another option entirely would be more suitable.
If I were specifying a system of this type, I would require it to satisfy, without compromise, the following requirements, whether generated artificially or recorded by a real person:
1) The pronunciation should be clear and consistent across the entire language base.
2) The cadence should be similarly consistent with clear delineation between key phrases and connecting sentences.
3) The tone should be varied in order to avoid any flat sections within the sentence and to avoid phonetic collisions. The most common example of the current system violating this rule is with the sentence “We are now approaching” which sounds almost as though it is mumbled, even in a quiet carriage with minimal ambient noise. Compare this with the way the previous system featured tonal fluctuations, particularly on the emphasised “ow” sounds in its accent, which made the above phrase distinguishable even over significant noise.
4) The pronunciation of common phrases and proper nouns should conform to one set of accepted standards, acknowledging that regional variances exist. Rather than trying to adjust to each region, a single phonetic set should be chosen.
5) It is not a requirement that it sound like “regular” human speech; as though being read out by a conductor. In fact, I’d argue that an algorithmic tone is better, so long as the above 4 points are satisfied.
I hope this criticism is useful and constructive in some way, I do think that we should accept better for a modern transport system and for Scotland national railway.
Edit 30/05/2025
Since drafting this email I have read the account of Gayanne Potter claiming that her voice has been used as a target for this system, without her consent, and that her request to have the voice removed was redirected towards the vendor company who dismissed it.
This is entirely morally indefensible.
Aside from the core issue – that she did not give explicit consent for her voice to be used in this system and is not explicitly asking for the voice to be removed – she did not even know that this was the intended purpose of the recordings of her voice taken, being told it was for something else entirely.
In fact, given that the recordings were taken before the large-scale proliferation of large language (transformer model) based ‘generative AI’ systems, there is no way she could possibly have consented to this.
I anticipate, as has been my own personal experience with this type of company at my work, and the response from ReadSpeaker quoted in BBC news articles, that some vague justification will be given along the lines of the murkiness of IP and copywriter law and current moves to weaken these laws which remain a point of uncertainty in the UK and EU.
I reject these hypothetical responses on principle. It is simply not right, regardless of the excuse, to use this person’s voice and work for this system in any capacity. I appeal to ScotRail’s values of openness and honesty, and similar values espoused by the Scottish government of fairness and respect, that you do the same.
Thank you very much for taking the time,
Sincerely,
Robyn F H Veitch
Robynfhveitch@gmail.com