While working on my last blog post and video, – where I broke down the differences between Simplified and Traditional Chinese and how they relate to Mandarin and Cantonese from a translator’s perspective – I decided to ask ChatGPT for examples of linguistic conventions that differ between Traditional Chinese as used in Taiwan and Hong Kong.
After reading ChatGPT’s answer, I felt compelled to share my insights with the wider language industry and community because my experience underscores the crucial need for fact-checking and human expertise when using AI tools like ChatGPT.
As a native Cantonese speaker, I was able to spot several mistakes right away. But if you’re not familiar with Cantonese, or don’t speak Chinese at all, you might easily take this information at face value. And the implications could be serious, especially if key linguistic decisions were made based on flawed AI-generated insights.
Now, let me be clear – I’m not a technophobe who’s opposed to the use of AI. Far from it! In fact, in a recent blog post, I explored the challenges and opportunities AI presents for the language industry. Personally, I believe the opportunities are vast, and language professionals who are ready to adapt and embrace these changes are not just positioned to survive but to truly thrive. I'm extremely curious about AI and am currently working my way through Google's brilliant AI Essentials Course on Coursera, which I highly recommend to anyone who wants to know more about AI. Even prior to taking this course, I was already using AI tools like ChatGPT as a productivity booster.
However, as many industry experts have highlighted, and as I learned in a highly informative webinar hosted by the Chartered Institute of Linguists (of which I am a member), it’s crucial for language professionals, service providers, translation agencies, and clients to remain vigilant about the risks and limitations of AI.
The CIOL also urges translators to flag these issues and help educate the public about the potential pitfalls of AI. That’s why I’ve decided to write this blog post – to shed light on some of the glaring mistakes I’ve encountered firsthand and thereby highlight the importance of human oversight in the use of AI.
In this blog post, I’ll dive into some of the glaring mistakes I found in ChatGPT’s response. To make this as clear and transparent as possible, I’ve pasted its answer verbatim and added my own comments to highlight where it went wrong.
—-
CHATGPT’S RESPONSE REPRODUCED HERE VERBATIM (MY COMMENTS HIGHLIGHTED IN YELLOW)
Here are some examples of differences in linguistic conventions between Traditional Chinese used in Taiwan and Hong Kong. These include variations in vocabulary, terminology, and even punctuation usage, despite both regions using the Traditional script:
1. Vocabulary and Terminology
Taiwan and Hong Kong often use different terms for the same concepts, similar to British vs. American English:
• “Cheese”
• Taiwan: 起司 (qǐ sī)
• Hong Kong: 芝士 (zhī shì)
• “Potato”
• Taiwan: 馬鈴薯 (mǎ líng shǔ)
• Hong Kong: 薯仔 (shǔ zǎi)
• “Bus”
• Taiwan: 公車 (gōng chē)
• Hong Kong: 巴士 (bā shì)
• “Computer”
• Taiwan: 電腦 (diàn nǎo)
• Hong Kong: 計算機 (gài syùn gēi) — though this can also mean “calculator.”
→ This is NOT true! 計算機 (also referred to as 計數機 colloquially) only ever refers to a “calculator” in Cantonese. A computer is always referred to as 電腦. A Hong Kong-based client would most likely not be very impressed if a translator translated “computer” as 計算機 – they might assume (not unreasonably) that the translator is a non-Hong Kong native who simply “converted” the mainland Chinese term 计算机 (which does mean “computer” in mainland China) from Simplified Chinese into Traditional Chinese.
2. Grammar and Sentence Structure
While largely similar, some minor differences in grammatical preferences reflect regional speech patterns:
• “Do you have time?”
• Taiwan: 你有空嗎? (Nǐ yǒu kòng ma?)
• Hong Kong: 你得閒嗎? (Néi dāk hàahn ma?)
• “I’m hungry.”
• Taiwan: 我餓了 (Wǒ è le)
• Hong Kong: 我肚餓 (Ngóh tóuh ngoh)
3. Formality and Stylistic Preferences
Taiwan tends to use more formal or literary expressions, while Hong Kong often adopts a more colloquial tone, reflecting the strong influence of spoken Cantonese on written conventions.
• “Welcome!”
• Taiwan: 歡迎光臨 (Huān yíng guāng lín) — more formal.
• Hong Kong: 歡迎 (Fūn yìhng) — often shorter and more casual.
→ If you walk into a half-decent shop or restaurant in Hong Kong, you’re just as likely to be greeted with 歡迎光臨 as anything else! Having said that, ChatGPT is not entirely wrong in stating that “Taiwan tends to use more formal or literary expressions” – I have certainly noticed this when translating Taiwanese academic documents, which often contain wording with an almost quaint literary feel. So, all in all, while this statement not a “mistake” per se, it certainly requires further examination and human input to untangle some of the nuances involved.
4. Punctuation
Punctuation conventions also differ slightly between the two regions:
• Quotation Marks
• Taiwan: 「 … 」
• Hong Kong: 「…」 (used similarly but spacing and placement can vary).
• Ellipses
• Taiwan: Six dots (……).
• Hong Kong: Three dots (…).
→ OK, this is straight up WRONG. The ellipsis is always written as SIX, not three dots in Hong Kong. According to the official style guide of Chinese University Hong Kong:
• 標點全用全型,中文破折號──,不宜用英文的短線代替。省略號用六點:……,不用英文的三點:…
5. Loanwords
Taiwan often borrows words directly from Japanese or creates its own transliterations, while Hong Kong tends to adopt English loanwords directly due to its colonial history.
• “Parking lot”
• Taiwan: 停車場 (tíng chē chǎng)
• Hong Kong: 泊車場 (baak chē chèuhng) or more commonly 停車場 (tìhng chē chèuhng)
→ This is not true. In Cantonese, “泊車” is used as a verb; a “car park” is always referred to as 停車場 in Hong Kong, both in written and spoken forms. I have certainly never heard of the term 泊車場 in any context!
6. Technical and Administrative Terms
Some technical or governmental terms differ due to political and administrative separation:
• “Post office”
• Taiwan: 郵局 (yóu jú)
• Hong Kong: 郵政局 (yàuh jing guhk).
→ This is incorrect. While the term “郵政局” does exist, it refers to the administrative body that oversees the city’s postal service (similar to “Royal Mail” in the UK) and is not the equivalent of “post office,” which is the place you visit to post a letter. The correct term for a post office, both in spoken and written form, is “郵局”.
These differences highlight how cultural, historical, and linguistic influences have shaped the conventions of Traditional Chinese in Taiwan and Hong Kong, even though the script itself is largely the same.
AI Hallucinations and Biases
In my opinion, the errors we’ve discussed highlight two critical issues with AI that deserve attention: hallucinations and biases.
Let's first take a look at "hallucinations".
In the context of AI, a “hallucination” occurs when the model generates information that is plausible-sounding but completely fabricated or incorrect. Essentially, the AI “guesses” rather than admitting it doesn’t know the answer. This can happen because the model is designed to predict the most statistically likely sequence of words based on its training data, not to verify the factual accuracy of its output.
For example, when ChatGPT claimed that 計算機 means “computer” in Hong Kong, it wasn’t "lying" intentionally – it was making an educated guess based on patterns in its training data. However, since this claim wasn’t grounded in linguistic or cultural reality, it turned out to be wrong. For users unfamiliar with the language, such hallucinations can be dangerously misleading, especially when they are presented with the same confidence as correct information.
This issue isn’t unique to linguistic contexts; hallucinations have been observed in many fields, from legal summaries to medical advice generated by AI tools. It’s a critical limitation that reinforces the need for fact-checking and human oversight whenever AI is used for important tasks.
Now, let's take a look at how the errors I've discussed in this blog post reveal biases that may exist in AI models.
AI models like the one powering ChatGPT rely heavily on the data they’re trained on. This means they can inherit and reflect biases inherent in that data. Take the examples above: one possible explanation for these inaccuracies is that Mandarin Chinese might have been “prioritised” during training because of its global dominance and the sheer volume of texts available in that language. In contrast, languages like Cantonese or Hong Kong Chinese likely received far less focus due to their comparatively smaller data sets. As a result, responses related to these languages are often less accurate – revealing an unintentional bias against minority languages.
Consider the error involving 計算機 and 電腦. It’s likely that ChatGPT’s incorrect response stems from the overwhelming presence of Mandarin-based texts in its training data, where 計算機 is more commonly used for “computer.” This imbalance reflects a broader challenge: AI models currently struggle to reliably differentiate between regional variants, such as Cantonese versus Mandarin, or even Simplified versus Traditional Chinese within the broader “Chinese” language umbrella.
These issues underscore why a human-in-the-loop approach is essential when using AI in translation. Whether it’s during the design and optimisation of AI models or the critical step of reviewing AI-generated outputs, human expertise remains crucial to ensuring accuracy and fairness.
I hope this blog post has served to offer some insights into the potential pitfalls of relying on AI tools such as ChatGPT. While such tools can be immensely valuable, they must be used with caution and are not substitutes for human expertise (at least not yet!).
Need help navigating the intricate linguistic nuances of the Chinese language? Feel free to get in touch – I am a qualified UK-based Chinese-to-English translator with over 13 years of translation experience, and I’d love to assist you with any translation or localisation needs.
Comments