国产av不卡一区二区_欧美xxxx做受欧美_成年人看的毛片_亚洲第一天堂在线观看_亚洲午夜精品久久久中文影院av_8x8ⅹ国产精品一区二区二区_久久精品国产sm调教网站演员_亚洲av综合色区无码一二三区_成人免费激情视频_国产九九九视频

Domestic AI models tackle unique demands of Cantonese

Cultural elements, colloquialisms require local solutions, experts say

By OASIS HU in Hong Kong | China Daily | Updated: 2024-10-21 07:42
Share
Share - WeChat
LU PING/CHINA DAILY

"Is there a wife in a wife cake?" Heyson He Lixi, an influencer promoting Cantonese, asked.

"There is no actual wife in a wife cake, as it is just a traditional Hong Kong pastry," answered Sensechat, the world's first Cantonese large language model, a ChatGPT-like product, in fluent Cantonese.

He raised a more challenging question, "What is sorrowful rice?"

"Sorrowful rice, a well-known delicacy in Hong Kong, derives its name from the martial arts masterpiece The Return of the Condor Heroes by Hong Kong writer Jin Yong. In the novel, the character Yang Guo experiences deep sorrow after losing his beloved, Xiaolongnyu," Sensechat responded.

He was impressed — it was the first artificial intelligence tool he had encountered that exhibited a profound understanding of Cantonese culture.

Born in 1998, He is fond of Cantonese and has been creating videos promoting the language since college. Over the years, he has amassed a fan base that surpasses 700,000 across various online platforms, including YouTube and popular Chinese mainland social media platforms like Bilibili, Xiaohongshu, and Douyin. Due to the recent AI craze, He started producing videos evaluating the tools' ability to understand Cantonese.

The outcomes are underwhelming. Many AI large language models (LLMs) and AI software developed on the mainland fail to recognize Cantonese phonetically. Some Western-developed AI software can listen to Cantonese, but cannot speak it accurately. ChatGPT, for instance, often blends Cantonese with Mandarin. Suno, an AI large language model tool that specializes in generating songs, can pronounce Cantonese to a degree, but its primary focus remains music creation.

In July, the Sensetime Group, an AI developer based in Hong Kong, introduced Sensechat, a Cantonese version of its proprietary LLM, and announced that it would be available for free to Hong Kong users indefinitely.

Upon a friend's recommendation, He downloaded Sensechat.

"I felt 85 percent satisfied with Sensechat," he said. "The application still requires to be further refined, but it is one of the few that can truly understand Cantonese."

The application emphasizes one of the unique traits of Cantonese — its colloquial nature.

Pronunciation of Cantonese involves extensive use of modal particles, which are often used at the end of sentences to indicate mood. These particles usually go unnoticed by most AI tools, but Sensechat captures them effectively.

In terms of written text, Sensechat can understand and reflect the nuances between the two forms of written Cantonese. It has a standardized form used in formal situations, similar to Mandarin, and a phonetic style for everyday use. This characteristic, He said, is often overlooked by other large language models.

He recorded his interactions with Sensechat, and shared it online, garnering over 150,000 views. "Cantonese speakers truly need such a tool," He said.

Data size matters

Training an LLM typically involves three stages, said Cao Jiannong, the chair professor in the Department of Computing at Hong Kong Polytechnic University.

The first stage requires pre-training using extensive data, followed by fine-tuning with high-quality data. In the third stage, humans are needed to align the output of the LLM with local culture, ethics, morals, laws, and other rules to restrict the risk of generating inaccurate, biased, or unlawful content.

Developing a Cantonese LLM faces difficulties in all three stages, Cao said.

While Hong Kong's internet infrastructure is relatively well-developed, there is a scarcity of Cantonese content available online. A major factor contributing to this scarcity is that while Cantonese is widely spoken in daily life, the written form of Cantonese is Chinese.

Moreover, English has long served as the official language in Hong Kong. Consequently, a significant portion of the city's online information, including official archived documents in areas such as law, finance, politics, and medicine, is predominantly available in English, Cao said.

LLMs rely heavily on abundant data for their training, said Francis Fong Po-kiu, honorary president of the Hong Kong Information Technology Federation, a local IT-related business association. Without data, there is simply no way to develop a language model, he said.

Literature scarcity

Cantonese web resources suffer not only from a shortage in quantity, but also a lack of quality, said Cao.

When it comes to written material, Hong Kong has not prioritized literature, resulting in a scarcity of quality Cantonese literary works, said Keith Li King-wah, chairman of Hong Kong Wireless Technology Industry Association.

Most available Cantonese texts come from online forums and social media, and often contain low-quality and even offensive language, potentially leading AI models to produce crude content, Li said.

Collecting speech data presents another problem.

Despite access to Cantonese videos online, such as movies and TV dramas, they cannot be used due to background noise, said Albert Lam Yun-sang, the chief technology officer and chief scientist at Fano Labs, a Hong Kong-based startup focusing on speech and language technologies.

Besides insufficient data, Cantonese's intricate linguistic characteristics are another obstacle in training an AI model.

The Economist magazine analyzed language learning time, and found that mastering Cantonese requires 88 weeks of study, placing it alongside Mandarin, Arabic, Japanese, and Korean in the top five most difficult languages to learn.

Lu Lewei, director of the Sensetime Research Institute, said that Cantonese is highly colloquial with numerous inflections. It has nine tones and even a slight variation in pronunciation can alter a word's meaning.

The language also features a blend of Chinese and English and a mix of old and modern terms.

In language modeling, the simplicity of a language offers advantages. The more complex the language is, the harder for the AI model to learn about it, Lam said.

Furthermore, underlying Cantonese is the local culture, which can be challenging for those tasked with aligning the output of large language models, Cao said.

Urgent need

Despite the difficulties involved in creating Cantonese AI models, demand for them is undeniable, said Fong from the Hong Kong Information Technology Federation.

The global Cantonese-speaking population is nearly 120 million, and 85.2 million of those are native Cantonese speakers.

In Hong Kong, 6.3 million residents, or 88.2 percent of the city's population, use Cantonese as their spoken language. In other cities within the Guangdong-Hong Kong-Macao Greater Bay Area, Cantonese is the predominant dialect, with 67 million residents in Guangdong province conversing in it.

In the future, AI will be akin to today's computers and fundamentally a tool for the general public. Without Cantonese AI tools, Cantonese-only speakers may encounter significant inconvenience and marginalization in both the offline and online world, Cao said.

For a city, lack of AI expertise could result in decreased productivity in sectors such as education, healthcare, finance, and law. These limitations could impede the whole city's development, Cao added.

Fong said AI models from other countries or regions may struggle to grasp Cantonese culture accurately. This could lead to cultural or political misinterpretations, resulting in the spreading of incorrect messages.

Dependence on outside AI models could make privacy and security vulnerable, Fong said.

Government officials, for instance, might face national security risks and local companies might leak data if they inadvertently disclose sensitive information to the models developed in foreign jurisdictions, he added.

Fong urged the Hong Kong Special Administrative Region government and local organizations to develop Cantonese LLMs.

In July, Sun Dong, Hong Kong's Secretary for Innovation, Technology, and Industry, announced that the SAR government is cooperating with local universities to develop a Hong Kong-based large language model.

A document co-pilot application for civil servants is now being used on a trial basis.

The model has already been implemented in Sun's department and the system will eventually become available to all Hong Kong residents, the secretary said.

The bureau said plans are underway to expand the pilot application to three other government bureaus, but it gave no indication when Hong Kong residents would gain access to it.

Fong said if it could be launched successfully, the government LLM would have many benefits.

It would be a positive step in resolving the issue of some Western AI models limiting their usage in Hong Kong. Also, implementing a localized AI model could safeguard privacy and provide more convenience to residents, Fong said.

Cao said it's unclear what specific features the government's AI model could offer and how it would distinguish itself from other similar products.

"I don't think the government has done enough research on what they want to do," Cao said.

Local startups

Local technology companies, meanwhile, are actively meeting the needs of the Cantonese-speaking market.

One startup, Votee AI, developed an opensource Cantonese LLM this year.

After years of operating in the local market, Votee AI has gathered substantial amounts of open-source Cantonese data along with primary data.

Taking a community-centered approach, they have also collaborated with local Cantonese linguists and AI researchers, including the team behind the online Cantonese dictionary "words.hk", to capture the nuances of Hong Kong speech.

Sensetime has also accumulated a vast reservoir of internal open-source data.

The company has synthesized data by leveraging advanced technologies and bought supplementary information from external channels to collect data.

To combat the shortage of high-quality Cantonese data, Sensetime also collected audio Cantonese data from hundreds of its local employees.

Sensechat's clients include customer service providers, financial institutions, legal firms, healthcare companies, and others.

For Hong Kong residents, the company promises to provide the service for free indefinitely for free on both the web version and mobile application.

A local tech industry insider, who chose to stay anonymous, said Sensechat should opensource its technology to allow more residents and organizations to access it freely, to benefit the city.

After trying the Sensechat platform, he said its understanding of some Hong Kong slang could be more precise. Nonetheless, "it should be recognized that Sensechat filled a void in the local market," he said.

Cultural roots

In addition to developing local AI models, existing mainstream language models should be encouraged to improve their Cantonese functions, said Li from the Hong Kong Wireless Technology Industry Association.

However, mainstream AI language models are primarily developed by commercial entities in the West. Without market demand, they may not be willing to enhance their products' Cantonese capabilities.

Li believes the Hong Kong SAR government and local organizations should take the lead in collecting Cantonese data, digitize cultural content, and share these resources openly to enrich the Cantonese body of information.

Cantonese speakers can also actively use the language to engage with mainstream AI language models.

These actions can demonstrate to AI model developers that there is a market demand for Cantonese, while interaction with these models can also enhance their understanding of Cantonese culture.

The key to encouraging more people to use Cantonese lies in making Cantonese culture appealing, Li said.

Language is not just a communication tool; it encapsulates the cultural essence and identity of its speakers, he said.

The marginalized status of Cantonese in the digital sphere is a reflection of the decline of the cultural significance of the region.

In the 1970s and 1980s, Hong Kong, although just a city, was so culturally influential that Cantonese was a popular language around the world, Li said.

"At that time, the whole world watched Hong Kong movies and TVB(television shows), knew Jackie Chan and Bruce Lee, and sang Cantonese songs. However, in the present day, even many students in Hong Kong cannot speak Cantonese," he said.

"The focus of government policies should not only be on technology, but also on culture."

He, the influencer, said he learned Cantonese from his grandparents when he was a child, which later made him more proficient in the language than other school students. The confidence this gave him motivated him to become a Cantonese blogger.

However, as He aged, Cantonese became so marginalized that even voice-operated devices and software in his home failed to understand Cantonese commands.

While He could communicate with these devices in Mandarin and English, his grandparents, who only speak Cantonese, struggled to keep pace.

He hopes that Cantonese LLMs will one day help his elderly grandparents manage their daily lives through voice-controlled apps capable of understanding Cantonese.

Top
BACK TO THE TOP
English
Copyright 1994 - . All rights reserved. The content (including but not limited to text, photo, multimedia information, etc) published in this site belongs to China Daily Information Co (CDIC). Without written authorization from CDIC, such content shall not be republished or used in any form. Note: Browsers with 1024*768 or higher resolution are suggested for this site.
License for publishing multimedia online 0108263

Registration Number: 130349
FOLLOW US
亚洲视频小说图片| 久久久久久久91| 3d动漫精品啪啪一区二区竹菊| 欧美色播在线播放| 亚洲午夜久久久久久久久久久| 中文字幕一区不卡| 少妇精品放荡导航| 亚洲电影免费观看| 91 com成人网| 欧美色综合天天久久综合精品| 国产精品久久久久影视| 久久久综合激的五月天| 国产精品456露脸| 国内成人自拍视频| 奇米影视在线99精品| 99精品免费网| 成人h动漫免费观看网站| 国产美女高潮在线| 538在线观看| f2c人成在线观看免费视频| 国产99re66在线视频| 欧美大胆人体bbbb| 亚洲资源网你懂的| 9765激情中文在线| 成人影音在线| 川上优av中文字幕一区二区| 黄色三级在线| 欧美尺度大的性做爰视频| 亚洲精品www| 一区二区理论电影在线观看| 成人午夜免费av| 国产一区二区精品久久| 老司机精品视频一区二区三区| 麻豆成人免费电影| 蜜桃视频一区二区| 亚洲蜜桃视频| 91精品啪在线观看国产18| 综合激情一区| 亚洲日韩视频| 日韩高清在线一区| 精品亚洲国产成人av制服丝袜| 精精国产xxxx视频在线| 在线视频欧美日韩精品| 久久久久久久久久看片| 久久久久国产一区二区三区四区| 精品一区二区三区不卡| 日韩av一级电影| 国产欧美一区二区三区国产幕精品| 国产亚洲一级| 一本一道久久综合狠狠老精东影业| 久久激情一区| 国内精品免费**视频| 99热国内精品永久免费观看| 一区二区三区在线| 99在线精品视频在线观看| 日韩av一区二区在线影视| 久久激情综合网| 国产激情在线播放| 亚洲福利二区| 麻豆av在线导航| 在线一区观看| 在线播放麻豆| 美女av在线免费看| 国产在线精彩视频| 51一区二区三区| 国产一区精品| 神马久久精品| 在线免费观看h| 天堂地址在线www| 国产偷倩在线播放| 四虎4545www精品视频| 亚洲视频精选| 日韩.com| 国产精品久久久久久模特| 韩国三级在线一区| 久久久久久久国产精品影院| 一区二区三区免费网站| 好看的日韩av电影| 国产在线精彩视频| 四虎精品在线观看| 欧美国产中文高清| 久久中文字幕二区| 国产精品成人国产| 午夜精品影视国产一区在线麻豆| 欧美好骚综合网| 欧美1区2区| 免费在线看一区| 亚洲乱码免费伦视频| 国产一区二区三区| 中文字幕不卡在线视频极品| 久久人人视频| 欧美日韩国产一区二区三区| 色婷婷在线播放| 精品三级在线观看| 欧美少妇xxx| 亚洲国产天堂久久综合网| 久久综合久久八八| 中文字幕2020第一页| 欧美大片在线观看一区二区| 影音先锋男人资源在线| 亚洲视频在线一区二区| 最近中文字幕在线6| www欧美在线观看| 7777精品伊人久久久大香线蕉最新版 | 国产亚洲欧洲在线| av成人在线观看| 亚洲综合一二三区| 影音先锋电影在线观看| 51精品视频| 亚洲国产综合视频在线观看| 永久www成人看片| 亚洲电影成人| 国产视频精品一区二区三区| 电影在线观看一区二区| 亚洲精品大尺度| 亚洲日本在线看| 亚洲有码转帖| 中文字幕在线观看网站| 成人噜噜噜噜| 国产在视频一区二区三区吞精| 首页亚洲中字| 香蕉成人久久| 成人性生交大片免费看视频在线| 亚洲人一二三区| 91.成人天堂一区| 另类天堂视频在线观看| 一个人在线观看免费视频www| 一二三中文字幕在线| 中文在线а√天堂| 超级碰碰不卡在线视频| 久久青青视频| 国产麻豆一区二区三区精品视频| 欧美涩涩视频| 狠狠色2019综合网| 国产色产综合产在线视频| 欧美视频日韩视频在线观看| 久久精品中文字幕免费mv| 再深点灬舒服灬太大了少妇| 欧美军同video69gay| 日韩午夜在线影院| 久久久久中文字幕2018| 色成人亚洲网| 日韩中文字幕国产| 欧美一区二区女人| 久久久中精品2020中文| 美女污污网站| 国产精品一区在线看| 伦一区二区三区中文字幕v亚洲| 国产调教精品| 丝袜美腿一区二区三区| 亚洲欧洲日产国码二区| 日韩一区二区三区四区| 亚洲在线偷拍自拍| 久久久亚洲精华液精华液精华液 | 狠狠躁夜夜躁人人躁婷婷91| 亚洲夜晚福利在线观看| 国产91中文| 欧美jizz18hd性欧美| jizz日韩| 亚洲视频日韩| 久久精品免视着国产成人| 中文字幕av高清在线观看| 先锋成人av| 亚洲精品亚洲人成在线| 国内精品国产成人| 一本一本大道香蕉久在线精品| 久久天天躁狠狠躁夜夜躁| 多人啪嗒啪嗒在线观看免费| 久久五月精品中文字幕| 成人在线视频国产| 欧美丝袜激情| 国产99一区视频免费| 色999日韩国产欧美一区二区| 欧美freesextv| 九九99精品| 久久这里都是精品| 91九色在线播放| 国产成人影院| 精品激情国产视频| 性色一区二区| 自拍av在线| 久久精品亚洲| 国产91精品青草社区| 亚洲欧美日韩国产综合精品二区| 清清草免费视频| 日韩精品一卡二卡三卡四卡无卡| 成年人免费看的视频| 久久久影视传媒| 国产三级电影在线| 91蜜桃在线免费视频| 国产福利电影在线| 精品综合久久久久久97| 亚洲精品一二区| 中文字幕一区二区日韩精品绯色| 亚洲美女区一区| 一区二区三区在线免费播放| 亚洲裸体xxxx| 国产极品嫩模在线视频一区| h视频在线免费| 欧美一区二区三区| 欧美欧美黄在线二区| 男人的j进女人的j一区| 国产精品日产欧美久久久久| 精品视频久久久久久| 尤物在线观看| 98色花堂精品视频在线观看| 成人综合日日夜夜| 中文字幕日韩欧美精品高清在线| 中文成人av在线| 欧美大胆人体bbbb| 91在线地址| 在线天堂资源| 视频在线观看91| 欧美三级电影在线看| 欧美日本色图| 黄色影院在线看| 女女色综合影院| 在线视频超级| 日韩av在线免费观看不卡| 在线亚洲欧美专区二区| 性欧美69式xxxxx| 日本色护士高潮视频在线观看| 欧美激情视频一区二区三区免费| 亚洲有吗中文字幕| 一卡二卡欧美日韩| 97婷婷涩涩精品一区| 18视频免费网址在线观看| 国产探花在线精品一区二区| 国产精品久久久久aaaa樱花 | 成人激情视屏| 免费av成人在线| 日韩一区二区电影在线| 黄色的网站免费| 国产日产一区二区三区| 欧美精品一区二区三区久久久竹菊| 亚洲最新视频在线播放| 欧美亚洲伦理www| 粉嫩av亚洲一区二区图片| 久久久精品免费观看| 亚洲午夜久久久影院| 日本一二三区视频免费高清| 国产图片一区| 国产精品久久久久四虎| 欧美黑人狂野猛交老妇| 亚洲欧美视频一区二区| 久久综合国产| 午夜成人在线视频| 四虎永久成年免费影院| 青青草原国产在线| 亚洲综合丁香| 同产精品九九九| 亚洲国产欧美日韩另类综合| 欧美成人精品xxx| 69av在线| 中文日韩在线| 制服丝袜一区二区三区| www.天天操| 美腿丝袜亚洲图片| 亚洲色图视频免费播放| 91国语精品自产拍在线观看性色 | 久久久国际精品| 亚洲精品国产无天堂网2021| 欧美性猛xxx| fpee性欧美| 亚洲精品自拍| 一区二区三区国产盗摄| 日韩欧美一区视频| 人人在草线视频在线观看| 精品国产亚洲一区二区三区大结局| 国内不卡的二区三区中文字幕| 亚洲韩国欧洲国产日产av| 影音av资源网| 午夜a一级毛片亚洲欧洲| 亚洲精品写真福利| 免费久久网站| 成人福利av| 成人av网站免费观看| 在线国产电影不卡| h在线观看免费| 思热99re视热频这里只精品| 亚洲伦理在线精品| 免费精品国产自产拍在| 国产精品久久乐| 久久综合久久鬼色| 成人中文字幕电影| 国产偷v国产偷v亚洲高清| 5858s免费视频成人| 久久久999国产| 91av在线不卡| 国产午夜精品久久久 | 少妇精69xxtheporn| 蜜桃免费在线| 精品二区久久| 3atv一区二区三区| 91午夜在线| 91精品国产自产拍在线观看蜜| 欧美极品少妇xxxxⅹ高跟鞋 | 暧暧视频在线免费观看| 激情亚洲综合在线| 日韩网站免费观看高清| dy888亚洲精品一区二区三区| 美女一区二区三区在线观看| 一区二区欧美亚洲| 黄av在线播放| 国产伦理精品不卡| 久久综合伊人77777蜜臀| av影片在线| 欧美日本三级| 9999久久久久| 成人国产精品视频| 蜜臀久久99精品久久久久久宅男| 国产丝袜在线| 国产精品一区二区免费不卡| 两个人的视频www国产精品| 韩日毛片在线观看| 91免费看`日韩一区二区| 欧美国产日产韩国视频| 国内高清免费在线视频| 99久久精品99国产精品| 最近2019中文免费高清视频观看www99| 美女羞羞视频在线观看| 久久99久久99精品免视看婷婷| 日韩中文在线不卡| 欧美hdfree性xxxx| 男男做性免费视频网| 99a精品视频在线观看| 亚洲宅男天堂在线观看无病毒| 美女视频网站在线观看| 精品欧美久久| 欧美久久久久免费| 欧美91精品久久久久国产性生爱| 在线免费av电影| 高潮按摩久久久久久av免费| 亚洲444eee在线观看| 毛片网站免费| 欧美电影《轻佻寡妇》| 欧美一区二区三区播放老司机| 日本电影免费看| 午夜电影一区| 中文字幕制服丝袜一区二区三区| 亚洲人av在线影院| 123区在线| 国产精品午夜电影| 亚洲精品综合精品自拍| 毛片在线导航| 国产自产视频一区二区三区| 亚州精品天堂中文字幕| 免费一级欧美在线大片| 亚洲午夜在线视频| 成人动漫h在线观看| 尤物在线精品| 日韩一级裸体免费视频| 综合在线影院| 亚洲一区二区三区四区在线 | 欧美性感一区二区三区| 欧美极品在线播放| 啦啦啦中文高清在线视频 | 欧美久久成人| 欧美精品一区二区久久婷婷| 日韩a在线看| 麻豆国产欧美一区二区三区| 8x拔播拔播x8国产精品| 国内精品麻豆美女在线播放视频 | 亚洲美女性视频| 午夜伦理在线视频| 秋霞成人午夜伦在线观看| 蜜桃视频在线一区| 久久电影国产免费久久电影| 香蕉久久国产| 久久亚洲电影| 国产精品一区二区视频| 少妇视频在线观看| 日本欧美日韩| 粉嫩av国产一区二区三区| 99久久婷婷国产综合精品青牛牛| 福利电影一区| 久草在线资源视频在线观看| 性欧美极品另类| 本道综合精品| 3344永久| 免费在线观看污视频| 色综合久久久久综合一本到桃花网| 波多野结衣手机在线视频| 91九色视频蝌蚪| 黄a免费视频| 亚洲偷欧美偷国内偷| 亚洲日本中文字幕| 欧美精品videosex性欧美| 影音先锋男人每日资源站| av成人动漫| 神马久久资源| 欧美日韩一区自拍| 在线亚洲免费| 激情欧美一区二区| av资源种子在线观看| 91精品影视| 日韩电影免费观看高清完整版| 欧美黄色免费| 亚洲第一精品福利| 高清不卡亚洲|