Science Focus (issue 25)

Issue 025, 2023 SCIENCE FOCUS The Procrastinator’s Guide to ChatGPT 遲來的ChatGPT懶人包 Unleashing Nature’s PlasticEating Marvels! 出來吧!自然界蠶吃塑膠的生物 Thinking Out of the Box: Removing Medical Devices with a Liquid Metal 有何不可: 利用液態金屬移除醫療裝置 Alpacas and Nanobodies 羊駝與奈米抗體 Fingerprints: The Key to Our Individuality 指紋:判別個人身分的特徵

Dear Readers, Welcome to a new issue of Science Focus. In the age of artificial intelligence, it was tempting to write this letter with one of the pervasive online tools, especially after reading the article on ChatGPT in this issue. Nevertheless, there is much to be learned about how to ask the right questions in order to prompt ChatGPT to do exactly what we want. In addition, it is important to remember that we are still the main drivers in knowledge generation, based on novel observations and discoveries. In this post pandemic era, it has become common knowledge on how vaccines triggered the production of antibodies to combat deadly viral invaders. It turns out the structure of antibodies from alpacas and their relatives is rather different from ours. Read more about how scientists have leveraged on alpaca antibodies for research and therapeutics. Staying with advances in biomedical science, you can learn more about how medical devices can be removed by infiltration with a liquid metal, and how our fingerprints are determined by a combination of genetic and environmental factors. Many efforts have been made in reducing the use of plastics, but how can we get rid of plastic waste more efficiently? Some scientists found unexpected inspirations from insects and bacteria. Finally, for those of you who play the game Wordle, we provide a quick guide on how to maximize your chance of getting the elusive green squares. As we enter a new school year, I hope Science Focus continues to be the bridge between recent scientific advances and what you learn in textbooks. As always, we welcome your suggestions and comments via our social media pages. Yours faithfully, Prof. Ho Yi Mak Editor-in-Chief 親愛的讀者: 歡迎閱讀最新一期《科言》!在這個人工智能盛行的年代,相關應 用程式有如雨後春筍般湧現,在讀過今期關於 ChatGPT 的文章後,讓 我以此作為今期〈主編的話〉的引子。對於如何問 ChatGPT正確的問 題才能使其做出我們所想,仍有待慢慢學習,但此外更重要的是,我們 應該記住人類仍是知識的主要發現者,我們能透過細心觀察作出不一 樣的發現。 在後疫情年代,相信大家都熟識疫苗如何激發身體產生抗體以抵 禦致命病毒,但原來羊駝和近親朋友們製造出來的抗體結構與我們的 都不太一樣,閱讀後頁文章看看科學家如何把羊駝抗體應用於研究和 醫藥上。同樣是生物醫學上的突破,今期我們會介紹液態金屬如何能透 過滲入醫療裝置而把其移除,還會探討遺傳和環境因素如何共同決定 指紋樣式。社會一直致力減少使用塑膠,但我們如何能更有效地消除塑 膠廢料呢?科學家從昆蟲和細菌中找到意想不到的靈感。最後,我們為 有玩 Wordle 的讀者們送上一份簡易攻略,看看如何增加綠色框框出 現的機會吧! 在新學年裡,我希望《科言》繼續成為一道橋樑,連結科學界最新 發現和課本上的知識。一如以往,我們歡迎大家透過社交媒體把建議和 意見告訴我們。 主編 麥晧怡教授 敬上 Message from the Editor-in-Chief 主編的話 Copyright © 2023 HKUST E-mail: Homepage: Scientific Advisors 科學顧問 Prof. Yukinori Hirano平野恭敬教授 Prof. Ivan Ip 葉智皓教授 Prof. Tim Leung 梁承裕教授 Prof. Kenward Vong 黃敬皓教授 Editor-in-Chief 主編輯 Prof. Ho Yi Mak麥晧怡教授 Managing Editor 總編輯 Daniel Lau 劉劭行 Student Editorial Board學生編委 Editors 編輯 Sonia Choy 蔡蒨珩 Peace Foo 胡適之 Roshni Printer Charlton Sullivan 蘇柏安 Helen Wong 王思齊 Social Media Editor 社交媒體編輯 Zoey Tsang 曾鈺榆 Graphic Designers 設計師 Ligeia Fu 付一乙 Charley Lam 林曉薏 Evangeline Lei 雷雨晴 Coby Ngai 魏敏儀 Contents Science Focus Issue 025, 2023 What’s Happening in Hong Kong? 香港科技活動 InnoCarnival 2023 1 創新科技嘉年華 2023 The Shaw Prize 2023 Exhibition 2023 邵逸夫獎展覽 Mars 1001 火星千日行 Science Today 今日科學 The Procrastinator’s Guide to ChatGPT 2 遲來的 ChatGPT 懶人包 Unleashing Nature’s Plastic-Eating Marvels! 8 出來吧!自然界蠶吃塑膠的生物 Thinking Out of the Box: Removing Medical Devices 12 with a Liquid Metal 有何不可:利用液態金屬移除醫療裝置 Alpacas and Nanobodies 15 羊駝與奈米抗體 Amusing World of Science 趣味科學 Fingerprints: The Key to Our Individuality 18 指紋:判別個人身分的特徵 Demystifying Wordle: A Crash Course in 22 Information Theory Wordle 大揭秘:資訊理論 101

What’s Happening in Hong Kong? 香港科技活動 Fun in Fall Science Activities 秋日科學好節目 Any plans for this Fall? Check out these activities! 計劃好這個秋天的好去處了嗎?不妨考慮以下活動! InnoCarnival 2023 創新科技嘉年華 2023 The Shaw Prize 2023 Exhibition 2023邵逸夫獎展覽 Mars 1001 火星千日行 Under this year’s theme of "Go Smart! Go Tech! Go Green!," this event organized by the Innovation and Technology Commission aims at promoting innovation and technology culture in the community. The public can participate in a wide range of activities, including workshops, interactive games, and online seminars, with the goal of boosting creativity. Thirty-seven program partners, including universities, research and development centers, government departments, and organizations, will showcase their research achievements at the Science Park. Established in 2002, the Shaw Prize recognizes currently active scientists with recent significant breakthroughs in scientific research. Three prizes are awarded annually in Mathematical Sciences, Life Science and Medicine, and Astronomy. In the exhibition, you can learn more about this year’s Shaw Laureates and their scientific achievements. This sci-fi movie depicts the journey of astronauts from different countries as they embark on the first manned mission to Mars. It highlights the challenges involved in the 1001-day mission to uncover the mysteries of the Red Planet. While it has been over half a century since the first astronaut Neil Armstrong set foot on the Moon, landing on Mars is a much more complex goal that requires advanced technology and poses greater risks. This movie portrays the exhilaration of finally achieving this long-held dream of visiting other planets in person, and the international cooperation that would be required to turn fiction into reality. Period: October 28, 2023 (Sat) – November 5, 2023 (Sun) Time: 10:00 AM – 6:00 PM (Sat & Sun) 10:00 AM – 5:00 PM (Mon to Fri) Venue: Hong Kong Science Park Admission fee: Free of charge Remark: Quotas are limited for the pre-registered activities on a first-come, first served basis. Time: November 10, 2023 – January 10, 2024 Venue: Main Lobby, Hong Kong Science Museum Admission fee: Free of charge Period: Now – January 31, 2024 Time: 3:30 PM and 8:00 PM (Mon, Wed to Fri) 2:00 PM and 6:30 PM (Sat, Sun and public holiday) Venue: Space Theatre, Hong Kong Space Museum Admission fee: Standard admission: $32 (stalls), $24 (front stalls) Concession admission: $16 (stalls), $12 (front stalls) Remark: Please refer to the museum’s website for more details. 展期: 2023年10月28日(六)至 2023年 11月5日(日) 時間: 上午十時至下午六時(六、日) 上午十時至下午五時(一至五) 地點: 香港科學園 入場費: 免費 備註:需預約的活動名額有限,先到先得。 時間: 2023年11月10日至2024年1月10日 地點: 香港科學館大堂 入場費:免費 展期: 現在至2024年1月31日 時間: 下午三時半及八時(一、三至五) 下午二時及六時半(六、日及公眾假期) 地點: 香港太空館天象廳 入場費: 標準票:32 元(後座);24 元(前座) 優惠票:16 元(後座);12 元(前座) 備註:更多詳情請參閱太空館網頁。 這個由創新科技署舉辦的嘉年華今年以「智 慧生活綠色科技」為題,旨在推廣社區內的創科 文化。公眾可以從一系列的工作坊、互動遊戲, 以及網上講座體驗創新科技的樂趣,激發創意。 三十七個活動夥伴,包括大學、研發中心、政府 部門和多個機構都會在科學園展示其研究成 果。 邵逸夫獎於2002 年設立,設有三個獎項: 數學科學獎、生命科學與醫學獎和天文學獎,每 年頒發予現時活躍於研究工作並在近期取得突 破性成果的科學家。透過今次展覽,你可以更深 入認識今年的得獎者以及他們的科研成就。 這套科幻電影敍述來自不同國家的太空人 攜手完成人類首次登陸火星的旅程,為了揭示這 顆赤紅行星的神秘面紗,團隊在任務的 1001 天 裡遇上重重挑戰。雖然太空人岩士唐早已在超 過半個世紀前成功登月,但登陸火星是一個需 要更先進科技和更具風險的艱鉅任務。在希望 這會在現實世界成真的同時,這套電影預示了人 類達成到訪其他行星,以及實現國際間緊密合作 夙願的喜悅。 1

By Sonia Choy 蔡蒨珩 The Procrastinator's Guide to ChatGPT True story: One of my friends wrote his thesis with the help of Clyde, Discord’s AI server bot powered by OpenAI, the same company that invented ChatGPT (Chat Generative Pre-Trained Transformer). He had difficulties writing parts of the paper, asked Clyde to rewrite his clunky section, and boom, it was done. My friends and I also often took to ChatGPT when drawing graphs and diagrams in an unfamiliar computer language. ChatGPT would churn out 90% correct code in ten seconds, saving us huge amounts of time as we would only need to make slight modifications. ChatGPT has truly transformed our lives and the world of education. But ChatGPT isn’t foolproof yet. That same friend once asked an early version of ChatGPT a simple question: What is 20 – 16? After a few seconds, it gave us the answer “3.” We laughed about it for a few minutes. People have also posted responses of ChatGPT to various questions that look legit, but turns out to be a pile of nonsense. ChatGPT can write complicated code, but it can’t seem to do simple things like subtraction and figuring out that the sun rises in the east. Why is that the case? Machine Learning 101 First we need to answer the question – how does ChatGPT learn things? Artificial intelligences (AI) are typically modeled on the human brain’s neural networks [1, 2]. A neural network is typically divided into three main layers – the input, hidden and output layers. The input and output layers have obvious meanings, but the hidden layer is the key of the model; there can be multiple hidden layers. There are also nodes at each level, which are linked to other layers, and sometimes to others in the same layer (Figure 1). Each layer of neurons evaluates a numerical function, and its outputs influence other neurons they are connected to. These functions act as the thinking process and reach its goal by evaluating certain criteria. For example, if the goal for the AI is to identify pictures of cats, then each layer will evaluate some sort of similarity to existing pictures of cats. By learning bit by bit from the examples, it knows what outputs are desired in each layer, and adjusts itself so that it is finally able to identify pictures of cats. AI models are typically trained either by deep 遲來的ChatGPT懶人包 Figure 1 The main layers of a neural net with circles as nodes. 圖一 神經網絡的主要分層。圓圈代表節點。 learning or machine learning. While these terms are sometimes used interchangeably, they have a slight difference – in deep learning, the AI is programmed to learn unfiltered, unstructured information on its own, while in machine learning more human input is required for the model to learn and absorb information, e.g. telling the AI what it is learning, as well as other fine-tuning of the model. According to OpenAI, ChatGPT is a machine (reinforcement) learning model [3]. It uses humansupervised fine-tuning, and does not adjust independently in the process of learning new material, perhaps due to the complicated nature of human language. While the details of how the model was trained and its mechanisms are kept under wraps, perhaps in fear that other companies may make use of them and exceed GPT’s capabilities, OpenAI only revealed that GPT-3 was trained on filtered web crawl (footnote 1), English-language Wikipedia, and three secret sets of written and online published texts which they referred to as WebText2, Books1 and Books2 [4]. It is speculated that the undisclosed components include online book collections like LibGen, as well as internet forums and other informal sources. Generating the Probabilities (Large Language Model) But if you have experience with auto-correct predictions on your phone, you might have some idea of the chaos that might ensue. The current autocorrect chain on my phone, starting with the word “I”, goes like

3 this: “I have to go to the university of the Pacific ocean and I will be there in about to go to bed now.” Sounds legit at first, but it descends quickly into gibberish (there is no University of the Pacific Ocean, for example). That is because auto-correct only picks up on patterns in language without comprehending the actual meanings – it won’t know that “Colorless green ideas sleep furiously” is complete nonsense (Footnote 2) [5]. ChatGPT is more intelligent than autocorrect. First of all, it generates a list of probabilities of possible next words. Let’s use a simpler GPT-2 system for demonstration – after the clause “The best thing about AI is its ability to…,” the GPT would generate the list of words in Table 2 [1]. GPT is part of the large language model (LLM) family. The main working principle behind LLM sounds familiar to mathematicians: approximation (or more accurately, mathematical modeling). Given the following series of points in Figure 3 [1], how would you plot a graph? The easiest option seems to be a straight line, but we can do better with a quadratic equation, ax2 + bx + c. So we can say ax2 + bx + c is a good model of these points, and start making predictions with the model. Just as mentioned before, the amount of text we have is far from adequate for us to empirically calculate a probability of the occurrence of the next word, because the 40,000 common English words can already give 1.6 billion (40,000P2) combinations [1]. The model, GPT, works essentially because an informed guess can be made to choose a good enough “line” to fit in the graph, covering the missing parts with theoretical values. To this date we don’t really have a good understanding of how the computer does it, just as we don’t really know how our brain does “intuitive” tasks, but the developers of GPT can adjust the “weight” of the output of each neuron in the network to achieve the optimal results after each learning process. In other words, we train the neural net to find the “best-fit curve” via machine learning. Eventually, the goal is for GPT to predict what comes after a phrase in an How are these probabilities generated? First of all, it is not possible to just infer them from existing writing, as we don’t have nearly enough text that is accessible under copyright for models to train on. Instead, we use a little bit of mathematics to help us out. 學習(learn) 4.5% 預測(predict) 3.5% 製作(make) 3.2% 理解(understand) 3.1% 做(do) 2.9% Table 2 A list of probabilities of possible next words generated by GPT-2 after the clause “The best thing about AI is its ability to…” [1] 表二 由GPT-2預測「人工智能最棒的地方在於它能夠……」之後 下一個可能詞語出現的概率列表 [1]。 Figure 3 Fitting an equation of straight line (top) and a quadratic equation (bottom) into a series of given points [1]. 圖三 嘗試用直線方程(上)及二次方程(下)解釋橙色點的分佈 [1]。

task requires creativity). A lot of the machine learning process is not that well understood by humans, just as the true processes of the human brain remain a mystery. How do humans learn their native language? What do the hidden layers in our brains do in order to produce human-like text? We don’t know the answers to either of these questions yet. Falsehoods, Biases and Accountability One problem with GPT is that it sometimes comes up with blatantly false statements and has inherent biases towards certain social groups. We’ve witnessed how it can confidently announce 20 – 16 = 3. It has claimed, in a previous version of GPT-3, that coughs stop heart attacks, that the U.S. government caused 9/11, and even made up references that don’t exist [6, 7]. Why did this happen? Once again, GPT is only a LLM, meaning that it knows how language works, but doesn’t necessarily understand its meaning. Early LLMs even have only syntactic knowledge and very few comprehension skills. However, this is about to change. At the time of writing, GPT had recently announced a partnership with WolframAlpha [8], a mathematical software and database, and other online databases to let it access more accurate information, so that it can draw on the databases to improve its accuracy rather than giving responses generated entirely by probability. In some sense, training GPT or any model is like teaching a toddler; they come into the world not knowing what is correct and wrong, and it is up to their parents, teachers, and society to teach them what are right. Here the programmers are the parents of GPT, as they input tons of learning materials into the system, and supervise its learning by providing reference answers and feedback. It is possible to tell GPT enough information to force it to say unfinished sentence, so they can acquire the ability to write independently. Creativity in AI “GPT, surprisingly, writes like a human. It can generate text that reads as if it was written by a person, with a similar style, syntax, and vocabulary. The way it does this is by learning from a huge amount of text, such as books, articles, and websites, which helps it understand how language is used in different contexts…” The previous paragraph was written by Sage, a chatbot powered by GPT-3.5. It reads just like human writing – you might not have noticed it was written by an AI if I didn’t tell you. How does it do that? Well, as GPT describes itself, it is trained on a vast amount of text, with which it builds an LLM and evaluates what the most statistically likely words are after writing each phrase. You might think that GPT will always pick the most likely word on each occasion, but this is not the case. Creativity is found in the unexpected. If you choose a higher “creativity index” (technically called “temperature”), GPT will pick from other less likely options to continue its writing. This makes the overall piece more interesting and less robotic. For example, if GPT picks the statistically most likely word every time (zero temperature), we would get the following paragraph in an older GPT-2 system [1]: “The best thing about AI is its ability to learn from experience. It’s not just a matter of learning from experience, it’s learning from the world around you. The AI is a very good example of this. It’s a very good example of how to use AI to improve your life. It’s a very good example of how to use AI to improve your life. The AI is a very good example of how to use AI to improve your life. It’s a very good example of…” It falls into a loop eventually. Even if this doesn’t happen in GPT-3, the paragraph itself isn’t that interesting. However, if we increase the temperature to 0.8 in GPT-3, we get this [1]: “The best thing about AI is its ability to learn and develop over time, allowing it to continually improve its performance and be more efficient at tasks. AI can also be used to automate mundane tasks, allowing humans to focus on more important tasks. AI can also be used to make decisions and provide insights that would otherwise be impossible for humans to find out.” Now this reads more like human writing. The temperature 0.8 is arbitrary but seems to work best at the moment (although it also depends on whether your

5 truths when you ask factual questions. However, when it comes to opinions, there is a human-imposed block. If you ask ChatGPT how it feels about large birds, for example, it replies with an automatic message: “As an AI language model, I don't have personal opinions or feelings. However, I can provide you with some information about large birds.” But we could theoretically ask GPT to write an opinion piece, and we can predict how it would do by studying the correlated words it comes up with on certain topics. Researchers analyzed the top ten descriptive words that occurred concurrently with words related to gender and religion in the raw outputs generated by GPT-3; they observed that “naughty” or “sucked” are correlated with female pronouns, and Islam is commonly placed near “terrorism” while atheism is placed near “cool” and “mad” [4]. Why does GPT hold such biases, then? Remember that GPT is trained on a selected sample of text – most of it comes from published texts and web crawls, but in order for it to grasp informal language, GPT was also speculated to be trained on internet forums such as Reddit. As such, it may end up internalizing biases held by many users of these forums. Just as a person may hold prejudiced views, GPT cannot be expected to be completely neutral on all topics. GPT-4 is already far more capable at certain jobs than humans; however, it cannot be trusted to be a completely neutral source, nor can it be trusted to give 100% accurate information. It must still be used with discretion. The best approach is probably to treat it like a person – take everything with a grain of salt. 1 Web crawl: A snapshot of the content of millions of web pages, captured regularly by web crawlers. The downloaded content can serve as a dataset for web indexing by search engines and AI training. 2 Editor’s notes: This is a famous example suggested by linguist Noam Chomsky to illustrate that a sentence can be grammatically wellformed but semantically nonsensical. 一個真實故事:筆者有位朋友在Discord的聊天機 器人Clyde幫助之下完成了自己的畢業論文。他在寫某 些部分時遇到阻滯,好像怎麼寫也是不太通順,於是他請 Clyde 重寫笨拙的部分,結果在一下子就完成了。Clyde 是 Discord的人工智能(artificial intelligences / AI)伺服器 機器人,由發明ChatGPT(Chat Generative Pre-Trained Transformer;聊天生成預訓練轉換器)的公司 OpenAI 提 供技術支援。ChatGPT 確實改變了我們生活,以至教育界 的景象:朋友和我在使用不熟悉的電腦語言繪製圖表時也經 常使用 ChatGPT,因為它在十秒內就能編寫出 90% 正確的 程式碼,我們只需稍作修改即可,為我們節省了大量時間。 但我們操作ChatGPT 時還是不能把腦袋扔掉。那位 朋友曾經問過舊版ChatGPT一條簡單問題:20 – 16是多 少?數秒後,ChatGPT 回答:「3」,這使我們捧腹大笑了 好幾分鐘。網民尤其喜歡抓出ChatGPT的痛腳,在網上 分享它種種似是而非的荒謬論述;到底為甚麼它能寫出複 雜的電腦程式,但似乎不能回答簡單的減法題目或者指出 太陽從東邊升起的事實呢? 機器學習101 首先我們要解答一條問題:ChatGPT怎樣學習知識?人 工智能多數都是模擬人腦的神經網絡 [1, 2]。神經網絡主要 可以分為三層:輸入層、隱藏層和輸出層。輸入層及輸出層 的意思不用多說,但隱藏層才是精髓所在,而一個網絡可以 包含多個隱藏層。此外,以上每一層都均有節點 (nodes) 連接不同層或是同一類別的其他層(圖一)。 每層神經元都會計算一個函數,輸出值將影響相連的神 經元。這些函數正如思考過程,會透過考慮一系列相關因素 來達成目標,譬如說:如果 AI 的任務是辨認貓的照片,那麼 每一層就會比對相片與現有貓照片在某個方面的相似度。透 過一步步從現有例子中學習,AI會知道每一層應該要做到 怎樣的輸出而作出自我調整,使它最終能辨認貓的照片。 AI模型通常透過深度學習或機器學習進行訓練,雖然 許多人會交替使用這兩個詞語,但其實它們有著細微分別: 在深度學習中,AI 的程式設定使它自行學習未經過濾、缺乏 結構的資訊;而在機器學習中,模型需要更多人類指示來學 習和吸收資訊,例如告訴AI它正在學習甚麼,以及對模型 作出其他微調。 根據 OpenAI 的說法,ChatGPT 是一種機器(或強化) 學習模型 [3]。可能是基於人類語言的複雜性,ChatGPT在 人類監督下才會作出微調,而不會在學習新材料的過程中 自我調整。也許是擔心其他公司會製造出超越 GPT能力的 模型,OpenAI 對訓練方法和原理的細節三緘其口,只透 露GPT-3在訓練過程中使用了經過過濾的網絡抓取 (web crawl)(註一)、英語版維基百科,以及三組他們稱之為「網 路文本二」(WebText2)、「書籍一」(Books1)和「書籍 二」(Books2)的線上文庫 [4]。據推測,這些未公開的部

單詞已經可以提供 16 億(40,000P2)個組合 [1]。GPT模 型的成功之道在於它能作出一個合理的猜測來選一條足夠 好的「線」來總結「點」的分佈,用理論值覆蓋現實文本鞭長 莫及的部分。迄今為止,我們不太了解電腦如何做到這一 點,就像我們並不真正了解大腦是如何憑直覺完成簡單事 情一樣,但我們只知道GPT 的開發者可以在每次學習過 程中調整網絡裡每個神經元輸出值的比重,以得出最佳結 果。換句話說,我們藉由機器學習訓練神經網絡,以找出 最合適總結資料點的「曲線」。 最終,我們的目標是讓GPT預測緊隨在未完成句子後 的單詞,從而使它得到自主寫作的能力。 無窮創意 「令人驚訝的是,GPT可以像人一樣寫作。它能生成讀 起來像人寫的文本,具有相似的風格、語法和詞彙。它做到 這一點的方法是從大量文本(如書籍、文章和網站)中學習, 這有助於它理解語言在不同語境中是如何使用的……」 以上一段文字是由 GPT-3.5 聊天機器人 Sage 寫的,但 讀起來就像人寫的一樣,如果我不告訴你,你大抵也不會注 意到。但它是怎麽做到的?正如 GPT自述的那樣,它是以大 量文本訓練出來的一個 LLM,在寫完每個短語後,它會評估 從統計學角度看來最有可能出現的單詞是甚麽。 你可能會認為GPT每次都會選擇表上最有可能出 現的單詞,但事實並非如此 — 創意往往在於出其不意 之處。如果你選擇較高的「創意指數」(技術上叫「溫度」 (temperature)),GPT 就會挑選其他可能性較低的選項 來續寫句子,這可使成品更為有趣而不那麼生硬。 又舉另一個例子,如果 GPT 每次都選擇統計學上最有可 能出現的單詞(即設溫度為零),那麼在舊版 GPT-2 系統中, 我們將會得到以下這段文字 [1]: 「人工智能的最大優點就是從經驗中學習的能力。這不 僅僅是從經驗中學習,而是從周圍的世界中學習。人工智能 就是一個很好的例子。它是如何利用人工智能改善生活的一 個很好的例子。這是一個如何利用人工智能改善生活的很 好的例子。人工智能是如何利用人工智能改善生活的一個很 好的例子。這是一個非常好的例子……」 它最終陷入無限循環。即使在GPT-3中沒有發生這 種情況,但得出的段落也並不見得有趣。然而,如果我們在 GPT-3將溫度提高到 0.8,就會得到以下一段 [1]: 「人工智能的最大優勢在於它能夠隨著時間的推移不斷 學習和發展,從而不斷提高性能和工作效率。人工智能還可 用於將瑣碎的任務自動化,讓人類專注於更重要的任務。人 工智能還可用於決策,並提供人類無法發現的洞察力。」 這段看來更像是人類寫的文章。溫度 0.8 其實是一個任 意值,只是目前看來效果最好(這也取決於你指派的寫作任 務需要多少創意)。人類對機器學習的過程並不十分了解,就 分包括 LibGen 等線上圖書館,以及互聯網論壇和其他非 正式來源。 概率之學:大型語言模型 如果你有在手機使用自動修正、預測字詞等功能的經 驗,你應該會對隨之而來的混亂有所了解。筆者此時手機以 「我(I)」開首的自動修正字串是這樣的:「我得去太平洋 大學,而我會到那裡馬上就要睡覺了。」(“I have to go to the university of the Pacific ocean and I will be there in about to go to bed now.”)句子乍聽之下尚算正常, 但很快就會發現那只是胡言亂語(例如世界上根本就沒有 太平洋大學),因為自動修正功能只懂得選擇語言中常見的 組合,但不能理解其實際含義 — 它不會知道「沒有顏色的 綠色想法激烈地睡覺」(“Colorless green ideas sleep furiously”;註二)是完全沒有意義的廢話 [5]。 當然,ChatGPT比自動修正聰明得多。首先, ChatGPT 會列出下一個可能詞語出現的機率。讓我們拿比 較簡單的 GPT-2 作為示範:對於「人工智能最棒的地方在 於它能夠……」(The best thing about AI is its ability to…),GPT列出的候選字詞可見於表二 [1]。 我們是如何得出這些概率?首先,我們不可能僅僅從現 有的文本推斷出這些概率,因為在考慮版權問題後我們遠遠 沒有足夠的文本訓練模型。相反,我們需要運用少許數學來 幫助我們。 GPT是大型語言模型(Large Language Model,簡稱 LLM)家族的一分子。LLM 背後的主要原理對讀過數學的大 家並不陌生:近似法(approximation)(更準確地說是建 立數學模型)。對於圖三裡一系列的點 [1],你會畫一條怎樣 的線?最簡單的選擇似乎是直線,但其實二次方程 ax2 + bx + c會更為適合。 因此我們可以說ax2 + bx + c對於橙色點的分佈來 說是一個足夠好的模型。有了模型,我們就可以作出估計 及預測。 如前面所述,人類撰寫的書籍數量遠遠不足以讓我們統 計出下一個單詞出現的實質概率,因為 40,000 個常用英語

像人類大腦的思考過程仍然是個謎一樣:人類是怎樣學習母 語的呢?我們大腦中的隱藏層又如何想出有人性的文句呢? 我們還未知道這些問題的答案。 真確性、偏見與可靠程度 GPT的一大問題是它有時會提出一些明顯是錯誤的說 法,而且對某些社會群體帶有偏見。我們已經看過它怎樣自 信地宣稱20 – 16 = 3,而在GPT-3其中一個舊版本中,它 曾聲稱咳嗽能阻止心臟病發作,美國政府是911事件的始 作俑者,甚至編造一些不存在的參考書目 [6, 7]。為甚麼會 出現這種情況呢?要記住的是,GPT 只是一個LLM,也就是 說它知道語言的文法,但不一定理解語義。早期的 LLM甚 至只有句法知識,而理解能力極度有限。 不過,這種劣勢即將被扭轉。在撰寫本文時,GPT 已經 宣佈與數學軟件及數據庫WolframAlpha [8]以及其他線 上數據庫合作,讓 GPT取得更準確的資訊,從而透過即時存 取數據庫的資訊來提高其答案的準確性,而不再是給出完全 由概率斷定的答案。 某程度上訓練GPT 或任何AI模型就像教導蹣跚學步 的嬰孩,他們來到這個世界並不知道甚麼是善惡對錯,因此 需要父母、老師和社會來教他們正確的行為。編程員扮演著 父母的角色,向系統輸入大量學習材料,並透過提供參考答 案和反饋監督系統學習。 我們可以透過告訴 GPT足夠多的資訊,迫使它在被問及 與事實相關的問題時說出真相,但觀點往往涉及人類的主觀 看法。如果你問 ChatGPT 它對大型鳥類有甚麼感覺,它會 自動回覆一條系統訊息:「作為一個人工智能語言模型,我 沒有個人觀點或感受。不過,我可以為你提供一些關於大型 鳥類的資訊。」(“As an AI language model, I don't have personal opinions or feelings. However, I can provide you with some information about large birds.”) 理論上我們可以叫GPT 撰寫一篇評論文章:透過研究 GPT 就某些主題提出的相關詞彙,我們就可以預測它將會 寫出一篇怎樣的文章。研究人員分析了在GPT-3 輸出的原 始答案中與性別和宗教相關詞彙同時出現的頭十個描述性 詞彙,他們觀察到「調皮(naughty)」或「糟糕(sucked)」 與女性代名詞有關聯,「伊斯蘭教(Islam)」通常被置於「恐 怖主義(terrorism)」附近,而「無神論(atheism)」則會與 「酷(cool)」和「瘋狂(mad)」一起出現 [4]。GPT為甚麼 會有這樣的偏見呢?請記住GPT是在選定的文本上進行訓 練,儘管大部分文本來自公開發表的文章和網路抓取,但為 了讓它掌握非正式用語,人們推測GPT 的訓練文本也包括 Reddit 等等的互聯網論壇,因此它可能內化了這些論壇裡 許多用戶所持的偏見。就像一個人很難做到不偏不倚一樣, 我們不能指望 GPT 對所有話題都保持絕對中立。 GPT-4 在某些工作上的能力已經遠遠超越人類,但我們 仍然不能把它視為一個完全中立的消息來源,也不應相信它 能提供 100% 準確的資訊,使用它時仍須保持謹慎,最好就 是像對待人一樣對待它,要記住:耳聽三分假,眼看未為真。 (中文版由筆者及 AI 翻譯器 DeepL合著寫成,有些部 分全由DeepL翻譯。讀者們,你們能分清誰寫了哪一段嗎?) 1 網路抓取:由一些網路爬蟲(web crawler;網路機器人的一種)定期抓取上 百萬個網站所得的網路快照,記錄了大量網站當刻的內容。這些下載內容可 於製作搜索引擎的互聯網索引和訓練 AI。 2 編按:這句是由語言學家Noam Chomsky 提出的著名例子,指出一句句子 可以是文法上正確,但語義上完全沒有意義。 (答案:「無窮創意」部分的三段長引文和關於大型鳥類 的自動回覆全由 DeepL 翻譯。) 7 References 參考資料: [1] Wolfram, S. (2023, February 14). What is ChatGPT doing...and why does it work? Stephen Wolfram Writings. [2] IBM. (2023). What is Artificial Intelligence (AI) ?. https:// [3] OpenAI. (2022, November 30). Introducing ChatGPT. [4] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., . . . Amodei, D. (2020). Language models are few-shot learners. arXiv. [5] Ruby, M. (2023, January 31). How CHATGPT works: The models behind the bot. Medium. https:// [6] University of Waterloo. (2023, June 19). ChatGPT and Generative Artificial Intelligence (AI): False and outdated information. generative_ai/falseoutdatedinfo [7] Lin, S., Hilton, J., & Evans, O. (2022). TruthfulQA: Measuring How Models Mimic Human Falsehoods. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 1, 3214-3252. https://doi. org/10.18653/v1/2022.acl-long.229 [8] Wolfram, S. (2023, March 23). ChatGPT Gets Its “Wolfram Superpowers”! Stephen Wolfram Writings. https://writings.

Unleashing Nature’s Plastic-Eating Marvels! By Roshni Printer Plastic waste management has been an ongoing global challenge, demanding both urgent attention and innovative solutions. Imagine, for a moment, that nature itself had provided us with such solutions! In the fight against plastic pollution, humans have discovered the incredible ability of worms and bacteria to naturally degrade plastic waste [1]. Let’s delve into how these creatures digest plastic, the biology behind it and the future of these discoveries. Digestion of Polyethylene (PE) by Wax Worms In 2017, Federica Bertocchini and her colleagues in Spain published a ground-breaking research paper highlighting the degradation of polyethylene (PE) plastic by a wax worm, Galleria mellonella [2]. As an amateur beekeeper, Bertocchini noticed that the wax worms, which are commonly found in beehives and feed on beeswax as pests, seemed to be able to chew through plastic bags she used to collect and dispose them. Intrigued by this observation, she decided to conduct a more systematic study to further explore the potential of wax worms as a solution for plastic waste degradation. To ensure that the observation was not only due to the physical chewing motion of the worms, further experiments were conducted, in which the saliva extracted from the worms was spread on a PE film. The results showed a significant loss of mass of the PE within a few hours, which is comparable to the weathering effect generated by exposing the plastic to the environment for months or years [1]. This raised one major question: How could the wax worm saliva break the strong carbon-carbon bonds in plastic? The mechanism, which is still under scrutiny, can be attributed to enzymatic reactions [1]. PE is a plastic polymer, essentially a long-chain hydrocarbon (Figure 1). To initiate the degradation of PE, oxygen needs to be introduced into the polymeric chain to form carbonyl groups (C=O). Typically, abiotic factors like light or temperature are responsible for this crucial first step which is regarded as the bottleneck of the whole process. However, this can be accelerated by the two 出來吧!自然界蠶吃 塑膠的生物

9 oxidases identified in wax worm saliva. In addition, the gut microbiome of wax worms also appears to involve in the digestion of PE, with the genus Acinetobacter suggested to be the major contributor to the effect [3]. Degradation of Polyethylene Terephthalate (PET) by Bacteria Around the time of Bertocchini’s discovery, a group of scientists in Japan also discovered the ability of bacteria to degrade a different type of plastic [4]. Named Ideonella sakaiensis, the bacterium was able to degrade polyethylene terephthalate (PET), the main component of plastic bottles. This bacterium produced two digestive enzymes known as PETase (or PET hydrolase) and MHETase to dismantle the polymer (Figure 2). The former acts on the ester bonds in PET, breaking the polymer down into its monomers, mono(2-hydroxyethyl) terephthalic acid (MHET); the latter further breaks down MHET into terephthalic acid (TPA) and ethylene glycol (EG). Further metabolisms enable the utilization of these compounds as energy and carbon sources by the bacterium. To have any real impact on the degradation of plastic waste, the stability and efficiency of individual enzymes need to be tremendously enhanced – this is precisely what scientist Hal Alper has been working on [5]. Using artificial intelligence, his team ran through a database of enzymes to devise an optimal combination of mutations that would speed up the degradation of PET. When five mutations were introduced to the wild-type PETase, the resulting enzyme FAST-PETase could nearly completely degrade untreated, postconsumer-PET in one week, and work between 30 °C and 50 °C and various pH levels. Furthermore, scientists have also successfully combined PETase and MHETase by physically connecting the two enzymes with a linker peptide to create a “superenzyme” capable of degrading PET at a rate six times faster than using PETase alone [6, 7]. These approaches hold immense potential for accelerating the decomposition of PET, taking us one step closer to solving the real-world problem on plastic waste management. More interestingly, there was another “delicious” breakthrough made by a team of scientists at Edinburgh [8]. They found an enzymatic pathway to convert post-consumer PET waste into vanillin – the main component in vanilla flavoring! Once the PET plastic was broken down into TPA and EG, genetically engineered Escherichia coli bacteria expressing five different enzymes were added to the degradation products, which results in a step-by-step synthesis of vanillin from TPA at a conversion rate of up to 79%. This biosynthetic pathway offers us with a way to upcycle plastic waste, creating a product with a higher value. The race to further harness enzymes for plastic degradation is underway, and could open up the possibility for a cleaner future. With the aid of the power of nature, we are closer to turning the tables on plastic pollution. Figure 1 Chemical structure of PE. Figure 2 Degradation pathway of PET 處理塑膠廢料一直都是全球面對的挑戰,亟需各方關 注和創新的解決方案。然而試想想,如果大自然早已給予 我們解決方案呢?在與塑膠污染的搏鬥中,人類已經發 現蠕蟲和細菌驚人的塑膠分解能力 [1]。讓我們看看這 些生物如何分解塑膠,過程背後的原理,以及未來發展 的方向。

由蠟蟲分解聚乙烯(PE) 2017 年, 西 班 牙 科 學 家 Federica Bertocchini 和同事發表了一篇突破性的研究論 文,內容提及蠟蟲Galleria mellonella分解聚乙 烯(polyethylene / PE)的能力 [2]。作為業餘的 養蜂人,Bertocchini 發現這種經常在蜂巢出沒, 以蜂蠟為食物的害蟲似乎能夠嚼穿 Bertocchini 用來收集並棄置牠們的膠袋。被這個現象深深吸 引的 Bertocchini 決定進行有系統的研究探索把 蠟蟲應用於降解塑膠廢料的可能性。 為了證明蠟蟲並不只是把膠袋嚼碎,研究人 員把從蠟蟲抽取的唾液塗抹在PE薄膜上,結果 大部分PE在僅僅數小時內就已被分解,程度可 比把塑膠曝露在環境中以月或年計的侵蝕作用 [1]。於是問題來了:蠟蟲唾液到底是怎樣破壞塑 膠中較強的碳-碳鍵呢?這個仍被研究中的機制 可以歸類為酶催化作用 [1]。PE是一種塑膠聚合 物,基本上是一條長鏈碳氫化合物(圖一)。要開 始 PE 的降解,氧要被引進聚合物長鏈中形成羰基 (C=O)。一般這被認為是瓶頸位的第一步是由 光和熱這些非生物因子負責,但在蠟蟲唾液發現 的兩種氧化酶亦能催化這重要的一步。此外,蠟 蟲的腸道菌群亦似乎有參與 PE 的分解,當中不動 桿菌(Acinetobacter)屬被認為是推動反應的 主要細菌 [3]。 由細菌分解聚對苯二甲酸乙二酯(PET) 約莫在Bertocchini作出突破性發現的同 時,日本的科學家亦發現了細菌分解另一種塑膠 的能力 [4]。這種被稱為Ideonella sakaiensis 的細菌能分解膠樽的主要成分 — 聚對苯二甲酸 乙二酯(polyethylene terephthalate / PET)。細菌會 製造兩種分別叫PET酶和MHET酶的消化酶分解聚合 物(圖二);前者拆解PET結構中的酯鍵,把聚合物分解 成單(2- 羥乙基)對苯二甲酸(mono(2-hydroxyethyl) terephthalic acid / MHET)單體,後者續把MHET分解 成對苯二甲酸(TPA)和乙二醇(EG)。進一步的代謝作用 使細菌能使用這些化合物作能量和碳來源。 圖一 PE 的化學結構 然而要有效分解現實的塑膠廢料,這些酶的穩定性和 效率還需要大大提升 — 這正是科學家Hal Alper正著手 解決的問題 [5]。他的團隊利用人工智能在酶資料庫裡尋 找可以加速 PET分解的最佳突變組合,結果透過把五個突 變引入野生型 PET 酶,他們創造出的 FAST-PET 酶在一 週內就幾乎把未經處理的PET回收製品完全分解,而且 還能在 30 ° C 至 50 ° C 和多個 pH 值間運作。此外,科學 家亦成功利用連接肽把 PET 酶和 MHET 酶連接起來,合 二為一的「超級酶」降解 PET 的速度是使用單一 PET 酶 的六倍 [6, 7]。這些策略都有望加速PET廢料的降解,使 我們離解決現實塑膠廢料的問題邁進了一步。 更有趣地,愛丁堡的科學家亦作出了另一個「美味」的 突破 [8]。他們發現了把PET回收製品轉化成雲呢拿主要 成分香草醛(vanillin)的酶催化途徑。當 PET 塑膠被分解 成 TPA 和 EG 後,他們在降解產物中加入經基因改造的大 圖二 PET 的降解途徑

11 References 參考資料: [1] Sanluis-Verdes, A., Colomer-Vidal, P., Rodriguez-Ventura, F., Bello-Villarino, M., Spinola-Amilibia, M., Ruiz-Lopez, E., Illanes-Vicioso, R., Castroviejo, P., Aiese Cigliano, R., Montoya, M., Falabella, P., Pesquera, C., GonzalezLegarreta, L., Arias-Palomo, E., Solà, M., Torroba, T., Arias, C. F., & Bertocchini, F. (2022). Wax worm saliva and the enzymes therein are the key to polyethylene degradation by Galleria mellonella. Nature Communications, 13(1), 5568. [2] Bombelli, P., Howe, C. J., & Bertocchini, F. (2017). Polyethylene bio-degradation by caterpillars of the wax moth Galleria mellonella. Current Biology, 27(8), R292– R293. [3] Cassone, B. J., Grove, H. C., Elebute, O., Villanueva, S. M. P., & LeMoine, C. M. R. (2020). Role of the intestinal microbiome in low-density polyethylene degradation by caterpillar larvae of the greater wax moth, Galleria mellonella. Proceedings of the Royal Society B: Biological Sciences, 287(1922), 20200112. rspb.2020.0112 [4] Yoshida, S., Hiraga, K., Takehana, T., Taniguchi, I., Yamaji, H., Maeda, Y., Toyohara, K., Miyamoto, K., Kimura, Y., & Oda, K. (2016). A bacterium that degrades and assimilates poly(ethylene terephthalate). Science, 351(6278), 1196–1199. aad6359 [5] Lu, H., Diaz, D. J., Czarnecki, N. J., Zhu, C., Kim, W., Shroff, R., Acosta, D. J., Alexander, B. R., Cole, H. O., Zhang, Y., Lynd, N. A., Ellington, A. D., & Alper, H. S. (2022). Machine learning-aided engineering of hydrolases for PET depolymerization. Nature, 604(7907), 662–667. https://doi. org/10.1038/s41586-022-04599-z [6] Knott, B. C., Erickson, E., Allen, M. D., Gado, J. E., Graham, R., Kearns, F. L., Pardo, I., Topuzlu, E., Anderson, J. J., Austin, H. P., Dominick, G., Johnson, C. W., Rorrer, N. A., Szostkiewicz, C. J., Copié, V., Payne, C. M., Woodcock, H. L., Donohoe, B. S., Beckham, G. T., & McGeehan, J. E. (2020). Characterization and engineering of a two-enzyme system for plastics depolymerization. Proceedings of the National Academy of Sciences of the United States of America, 117(41), 25476–25485. https://doi. org/10.1073/pnas.2006753117 [7] University of Portsmouth. (2020, September 28). New enzyme cocktail digests plastic waste six times faster. new-enzyme-cocktail-digests-plastic-waste-six-timesfaster [8] Sadler, J. C., & Wallace, S. (2021). Microbial synthesis of vanillin from waste poly(ethylene terephthalate). Green Chemistry, 23(13), 4665–4672. d1gc00931a 腸桿菌(Escherichia coli),細菌會表達五種不同的酶將 TPA 逐步轉化成香草醛,轉化率更高達 79%。這個生物合 成途徑為我們提供升級改造(upcycle)塑膠廢料的方法, 製造出擁有更高價值的產物。 利用酶降解塑膠的競賽正進入白熱化階段,可望為我 們締造更潔淨的未來。借助大自然的力量,我們在解決塑 膠污染問題上離扭轉劣勢又進了一步。

What happens when several drops of the liquid metal gallium are added to an aluminum can? Our high school chemistry class taught us that nothing would happen. But if you wait for a while, you will be surprised to see the can shatters into pieces with just a single touch. Are our chemistry teachers wrong? No. The shattering of aluminum cans upon exposure to gallium is not caused by a chemical reaction. Instead, it is the result of a physical phenomenon known as liquid metal embrittlement (LME). Although a metal object (say an aluminum can) may appear as a single piece, it actually consists of many small crystals called grains. As shown in Figure 1, when the can comes into contact with specific liquid metals like gallium, the latter can penetrate the boundaries, or spaces, between the grains [1]. This significantly weakens the cohesion of the grains and hence the strength of the aluminum can, making it susceptible to fracture. Thinking Out of the Box: Removing Medical Devices with a Liquid Metal 有何不可:利用液態金屬移除醫療裝置 By Helen Wong 王思齊 Figure 1 The process of liquid metal embrittlement (LME) [2]. While LME has been a common source of metal structure failure in industries such as aerospace and construction, a group of researchers at the Massachusetts Institute of Technology recently “harnessed this failure mechanism in a productive way [2, 3].” Metals have properties ideal for making

13 如果將數滴液態金屬鎵(gallium)加入一個鋁罐會發 生甚麼事呢?高中化學課告訴我們甚麼都不會發生,但在 片刻之後,你會驚訝地發現只需輕輕一碰,鋁罐就會化為 碎片。 難道我們的化學老師弄錯了嗎?非也。鋁罐接觸鎵後 會碎裂確實不是由化學反應引起,而是由一種名為液態金 屬脆化(liquid metal embrittlement / LME)的物理現 象導致。 雖然金屬物體(例如鋁罐)看起來是一個整體,但實際 上是由許多叫晶粒(grains)的細小晶體組成。如圖一所 示,當鋁罐與例如鎵等特定的液態金屬接觸時,後者可以 穿透晶粒之間的空隙,亦即是晶粒邊界 [1],晶體之間的內 聚力因而會被顯著削弱,導致鋁罐的強度下降,變得容易 碎裂。 雖然 LME 一直是航空、航天和建築等業界中導致金屬 結構失效的常見原因,但麻省理工學院的研究人員最近卻 「以具建設性的方式利用了這種失效機制 [2, 3]」。 Figure 2 Smearing EGaIn paint onto a staple to remove the device. 1 Editor’s note: In addition to the camera to look inside the body, various tools can be attached to the tip of the endoscope, such as grasping forceps (for retrieving foreign objects), and biopsy forceps (for performing biopsies). It may take some time before these dissolvable metal devices are ready for clinical use, but the genuine creativity demonstrated in this study is immediately apparent. While most people perceive LME as a failure mechanism, the researchers thought out of the box to turn such a mechanism into a productive one. At times, good research does not require highly sophisticated methods; a touch of creativity can make all the difference. biomedical devices: They are strong, durable, and have excellent electrical and thermal conductivity. However, a major problem when using metal devices is the way to remove them when they are not required anymore. This can possibly be done by surgery or endoscopy (footnote 1), yet these invasive procedures may cause additional tissue damage. Therefore, the researchers started exploring devices that can disintegrate inside the patient’s body after use. Drawing inspiration from LME, the research team experimented on the use of a gallium alloy called eutectic gallium-indium (EGaIn) for the dissolution of different aluminum devices. Gallium stands out from other LME-inducing liquid metals for two reasons. First, it can prevent the formation of a surface oxide layer on the aluminum device upon application. This allows aluminum to react with water and enhances its degradation via dissolution. More importantly, gallium is biocompatible – acute toxicity studies showed EGaIn is non-toxic to rodents even at high doses. The next step is to deliver gallium-indium to aluminum devices, either directly or indirectly. The former involves smearing EGaIn paint onto devices such as staples used to hold the skin together (Figure 2). This may appear trivial, but it is not an easy task. Like water, EGaIn has high surface tension that hinders its ability to attach to and spread over metal surfaces. Knowing that gallium oxide has a much lower surface tension, the researchers applied a simple trick – physically stirring EGaIn beforehand – to increase the alloy’s exposure to air and hence the ratio of gallium oxide to EGaIn in the paint. Alternatively, nano- and microparticles of EGaIn were produced for delivery into patients’ bodies to trigger the dissolution remotely. The team treated different biomedical devices made of aluminum, such as staples on skin and stents implanted in the esophagus, with EGaIn suspensions and found that these metal structures were broken down shortly afterward. Although gallium-induced embrittlement works well for aluminum devices, what about devices made of other metals? For instance, esophageal stents are often made of metals such as nitinol, a nickel-titanium alloy, instead of aluminum. To widen the applicability of LME in the removal of biomedical devices, the researchers have also been exploring the possibility of creating dissolvable devices made of nitinol and other metals commonly used in medical settings.