3 this: “I have to go to the university of the Pacific ocean and I will be there in about to go to bed now.” Sounds legit at first, but it descends quickly into gibberish (there is no University of the Pacific Ocean, for example). That is because auto-correct only picks up on patterns in language without comprehending the actual meanings – it won’t know that “Colorless green ideas sleep furiously” is complete nonsense (Footnote 2) [5]. ChatGPT is more intelligent than autocorrect. First of all, it generates a list of probabilities of possible next words. Let’s use a simpler GPT-2 system for demonstration – after the clause “The best thing about AI is its ability to…,” the GPT would generate the list of words in Table 2 [1]. GPT is part of the large language model (LLM) family. The main working principle behind LLM sounds familiar to mathematicians: approximation (or more accurately, mathematical modeling). Given the following series of points in Figure 3 [1], how would you plot a graph? The easiest option seems to be a straight line, but we can do better with a quadratic equation, ax2 + bx + c. So we can say ax2 + bx + c is a good model of these points, and start making predictions with the model. Just as mentioned before, the amount of text we have is far from adequate for us to empirically calculate a probability of the occurrence of the next word, because the 40,000 common English words can already give 1.6 billion (40,000P2) combinations [1]. The model, GPT, works essentially because an informed guess can be made to choose a good enough “line” to fit in the graph, covering the missing parts with theoretical values. To this date we don’t really have a good understanding of how the computer does it, just as we don’t really know how our brain does “intuitive” tasks, but the developers of GPT can adjust the “weight” of the output of each neuron in the network to achieve the optimal results after each learning process. In other words, we train the neural net to find the “best-fit curve” via machine learning. Eventually, the goal is for GPT to predict what comes after a phrase in an How are these probabilities generated? First of all, it is not possible to just infer them from existing writing, as we don’t have nearly enough text that is accessible under copyright for models to train on. Instead, we use a little bit of mathematics to help us out. 學習(learn) 4.5% 預測(predict) 3.5% 製作(make) 3.2% 理解(understand) 3.1% 做(do) 2.9% Table 2 A list of probabilities of possible next words generated by GPT-2 after the clause “The best thing about AI is its ability to…” [1] 表二 由GPT-2預測「人工智能最棒的地方在於它能夠……」之後 下一個可能詞語出現的概率列表 [1]。 Figure 3 Fitting an equation of straight line (top) and a quadratic equation (bottom) into a series of given points [1]. 圖三 嘗試用直線方程（上）及二次方程（下）解釋橙色點的分佈 [1]。

RkJQdWJsaXNoZXIy NDk5Njg=