[해외 DS] 챗GPT의 원리와 범용인공지능의 가능성

챗GPT, 트랜스포머 알고리즘으로 단어 간 맥락 파악 지성체가 아니라 ‘문답’ 형식 위해 인간 개입 한 것에 지나지 않아 하드웨어 발전으로 계산 속도 빨라진 것 뿐, 범용인공지능 시대는 아직 ‘설레발’

pabii research

[해외DS]는 해외 유수의 데이터 사이언스 전문지들에서 전하는 업계 전문가들의 의견을 담았습니다. 저희 데이터 사이언스 경영 연구소 (MDSA R&D)에서 영어 원문 공개 조건으로 콘텐츠 제휴가 진행 중입니다.


사진=Scientific American

2023년 상반기 인공지능 업계를 가장 뜨겁게 달궜던 키워드는 단연 챗GPT일 것이다. 챗GPT는 단 몇 초만에 물리학에 관한 논문을 써주는 것은 물론 여행 일정을 대신 짜주거나, 코딩을 대신 해주는 등 일상적인 요청도 무리없이 해낸다.

그러나 챗GPT는 이따금씩 무능한 모습을 보이며 인간에게 당혹감을 안겨주기도 한다. 대규모 언어 모델(Large Language Model, LLM)의 고질적인 문제로 지적받는 ‘환각(Hallucination)’이 대표적인 예다. 챗GPT 모델의 특성상 학습 데이터가 출처가 불분명한 인터넷의 데이터를 포함하고 있는 만큼 종종 잘못된 정보를 사실인 것 처럼 ‘그럴듯’하게 출력해 사용자들을 속이기도 한다.

챗GPT의 유용성에 대해서는 대부분 인정하는 분위기다. 대규모 자연어 데이터를 학습한 만큼, 사용자가 질문할 수 있는 다양한 상황에 맞춰 유연한 답변을 출력할 수 있기 때문이다.

챗GPT의 핵심, ‘T(Transformer)’

챗GPT는 ‘사전 학습된 대화 생성 트랜스포머(Chat Generative Pre-trained Transformer)’의 약자로, 오픈AI가 대규모 언어 모델을 기반으로 개발한 대화형 인공지능이다. 저장된 데이터에 따라 간단한 답변만을 내놓던 기존의 챗봇과는 달리, 챗GPT는 미리 학습을 끝낸 뒤 프롬프트(prompt, 사용자의 질문)에 맞춰 약간의 수정을 가해 문맥에 맞고 일관성 있는 텍스트를 자체적으로 만들어낸다.

챗GPT에서 중요한 건 G도, P도 아닌 바로 T(Transformer, 트랜스포머)다. 트랜스포머는 문장 속에 순차적으로 나열되어 있는 단어와 같은 ‘시퀀스(sequence) 데이터’의 맥락과 의미를 학습하는 신경망(Neural Network, NN) 모델이다. 이는 ‘어텐션 메커니즘(Attention Mechanism)’으로도 불리는데, 구글에서 발표한 논문 제목이었던 ‘Attention Is All You Need(가장 중요한 건 어텐션이다)’에서 유래된 이름이다.

트랜스포머 모델이 학계 및 인더스트리 가릴 것 없이 엄청난 주목을 받았던 부분은 수학・통계학적 기법을 응용해 서로 떨어져 있는 단어(데이터) 간 관계에 따라 뜻이 미묘하게 달라지는 부분까지 민감하게 감지해 낼 수 있다는 점이다. 예를 들어, “영희는 주전자에 담겨 있는 물을 컵에 따랐다. 그것이 가득 찰 때까지”와 “영희는 주전자에 담겨있는 물을 컵에 따랐다. 그것이 텅 빌 때까지”라는 두 가지 문장을 살펴보자. 첫 번째 문장에서의 “그것”과 두 번째 문장에서의 “그것”은 텍스트 그 자체로는 같지만, 각각 “컵”과 “주전자”의 서로 다른 뜻을 갖는다. 이와 관련해 그간 기존 챗봇 모델들은 “그것’이 가지는 두 가지 의미를 분간하지 못했는데, 챗GPT 시대가 열리면서 인공지능이 해당 문장들의 두 가지 뜻을 구분하기 시작한 것이다.

트랜스포머 모델의 두 가지 학습 방식, ‘빈칸 맞추기’와 ‘파인 튜닝’

트랜스포머 모델은 두 가지 방식을 통해 학습이 이뤄진다. 첫 번째로 ‘빈칸 맞추기’ 연습을 무한히 반복하며 지식을 학습한다. 예를 들어 “나는 물을 ___” 라는 문장이 있을 때, 해당 문장의 마지막 빈칸에 들어올 단어가 무엇인지 맞추는 과정을 반복한다. 이 때 연구자들은 “나는 물을”로 시작하는 방대한 양의 문장을 구해 챗GPT에게 문제를 맞추게 하고, 특별히 많이 나타나는 답을 찾아내도록 한다. 해당 문장의 예로 다시 돌아와보면 빈칸에 “나는 물을 필통”, “나는 물을 핸드폰” 등의 명사가 들어가기 보다는 “마셨다”가 가장 자주 정답이 되게끔 인식시키는 것이다. 이는 달리 말하면 “물”이라는 단어와 “마셨다”라는 단어 사이에 관련성이 높다는 것을 학습시키는 것과 같은 뜻이다. 또한 이같은 방식으로 “물”이 “컵”, “냉장고” 등과 관련이 있다는 것도 학습하게 될 것이다. 여기서 더 나아가 챗GPT는 단어를 연쇄적으로 이어 붙이며 “나는 물을 마셨다. 그리고 밥도 먹었다” 등의 새로운 문장을 쓰고, 논문 한 편에 해당하는 긴 글도 써낼 수 있게 된다.

사실 이같은 ‘빈칸 맞추기’ 학습은 이미 오래 전 네이버, 구글 등 대형 검색 포털의 검색어 예측 기능, 스마트폰의 문장 자동완성 기능에도 탑재된 RNN(Recurrent Neural Network, 순환신경망) 알고리즘과도 비슷하다. 다만 차이가 있다면, 트랜스포머 모델은 기존의 RNN에 자연어의 고유한 특성인 문법, 미묘한 맥락 차이 등을 추가로 반영한 고도의 ‘문장 자동완성 기계’라고 볼 수 있다.

두 번째로 트랜스포머 모델은 사용자가 ‘질문’을 하면 ‘답’을 하도록 학습을 시킴으로써 비로소 ‘고급 챗봇’ 역할을 할 수 있게 된다. 즉 트랜스포머 모델에 ‘질문-답’ 형식의 글을 반복적으로 학습시켜 사용자가 ‘질문’을 던졌을 때 GPT가 생성하는 글이 ‘답’ 형식을 띄도록 하는 것이다. 예를 들어 GPT에게 “반찬은 어디에 있나요?”라는 질문과 “냉장고에 있습니다”라는 답을 학습시키고 나면 이후 GPT는 문답 패턴을 최대한 복원하려고 하면서, 앞서 학습한 “물-정수기” 관계를 적용해 “물은 어디에 있나요?”에 “냉장고”라고 답변할 수 있게 된다. 또한 이처럼 “반찬-냉장고” 관계를 “물-정수기”로 미세하게 조정한다고 해서 해당 학습 과정을 ‘fine-tuning(파인-튜닝)’이라고 부른다.

범용인공지능 시대, 아직은 많이 멀었다

챗GPT를 사용하다 보면, 확실히 “어떻게 이런 질문에도 대답할 수 있지”라고 놀라는 순간이 있다. 이로 인해 일각에서는 곧 인간에 견줄 수준의 지능을 가진 ‘범용인공지능(General Artificial Intelligence)’가 등장하는 것 아니냐는 기대감을 갖기도 한다.

특히 인공지능과 신경과학(neuroscience)적 고찰을 결합한 강화학습(reinforcement learning) 알고리즘을 통해 기존의 인공지능 기술이 새로운 국면을 맞이할 것이라는 기대감이 높아지고 있다. 인간은 제한된 경험(데이터)으로도 효율적으로 학습하고 외부 환경 변화에 알맞게 대처하는 능력을 가지고 있는데, 이같은 능력을 강화학습 기술에 적용해 인공지능 기술의 새로운 지평을 열겠단 얘기다. 예컨대 최근 연구에서는 강화학습 등의 알고리즘으로도 풀리지 않는 공학적 난제를 인간의 두뇌가 이미 해결하고 있다는 사실의 기반한 ‘전두엽 메타 제어’ 이론을 활용해 단일 인공지능이 외부 상황변화에 견고하게 대응하도록 설계하는 것은 물론, 다수의 인공지능 개체가 서로의 전략을 이용해 ‘협력’ 및 ‘경쟁’의 균형점을 유지하고자 한다.

조금 더 현실적인 차원에서, 인간 두뇌의 작동 방식에 대한 한 가지 설명인 ‘모듈 이론’을 챗GPT에 적용하려는 움직임도 포착되고 있다. ‘모듈 이론’에 따르면 인간의 두뇌는 우리가 매일 수행하는 다양한 활동(말하기, 기억, 사회적 관계 등)에 대한 각각의 ‘모듈’을 바탕으로 작동되는데, 최근 오픈AI에서 이같은 ‘모듈 이론’에 착안해 수학 엔진, 물리학 엔진 등 파이썬(Python) 형식의 다양한 플러그인(Plug-in)을 내놓고 있다.

그러나 대부분의 AI 전문가들은 범용인공지능에 대한 ‘기대감’이 ‘설레발’이 되지 않도록 주의를 당부한다. 최근 하드웨어적인 발전으로 인공지능이라고 포장되는 1980년대 수학인 비선형 패턴 매칭(Non-linear pattern matching)이 조금 더 계산을 빠르게 할 수 있게 된 것 뿐이지, 인류의 지식이 갑자기 크게 진일보한 게 아니라는 지적이다. 또한 챗GPT 기술 역시 그 근간에는 다양한 ‘질문’에 대해 대답할 수 있도록 오픈AI 직원들이 천문학적인 시간과 돈을 들여 ‘직접’ 스크립트를 작성했던 것이지, 인간을 뛰어넘는 새로운 지성체가 나타난 것이 아니라는 비판이다.


Sophie Bushwick:Today we’re talking Large Language Models. What they are. How they do what they do, and what ghosts may lie within the machine. I’m Sophie Bushwick, tech editor at Scientific American.

George Musser:I’m George Musser, contributing editor.

Sophie Bushwick:And you’re listening to Tech Quickly, the AI-obsessed sister of Scientific American’s Science Quickly podcast.

[Intro music]

Bushwick:My thoughts about large language models, which are those artificial intelligence programs that analyze and generate text, are mixed. After all, ChatGPT can perform incredible feats, like writing sonnets about physics in mere seconds, but it also displays embarrassing incompetence. It failed to solve multiple math brain teasers even after lots of help from the human quizzing it. So when you play around with these programs, you’re often amazed and frustrated in equal measure. But there’s one thing that LLMs have that consistently impresses me, and that’s these emergent abilities. George, can you talk to us a little bit about these emergent abilities?

Musser:So the word emergence has different meanings in this context. Sometimes, these language models develop some kind of new ability just because they’re so ginormous, but I’m using the word emergent abilities here to mean that they’re doing something they weren’t really trained to do, they’re going beyond their explicit instructions that they’ve been given.

Bushwick:So let’s back up a little and talk about how these models actually work and what they are trained to do.

Musser:So these large language models work sort of like an auto correct on your phone keyboard. They’re trained on what are likely completions of what you’re typing. Now, they’re obviously a lot more sophisticated than that keyboard example. And they use different computational architectural techniques. The leading one is called a transformer. It’s designed to transform cues that we developed from context. So we know what a word is because of the words that are around it.

Bushwick:And transformer, that’s the ‘T’ of GPT. Right? It’s a generative pre-trained transformer.

Musser:Exactly. So that’s one component is the so-called transformer architecture. It goes beyond the old or older, it’s not that old neural network architecture that’s models on our brains. So another component that they’ve added is the training regimen. They’re basically trained on like a peekaboo system where they’re shown part of a scene. Well, if they’re trained on visual data, but part of texts, if they’re trained on text, and then they’re trained to try to fill in the blanks on that. And that’s a very, very stringent training procedure. If you had to go through that, if you were given half a sentence had to fill in the rest of the sentence, you would have to learn grammar, if you had known grammar, you’d have to learn knowledge of the world, if you hadn’t known that knowledge of the world. It’s almost like Mad Libs, or fill-in-the blank training. So that is a hugely demanding training procedure that gives it these emergent capabilities. And then, on top of all that, it has a fine-tuning so-called procedure where not only will it autocomplete what you’ve typed in, but it’ll actually try to construct a dialogue with you, and it will come back and speak to you as if it were another human. And you know, it’s acting, it’s responding to your queries in a dialogue format. And that’s pretty amazing, as well, that it can do that. And if these are features that people didn’t really expect AI systems to have for another decade or so.

Bushwick:And what’s an example of something that it does that goes beyond just filling in part of a sentence, or even engaging in dialogue with people? One of these abilities that are being called emergent abilities.

Musser:This is really cool because every AI researcher you talk to on this has his or her or their own example of the aha moment of something it was not meant to do, and yet it did. So one researcher told me about how it drew a unicorn, he asked it, draw me a unicorn. Now it doesn’t have a drawing capacity doesn’t have like an easel and brushes. So it had to create the unicorn out of graphical programming language. So you have to consider the number of steps that are required, it had to extract a notion of a unicorn from internet text. It had to abstract out from that notion, kind of the essential features of a unicorn, sort of like a horse it has a horn, etc. And then it had to learn separately, a graphical programming language. So its ability to synthesize across vastly different domains of knowledge is just astounding, really.

Bushwick:So that sounds really impressive to me. But I’ve also read some critics saying that some of these abilities that seem so impressive happened because all this information was just in the training data for the large language model so it could have picked it up from that, and they’ve sort of criticized the idea of calling these emergent abilities in the first place. Are there any examples of LLMs doing something that you’re like, wow, I have no idea how they got it from that training data.

Musser:There’s always a line you can draw between its response and what was in its training data. It doesn’t have any magical ability to understand the world, it is getting it from its training data. It’s really the ability to synthesize to pull things together in unusual ways. And I think a kind of middle ground is emerging among the scientists who discover this, who they’re not dismissive and saying, oh, it’s just AutoCorrect. It’s just parroting what it knew. And to the other extreme, oh, my God, these are Terminators in the making. So there’s kind of a middle ground, you can take and say, well, they really are doing something new and novel that’s unexpected. It’s not magical. It’s not like achieving sentience, or anything like that. But it’s going beyond what was expected. And, you know, as I said, every researcher has his or her their own example of, whoa, how the freak did it do that. And skeptics will say, I bet that it can’t do this. Next day, it did that. So it’s going way beyond what people thought.

Bushwick:And when scientists say how does it do that? Can they look into the the sort of black box of the AI to figure out how it’s doing these things?

Musser:I mean, that’s really the main question here. It’s very, very hard. These are extremely complicated systems, the number of neurons in them is on par on the neurons in a human are certainly a mammal brain. But they’re using, in fact, techniques that are inspired by the techniques of neuroscience. So the same kinds of ways that neuroscientists try to access what’s in our heads, the AI researchers are doing to these systems as well. So in one case, they create basically artificial strokes, artificial lesions in the system, they zap out, or they temporarily disable some of the neurons in the network, and see how that affects the function, does it lose some kind of functionality, and then you could say, ah, then I can understand where that functionality is coming from, it’s coming from this area of the network. Another thing they can do, which is analogous to inserting an electrical probe into the brain, which has been done in many cases for humans and other animals, is to insert a probe network, a tiny little network that’s much smaller than the main one into the big network, and see what it finds. And in one case I was very struck by, they trained a system on Othello, the board game, and inserted one of these probe networks into the main network. And they found that the network had a little representation of the game board built within it. So it wasn’t just parroting back game moves, ‘I think you should put the black marker on, you know, this square,’ it was actually understanding the game of Othello and playing according to the rules.

Bushwick:So when you tell me things like that, like the the machine learning the rules of Othello building a model of the game board, or a representation of the game board within its system, that makes me think that, you know, as these models keep developing, as more advanced ones come out, that these abilities could get more and more impressive. And so this brings us back to something you mentioned, which is AGI or artificial general intelligence, this idea of an AI with the flexibility and capability of a human. So do you think there’s any way that that kind of technology could emerge from these?

Musser:I think absolutely. Some kind of AGI is definitely in the foreseeable future. I mean, I hesitate to put the number of years on it, one researcher said within five years, we’ll see something that’s like an AGI — maybe not a human level, but a dog level or at rat level, which would still be pretty impressive. The large language models themselves alone don’t really qualify as AGI. They’re general in the sense that they can discourse about almost any piece of information or human knowledge that’s on the on the internet in text form. But they don’t really have a stable identity, a sense of self that we associate with most certainly animal brains, they still hallucinate confabulate, they could have a limited learning ability, but you can’t put them through college. They don’t have this ongoing learning capacity. That really is what’s so remarkable about mammals and humans, absolutely. So I think the large language models are basically solved as far as the AI researchers are concerned the problem of language, they got the language part. So now they have to bolt on the other components of intelligence such as symbolic reasoning, our ability to intuit what physics is that things should fall down or break, etc. And those can be kind of put on in a modular way. So you’re seeing a modular approach that’s now emerging to artificial intelligence.

Bushwick:We talk about modular AI that sounds like what I’ve heard about plugins, these programs that work with an LLM to give it extra abilities like a program that can help an LLM do math.

Musser:Yes. So the plugins that OpenAI has introduced with GPT, and that the other tech companies are introducing with their own versions of that are modular, in a sense that’s thought to be roughly similar to what happens in animal brains. I think probably you’d have to go even further than that to get something that’s truly an artificial general intelligence system. Still, plugins are still invoked by human user. If you give a query to ChatGPT, it’s capable of looking at the answer on an internet search. It can run a Python script, for example, it could call up a math engine. So it’s getting at the modular nature of the human brain, which has multiple components also that we call on in different circumstances. And whether that particular architecture will be the way to AGI it’s certainly showing the way forward.

Bushwick:So are AI researchers really excited about the idea that AGI could be so close?

Musser:Yeah, they’re tremendously excited. But they’re also worried they’re worried that they’re the dog that’s about to catch the fire hydrant, because it’s just like, the AGI has been something they’ve wanted for so long. But as you begin to approach it, and begin to see what it’s capable of, you also get very worried — and a lot of these researchers are saying, well, you know, maybe we need to slow down a little bit, or at least, slow down is maybe not the right word. Some actually do want to slow down some do want to pause or moratorium, but there’s definitely a need to enter a phase of understanding, of understanding what these systems can do. They have a number of latent abilities in other words, abilities that are not explicitly programmed into them that which they exhibit when they’re being used. That haven’t been fully catalogued. No one really still knows what ChatGPT even in its current incarnation can do. How it does it still an open scientific question? So I think before we you know, have the the Skynet scenarios we’ve got more immediate, a) intellectual questions about how these systems work and b) societal questions about what these things might do in terms of algorithmic bias or misinformation.

Bushwick:Tech, Quickly, the most technologically advanced member of the Science, Quickly podcast family, is produced by Jeff DelViscio, Tulika Bose, Kelso Harper and Carin Leong. Our show is edited by Elah Feder and Alexa Lim. Our theme music was composed by Dominic Smith.

Musser:Don’t forget to subscribe to Science Quickly wherever you get your podcasts. For more in-depth science news and features, go toScientificAmerican.com. And if you like the show, give us a rating or review!

Bushwick:For Scientific American’s Science Quickly, I’m Sophie Bushwick.

Musser:I’m George Musser. See you next time!

[The above is a transcript of this podcast.]

Similar Posts