[해외 DS] “빌보드 HOT 100”, 이제는 인공지능으로 예측한다?

매일 쏟아지는 신곡들, 97% 정확도로 히트곡 예측하는 AI 등장 신경과학과 인공지능의 만남으로 이뤄낸 혁신 일각에선 적은 표본 수, 웨어러블 기기의 한계, 윤리성 문제 등 연구에 대한 지적도

10
pabii research

[해외DS]는 해외 유수의 데이터 사이언스 전문지들에서 전하는 업계 전문가들의 의견을 담았습니다. 저희 데이터 사이언스 경영 연구소 (MDSA R&D)에서 영어 원문 공개 조건으로 콘텐츠 제휴가 진행 중입니다.


매일 새로운 곡은 셀 수 없이 쏟아지는 가운데, 최근 어떤 곡이 이른바 ‘빌보드 차트’에 오르게 될지 예측하는 AI 알고리즘이 공개돼 세간의 주목을 받고 있다.

지난 6월 클레어몬트 대학교 신경경제대학원 폴 잭 교수가 발표한 ‘신경생리학 및 머신러닝을 활용한 정확한 히트곡 예측’에 따르면, 청취자의 뇌 활동 데이터 기반의 비선형 패턴매칭(non-linear pattern matching) 알고리즘을 활용해 노래가 히트할지 여부를 97%의 정확도로 예측할 수 있는 것으로 나타났다.

그러나 학계에선 해당 연구의 타당성에 대해 의문을 제기하는 분위기다. 웨어러블 기기로 측정된 데이터의 오차 가능성, 적은 표본 수 등으로 인해 연구 결론을 일반화할 수 없다는 지적이다.

‘신경예측’으로 높은 정확도의 예측 가능했다

기존 음악 산업에서 AI를 활용한 움직임이 없었던 것은 아니다. 스포티파이, 애플 뮤직 등 유수 스트리밍 플랫폼 기업들은 매일 쏟아져 나오는 신곡 중 히트곡을 선별하기 위해 청취 데이터 및 알고리즘을 활용해 왔으나, 정확도는 50%를 넘기지 못하는 등 고전을 면치 못했다. 이에 일각에선 “차라리 동전을 던져 맞추는 게 낫겠다”는 우스갯소리가 나오기도 했다.

그런데 이번 폴 잭 교수가 발표한 연구는 무려 97%의 정확도로 히트곡을 예측해 업계의 눈길이 쏠린다. 해당 연구는 노래의 템포, 장르 등의 고유한 특성이 아닌, 청취자의 뇌 데이터를 기반으로 AI를 훈련 시켰다는 점에서 여타 스트리밍 플랫폼들의 기존 시도들과는 차별된다.

해당 논문에서 연구진들이 강조한 단어는 ‘신경예측(neuroforecasting)’이다. 신경예측이란 사람의 현재 뇌 활동을 기반으로 미래 행동을 예측하는 신경과학 분야의 용어다. 이번 연구를 지휘한 폴 잭 교수는 “다양한 노래에 대한 청취자의 신경생리학적 반응을 측정했다”며 “이같은 청취자의 뇌 활동 데이터를 기반으로 전 세계 음악 선호 트렌드를 정확히 예측할 수 있었다”고 밝혔다.

기존 신경과학 연구와 차별되는 대목, ‘웨어러블 기기’ 통한 효율적인 뇌 활동 데이터 수집

이번 연구에서 특히 눈에 띄는 대목은, 실험 참가자들의 음악에 대한 신경생리학적 반응을 웨어러블 기기를 활용한 심박수로 측정했다는 점이다. 신경과학(neuroscience) 분야의 연구는 보통 뇌의 메커니즘을 자세하게 파악하기 위해 fMRI(자기공명영상) 또는 EEG(뇌파검사)를 활용하는 것이 일반적이다. 그러나 이같은 기술은 높은 비용이 든다는 점, 음악 청취로 인한 뇌 신호 변화를 감지하는데 몇 초간 지연이 발생하는 점 등의 단점이 있기 때문에 이를 모두 상쇄할 수 있는 웨어러블 기기를 활용했다는 게 폴 잭 교수의 설명이다.

그러나 신경과학자들 사이에선 폴 잭 교수의 이같은 파격적인 시도에 대부분 회의적인 것으로 분석된다. 그간 신경과학 분야의 연구들은 심박수 활동이 뇌 활동의 전부를 설명할 수 없다는 점, 웨어러블 기기의 측정 정확도에 대한 의구심 등으로 인해 절대다수가 fMRI, EEG를 통해 데이터를 수집했기 때문이다. 쉽게 말해 비싼 비용을 감내하고 뇌 과학 연구를 수행해 왔던 건 모두 이유가 있다는 게 신경과학자들의 논리다. 또한 이러한 부분을 모를 리 없는 신경과학 전문가 폴 잭 교수가 해당 연구를 감행한 부분과 관련해, 일각에선 폴 잭 교수와 해당 연구와 제휴를 맺은 신경생리학 플랫폼 기업인 머서 뉴로사이언스(Mercer Neuroscience) 사이에 일련의 금전적 이해관계가 있었던 것 아니냐는 의심도 증폭되고 있다.

한편 웨어러블 기기 활용에 대한 학계의 긍정적인 시각도 존재한다. fRMI 기계의 경우 뇌를 한 번 스캔하는 데만 45분에서 1시간이 걸리는 만큼, 오랜 시간 동안 실험 참가자가 음악에만 온전히 집중하기는 어렵다는 설명이다. 익명을 요구한 신경과학자 A씨는 “fRMI가 진행되는 추운 공간에 오랫동안 갇혀 있으면 평소의 음악을 듣는 방식과 상당히 달라지게 될 것”이라며 “이번 연구의 가치는 참가자들이 쉽게 접근할 수 있고, 저렴하게 이용할 수 있는 웨어러블 기기를 사용한다는 것”이라고 밝혔다.

‘신경과학’적 예측의 윤리적 문제와 한계

만약 잭 폴 교수의 연구가 학계의 인정을 받고 스트리밍 플랫폼 업계 전반에 도입되면, 알고리즘이 내가 좋아하는 음악을 기분과 상황에 맞게 자동으로 찾아주고, 심지어 스스로 작곡을 해볼 수 있는 등 흥미로운 일들을 기대해 볼 수 있다. 그러나 한편으로는 자신의 심박수, 호흡수를 포함한 일거수일투족이 플랫폼에 추적당하고 있다는 사실에 두려움이 느껴지기도 한다. 다시 말해 이러한 ‘마음을 읽는’ 알고리즘에 대해 사생활 침해 논란과 같은 윤리적 질문이 제기될 수 밖에 없다.

이에 기존 AI를 활용하는 스트리밍 플랫폼들은 개인 정보 수집에 대한 약관 동의 옵션을 마련해 선별적으로만 데이터를 수집해 오고 있다고 주장한다. 그러나 대부분의 일반인은 회원가입 또는 웹 사이트 방문할 때 올라오는 조그마한 약관 창을 제대로 읽지도 않고 ‘수락’ 버튼을 누른다. 또한 이미 많은 회사들이 암묵적으로 소비자의 행동 데이터에 대한 권한을 가지고 많은 분석을 수행하고 있는 게 작금의 현실이다.

한편 이번 연구는 33명의 비교적 적은 표본을 기반으로 수행됐다. 물론 앞서 살펴봤듯 연구진들은 ‘신경예측’이라는 키워드를 내세우며 적은 표본으로도 결과를 일반화하기 충분하다고 주장하지만, 여전히 많은 사람으로부터 비판을 피하기 합당한 이유로는 부족하다. 특히 해당 연구의 실험 참가자 전원은 대학생으로, 비교적 젊은 청취자들이 다수였다. 이에 따라 인종적, 세대적 다양성이 충분히 반영되지 않은 것 아니냐는 비판이 나온다.


Sophie Bushwick:  Last month, AI researchers claimed an impressive breakthrough. They published a paper showing that AI can predict, with 97 percent accuracy, if any song will be a hit. And it does this by measuring how the listener’s body responds to the music.

Lucy Tu: But it might be too soon to anoint AI as the next big talent scout for the music industry. I’m Lucy Tu, the 2023 AAAS Mass Media fellow for Scientific American.

Sophie Bushwick: I’m Sophie Bushwick, tech editor at Scientific American. You’re listening to Tech Quickly, the all-things-tech part of Scientific American’s Science Quickly podcast.

[Intro music]

Bushwick:  I thought the music industry has been using AI to create songs and analyze them for a while. So what’s so special about this new approach?

Tu: Great question. Streaming services and music industry companies have been relying heavily already on algorithms to try and predict hit songs. But they’ve focused primarily on characteristics like a song artist and genre, as well as the music itself. So — aspects like the lyrics or the tempo. But even with all of that data, the existing AI algorithms have only been able to correctly predict whether a song will be a hit or not less than 50% of the time. So you’re honestly better off flipping a coin.

Bushwick: Yeah, very random choice odds.

Tu: And so this new approach, it’s different for a few reasons. One being it’s near perfect accuracy, a 97% success rate is much, much higher than any approach we’ve seen before. And it’s also unique because the study claims to train its AI on the brain data of listeners rather than a song’s intrinsic features like it’s DanceAbility or it’s explicitness.

Bushwick: That sounds like science fiction, just like it’s AI reading your mind to predict if you like the song, but I can’t help but notice that it claims to use brain data. So what do you mean by that?

Tu: Yeah, great catch! So at face value, the researchers in this recent study, say they measured listeners’ neurophysiological response to different songs. And whether intentionally or not a lot of popular news outlets sort of picked up on the neuro part of neuro physiological response. And assume that meant the researchers directly tracked brain activity through an fMRI scan, or EEG recording, which they didn’t,

Bushwick: What did they use?

Tu: So what they did was they had these listeners, while they were listening to songs were a wearable device, sort of like an Apple Watch, or a Fitbit, something that can track your cardiac activity. So your your heart rate, for instance. And they collected this cardiac data, and use it as a proxy for brain activity by putting it through this commercial platform immersion neuroscience, which claims to be able to measure emotional resonance and attention by using cardiac data.

Bushwick: So they’re essentially they’re taking your heart rate and your blood flow, and then they’re translating it into a measure that they say indicates what’s going on in your brain.

Tu: Exactly. And this measure of what’s going on in your brain is called immersion. I talk to some researchers who were a little bit skeptical about the use of cardiac data as a proxy for neural response, especially because this measure of immersion that the researchers talk about, hasn’t really been discussed in by any other researchers in peer reviewed publication.

Bushwick: So it’s been studied by the people who work at the company that uses it, but not really anyone outside it.

Tu: Exactly

Bushwick: Gotcha.

Tu: And I will say also that the lead author of this most recent study, he has some financial ties to the commercial platform that was used in Mercer neuroscience. He’s the co founder of the company, and then also its chief immersion officer, which is another concern that some of the researchers I talked to raised.

Bushwick: So if immersion is such a controversial measure, then why don’t the scientists just stick someone into an MRI machine that would actually scan their brains because this has been done before. In 2011, researchers from Emory University put teenagers through an MRI machine to see how their brains reacted to music. And they did make somewhat accurate predictions of a song sales based on these brain scans. So why are the researchers in this study choosing it to do it with this other measure that hasn’t been proved in the same way?

Tu: I think the key here is the wearable device component that I talked about earlier. So that study that you mentioned, like you said, they put teenagers through an MRI machine. Well fMRI machines, they take a long time, 45 minutes to an hour just to get one scan of the brain. And also, pupils can be claustrophobic. It’s not comfortable to sit in an fMRI for an hour and listen to music.

Bushwick: It’s a long time.

Tu: Yeah, a long time to be confined in this cold chamber. I mean, you think that maybe it would influence the way you know, people listen to music if they’re stuck in this cold space for for that extended period, it’s also just impractical to put a bunch of people through an fMRI just to get a few brain scans, and then use that to train an AI algorithm to predict hit songs. So this study, what its value out is, is that participants use a wearable device, something easily accessible, something that can be super cheap. A lot of people already own wearable devices, like the ones used in this study,

Bushwick: I’m wearing one, yup

Tu: Me too!

Tu:Um, so the idea is that if we can actually predict hit songs, just with the data that’s given to us by a wearable device, like the heart rate, like the blood flow, we might be able to widely click data. So people have personalized music, movie, etc, or recommendations. It’s just a lot more accessible than the traditional brain scan approaches that have been done before.

Bushwick:But see, that actually does freak me out a little bit. Because music platforms like Spotify, they’re already collecting a lot of personal information about their users. So what would it mean for them to also be eavesdropping on your heart rate and your breathing rate? I mean, almost as if they’re trying to read your mind.

Tu:It’s kind of discomforting, honestly, don’t get me wrong, I would love in some ways, if my streaming services just automatically knew somehow what I wanted to listen to in that moment, you know, when I’m sad, they give me a playlist for heartbreak songs. And when I’m really happy, or, you know, in the car with friends that give me that carpool karaoke playlist. I love that on one hand, but the idea that they’re giving me these recommendations based on literally reading my mind is it raises a lot of ethical questions, which is something that also came up in quite a few of the conversations I had, with some researchers and experts in data privacy. I think one big question that I actually raised with the lead author of this study was, well, how do you actually envision this service being used? And he said, of course, we would go through the unnecessary data privacy channels, this would be an opt in service. So only people who explicitly say I accept Spotify reading my mind would have their minds read. And then I talked to another data privacy expert who countered and said, Well, how many of us actually read the terms and conditions before we accept it? I don’t know. Absolutely not.

Bushwick:Am I going to scroll through hundreds of pages of permissions? No, I usually just click OK.

Tu:And that’s what I’m saying. I think that these terms and conditions could tell me I’m signing away the rights of my firstborn child.

[laughter]

Tu:So the data privacy expert I spoke to said that that’s a huge consideration. We have to think of not just when we’re implementing this technology, but when we’re developing it. And so we have to think about questions of what this would mean in terms of educating consumers if we were to actually make this technology more accessible these AI algorithms.

Bushwick:So before we even start worrying about reading the terms and conditions and having our Fitbit spy on us and predict what songs you want to listen to, is this even ready? Is the technology even ready for that yet? Are there other steps that we would have to go through before it’s ready to roll out and larger than just a study sample size?

Tu:Absolutely. So one big limitation of this study is that it used a pretty small sample of I think, less than 30 people. The study does claim that even that small sample size is enough for them to do this process they call neuro forecasting, which is taking a small sample of data, a small pool of people and using the data from that small pool to make predictions about a much wider audience a much wider market. Not everyone’s fully convinced. Researchers who said they would love to see the findings from this study replicated not only to first confirm the validity of that, that measure, we talked about earlier immersion, the validity of using cardiac data as a proxy for brain activity. This pool of 30 was recruited through a university so they had a lot of younger listeners, my music preferences, and my mother’s music preferences are very, very different. I’m sure the authors even themselves note that they didn’t have a lot of racial and ethnic diversity. So they might not have captured the cultural nuances for instance, that might go into music preferences. So some other researchers I spoke to said they would love to see the findings from this study replicated with larger samples, perhaps more diverse samples, so they can verify that the preferences used in this study to predict hit songs are actually replicable with other groups that might have entirely different preferences when it comes to music and song listening.

Bushwick:Science Quickly is produced by Jeff DelViscio, Tulika Bose, Kelso Harper and Carin Leong. Our show is edited by Elah Feder and Alexa Lim. Our theme music was composed by Dominic Smith.

Tu:Don’t forget to subscribe to Science Quickly wherever you get your podcasts. For more in-depth science news and features, go to ScientificAmerican.com. And if you like the show, give us a rating or review!

Bushwick:For Scientific American’s Science Quickly, I’m Sophie Bushwick.

Similar Posts