《電子技術(shù)應(yīng)用》
您所在的位置:首頁 > 其他 > 設(shè)計(jì)應(yīng)用 > 融合電影流行性與觀影時(shí)間的協(xié)同過濾算法
融合電影流行性與觀影時(shí)間的協(xié)同過濾算法
網(wǎng)絡(luò)安全與數(shù)據(jù)治理
錢澤俊,劉潤(rùn)然
(杭州師范大學(xué)阿里巴巴商學(xué)院,浙江杭州311121)
摘要: 相似度評(píng)估作為協(xié)同過濾推薦算法的核心,盡管研究人員對(duì)其不斷改進(jìn),卻仍難以在各個(gè)維度上充分利用評(píng)價(jià)數(shù)據(jù)。針對(duì)這一挑戰(zhàn),首先以用戶與電影之間的相互影響方式作為切入點(diǎn),對(duì)二者間可能存在的自洽邏輯進(jìn)行探究,提出了電影流行度計(jì)算公式用于對(duì)電影進(jìn)行加權(quán);接著以用戶觀影時(shí)間作為研究對(duì)象,探究用戶觀影喜好的轉(zhuǎn)變與觀影時(shí)間順序之間的聯(lián)系,并結(jié)合肯德爾相關(guān)系數(shù)提出了觀影順序一致性度量公式;最后將以上研究?jī)?nèi)容與傳統(tǒng)相似度算法融合,并基于Netflix Prize數(shù)據(jù)集與豆瓣電影評(píng)價(jià)數(shù)據(jù)集對(duì)改進(jìn)后的相似度算法進(jìn)行驗(yàn)證。實(shí)驗(yàn)結(jié)果表明改進(jìn)后的相似度算法擁有更高的推薦準(zhǔn)確度。
中圖分類號(hào):TP3913文獻(xiàn)標(biāo)識(shí)碼:ADOI: 10.19358/j.issn.2097-1788.2024.02.009
引用格式:錢澤俊,劉潤(rùn)然.融合電影流行性與觀影時(shí)間的協(xié)同過濾算法[J].網(wǎng)絡(luò)安全與數(shù)據(jù)治理,2024,43(2):54-63.
Collaborative filtering algorithm combining movie popularity and viewing time
Qian Zejun,Liu Runran
(Alibaba Business School, Hangzhou Normal University, Hangzhou 311121, China)
Abstract: As the core of the collaborative filtering recommendation algorithm, similarity evaluation is still difficult to fully utilize evaluation data in all dimensions, despite researchers constantly improving it. In this paper, aiming at this challenge, the mutual influence between users and movies is taken as the starting point, the possible self consistent logic between the two is explored, and a formula called Movie Popularity Weight (MPW) calculation formula is proposed to calculate the weight of movies. Then, taking the viewing time of users as the research object, the relationship between the change of viewing preference and the viewing time sequence of users is explored, and combined with the theory of Kendall correlation coefficient, a formula called Consistency in Viewing Sequence (CVS) calculation formula is proposed. Finally, the traditional similarity algorithm is improved by using the above research content, and the improved similarity algorithm is validated by using two datasets, one is the Netflix Prize dataset, while the other one is built based on publicly available data from Douban.com called Douban Movie K5 dataset. The experimental result shows that the improved similarity algorithm has higher recommendation accuracy.
Key words : recommendation algorithm; collaborative filtering; similarity algorithm; movie popularity; viewing time

引言

推薦系統(tǒng)[1]是人們借助計(jì)算機(jī)系統(tǒng)的高計(jì)算能力,為解決用戶在面對(duì)信息過載時(shí)獲取有效信息的效率低下問題而設(shè)計(jì)的輔助系統(tǒng),其準(zhǔn)確性極大程度上依賴于所采用的推薦策略。在推薦系統(tǒng)的眾多策略中,“協(xié)同過濾”是其中廣泛使用的一種策略[2],它以用戶的興趣偏好作為推薦依據(jù),并假設(shè)每個(gè)用戶未來的行為更有可能與該用戶過去的行為類似。因此,以協(xié)同過濾策略為基礎(chǔ)的推薦系統(tǒng),會(huì)基于與目標(biāo)用戶相似的其他用戶對(duì)一些物品的評(píng)價(jià)來向目標(biāo)用戶推薦物品[3],具有良好的可解釋性。協(xié)同過濾策略的關(guān)鍵步驟是計(jì)算用戶間的相似度,但由于傳統(tǒng)的相似度算法很容易受到冷啟動(dòng)、數(shù)據(jù)稀疏性、時(shí)間衰變等問題的影響[4],因此許多研究人員對(duì)此進(jìn)行改進(jìn)并提出了一些新的相似性度量算法。在研究物品的權(quán)值計(jì)算方面,Leskovec[5]等人對(duì)Pearson相關(guān)系數(shù)算法的改進(jìn)考慮到評(píng)價(jià)的分布具有長(zhǎng)尾特征,即隨著時(shí)間的流逝,部分受歡迎的物品將會(huì)得到更多用戶的評(píng)價(jià),而一些不受歡迎的物品,它們得到的評(píng)價(jià)數(shù)量則一直非常有限。


作者信息:

錢澤俊,劉潤(rùn)然

(杭州師范大學(xué)阿里巴巴商學(xué)院,浙江杭州311121)


文章下載地址:http://ihrv.cn/resource/share/2000005903


weidian.jpg

此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載。