예측 함수: 사용자 u가 아이템 i를 구매할 확률 (또는 점수) $\hat{x}_{u,t,i}$
$\hat{x}_{u,t,i} = \langle U_u, V_i \rangle + \langle M{s_{u,t-1}}, N_i \rangle$
$f(i \mid u,j) = f(i \mid u) + f(i\mid j) + f(u,j) = \gamma_u^{U,I} \cdot \gamma_i^{I,U} + \gamma_i^{I,J}\cdot \gamma_j^{J,I} + \gamma_u^{U,J}\cdot \gamma_j^{J,U}$
목적 함수: BPR
$\sum_{(u, i, j) \in D_S} -\ln \sigma(\hat{x}{u,t,i} - \hat{x}{u,t,j}) + \lambda ||\Theta||^2$

Personalized Ranking Metric Embedding (PRME)

거리 기반 유사도 측정

Personalized Ranking Metric Embedding for Next New POI Recommendation

PRME

예측

$\sigma(f(i|u,j) - f(i^-|u,j))$

$f(i \mid u,j) = -d(\gamma_u,\gamma_i)^2 - d(\gamma_i,\gamma_j)^2 = -\lVert \gamma_u - \gamma_i\rVert^2_2-\lVert \gamma_i - \gamma_j\rVert^2_2$

GRU4Rec (2015)

세션 순서를 고려하여 다음에 클릭할 아이템을 예측하는 모델.
기존의 Matrix Factorization이나 Factorization Machine 기반 추천 방식이 세션의 순차성을 고려하지 못한다는 점을 극복함.

Session-based Recommendations with Recurrent Neural Networks

Session

유저가 서비스를 이용하는 짧은 기간 동안의 행동 기록
보통 클릭 로그나 페이지 방문 순서 등으로 구성됨

핵심 아이디어

Session sequence를 GRU 계열 RNN에 입력
마지막 hidden state를 이용해 다음에 클릭할 아이템의 확률을 예측

모델 구조

각 아이템을 one-hot 또는 embedding vector로 변환
GRU Layer를 통해 시퀀스를 처리
마지막 hidden state로 softmax 분포 계산

수식

GRU transition:

\mathbf{h}_t = \text{GRU}(\mathbf{x}_t, \mathbf{h}_{t-1})

Softmax scoring:

\hat{y}_t = \text{softmax}(W_o \cdot \mathbf{h}_t + b_o)

Loss (Ranking Loss - TOP1 loss 등):

L = \sum_{(s, i^+, i^-)} \sigma(\hat{y}_{i^-} - \hat{y}_{i^+}) + \sigma(\hat{y}_{i^-}^2)

학습 기법

병렬 미니 배치 학습
- Session 길이가 짧아 idle time이 많아질 수 있음
- 여러 세션을 병렬적으로 묶어서 학습하는 구조 제안
Negative Sampling 전략
- 아이템 수가 많아서 전부 학습하기 힘듦 → Negative Sampling
- 인기 많은 아이템 위주 샘플링: 상호작용 없는 인기 아이템은 관심 없는 아이템이라고 가정

Neural Attentive Recommendation Machine (NARM)

user의 취향을 global level(GRU)과 local level(GRU + attention)로 분리

Neural Attentive Session-based Recommendation

모델 구조

Global Encoder: Sequential Behavior Modeling

narm_global

각 state를 계산

\mathbf{h}_t = \text{GRU}(\mathbf{x}_t, \mathbf{h}_{t-1})

세션의 전체 표현은 마지막 state로 정의

\mathbf{c}_g = \mathbf{h}_T

Local Encoder: Attention-based Purpose Modeling

narm_local

각 시점의 attention score를 계산

e_t = \mathbf{q}^\top \cdot \sigma(\mathbf{W}_1 \mathbf{h}_T + \mathbf{W}_2 \mathbf{h}_t + \mathbf{b})

정규화

\alpha_t = \frac{\exp(e_t)}{\sum_{k=1}^{T} \exp(e_k)}

최종 local score

\mathbf{c}_l = \sum_{t=1}^{T} \alpha_t \mathbf{h}_t

전체 구조

narm_final

SASRec

Self-Attentive Sequential Recommendation

Transformer 구조 기반의 시퀀스 추천 모델.
사용자의 과거 클릭 시퀀스를 기반으로 다음 아이템을 예측.

구조
- Transformer encoder 기반
- item embedding + position embedding → self-attention
특징
- Dropout 적용 (embedding layer)
- Input/output embedding 공유
- Negative sampling + Binary cross entropy loss
장점
- RNN 기반 모델과 달리 병렬 처리 가능
- 긴 시퀀스에도 효과적으로 학습 가능

수식

Loss = - \sum_{s^u \in \mathcal{S}} \sum_{t \in \{1, \dots, n\}} \left[ \log(\sigma(r_{o_t, t})) + \sum_{j \notin s^u} \log(1 - \sigma(r_{j, t})) \right]

$r_{o_t, t}$ : 시퀀스의 정답 아이템의 score
$\sigma$ : 시그모이드 함수
$j \notin s^u$ : negative sample로 사용되는 아이템

→ MC, RNN, CNN 기반의 기존 모델보다 병렬성과 정확도 측면에서 뛰어남

BERT4Rec

Bidirectional Encoder Representations from Transformers for RecSys

BERT 구조를 추천 시스템에 적용한 모델.
과거 + 미래 정보를 모두 활용하여 시퀀스 예측 정확도 향상.

구조
- Transformer encoder 기반
- Masked item prediction 방식으로 학습
특징
- 양방향 self-attention 사용
- Input/output embedding 공유
- Cross-entropy loss 사용
장점
- 양방향 컨텍스트 활용
- 다양한 위치 정보 학습에 유리

S3-Rec (Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization)

Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization

BERT4Rec 구조에 자기지도 학습을 결합한 모델.
Side information과 mutual information을 활용하여 더 강한 표현 학습 가능.

구조
- BERT-like encoder + 4가지 auxiliary loss
4가지 self-supervised task
- Associated Attribute Prediction
- Masked Item Prediction
- Masked Attribute Prediction
- Segment Prediction
장점
- Label 없이도 표현 학습 강화
- 데이터 효율성 향상

CL4SRec (Contrastive Learning for Sequential Recommendation)

Contrastive Learning for Sequential Recommendation

시퀀스 추천에 contrastive learning 적용.
다양한 증강 기법을 통해 더 일반화된 시퀀스 표현 학습.

구조
- SASRec 기반 + contrastive loss 추가
augmentation 기법
- item masking
- cropping
- reordering
loss
- contrastive loss (positive pair는 유사하게, negative는 멀게)
장점
- label 없이 robust한 표현 학습
- 적은 데이터에서도 성능 향상

수식

전체 loss 함수:

L_{total} = L_{main} + \lambda L_{cl}

Main loss: 다음 아이템 예측을 위한 cross-entropy 기반 loss

L_{main}(s_u, t) = -\log \frac{\exp(s_{u,t}^T v_{t+1}^+)}{\exp(s_{u,t}^T v_{t+1}^+) + \sum_{v_{t+1}^- \in V^-} \exp(s_{u,t}^T v_{t+1}^-)}

Contrastive loss: 같은 example의 augment끼리는 유사하게, 다른 example과는 구분되도록 학습

L_{cl}(s_u^{a_i}, s_u^{a_j}) = -\log \frac{\exp(\text{sim}(s_u^{a_i}, s_u^{a_j}))}{\exp(\text{sim}(s_u^{a_i}, s_u^{a_j})) + \sum_{s^- \in \mathcal{S}^-} \exp(\text{sim}(s_u^{a_i}, s^-))}

$s_u^{a_i}, s_u^{a_j}$ : 같은 example에서 파생된 augmentation
$\text{sim}(\cdot)$ : cosine similarity 또는 inner product 기반 유사도