Reverse Maximum Inner Product Search

Summary of the paper “Reverse Maximum Inner Product Search: How to efficiently find users who would like to buy my item?” of Recsys 2021

TL;DL;

I modified proofs and procedures for more clear self-understanding.

Paper Link

https://arxiv.org/abs/2110.07131

Notations

$u_i$ in $Q$ (User vectors)
$p_j$ in $P$ (Item vectors)
$<a, b>$: dot product between $a$ and $b$.

Maximum Inner Product Search (MIPS)

Given a user $u_i \in Q$, Find $\text{Topk}(u_i)$ = $\text{argtopk}_{p_j \in P} <u_i, p_j>$

Reverse Inner Product Search (R-MIPS)

Given an item $q \in P$, Find a set of users $u_i$ such that $q \in \text{TopK}(u_i)$

Main claim

with simple preprocessing, those three questions can be answered in constant time:

Given query item $q$ is included in $\text{Topk}(u_i)$ of the user $u_i$
Given query item $q$ is not included in $\text{Topk}(u_i)$ of the user $u_i$
Given query item $q$ is not included in $\text{Topk}(u_i)$ of all users $u_i$ of some (not any) set of users $u_i$, or block $B$.

Constructing Block $B$

Perform Descending Sort $Q, P$ according to their L2 norm. i.e., $||u_i|| \geq ||u_j||$ if $i < j$.
appropriately partition user vectors $Q$. e.g., $Q = [u_1, u_2, …,u_6]$, then $B_1 = [u_1,u_2, u_3]$, and $B_2 = [u_4, u_5, u_6]$
define $L_i$ (it is easier to write in python here)

L_i = np.array(sorted(dot(u_i, P[:50, :]), ascending=False))

L(B) = np.min([L_i for u_i in B], axis=0)

$L_i$ is sorted values of dot products between user $u_i$ and item vectors with top-$k$ norms.

Claim 1. Given $u_i, q$, if $<u_i, q> \leq L_i[k]$, then $q \notin \text{TopK}(u_i)$

proof: $L_i[k]$ is dot product between $u_i$ and item $p$ such that $p$ is in top-$K$ ranking in norm. Thus $p$ cannot be higher than rank $k$ and dot product with $q$ is lower than dot product with $p$. Thus $q$ cannot be in $\text{TopK}(u_i)$

Claim 2. Given $u_i, q$, if $<u_i, q> \geq ||u_i|| || p_k||$, then $q \in \text{TopK}(u_i)$

proof: Let $p$ to be a true top-$k$ item. $||u_i|| || p|| \geq ||u_i|| || p_k|| \geq <u_i, q>$ holds. Thus $q$ must be in top-$k$ ranking

Claim 3. Given $q$, a block $B$ and $u_i$ is a first vector in $B$, $||u_i|| ||q|| \leq L(B)[k]$, then $q \notin \text{TopK}(u_i)$ for all $u_i \in B$

proof: $\max_{u_i \in B} <u_i, q> \leq ||u_i|| ||q|| \leq L(B)[k]$. Then by Claim 1, it holds

Procedure

	given item query vector q
	ret = {}
	for B in Blocks:
		if we can skip block B using Claim 3:
			continue;
		for u in B:
			if we can skip u using Claim 1:
				continue
			if u, q satisfy Claim 2:
				ret.add(u)
			else:
				let TopK(u) using exhaustie search;
				if q in TopK(u)
				ret.add(u)
	return ret

Note:

We can parallelize easily along with Blocks.
Worst Case bound is equal to Exhaustive Search
No theoretical bound is given

느낀점:

생각해보면 쓰이는 수학/프로그래밍 테크 기술이 난이도가 고등학교때 기하와 벡터 배웠을 때 딱 그 정도만 쓰는 것 같은데 아이디어가 진짜 좋은 것 같다.
개선 여지가 많은 것 같다. 특히, Block Construction 부분에서, 블록 내부의 벡터들의 순서나, 블록 사이의 관계 측면에서 뭔가 개선할 여지가 있을 것 같은데 하는 생각. 아이디어를 일부로 약간만 풀고 세부적인 테크닉들은 공개 안 한 것 같다. 저자한테 메일을 보내봤는데 관심 가져줘서 고맙고 말 할 수 없는 것들은 말 할 수 없다는 상투적인 얘기를 들었음.