Break the Lock-In: Carry Your Preferences Anywhere

Share This Post

When you start using a new video or music service, the first recommendations often miss the mark, and it can be surprisingly hard to settle in. This is the cold start problem. Traditionally, we treat it as something the provider should solve, while the user can only “keep using it and wait until the system learns.”

My paper “Solving the Cold Start Problem on One’s Own as an End User via Preference Transfer” flips that perspective. If the service does nothing, the user can solve it on their own. We do this without asking the target service to add any special import feature. We only rely on the actions any user can take anyway: give a small number of items a positive or negative rating. From that tiny set of ratings, I show how to break through cold start proactively.

The proposed method, Pretender (PREference Transfer by END usERs), is the algorithm that makes this possible. Intuitively, it “transplants” your preferences from a source service you have used for a long time (for example, Netflix) to a target service you just joined (for example, Hulu). Crucially, this is not about finding the same exact items and rating them again. Items rarely match across services. Instead, we transfer preference by matching distributions in a feature space.

Intuitive derivation of the algorithm

Suppose that on a movie site A (the source), your long-term history looks like this: you watched action movies 60% of the time, science fiction movies 30% of the time, and romance movies 10% of the time. Now you start using site B (the target), and you want B to recommend as well as A does.

You do not know what B’s recommendation model looks like, but the model learns from the pattern of items you rate. So if the set of items you rate on B has the same overall “shape” as your history on A, then B should be able to recognize your taste quickly. We will later see how we should measure the closeness of the history shapes.

There are several obstacles to solve this problem. Items do not necessarily overlap between A and B, so you cannot simply copy your history. Also, clicking hundreds of items is tedious. Even if a web agent automates the clicks, too many interactions take time and create unnecessary load.

Pretender addresses this by selecting only a small number of items (say 20) from the target service so that the average behavior of those few items matches the average behavior of your much larger source history as closely as possible.

Inputs and outputs

Let me restate what the user knows and what the user wants to achieve.

As input, the user has a set of items they rated on the source service, the feature vectors of those items, and binary feedback for each item (like or dislike). The target service also has its own set of items with feature vectors. The key constraint is that the user can only rate a small number of target items, denoted by \(K\). This is exactly what makes the problem meaningful. If \(K\) were unlimited, preference transfer would be trivial.

As output, Pretender produces a concrete instruction: which \(K\) target items you should rate, and whether you should give each one a positive or negative rating. A human can follow the instruction manually, or a web agent can execute it automatically.

What makes it hard

This problem has three unpleasant aspects.

First, the source and target services do not share items, so “copying the history” does not work.

Second, \(K\) must be small. This is not only about convenience. Even with automation, large \(K\) costs time and can stress the service.

Third, the target service is a black box. You do not know its model, its loss function, or its training procedure, so it is hard to predict how your injected ratings will affect recommendations.

Pretender is designed to sidestep the third issue head-on.

The core idea: match distributions

Pretender treats your labeled source history as an empirical distribution over labeled feature points, and it tries to construct a similar empirical distribution on the target side.

We denote the source distribution by \(\mu_S\) and the target distribution that we will create by \(\mu_T\). Here “distribution” means a discrete distribution over pairs \((x, y)\), where \(x\) is the item feature vector and \(y\) is the label (like or dislike). The goal becomes simple to state: make \(\mu_T\) close to \(\mu_S\) under a suitable distance.

We measure this discrepancy using an integral probability metric, and we focus on two concrete instantiations: maximum mean discrepancy (MMD) and the 1-Wasserstein distance.

Why does this formulation matter? Imagine that the target service trains some unknown model using some unknown loss, but the loss is not wildly sensitive to small changes in \((x, y)\) (for example, it satisfies a Lipschitz condition). Then, if a model achieves low training loss on \(\mu_T\), it cannot be arbitrarily bad on \(\mu_S\) as long as \(\mu_T\) is close to \(\mu_S\) in the corresponding metric. This is the key move: We do not need to know what the target model is. We can build an end-user guarantee around distribution matching.

This is also why this paper is intentionally “agnostic” to the provider. The only thing the user can control is which target items they rate and how they label them. Pretender pushes the entire problem into that control knob.

Algorithm description

Constructing \(\mu_T\) is a combinatorial optimization problem. The user cannot assign “0.2 of a like” to an item, and cannot edit item features. The user only chooses whether to rate an item, and whether that rating is positive or negative. Pretender bridges the gap by solving a continuous relaxation first, then rounding back to a feasible discrete action plan.

The flow is as follows.

First, for every target item, we create two labeled candidates: the candidate “rate this item positively” and the candidate “rate this item negatively.” In other words, from the same feature vector \(x\), I build both \((x, 1)\) and \((x, 0)\). This gives me a pool of labeled candidate points.

Second, we solve a continuous optimization problem that assigns a weight \(w\) to each labeled candidate. The weights form a probability distribution, so \(\sum_j w_j = 1\). We also enforce an upper bound \(w_j \le 1/K\). This constraint prevents the continuous solution from concentrating too much mass on a few candidates, which would later break the rounding step.

Third, we perform randomized rounding. For each candidate \(j\), we include it independently with probability \(K w_j\). If the candidate is included, it gets weight \(1/K\); if it is excluded, it gets weight 0. The key property is that the expected rounded weight equals the original weight. In expectation, this preserves the continuous distribution we optimized for.

Finally, the number of selected items might not be exactly \(K\), so I postprocess: if we selected fewer than \(K\), we add a few items greedily; if we selected more than \(K\), we remove a few items greedily. This introduces extra error, but the design keeps that error controlled.

A good way to view Pretender is as a bridge between two worlds: We use continuous optimization to shape a target distribution, and we use probabilistic rounding to convert that solution into the discrete clicks a user can actually perform.

Why the maximum mean discrepancy version is easy to optimize

When we use maximum mean discrepancy as the distance, the objective can be written as a quadratic form involving kernel matrices. As a result, the continuous relaxation becomes a convex quadratic program. Because the problem is convex, we do not worry about local minima, and a standard method such as the Frank-Wolfe algorithm works cleanly.

In the analysis, we separate three sources of error: optimization error in the continuous solver, error from randomized rounding, and error from forcing the final set size to be exactly \(K\). We then combine them to show that the final distribution distance is only slightly worse than the best possible \(K\)-item choice, and that this gap shrinks as \(K\) grows.

What the experiments show

In experiments, I create two “virtual services” from real datasets such as MovieLens, Last.fm, and Amazon reviews. This simulates a user who has ratings on the source but not on the target. I evaluate success by measuring the distance between the source distribution and the target distribution that Pretender constructs. Smaller distance means better transfer.

Two empirical points are especially important.

First, Pretender outperforms a naive nearest-neighbor greedy baseline. Greedy selection can look appealing because it chooses items close to the source history, but it often collapses diversity and fails to match the full distribution. Distribution matching requires balancing similarity and coverage, not just picking the closest points.

Second, increasing \(K\) does not necessarily improve the best achievable value. In a combinatorial setting where you must pick exactly \(K\) items, forcing \(K\) to be larger can make the optimum worse, because you are compelled to include extra items that do not fit the source distribution. In practice, this suggests a natural operational trick: try several values \(K’ \in \{1, 2, 4, 8, \ldots\}\) in parallel and keep the one that yields the smallest distance.

Discussion

The most important aspect of this work is that it gives users an algorithmic way to counter platform lock-in.

Services like Netflix or Hulu have an incentive to keep users inside their own ecosystem. Once you build up a preference history, switching becomes painful because the new service starts cold. The cold start problem acts as a friction that discourages multi-homing and migration. Pretender can neutralize this lock-in effect by letting the user carry over preference information without any official import function.

If such user-side transfer becomes common, users can switch services more casually, and competition between platforms can become healthier. This is a rare case where a user-side algorithm has the potential to reshape incentives.

At the same time, Pretender does not have to be antagonistic. A target service loses the most when new users churn because they cannot find anything they like. Pretender users effectively pay their own compute and effort to teach the service what they like. From the provider’s perspective, that is almost too convenient. Pretender users might even be high-retention users by construction. In the future, it could be rational for a service to offer a Pretender-friendly interface, for example, a page that makes it easy to batch-like or batch-dislike selected items. If one service blocks such behavior while a competitor welcomes it, users may drift toward the welcoming service. Over time, allowing external preference injection could become a stable competitive outcome.

Of course, the ideal world is that platforms collaborate on a clean preference-transfer standard. In reality, cooperation between platforms is slow and politically difficult. Pretender gives a pragmatic alternative: the user can act unilaterally. If that option spreads, it can also pressure platforms by making “refusal to cooperate” less effective as a lock-in strategy.

Summary

Pretender reframes the cold start problem from “a provider-side problem” into “a user-side problem,” and it provides a concrete optimization method and theoretical guarantees for solving it with only \(K\) binary ratings on the target service.

By enabling users to transfer preferences effectively, Pretender can expand user choice, reduce lock-in, and potentially encourage healthier competition between platforms. I see it as one example of user empowerment in modern digital ecosystems.

Author Profile

If you found this article useful or interesting, I would be delighted if you could share your thoughts on social media.

New posts are announced on @joisino_en (Twitter), so please be sure to follow!

Ryoma Sato

Currently an Assistant Professor at the National Institute of Informatics, Japan.

Research Interest: Machine Learning and Data Mining.

Ph.D (Kyoto University).

View Profile

Share This Post

Break the Lock-In: Carry Your Preferences Anywhere

Intuitive derivation of the algorithm

Inputs and outputs

What makes it hard

The core idea: match distributions

Algorithm description

Why the maximum mean discrepancy version is easy to optimize

What the experiments show

Discussion

Summary

Author Profile

Thanks for reading!

Recommended next

Intuitive derivation of the algorithm

Inputs and outputs

What makes it hard

The core idea: match distributions

Algorithm description

Why the maximum mean discrepancy version is easy to optimize

What the experiments show

Discussion

Summary

Author Profile

Sign up below to receive our latest articles via email.

Related Posts