We use this at https://languageroadmap.com, but for difficulty rankings between ...

We use this at https://languageroadmap.com, but for difficulty rankings between titles in the context of language learning using media. It does seem to work pretty well and we disclaim rankings with the confidence score.

As for the grandparent comment: recency bias, as pointed out by another commenter is a thing, as is the tediousness of doing a bunch of pairwise decisions. I think a happier medium is to have everyone fill in tier charts (with variable number of tiers) and build the pairwise rankings from that.