If you're going to consider this possibility, you can't look at the top 10k. The top 100 would be considerably more relevant. Tier lists revolve around a character's theoretical limits at the highest level of play. They're not relevant in the Platinum range or even Diamond level.
Edit: Oh wait, I didn't read above and missed this post:
http://www.neogaf.com/forum/showpost.php?p=232030742&postcount=8402
So, basically, we can't really derive much from these in terms of actual character strength in terms of competitive performance at tournaments right now.
The top 100 would probably be a better estimate, except for the lower amounts of data. Here are the same results, but using the top 100 matchups, with the addition of the margin of error at a 95% level of confidence:
Code:
Rank Char Wins Matches Win%
1 balrog 1700 2976 57.1 +/- 1.8
2 mbison 1634 2982 54.8 +/- 1.8
3 necalli 830 1529 54.3 +/- 2.5
4 zangief 931 1736 53.6 +/- 2.3
5 laura 1503 2863 52.5 +/- 1.8
6 ibuki 1681 3273 51.4 +/- 1.7
7 cammy 1157 2262 51.1 +/- 2.1
8 urien 1011 1988 50.9 +/- 2.2
9 nash 453 891 50.8 +/- 3.3
10 rashid 1328 2620 50.7 +/- 1.9
11 guile 735 1466 50.1 +/- 2.6
12 karin 876 1751 50.0 +/- 2.3
13 birdie 1002 2015 49.7 +/- 2.2
14 dhalsim 884 1830 48.3 +/- 2.3
15 akuma 1260 2617 48.1 +/- 1.9
16 chunli 321 691 46.5 +/- 3.7
17 ken 576 1255 45.9 +/- 2.8
18 rmika 531 1162 45.7 +/- 2.9
19 fang 546 1195 45.7 +/- 2.8
20 juri 336 737 45.6 +/- 3.6
21 ryu 739 1678 44.0 +/- 2.4
22 alex 501 1184 42.3 +/- 2.8
23 vega 667 1607 41.5 +/- 2.4
24 kolin 55 206 26.7 +/- 6.0
For comparison, the margin of error was about 0.2-0.4 for the previous results. This mostly agrees with the previous results, though with some notable changes that cannot be explained simply by the uncertainty (e.g. Dhalsim).
However, on top of the increased uncertainty, I think that these results are much more likely to be skewed by individual performance. If only a few people play any given character, then their individual play-style and their individual performance will have a much bigger effect on the overall observed results.
I mean, numbers don't lie. These are the actual stats for SFV online. But again, we usually talk about high level play when discussing tier lists.
The problem is, I think, that we base our estimation of (high-level) tiers on the results of relatively few players. Therefore their individual strengths and weaknesses can skew our perception of the strengths of a given character, and thereby their tier. When somebody is considered the "best" player of a character, then that influences how people perceive that character.
I'm sorta interested in approximating the "true" tier of these characters, though that is of course just a hypothetical, which is why I think that we need more data to wash out individual performance. But of course, as you allude to, when including more data we increase the range of skill-levels considered.
Your point about match-ups being 3 dimensional is well taken, and I am personally happy that I don't have to balance fighting games.