It really is. If they go the way of Hearthstone in terms of in-game viewing options (i.e. effectively none) it's going to be an uphill climb for Overwatch to become a big spectator esport.
The problem goes past spectating options, it's in the nature of the game. I will +1 everything in Milk's post.
Past that, consider the genres the game is based on: MOBA and FPS. To appreciate what's going on in a MOBA you need an encompassing view of the action. A spectator needs to be able to understand what's going on in the scheme of the game, but also be able to appreciate how individuals are moving within it. Luckily this is the same view the player sees, so you get to view all the mechanical skill from the same place.
Watching FPS games is harder in general, you need that first person view to really see the skill of the player, and that more than anything is what makes watching FPS compelling. It's great to see team play and strategy come together, but seeing those crazy shots and fast aim is something normal players don't get to experience on their own.
Putting both of these things together makes Overwatch an interesting concept, but how can you begin to watch it effectively? An overhead view allows you to see where the teams are and what they're doing, but you lose the chance to see the technical prowess of the players. Watching this game in first person without just staying on a single camera is far too hectic because of the pacing combined with an immense amount of visual noise.
Resolving this in an effective manner will be very challenging, and considering this will be coming from the same team that has denied the game a minimap as a "design decision", I have no hope whatsoever.
EDIT: responding
What you just said I've been saying about Counter Strike since it blew up. The difference between CS and OW to me as a spectator is that OW still has a sense of "big plays" happening the way they do in a MOBA. CS is a lot of positioning and person x killed person y and hopefully the camera was spectating it, but maybe it wasn't. In Overwatch when shit goes down shit actually goes down.
Responding point by point.
1) Comparing the visuals of CS to OW is night and day when you're trying to see things going on.
2) No rest periods in CS? In between every round? Couple minute break between halves? Each team tactical pausing once per game between rounds?
3) The pace of counterstrike allows casters and camera to anticipate high impact moments much more easily than in overwatch. Some will of course be missed, and maps are much less railroaded which makes things a little harder, but the pace really makes things easier to follow.
4) I'm assuming you mean "shit going down" meaning a bunch of ultimates going off. That's exciting and all but watching it from a first person view gets really messy, and without a top down or a mini map you don't get a real appreciation of the team's position or tactical approach. Shit goes down in CS as well, teams tend to use utility grenades, flash/smoke, and then explode on to the site. You can make a similar argument about the appreciation of positioning, but things don't move as fast and you get a full screen map and mini map which can be watched to see how people are going in. The set up normally takes much longer as well.