1) I thought using Loess was because he wanted to be able to modify the smoothing factor over time to match empirical data taken from earlier elections.
Perhaps, but it results in an overly-aggressive trend. Especially with such few data points. It might work for a daily tracker poll like USC but not for a once a month poll.
2) I think Nate is worried that if he combines polls, based on the differing methodology of each poll and a potential for an unaccounted for systemic bias, he would be mixing data points in the trend line that are fundamentally different? Though I would like to see a version of the model where he used the other factors to try to weight the various pollsters and just go with "if the polls are systemically biased, the model blows up, and it happens". I mean, he's used LOESS before, and IIRC, original PECOTA basically does what I suggested, which is try to normalize the values as close as they can, and then uses all of the points. So I guess I find it weird that he'd know to do that in 2003 but then not do it in this model unless there was some legitimate reasoning behind it.
Then if that's the case, don't do it. You can't build a trend line between a pollster's 2 or 3 polls. You can't say because something went from C+6 to C+3 there's a trend to build. This shows a complete misunderstanding of polling. Now, you can do that with other events, say maybe a sports event, but most certainly not polling. Because the state of the race could be unchanged (Hillary is up 4.5 during both polls) and you'r equally as likely to get C +6 as you are C+3. So you're building a trend out of nothing.
Besides, he already attempts to take out the bias in the polls. So you can build the trend line after taking out the bias. Basically, adjust the polls based on pollster ratings and house effects, then build a trend line off those adjustments.
Just think of the huge amounts of error he's introducing into his model with the trend line. Even if pollster X had C +10 and then gets c+2 a month later, this does not necessarily represent a change. What if the C+10 is an outright outlier and the truth was always near C+2? Now you just built a massive trend line that is indicating a massive switch towards Trump that does not exist and you're going to look foolish. Of course, this is exactly what did happen a couple months ago and I pointed it out and literally predicted the move back in the polls as a result (and was right).
But if Silver combined all the polls, and last month all the polls were around C +10 and now are around C+2, then yeah...there's a fucking trend. But among 2-4 total data points for most pollsters is an absolutely INSANE thing to do, mathematically speaking.
The other issue with the trend-lines, due to the nature of the aggressiveness, is that that itself makes no sense. They are almost always going to overstate what is going on. The trend-line is trying to predict the future polling but that also makes no sense because politics is polarized and voters don't move that much, only voter response rates do (this is not a universal statement but merely a statement of current politics).
Oddly enough, his model basically nailed the GOP primary perfectly - it was his qualitative commenting (aka pundit-ing) that messed it up. So I get why he's just letting the model speak for itself this time instead of trying to do the same kind of qualitative punditry that was during the GOP primary.
I don't understand this argument, here. For one, the model is different than the GE model (different inputs). For another, what was there to nail? Trump led all the polling, for the most part, since like December. Every model that uses polling nailed it...because all the polls said so.