There are some improvements can still be made, e.g., better estimation of the value of remaining board, and faster stopping criteria for word searching, but the overall structure is pretty much set since you want the best solution. Switching to C++ and using some rudimentary data types (strings and arrays) should speed things up quite a bit (10x?). Another thing is to try make it parallel. That could be fun.
I think a better estimation of the value of the remaining board would be a wise approach. Basically scan the letters in the board and determine the longest possible word/words that can be built and compare that to the current best score.
Parallellizing the algorithm should improve speed and be fairly straightforward.
I'm not that interested in switching to C++. I've been coding in C++ professionally for a decade now, this exercise was primarily to force myself to use a new language by tackling a somewhat tricky problem. I'll probably try implementing it in D though.