Bucket features using text analysis (or by hand if engineers didn't really care about digestible analytic logs) and then run logistic regression against customer feature usage to define a very basic model. At this point, I've found models are better at predicting failure as opposed to success.
Take the most significant negative and positive buckets, along with the most highly used ones and do time series analysis (looking at things like time between feature usage in a workflow and find the averages for people who are reaching "success"), look for actions such as undo to indicate failure, and look for save actions to indicate a likelihood of intentional exploration. These can then be used as weighting factors for a given user session as you create a more refined model.
Other approaches such as k-means, etc. can be used to try and find specific user jobs which in some cases should be modeled separately as they have different intent.