I'm working on a new way for ModBot to detect steam game names to make the game art and HOT tag detection a little better. Since this seems like a fairly generalized problem, I thought I'd share my process here.
Currently, ModBot uses the Steam search function to search for the exact name you've entered. If it finds no results, you're SOL. If it finds more than one result, it attempts to figure out which is the best result by removing punctuation, converting everything to lower case, and trying to find the closest match between the possible results. This is not bad, but it also requires a lot of searches to go out to Steam unnecessarily, and it's not super fast, and there are a few issues.
The new system will do this:
- First, I store the name, appid, and price of every single game on Steam by screen scraping Steam's search results one by one.
- I store the name in the following format:
- If there is a / in the name, take only what's before the /
- Replace all Roman numerals (XIV) with Arabic numerals (14)
- Convert game name to all lower case
- Replace the text string "two" with "2"
- Remove any of the following text from game names:
- Game names starting with "a " or containing the word "a"
- "a post nuclear role playing game"
- "the "
- "of "
- "and "
- "deluxe edition", "complete", "collector's edition", "ultimate edition", "gold edition", "game of the year edition", "extended edition"
- "tom clancy's", "sid meier's"
- If there's a ": " in the game name, store the full name and also store the portion behind the : as an "alternate name"
- Strip all non alphanumeric characters remaining, including spaces, so the result looks like "splintercellblacklist"
- Store all of the full names
- Store alternate names only when they do not conflict with full names or overlap with other alternate names (for example, series that have a bunch of subtitle games like Hitman: Blood Money versus Hitman: Absolution clearly can't guess what you mean if you just type Hitman)
When you submit a game, it will do the same conversion on what you've submitted and check against the local database to get the information. Only if it doesn't find a hit will it send the search out to Steam, and then it'll use the old system.
I'll be rolling it out over the next day or two