Web Data Mining Using Artificial Intelligence

Perhaps the most difficult challenge in finding relevant information from the web is sorting through the vast amount of information to extract only what you are looking for. Through the use of a web data mining artificial intelligence technique called regular expression scoring relevant information can be distinguished and processed.

Data Mining

Using PHP a server can acquire large amounts of web data through a web crawler or specific web page screen scraper. These text strings can be evaluated through the use of multiple regular expression (“regex”) scoring that calculates a total score based upon the count of each regex found and the score assigned to each regex.

Each regex can be assigned a positive or negative score. A regex can match patterns such as “candidate” or “candidates” followed by any number of letters or words with “strategy”, “strategies”, or “strategic”. A regex can also look for characters such as the dollar sign immediately followed by numbers, which can help in finding price information, for example. Using PHP, a script can locate a price and then compare it to an earlier price, and then store the new information or notify you only if the price has changed.

If the total regular expression score is above a threshold that you set, a PHP script can process the relevant information by storing the information or sending an email notification, for example.

An example regular expression scoring application is screening real estate classified ads to find properties listed as “for sale by owner”. Real estate agents often will contact a “for sale by owner” seller if they have a potential buyer interested in similar property in a the desired neighborhood. Such an application can automatically run every day or on-demand.

OTTStreamingVideo.net can provide customized web data mining services. Please contact us for a quote for your specific needs.

Comments are closed.