Agile Web Mining at Bing

The web search industry is making great progress in transitioning from building tools for finding pages and sites to building tools that leverage and surface facts and knowledge. The local search space is founded on structured knowledge - the entity data that represents businesses and other things that necessarily have a location, and is a core piece of the knowledge space required for this future.

This data found through mining the web for information about local entities now helps to power a significant percentage of local search interactions in a number of countries around the world.

As computer scientists have been working on this system, they have come to think deeply about how to build systems for web mining, but also how to construct efficient developer workflows and how to add data management components to these systems to take advantage of human input when appropriate.


The core principles of Agile Web Mining are: optimize for developer productivity, optimize for data management and invest in low latency systems. So much of what we hear about in the industry currently revolves around very large data sets (big data) which often entail long processing times and high latency interactions. In contrast, few scientists tend to think of their data in a different way, where the size of data is relatively small (on the order of the size of a web site), but where there are many examples of these small data sets.

Name: Md. Amirul Islam

Department of Computer Science and Engineeringt