Web Mining Framework

Ontotext’s Web Mining Framework (WMF) is a comprehensive, efficient web intelligence and web search platform. 

We can build within 6 months a vertical market intelligence database for a G7 country. We already did it for jobs in the UK, cars in the USA, hotel rates in Europe. Or to collect and integrate 5 types of data about food from 20+ sources.
A Single Platform has all the tools you need to Get the Data:
  • Web crawling of full web pages
  • Collecting and processing HTML, XML, RDF and other formats
  • Focused crawling of specific sections or selected information from web pages
  • Screen scraping of structured online data with high precision (e.g. job boards)
  • Text mining and normalization
  • Data extraction, transformation, merging and de-duplication
  • Extensions to the Ontotext’s KIM platform for semantic annotation and search
  • Extensions to Ontotext’s OWLIM triple store for storage and inference of structured data
  • Configurable and extendable workflow management

A Serious Platform for Information Professionals:

  • A platform optimized to support large volumes of data
  • Performs independent, continuous data collection 24/7
  • Includes on-board NLP techniques based on GATE’s comprehensive framework
  • Provides configurable post-processing options for data normalization and integration
  • Offers a good balance of options for domain-specific and broad topic tasks

Solid Technology for Full Lifecycle Industrial Applications

Ontotext’s Web Mining Framework is a data collection work horse built on a solid workflow engine that supports advanced data handling features:

  • Distributed processing with advanced task schedule management
  • Detailed monitoring and reporting  of performance statistics
  • Extendable task type definition and post-processing configuration options
  • Coverage of the whole life cycle of building, executing, monitoring and maintenance of web mining components

Show-Case Projects

WMF-based solutions are at the core of several important large scale industrial deployments in different domains, such as: recruitment, vehicle trading, recipe collection, and hotel bookings.

Our SemTech 2011 presentation Baking with Data: from Jobs to Cars to Food shows how the Web Mining Framework has been doing the rounds in diverse application areas, making an onslaught of semantic technologies in the business world