Web based Named Entity Recognition



Web-NER aims at extracting entities of interest from web pages. The scale, unstructuredness, and diversity of the web pose challenges to NER on the web-pages. Traditionally, rule based techniques like Wrapper Induction Systems have been used for this task but these techniques are site specific and not robust. We intend to use statistical learning based approaches. The rich HTML structure, that encloses the web content, provides strong visual and spatial cues, in addition to textual information. Further, entities on web-pages are often in spatial relationships. For instance, on web-pages describing products, the product titles are almost always found above the product images. A web-page represents a 2D layout of irregularly placed blocks of varying sizes. Capturing contextual interactions (spatial dependencies) between blocks on such a layout is a challenging task.  

In this project, our aim is to build a framework that will assist in entity extraction from web-pages by exploiting textual, visual and spatial properties. We concentrate mostly on entities composed of several sub-entities that are dispersed on a web-page. In our initial attempts, we have used CRFs and SVMs with simple textual, spatial and visual features. For our experiments, we found that SVMs perform better than CRFs






<< back

Labels : IT Dissertation Contents Page Example, IT Management Dissertation Topics, IT Dissertation Topics Ideas, IT Dissertation Topics Information Technology, IT Dissertation, IT Dissertation Examples, IT Dissertation Topics, IT Dissertation Ideas, Write Information Technology Dissertation, Information Technology Thesis Sample, Sample Dissertation Information Technology, Dissertation Report Information Technology, Dissertation Proposal On Information Technology, Dissertation Topics on Information Technology Law, Information Technology Thesis Ideas, Dissertation Health Information Technology

Copyright © Dissertationideas.co.uk 2012 through 2014