About

The third NLP&DBpedia workshop is going to take place in conjunction with the 14th International Semantic Web Conference 2015.

Workshop date: Sunday, October 11th, 9am-5:30pm in room RBC 91 (downstairs).

Motivation

The central role of Wikipedia (and therefore DBpedia) for the creation of a Translingual Web has been recognized by the Strategic Research Agenda (cf. section 3.4, page 23) and most of the contributions of the Dagstuhl seminar on the Multilingual Semantic Web also stress the role of Wikipedia for Multilingualism. The previous editions of the NLP&DBpedia workshop also contribute to this understanding.

As more and more language-specific chapters of DBpedia are created (currently 14 language editions), DBpedia is becoming a driving factor for a Linguistic Linked Open Data cloud as well as localized LOD clouds with specialized domains (e.g. the Dutch windmill domain ontology created from http://nl.dbpedia.org or Japanese domain ontology of screws from http://ja.dbpedia.org/).

The data contained in Wikipedia and DBpedia have ideal properties for making them a controlled testbed for NLP. Wikipedia and DBpedia are multilingual and multi-domain, the communities maintaining these resource are very open and it is easy to join and contribute. The open licence allows data consumers to benefit from the content and many parts are collaboratively editable.  Especially, the data in DBpedia is widely used and disseminated throughout the Semantic Web.

With the foundation of the DBpedia Association and the frequent releases of the DBpedia+ Data Stack, this workshop hopes to channel contributions of the NLP research community into the data ecosystem of DBpedia and LOD, thus easing the use of interlinked language resources as well as increasing the performance of knowledge-based NLP approaches.

We envision the workshop to produce the following items:

  • an open call to the DBpedia data consumer community that will generate a wish list of data, which is to be generated from Wikipedia using NLP methods (for certain domains and application scenarios). This wish list will be broken down to tasks and benchmarks and as a result GOLD standard will be created
  • the benchmarks and test data created will be collected and published under an open licence for future evaluation (inspired by http://oaei.ontologymatching.org/ and http://archive.ics.uci.edu/ml/datasets.html)
  • strengthen the link between DBpedia and NLP communities that currently meet two times a year at DBpedia developers workshops.
    We also offer all authors the chance to contribute their data to the regular DBpedia releases in April and October

NLP4DBpedia

DBpedia has been around for quite a while, infusing the Web of Data with multi-domain data of decent quality. The data in DBpedia is, however, mostly extracted from Wikipedia infoboxes, while the remaining parts of Wikipedia are to a large extent not exploited for DBpedia. Here, NLP techniques may help improving DBpedia.

Extracting additional triples from the plain text information in Wikipedia, either unsupervised or using the existing triples as training information, could multiply the information in DBpedia, or help telling correct from incorrect information by finding supporting text passages. Furthermore, analyzing the semantics of other structures in Wikipedia, such as tables, lists, or categories, would help make DBpedia richer. Finally, since Wikipedia exists in more than 200 languages, we are particularly interested in seeing NLP approaches not only working for English, but also for other languages, in order to leverage the huge amount of knowledge captured in the different language editions.

NLP approaches enable also improving quality of DBpedia, especially by extracting content from sources other than Wikipedia that may validate the data in DBpedia.

DBpedia4NLP

On the other hand, NLP and information extraction techniques often involve various resources while processing texts from different domains. As high-quality annotated data is often too expensive and time-consuming to obtain, NLP researchers are increasingly looking to the Semantic Web for external structured sources to complement their datasets. Such resources can be gazetteers to aid a named entity recognition system or examples of relations between entities to bootstrap a relation finder. DBpedia can easily be utilised to assist NLP modules in a variety of tasks.

Leave a comment