Entity Extraction
Entity extraction, also known as Natural Language Processing, is the process of automatic analysis of large volumes of unstructured or semi-structured data to detect entities, their descriptions, relationships and events. The result is a semi-structured relational index, aka triples, used in populating the RDF database that drives visualization, relationship analysis tools and semantic applications.
Innovative Query allows input from a variety of source types including all popular document types, web sites, RSS, blogs, e-mail, translated text (FBIS), technical documents, and transcribed audio. Our system will provide crawlers and connectors to make the input of data stores as easy as possible.
Events & Facts
IQeXplore make extensive use of relationships between entities. These facts and events allow making queries such as “All quotations from people who work for Google” in a news stream possible. This also provides unique methods for data navigation based on inferred relationships.
Parsing Entities with Meaning and Relationships
Entity extraction goes beyond shallow parsing and tagging of text to work out logical relations between sentence components. Regardless of the initial composition each semantically valuable phrase is translated into grammar structure. When performing event or relationship searches this arrangement is vital to finding the correct links between elements.
NLP Independence
Innovative Query is NLP Independent, meaning our platform will work with a variety of entity extraction vendors. This is purposely done so we can focus on semantic applications, visualization tools, and search analytics. In addition, this allows us to “plug and play” with the best statistical, lexical and grammatical methods available or a combination thereof to be optimized for a particular type of data. This ensures you can always leverage the best and most appropriate technology for your needs.
Disambiguation
Extracting the meaning from text is especially difficult when one word has many senses or definitions. For example there is a storage tank, water tank, gas tank, and M-1 tank. Or, such as river or financial bank, to run a mile or run a company. Through domain-specific machine learning and application of a domain specific ontology, we provide you the ability to make a decision about which is the most likely sense of a word in context.
Automatic or Custom Ontology (Lexicon or Taxonomy)
Out of the box IQI provided the ability to extract 30 categories of entities (people, company, technology, etc) and over 50 categories of event and facts (Person’s Title, Person’s Educations, Person’s Company, Company Product, Business Relationship, etc.). This provided the information analysis needed for most applications and many industries. However, if domain specific analysis is need we can provide the tools and expertise you need.
The performance of a system can vary widely according to the information type processed. New knowledge acquisition relies heavily on pattern-based rules and lexical forms. In recognition of this fact, we provide a breadth of customization and tuning options to optimize your results. With our partners, there are a few approaches for building a custom ontology to optimize your entity extraction process.
For automatic ontology development, we start by using a set of sample documents specific to the content of a domain. Using a grammatical processing method will result in a new set of terms that forms a lexicon that can subsequently be fed back into the entity extraction process for the new domain.
Alternatively, customizing ontology grammars directly is the method by which users can define new entity types, specify new events and relationships or perform advanced entity tagging for a domain. Grammar writing is aided by a development environment, designed specifically for this purpose.
Using either method will produce better results for domain specific applications and will involve some professional services and/or professional linguist support.