HOME Blog Authors Administrator

 

NSA Information Stream Analysis

Purpose

The purpose of this use case of the IQeXplore platform is to identify threats (or more generally information patterns) across multiple information sources.  It is relatively easy to tag a single document as potentially identifying a threat.  It is much harder sifting through many sources of information, identifying fact patterns, connecting-the-dots and identifying specific threats.

The specific case is for the NSA/UCDMO is to identify inter-agency data network threats. IQeXplore is the glue of a system that monitors message traffic, pulling together groups of messages that may be linked to known threats.

From interagency message traffic groups of messages are collected into weblogs (collections of documents), which are then processed on the system. If the facts, events and entity concepts extracted out these weblogs can be matched against known patterns then a potential threat is tagged and the specific sub-elements of the weblog are collected and forwarded for further analysis.

Overview

Weblog document streams are semantically analyzed using natural language processing (NLP) for entities, facts and events and their relationships. All the document attributes are captured and all the metadata possible is extracted. Entities, events and facts are extracted based upon contextual analysis, not just keyword searching and they are categorized and relevancy values are assigned to them based upon context. More importantly relationship between the entities are identified where possible.

For threat assessments, ontology driven NLP engines are used to extract facts, relationships and entities in domains of interest.  An ontology is a formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts. The IQeXplore platform is very modular and can be configured for many different NLP engines, drive by different ontologies. While the diagram below is unstructured textual data, integrating with structured data is an easy process, see the technology section below.

image002.gif

Figure 1

Semantic relationships simply relate one entity to another. The example below includes a person known to be a threat. The problem in this example is finding people and other organizations associated with this person who may also be of interest or a threat.

image004.gif

Figure 2

One message, email, web page, etc. will generate a number of entities and a handful of discovered facts and events. With a weblog, the collection of documents can generate a large volume of discovered entities, facts and events. The task then becomes to see if these concepts and facts are part of a threat.

  image006.gif   image006.gif   image006.gif
 

 

… n documents in web log

 


Figure 2, above is a very simplified case with a collection of concepts relating to a known person who is a threat.  The only piece of information previously know is that Company A is owned by Person C, the known threat. The other entities and relationships are pulled out of the set of documents, the weblog. With a domain ontology and using semantic web reasoner technologies the collection of extracted entities and relationships can be put together to create the remainder of the graph in the figure. This in effect connects facts across documents -- connecting the entities by their discovered relationships.  The next step is to analyze these collections of concepts and relationship to see if they match patterns in an ontology design to model threat patterns.

The IQeXplore platform allows this type of content processing and reasoning – in an end-to-end system. It keeps track of all the extracted entities and facts, and once a threat is identified it can generate a list of all the documents, identifying the pertinent facts, their occurrences, bundle them and send them for further review or processing.  You can actually use the output of this system as the input for another. Thus filtering and distilling tremendous amounts of data into actionable intelligence.

 

 

Technology Data Flow

The following figure diagrams the process described above. The ontologies for the NLP and semantic reasoning are provided by the NSA. The IQeXplore platform provides an end-to-end data processing platform. Then software is written to glue the discovered threats to know threat databases.

image010.gif 

Also notice that the IQeXplore platform has a defined API where applications can easily be written to search and otherwise analyze the collections of documents processed in a wide variety of ways -- through the semantic search, text analytics and API's shown to the right.

 

AddThis Social Bookmark Button

The Problem

Keyword search is very effective for finding specific information and facts. However, for exploratory searches in which users need to research and learn, discover, and understand novel or complex topics, there is substantial room for improvement. This type of search is used to generate information and knowledge about topics not specifically defined and often evolving over time – such as learning about a company, trend, industry, technology or market. These types of searches often start without specific knowledge or even enough contextual information to form a specific useful query. Exploratory searches require research, browsing, connecting information and discovery of new and diverse information, which is time consuming.

Solution

Innovative Query’s IQeXplore product is an integrated web service to support and enhance complex exploratory research. This service can be deployed as an enterprise server license or as a SaaS. The application is totally browser based and is both powerful and easy to use. It integrates into users’ searches across space and time and allows them to capture relevant content from their existing search tools (Google, SharePoint, enterprise search, paid information services, Documentum, knowledge management systems, text files, etc.). IQeXplore automatically generates context analysis and linking of the information captured, storing it and correlating in its knowledgebase. From these discovered correlations visualization tools generate knowledge maps and trees to map relationships among researched information. Tools are provided for research and task management and team collaboration accelerate knowledge generation.

Business Need

Businesses are increasingly information intensive and innovation driven, putting increasing demands on knowledge workers and technologies to stay competitive. Information overload hinders the discovery of business intelligence information, as traditional tools often overload users with lists of often irrelevant information. Knowledge workers increasing work as teams, often spread all over the world and need to collaborate to perform their jobs and generate knowledge. There are interrelated problems that organizations and knowledge workers routinely struggle with in research and knowledge generation tasks. These are managing dispersed multidisciplinary teams of workers and improving researcher productivity with the increasing information overload we all face. 

As knowledge workers we all struggle to keep up with the ever increasing volumes of information, and information sources at our disposal. Executives, consultants and marketers in many industries are routinely tracking and evaluating industry trends and needs. They do primary market research by researching companies, industry trends, customer needs and reading research reports among other efforts. In product R&D efforts it is a constant effort to keep up with all the technical journals, web news, product announcements of competitors, customer feedback reports, product feature requests, new technologies, and industry companies. Biotechnology, information technology, materials technology, etc. researchers are constantly struggle to keep up with volumes of information available.

But, the biggest problem we face as knowledge workers is figuring out what information is valuable to our tasks, our businesses, now and in the future. There are few tools, other than brute effort in keeping up with this information and helping the knowledge worker relate and apply information to their tasks and understanding.

As the amount of information grows, productivity for knowledge workers tends to flatten out – we are in an era of exponential information growth. The ability of humans to process and use information is still the same – humans can only effectively hold and connect four pieces of information at a time. To help solve this problem we need tools, not just to organize, structure and search information, but to help people connect information in a usable and meaningful way. As about 80% of information is in unstructured sources, such as documents and web pages, information tools beyond document management and search are needed to help knowledge works to collect information from documents, do exploratory search, and find relationships and connect between information sources to create new knowledge and insight.

In the modern world globalization is increasing and innovation and rapid continuous change are major diving forces in many organizations. This is driving dispersed, even global, work forces where team work, work flow and collaboration are necessary for innovation and competitiveness. Discovering and creating new and relevant knowledge is the key to innovation and competitiveness. 

 

AddThis Social Bookmark Button