NSA Information Stream Analysis
Purpose
The purpose of this use case of the IQeXplore platform is to identify threats (or more generally information patterns) across multiple information sources. It is relatively easy to tag a single document as potentially identifying a threat. It is much harder sifting through many sources of information, identifying fact patterns, connecting-the-dots and identifying specific threats.
The specific case is for the NSA/UCDMO is to identify inter-agency data network threats. IQeXplore is the glue of a system that monitors message traffic, pulling together groups of messages that may be linked to known threats.
From interagency message traffic groups of messages are collected into weblogs (collections of documents), which are then processed on the system. If the facts, events and entity concepts extracted out these weblogs can be matched against known patterns then a potential threat is tagged and the specific sub-elements of the weblog are collected and forwarded for further analysis.
Overview
Weblog document streams are semantically analyzed using natural language processing (NLP) for entities, facts and events and their relationships. All the document attributes are captured and all the metadata possible is extracted. Entities, events and facts are extracted based upon contextual analysis, not just keyword searching and they are categorized and relevancy values are assigned to them based upon context. More importantly relationship between the entities are identified where possible.
For threat assessments, ontology driven NLP engines are used to extract facts, relationships and entities in domains of interest. An ontology is a formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts. The IQeXplore platform is very modular and can be configured for many different NLP engines, drive by different ontologies. While the diagram below is unstructured textual data, integrating with structured data is an easy process, see the technology section below.

Figure 1
Semantic relationships simply relate one entity to another. The example below includes a person known to be a threat. The problem in this example is finding people and other organizations associated with this person who may also be of interest or a threat.

Figure 2
One message, email, web page, etc. will generate a number of entities and a handful of discovered facts and events. With a weblog, the collection of documents can generate a large volume of discovered entities, facts and events. The task then becomes to see if these concepts and facts are part of a threat.



… n documents in web log
Figure 2, above is a very simplified case with a collection of concepts relating to a known person who is a threat. The only piece of information previously know is that Company A is owned by Person C, the known threat. The other entities and relationships are pulled out of the set of documents, the weblog. With a domain ontology and using semantic web reasoner technologies the collection of extracted entities and relationships can be put together to create the remainder of the graph in the figure. This in effect connects facts across documents -- connecting the entities by their discovered relationships. The next step is to analyze these collections of concepts and relationship to see if they match patterns in an ontology design to model threat patterns.
The IQeXplore platform allows this type of content processing and reasoning – in an end-to-end system. It keeps track of all the extracted entities and facts, and once a threat is identified it can generate a list of all the documents, identifying the pertinent facts, their occurrences, bundle them and send them for further review or processing. You can actually use the output of this system as the input for another. Thus filtering and distilling tremendous amounts of data into actionable intelligence.
Technology Data Flow
The following figure diagrams the process described above. The ontologies for the NLP and semantic reasoning are provided by the NSA. The IQeXplore platform provides an end-to-end data processing platform. Then software is written to glue the discovered threats to know threat databases.
Also notice that the IQeXplore platform has a defined API where applications can easily be written to search and otherwise analyze the collections of documents processed in a wide variety of ways -- through the semantic search, text analytics and API's shown to the right.