00530043.pdf

Computational Intelligence Solutions for Homeland
Security
Enrico Appiani and Giuseppe Buslacchi
Elsag Datamat spa, via Puccini 2,
16154 Genova, Italy
{Enrico.Appiani,Giuseppe.Buslacchi}@elsagdatamat.com
Abstract. On the basis of consolidated requirements from international Polices, Elsag Datamat
has developed an integrated tool suite, supporting all the main Homeland Security activities like
operations, emergency response, investigation and intelligence analysis. The last support covers
the whole “Intelligence Cycle” along its main phases and integrates a wide variety of automatic
and semi-automatic tools, coming from both original company developments and from the
market (COTS), in a homogeneous framework. Support to Analysis phase, most challenging
and computing-intensive, makes use of Classification techniques, Clustering techniques, Novelty Detection and other sophisticated algorithms. An innovative and promising use of Clustering and Novelty Detection, supporting the analysis of “information behavior”, can be very
useful to the analysts in identifying relevant subjects, monitoring their evolution and detecting
new information events who may deserve attention in the monitored scenario.
1 Introduction
Modern Law Enforcement Agencies experiment challenging and contrasting needs: on
one side, the Homeland Security mission has become more and more complex, due to
many national and international factors such as stronger crime organization, asymmetric threats, the complexity of civil and economic life, the criticality of infrastructures,
and the rapidly growing information environment; on the other side, the absolute value
of public security is not translated in large resource availability for such Agencies,
which must cope with similar problems as business organizations, namely to conjugate
the results with the search of internal efficiency, clarity of roles, careful programming
and strategic resource allocation.
Strategic programming for security, however, is not a function of business, but
rather of the evolution of security threats, whose prevision and prevention capability
has a double target: externally to an Agency, improving the coordination and the
public image to the citizen, for better enforcing everyone’s cooperation to civil security; internally, improving the communication between corps and departments, the
individual motivation through better assignation of roles and missions, and ultimately
the efficiency of Law Enforcement operations.
Joining good management of resources, operations, prevention and decisions translates into the need of mastering internal and external information with an integrated
approach, in which different tools cooperate to a common, efficient and accurate
information flow, from internal management to external intelligence, from resource
allocation to strategic security decisions.
E. Corchado et al. (Eds.): CISIS 2008, ASC 53, pp. 43–52, 2009.
springerlink.com
© Springer-Verlag Berlin Heidelberg 2009
44
E. Appiani and G. Buslacchi
The rest of the paper describes an integrated framework trying to enforce the above
principles, called Law Enforcement Agency Framework (LEAF) by Elsag Datamat,
whose tools are in use by National Police and Carabinieri, and still under development for both product improvement and new shipments to national and foreign Agencies. Rather than technical details and individual technologies, this work tries to
emphasize their flexible integration, in particular for intelligence and decision support. Next sections focus on the following topics: LEAF architecture and functions,
matching the needs of Law Enforcement Agencies; the support to Intelligence and
Investigations, employing a suite of commercial and leading edge technologies; the
role of Clustering and Semantic technology combination in looking for elements of
known or novel scenarios in large, unstructured and noisy document bases; and some
conclusions with future perspectives.
2
Needs and Solutions for Integrated Homeland Security Support
Polices of advanced countries have developed or purchased their own IT support to
manage their operations and administration. US Polices, facing their multiplicity
(more than one Agency for each State), have focused on common standards for Record Management System (RMS) [1], in order to exchange and analyze data of Federal importance, such as criminal records. European Polices generally have their own
IT support and are improving their international exchange and integration, also thanks
to the Commission effort for a common European Security and Defense Policy [3],
including common developments in the FP7 Research Program and Europol [2], offering data and services for criminal intelligence.
At the opposite end, other Law Enforcement Agencies are building or completely
replacing their IT support, motivated by raising security challenges and various internal needs, such as improving their organizations, achieving more accurate border
control and fighting international crime traffics more effectively.
Elsag Datamat’s LEAF aims at providing an answer to both Polices just improving
their IT support and Polices requiring complete solutions, from base IT infrastructure
to top-level decision support. LEAF architecture includes the following main functions, from low to high level information, as illustrated in Fig. 1:
•
•
•
•
•
Infrastructure – IT Information Equipments, Sensors, Network, Applications and Management;
Administration – Enterprise Resource Planning (ERP) for Agency personnel and other resources;
Operations – support to Law Enforcement daily activities through recording all relevant events, decisions and document;
Emergency – facing and resolving security compromising facts, with
real-time and efficient resource scheduling, with possible escalation to
crises;
Intelligence – support to crime prevention, investigation and security
strategy, through a suite of tools to acquire, process, analyze and disseminate useful information,
Computational Intelligence Solutions for Homeland Security
45
Fig. 1. LEAF Functional and Layered Architecture
We can now have a closer look to each main function, just to recall which concrete
functionalities lay behind our common idea of Law Enforcement.
2.1 Infrastructure
The IT infrastructure of a Police can host applications of similar complexity to business ones, but with more critical requirements of security, reliability, geographical
articulation and communication with many fixed and mobile users. This requires the
capability to perform most activities for managing complex distributed IT systems:
• Infrastructure management
• Service management
• Geographic Information System Common Geo-processing Service for the other applications in the system
2.2 Administration
This has the same requirements of personnel, resource and budget administration of
multi-site companies, with stronger needs for supporting mission continuity.
• Enterprise Resource Planning (human resources, materials, vehicles, infrastructures, budget, procurement, etc.)
2.3 Operations
The Operations Support System is based on RMS implementation, partly inspired to
the American standard, recording all actors (such as people, vehicles and other objects), events (such as incidents, accidents, field interviews), activities (such as arrest,
booking, wants and warrants) and documents (such as passports and weapon licenses)
providing a common information ground to everyday tasks and to the operations of
the upper level (emergency and intelligence). In other words, this is the fundamental
database of LEAF, whose data and services can be split as follows:
• Operational activity (events and actors)
• Judicial activity (support to justice)
• Administrative activity (support to security administration)
46
E. Appiani and G. Buslacchi
2.4 Emergencies
This is the core Police activity for reaction to security-related emergencies, whose
severity and implications can be very much different (e.g. from small robberies to
large terrorist attacks). An emergency alarm can be triggered in some different ways:
by the Police itself during surveillance and patrolling activity; by sensor-triggered
automatic alarms; by citizen directly signaling events to Agents; and by citizen calling
a security emergency telephone number, such as 112 or 113 in Italy. IT support to
organize a proper reaction is crucial, for saving time and choosing the most appropriate means.
• Call center and Communications
• Emergency and Resource Management
2.5 Intelligence
This is the core support to prevention (detecting threats before they are put into action) and investigation (detecting the authors and the precise modalities of committed
crimes); besides, it provides statistical and analytical data for understanding the evolution of crimes and takes strategic decisions for next Law Enforcement missions.
More accurate description is demanded to the next section.
3
Intelligence and Investigations
Nowadays threats are asymmetric, international, aiming to strike more than to win,
and often moved by individuals or small groups. Maintenance of Homeland Security
requires, much more than before, careful monitoring of every information sources
through IT support, with a cooperating and distributed approach, in order to perform
the classical Intelligence cycle on two basic tasks:
• Pursuing Intelligence targets – performing research on specific military or civil
targets, in order to achieve timely and accurate answers, moreover to prevent specific threats before they are realized;
• Monitoring threats – listening to Open, Specialized and Private Sources, to capture and isolate security sensitive information possibly revealing new threats, in
order to generate alarms and react with mode detailed investigation and careful
prevention.
In addition to Intelligence tasks,
• Investigation relies on the capability to collect relevant information on past events
and analyze it in order to discover links and details bringing to the complete situation picture;
• Crisis management comes from emergency escalation, but requires further capabilities to simulate the situation evolution, like intelligence, and understand what
has just happened, like investigations.
Computational Intelligence Solutions for Homeland Security
47
The main Intelligence support functions are definable as follows:
• Information source Analysis and Monitoring – the core information collection,
processing and analysis
• Investigation and Intelligence – the core processes
• Crisis, Main events and Emergencies Management – the critical reaction to large
events
• Strategies, Management, Direction and Decisions – understand and forecast the
overall picture
Some supporting technologies are the same across the supporting functions above. In
fact, every function involves a data processing flow, from sources to the final report,
which can have similar steps as other flows. This fact shows that understanding the
requirements, modeling the operational scenario and achieving proper integration of
the right tools, are much more important steps that just acquiring a set of technologies.
Another useful viewpoint to focus on the data processing flow is the classical Intelligence cycle, modeled with similar approach by different military (e.g. NATO rules
and practices for Open Source Analysis [4] and Allied Joint Intelligence [5]) and civil
institutions, whose main phases are:
• Management - Coordination and planning, including resource and task management, mission planning (strategy and actions to take for getting the desired information) and analysis strategy (approach to distil and analyze a situation from the
collected information). Employs the activity reports to evaluate results and possibly reschedule the plan.
• Collection - Gathering signals and raw data from any relevant source, acquiring
them in digital format suitable for next processing.
• Exploiting - Processing signals and raw data in order to become useful “information pieces” (people, objects, events, locations, text documents, etc.) which can be
labeled, used as indexes and put into relation.
• Processing - Processing information pieces in order to get their relations and aggregate meaning, transformed and filtered at light of the situation to be analyzed or
discovered. This is the most relevant support to human analysis, although in many
cases this is strongly based on analyst experience and intuition.
• Dissemination - This does not mean diffusion to a large public, but rather aggregating the analysis outcomes in suitable reports which can be read and exploited by
decision makers, with precise, useful and timely information.
The Intelligence process across phases and input data types can be represented by a
pyramidal architecture of contributing tools and technologies, represented as boxes in
fig. 2, not exhaustive and not necessarily related to the corresponding data types (below) and Intelligence steps (on the left). The diagram provides a closer look to the
functions supporting for the Intelligence phases, some of which are, or are becoming,
commercial products supporting in particular Business Intelligence, Open Source
analysis and the Semantic Web.
An example of industrial subsystem taking part in this vision is called IVAS,
namely Integrated Video Archiving System, capable to receive a large number of
radio/TV channels (up to 60 in current configurations), digitize them with both Web
48
E. Appiani and G. Buslacchi
Fig. 2. The LEAF Intelligence architecture for integration of supporting technologies
stream and high quality (DVD) data rates, store them in a disk buffer, allow the operators to browse the recorded channels and perform both manual and automatic indexing, synthesize commented emissions or clips, and archive or publish such selected
video streams for later retrieval. IVAS manages Collection and Exploiting phases
with Audio/Video data, and indirectly supports the later processing. Unattended indexing modules include Face Recognition, Audio Transcription, Tassonomic and
Semantic Recognition. IVAS implementations are currently working for the Italian
National Command Room of Carabinieri and for the Presidency of Republic.
In summary, this section has shown the LEAF component architecture and technologies to support Intelligence and Investigations. The Intelligence tasks look at
future scenarios, already known or partially unknown. Support to their analysis thus
requires a combination of explicit knowledge-based techniques and inductive, implicit
information extraction, as it being studied with the so called Hybrid Artificial Intelligence Systems (HAIS). An example of such combination is shown in the next
section.
Investigation tasks, instead, aim at reconstruct past scenarios, thus requiring the
capability to model them and look for their related information items through text and
multimedia mining on selected sources, also involving explicit knowledge processing,
ontology and conceptual networks.
4
Inductive Classification for Non-structured Information
Open sources are heterogeneous, unstructured, multilingual and often noisy, in the
sense of being possibly written with improper syntax, various mistakes, automatic
translation, OCR and other conversion techniques. Open sources to be monitored
include: press selections, broadcast channels, Web pages (often from variable and
short-life sites), blogs, forums, and emails. All them may be acquired in form of
Computational Intelligence Solutions for Homeland Security
49
documents of different formats and types, either organized in renewing streams (e.g.
forums, emails) or specific static information, at most updated over time.
In such a huge and heterogeneous document base, classical indexing and text mining techniques may fail in looking for and isolating relevant content, especially with
unknown target scenarios. Inductive technologies can be usefully exploited to characterize and classify a mix of information content and behavior, so as to classify sources
without explicit knowledge indexing, acknowledge them based on their style, discover recurring subjects and detect novelties, which may reveal new hidden messages
and possible new threats.
Inductive clustering of text documents is achieved with source filtering, feature extraction and clustering tools based on Support Vector Machines (SVM) [7], through a
list of features depending on both content (most used words, syntax, semantics) and
style (such as number of words, average sentence length, first and last words, appearance time or refresh frequency). Documents are clustered according to their vector
positions and distances, trying to optimize the cluster number by minimizing a distortion cost function, so as to achieve a good compromise between the compactness (not
high number of cluster with a few documents each) and representativeness (common
meaning and similarity among the clustered documents) of the obtained clusters.
Some clustering parameters can be tuned manually through document subsets. The
clustered documents are then partitioned in different folders whose name include the
most recurring words, excluding the “stop-words”, namely frequent words with low
semantic content, such as prepositions, articles and common verbs.
This way we can obtain a pseudo-classification of documents, expressed by the
common concepts associated to the resulting keywords of each cluster. The largest
experiment has been led upon about 13,000 documents of a press release taken from
Italian newspapers in 2006, made digital through scanning and OCR. The document
base was much noisy, with many words changed, abbreviated, concatenated with
others, or missed; analyzing this sample with classical text processing, if not semantic
analysis, would have been problematic indeed, since language technology is very
sensitive to syntactical correctness. Instead, with this clustering technique, most of the
about 30 clusters obtained had a true common meaning among the composing documents (for instance, criminality of specific types, economy, terrorist attacks, industry
news, culture, fiction, etc.), with more specific situations expressed by the resulting
keywords. Further, such keywords were almost correct, except for a few clusters
grouping so noisy documents that it would have been impossible to find some common sense. In practice, the document noise has been removed when asserting the
prevailing sense of the most representative clusters.
Content Analysis (CA) and Behavior Analysis (BA) can support each other in different ways. CA applied before BA can add content features to the clustering space.
Conversely, BA applied before CA can reduce the number of documents to be processed for content, by isolating relevant groups expressing a certain type of style,
source and/or conceptual keyword set. CA further contributes in the scenario interpretation by applying reasoning tools to inspect clustering results at the light of domain-specific knowledge. Analyzing cluster contents may help application-specific
ontologies discover unusual patterns in the observed domain. Conversely, novel information highlighted by BA might help dynamic ontologies to update their knowledge in a semi-automated way. Novelty Detection is obtained through the “outliers”,
50
E. Appiani and G. Buslacchi
namely documents staying at a certain relative distance from their cluster centers, thus
expressing a loose commonality with more central documents. Outliers from a certain
source, for instance an Internet forum, can reveal some odd style or content with respect to the other documents.
Dealing with Intelligence for Security, we can have two different operational solutions combining CA and BA, respectively supporting Prevention and Investigation
[6]. This is still subject of experiments, the major difficulty being to collect relevant
documents and some real, or at least realistic, scenario.
Prevention-mode operation is illustrated in Fig. 3, and proceeds as follows.
1) Every input document is searched for basic terms, context, and eventually key
concepts and relations among these (by using semantic networks and/or ontology), in
order to develop an understanding of the overall content of each document and its
relevance to the reference scenario.
2a) In the knowledge-base set-up phase, the group of semantically analyzed documents forming a training set, undergoes a clustering process, whose similarity metrics
is determined by both linguistic features (such as lexicon, syntax, style, etc.) and semantic information (concept similarity derived from the symbolic information tools).
2b) At run-time operation, each new document is matched with existing clusters;
outlier detection, together with a history of the categorization process, highlights
possibly interesting elements and subsets of documents bearing novel contents.
3) Since clustering tools are not able to extract mission-specific knowledge from
input information, ontology processing interprets the detected trends in the light of
possible criminal scenarios. If the available knowledge base cannot explain the extracted information adequately, the component may decide to bring an alert to the
analyst’s attention and possibly tag the related information for future use.
Content Analysis
Document(s)
1
Behaviour Analysis
Annotated
Docum.
Corpus
Set-up
2a
Ref.
clusters
Run-time
Ref.
Dynamic
Knowledge
3
Novelties
2b
Analyzed novel
scenario
Fig. 3. Functional CA-BA combined dataflow for prevention mode
Novelty detection is the core activity of this operation mode, and relies on the interaction between BA and CA to define a ‘normal-state scenario’, to be used for identifying interesting deviations. The combination between inductive clustering and
explicit knowledge extraction is promising in helping analysts to perform both gross
Computational Intelligence Solutions for Homeland Security
51
classification of large, unknown and noisy document bases, find promising content in
some clusters and hence refine the analysis through explicit knowledge processing.
This combination lies between the information exploitation and processing of the
intelligence cycle, as recalled in the previous section; in fact, it contributes both to
isolate meaningful information items and to analyze the overall results.
Content Analysis
Selected
corpus
2
Annotated
Docum.
Corpus
4
Relevant
Groups
1
Ref.
Investigation
Scenario
Behaviour Analysis
3
Structured
hypothesis
Search strategy for
missing information
Fig. 4. Functional CA-BA combined dataflow for Investigation mode
Investigation-mode operation is illustrated in Fig. 4 and proceeds as follows.
1) A reference criminal scenario is assumed as a basic hypothesis.
2) Alike prevention-mode operation, input documents are searched for to develop
an understanding of the overall content of each document and its relevance to the
reference scenario.
3) BA (re)groups the existing base of documents by embedding the scenariospecific conceptual similarity in the document- and cluster-distance criterion. The
result is a grouping of documents that indirectly takes into account the relevance of
the documents to the reference criminal scenario.
4) CA uses high-level knowledge describing the assumed scenario to verify the
consistency of BA. The output is a confirmation of the sought-for hypothesis or, more
likely, a structural description of the possibly partial match, which provides useful
directives for actively searching missing information, which ultimately serves to validate or disclaim the investigative assumption.
Elsag Datamat already uses CA with different tools within the LEAF component
for Investigation, Intelligence and Decision Support. The combination with BA is
being experimented in Italian document sets in order to set up a prototype for
Carabinieri (one of the Italian security forces), to be experimented on the real field
between 2009 and 2010. In parallel, multi-language CA-BA solutions are being studied for the needs of international Polices.
52
5
E. Appiani and G. Buslacchi
Conclusions
In this position paper we have described an industrial solution for integrated IT support to Law Enforcement Agencies, called LEAF, in line with the state of the art of
this domain, including some advanced computational intelligence functions to support
Intelligence and Investigations. The need for an integrated Law Enforcement support,
realized by the LEAF architecture, has been explained.
The key approach for LEAF computational intelligence is modularity and openness
to different tools, in order to realize the most suitable processing workflow for any
analysis needs. LEAF architecture just organizes such workflow to support, totally or
partially, the intelligence cycle phases: acquisition, exploitation, processing and dissemination, all coordinated by management. Acquisition and exploitation involve
multimedia data processing, while processing and dissemination work at conceptual
object level.
An innovative and promising approach to analysis of Open and Private Sources
combines Content and Behavior Analysis, this last exploring the application of |
clustering techniques to textual documents, usually subject to text and language processing. BA shows the potential to tolerate the high number, heterogeneity and information noise of large Open Sources, creating clusters whose most representative
keywords can express an underlying scenario, directly to the analysts’ attention or
with the help of knowledge models.
References
1. Law Enforcement Record Management Systems (RMS) – Standard functional specifications
by Law Enforcement Information Technology Standards Council (LEITSC) (updated 2006),
http://www.leitsc.org
2. Europol – mission, mandates, security objectives, http://www.europol.europa.eu
3. European Security and Defense Policy – Wikipedia article,
http://en.wikipedia.org/wiki/
European_Security_and_Defence_Policy
4. NATO Open Source Intelligence Handbook (November 2001), http://www.oss.net
5. NATO Allied Joint Intelligence, Counter Intelligence and Security Doctrine. Allied Joint
Publication (July 2003)
6. COBASIM Proposal for FP7 – Security Call 1 – Proposal no. 218012 (2007)
7. Jing, L., Ng, M.K., Zhexue Huang, J.: An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data. IEEE Transactions on knowledge and
data engineering 19(8) (August 2007)