Computational Intelligence Solutions for Homeland Security Enrico Appiani and Giuseppe Buslacchi Elsag Datamat spa, via Puccini 2, 16154 Genova, Italy {Enrico.Appiani,Giuseppe.Buslacchi}@elsagdatamat.com Abstract. On the basis of consolidated requirements from international Polices, Elsag Datamat has developed an integrated tool suite, supporting all the main Homeland Security activities like operations, emergency response, investigation and intelligence analysis. The last support covers the whole “Intelligence Cycle” along its main phases and integrates a wide variety of automatic and semi-automatic tools, coming from both original company developments and from the market (COTS), in a homogeneous framework. Support to Analysis phase, most challenging and computing-intensive, makes use of Classification techniques, Clustering techniques, Novelty Detection and other sophisticated algorithms. An innovative and promising use of Clustering and Novelty Detection, supporting the analysis of “information behavior”, can be very useful to the analysts in identifying relevant subjects, monitoring their evolution and detecting new information events who may deserve attention in the monitored scenario. 1 Introduction Modern Law Enforcement Agencies experiment challenging and contrasting needs: on one side, the Homeland Security mission has become more and more complex, due to many national and international factors such as stronger crime organization, asymmetric threats, the complexity of civil and economic life, the criticality of infrastructures, and the rapidly growing information environment; on the other side, the absolute value of public security is not translated in large resource availability for such Agencies, which must cope with similar problems as business organizations, namely to conjugate the results with the search of internal efficiency, clarity of roles, careful programming and strategic resource allocation. Strategic programming for security, however, is not a function of business, but rather of the evolution of security threats, whose prevision and prevention capability has a double target: externally to an Agency, improving the coordination and the public image to the citizen, for better enforcing everyone’s cooperation to civil security; internally, improving the communication between corps and departments, the individual motivation through better assignation of roles and missions, and ultimately the efficiency of Law Enforcement operations. Joining good management of resources, operations, prevention and decisions translates into the need of mastering internal and external information with an integrated approach, in which different tools cooperate to a common, efficient and accurate information flow, from internal management to external intelligence, from resource allocation to strategic security decisions. E. Corchado et al. (Eds.): CISIS 2008, ASC 53, pp. 43–52, 2009. springerlink.com © Springer-Verlag Berlin Heidelberg 2009 44 E. Appiani and G. Buslacchi The rest of the paper describes an integrated framework trying to enforce the above principles, called Law Enforcement Agency Framework (LEAF) by Elsag Datamat, whose tools are in use by National Police and Carabinieri, and still under development for both product improvement and new shipments to national and foreign Agencies. Rather than technical details and individual technologies, this work tries to emphasize their flexible integration, in particular for intelligence and decision support. Next sections focus on the following topics: LEAF architecture and functions, matching the needs of Law Enforcement Agencies; the support to Intelligence and Investigations, employing a suite of commercial and leading edge technologies; the role of Clustering and Semantic technology combination in looking for elements of known or novel scenarios in large, unstructured and noisy document bases; and some conclusions with future perspectives. 2 Needs and Solutions for Integrated Homeland Security Support Polices of advanced countries have developed or purchased their own IT support to manage their operations and administration. US Polices, facing their multiplicity (more than one Agency for each State), have focused on common standards for Record Management System (RMS) [1], in order to exchange and analyze data of Federal importance, such as criminal records. European Polices generally have their own IT support and are improving their international exchange and integration, also thanks to the Commission effort for a common European Security and Defense Policy [3], including common developments in the FP7 Research Program and Europol [2], offering data and services for criminal intelligence. At the opposite end, other Law Enforcement Agencies are building or completely replacing their IT support, motivated by raising security challenges and various internal needs, such as improving their organizations, achieving more accurate border control and fighting international crime traffics more effectively. Elsag Datamat’s LEAF aims at providing an answer to both Polices just improving their IT support and Polices requiring complete solutions, from base IT infrastructure to top-level decision support. LEAF architecture includes the following main functions, from low to high level information, as illustrated in Fig. 1: • • • • • Infrastructure – IT Information Equipments, Sensors, Network, Applications and Management; Administration – Enterprise Resource Planning (ERP) for Agency personnel and other resources; Operations – support to Law Enforcement daily activities through recording all relevant events, decisions and document; Emergency – facing and resolving security compromising facts, with real-time and efficient resource scheduling, with possible escalation to crises; Intelligence – support to crime prevention, investigation and security strategy, through a suite of tools to acquire, process, analyze and disseminate useful information, Computational Intelligence Solutions for Homeland Security 45 Fig. 1. LEAF Functional and Layered Architecture We can now have a closer look to each main function, just to recall which concrete functionalities lay behind our common idea of Law Enforcement. 2.1 Infrastructure The IT infrastructure of a Police can host applications of similar complexity to business ones, but with more critical requirements of security, reliability, geographical articulation and communication with many fixed and mobile users. This requires the capability to perform most activities for managing complex distributed IT systems: • Infrastructure management • Service management • Geographic Information System Common Geo-processing Service for the other applications in the system 2.2 Administration This has the same requirements of personnel, resource and budget administration of multi-site companies, with stronger needs for supporting mission continuity. • Enterprise Resource Planning (human resources, materials, vehicles, infrastructures, budget, procurement, etc.) 2.3 Operations The Operations Support System is based on RMS implementation, partly inspired to the American standard, recording all actors (such as people, vehicles and other objects), events (such as incidents, accidents, field interviews), activities (such as arrest, booking, wants and warrants) and documents (such as passports and weapon licenses) providing a common information ground to everyday tasks and to the operations of the upper level (emergency and intelligence). In other words, this is the fundamental database of LEAF, whose data and services can be split as follows: • Operational activity (events and actors) • Judicial activity (support to justice) • Administrative activity (support to security administration) 46 E. Appiani and G. Buslacchi 2.4 Emergencies This is the core Police activity for reaction to security-related emergencies, whose severity and implications can be very much different (e.g. from small robberies to large terrorist attacks). An emergency alarm can be triggered in some different ways: by the Police itself during surveillance and patrolling activity; by sensor-triggered automatic alarms; by citizen directly signaling events to Agents; and by citizen calling a security emergency telephone number, such as 112 or 113 in Italy. IT support to organize a proper reaction is crucial, for saving time and choosing the most appropriate means. • Call center and Communications • Emergency and Resource Management 2.5 Intelligence This is the core support to prevention (detecting threats before they are put into action) and investigation (detecting the authors and the precise modalities of committed crimes); besides, it provides statistical and analytical data for understanding the evolution of crimes and takes strategic decisions for next Law Enforcement missions. More accurate description is demanded to the next section. 3 Intelligence and Investigations Nowadays threats are asymmetric, international, aiming to strike more than to win, and often moved by individuals or small groups. Maintenance of Homeland Security requires, much more than before, careful monitoring of every information sources through IT support, with a cooperating and distributed approach, in order to perform the classical Intelligence cycle on two basic tasks: • Pursuing Intelligence targets – performing research on specific military or civil targets, in order to achieve timely and accurate answers, moreover to prevent specific threats before they are realized; • Monitoring threats – listening to Open, Specialized and Private Sources, to capture and isolate security sensitive information possibly revealing new threats, in order to generate alarms and react with mode detailed investigation and careful prevention. In addition to Intelligence tasks, • Investigation relies on the capability to collect relevant information on past events and analyze it in order to discover links and details bringing to the complete situation picture; • Crisis management comes from emergency escalation, but requires further capabilities to simulate the situation evolution, like intelligence, and understand what has just happened, like investigations. Computational Intelligence Solutions for Homeland Security 47 The main Intelligence support functions are definable as follows: • Information source Analysis and Monitoring – the core information collection, processing and analysis • Investigation and Intelligence – the core processes • Crisis, Main events and Emergencies Management – the critical reaction to large events • Strategies, Management, Direction and Decisions – understand and forecast the overall picture Some supporting technologies are the same across the supporting functions above. In fact, every function involves a data processing flow, from sources to the final report, which can have similar steps as other flows. This fact shows that understanding the requirements, modeling the operational scenario and achieving proper integration of the right tools, are much more important steps that just acquiring a set of technologies. Another useful viewpoint to focus on the data processing flow is the classical Intelligence cycle, modeled with similar approach by different military (e.g. NATO rules and practices for Open Source Analysis [4] and Allied Joint Intelligence [5]) and civil institutions, whose main phases are: • Management - Coordination and planning, including resource and task management, mission planning (strategy and actions to take for getting the desired information) and analysis strategy (approach to distil and analyze a situation from the collected information). Employs the activity reports to evaluate results and possibly reschedule the plan. • Collection - Gathering signals and raw data from any relevant source, acquiring them in digital format suitable for next processing. • Exploiting - Processing signals and raw data in order to become useful “information pieces” (people, objects, events, locations, text documents, etc.) which can be labeled, used as indexes and put into relation. • Processing - Processing information pieces in order to get their relations and aggregate meaning, transformed and filtered at light of the situation to be analyzed or discovered. This is the most relevant support to human analysis, although in many cases this is strongly based on analyst experience and intuition. • Dissemination - This does not mean diffusion to a large public, but rather aggregating the analysis outcomes in suitable reports which can be read and exploited by decision makers, with precise, useful and timely information. The Intelligence process across phases and input data types can be represented by a pyramidal architecture of contributing tools and technologies, represented as boxes in fig. 2, not exhaustive and not necessarily related to the corresponding data types (below) and Intelligence steps (on the left). The diagram provides a closer look to the functions supporting for the Intelligence phases, some of which are, or are becoming, commercial products supporting in particular Business Intelligence, Open Source analysis and the Semantic Web. An example of industrial subsystem taking part in this vision is called IVAS, namely Integrated Video Archiving System, capable to receive a large number of radio/TV channels (up to 60 in current configurations), digitize them with both Web 48 E. Appiani and G. Buslacchi Fig. 2. The LEAF Intelligence architecture for integration of supporting technologies stream and high quality (DVD) data rates, store them in a disk buffer, allow the operators to browse the recorded channels and perform both manual and automatic indexing, synthesize commented emissions or clips, and archive or publish such selected video streams for later retrieval. IVAS manages Collection and Exploiting phases with Audio/Video data, and indirectly supports the later processing. Unattended indexing modules include Face Recognition, Audio Transcription, Tassonomic and Semantic Recognition. IVAS implementations are currently working for the Italian National Command Room of Carabinieri and for the Presidency of Republic. In summary, this section has shown the LEAF component architecture and technologies to support Intelligence and Investigations. The Intelligence tasks look at future scenarios, already known or partially unknown. Support to their analysis thus requires a combination of explicit knowledge-based techniques and inductive, implicit information extraction, as it being studied with the so called Hybrid Artificial Intelligence Systems (HAIS). An example of such combination is shown in the next section. Investigation tasks, instead, aim at reconstruct past scenarios, thus requiring the capability to model them and look for their related information items through text and multimedia mining on selected sources, also involving explicit knowledge processing, ontology and conceptual networks. 4 Inductive Classification for Non-structured Information Open sources are heterogeneous, unstructured, multilingual and often noisy, in the sense of being possibly written with improper syntax, various mistakes, automatic translation, OCR and other conversion techniques. Open sources to be monitored include: press selections, broadcast channels, Web pages (often from variable and short-life sites), blogs, forums, and emails. All them may be acquired in form of Computational Intelligence Solutions for Homeland Security 49 documents of different formats and types, either organized in renewing streams (e.g. forums, emails) or specific static information, at most updated over time. In such a huge and heterogeneous document base, classical indexing and text mining techniques may fail in looking for and isolating relevant content, especially with unknown target scenarios. Inductive technologies can be usefully exploited to characterize and classify a mix of information content and behavior, so as to classify sources without explicit knowledge indexing, acknowledge them based on their style, discover recurring subjects and detect novelties, which may reveal new hidden messages and possible new threats. Inductive clustering of text documents is achieved with source filtering, feature extraction and clustering tools based on Support Vector Machines (SVM) [7], through a list of features depending on both content (most used words, syntax, semantics) and style (such as number of words, average sentence length, first and last words, appearance time or refresh frequency). Documents are clustered according to their vector positions and distances, trying to optimize the cluster number by minimizing a distortion cost function, so as to achieve a good compromise between the compactness (not high number of cluster with a few documents each) and representativeness (common meaning and similarity among the clustered documents) of the obtained clusters. Some clustering parameters can be tuned manually through document subsets. The clustered documents are then partitioned in different folders whose name include the most recurring words, excluding the “stop-words”, namely frequent words with low semantic content, such as prepositions, articles and common verbs. This way we can obtain a pseudo-classification of documents, expressed by the common concepts associated to the resulting keywords of each cluster. The largest experiment has been led upon about 13,000 documents of a press release taken from Italian newspapers in 2006, made digital through scanning and OCR. The document base was much noisy, with many words changed, abbreviated, concatenated with others, or missed; analyzing this sample with classical text processing, if not semantic analysis, would have been problematic indeed, since language technology is very sensitive to syntactical correctness. Instead, with this clustering technique, most of the about 30 clusters obtained had a true common meaning among the composing documents (for instance, criminality of specific types, economy, terrorist attacks, industry news, culture, fiction, etc.), with more specific situations expressed by the resulting keywords. Further, such keywords were almost correct, except for a few clusters grouping so noisy documents that it would have been impossible to find some common sense. In practice, the document noise has been removed when asserting the prevailing sense of the most representative clusters. Content Analysis (CA) and Behavior Analysis (BA) can support each other in different ways. CA applied before BA can add content features to the clustering space. Conversely, BA applied before CA can reduce the number of documents to be processed for content, by isolating relevant groups expressing a certain type of style, source and/or conceptual keyword set. CA further contributes in the scenario interpretation by applying reasoning tools to inspect clustering results at the light of domain-specific knowledge. Analyzing cluster contents may help application-specific ontologies discover unusual patterns in the observed domain. Conversely, novel information highlighted by BA might help dynamic ontologies to update their knowledge in a semi-automated way. Novelty Detection is obtained through the “outliers”, 50 E. Appiani and G. Buslacchi namely documents staying at a certain relative distance from their cluster centers, thus expressing a loose commonality with more central documents. Outliers from a certain source, for instance an Internet forum, can reveal some odd style or content with respect to the other documents. Dealing with Intelligence for Security, we can have two different operational solutions combining CA and BA, respectively supporting Prevention and Investigation [6]. This is still subject of experiments, the major difficulty being to collect relevant documents and some real, or at least realistic, scenario. Prevention-mode operation is illustrated in Fig. 3, and proceeds as follows. 1) Every input document is searched for basic terms, context, and eventually key concepts and relations among these (by using semantic networks and/or ontology), in order to develop an understanding of the overall content of each document and its relevance to the reference scenario. 2a) In the knowledge-base set-up phase, the group of semantically analyzed documents forming a training set, undergoes a clustering process, whose similarity metrics is determined by both linguistic features (such as lexicon, syntax, style, etc.) and semantic information (concept similarity derived from the symbolic information tools). 2b) At run-time operation, each new document is matched with existing clusters; outlier detection, together with a history of the categorization process, highlights possibly interesting elements and subsets of documents bearing novel contents. 3) Since clustering tools are not able to extract mission-specific knowledge from input information, ontology processing interprets the detected trends in the light of possible criminal scenarios. If the available knowledge base cannot explain the extracted information adequately, the component may decide to bring an alert to the analyst’s attention and possibly tag the related information for future use. Content Analysis Document(s) 1 Behaviour Analysis Annotated Docum. Corpus Set-up 2a Ref. clusters Run-time Ref. Dynamic Knowledge 3 Novelties 2b Analyzed novel scenario Fig. 3. Functional CA-BA combined dataflow for prevention mode Novelty detection is the core activity of this operation mode, and relies on the interaction between BA and CA to define a ‘normal-state scenario’, to be used for identifying interesting deviations. The combination between inductive clustering and explicit knowledge extraction is promising in helping analysts to perform both gross Computational Intelligence Solutions for Homeland Security 51 classification of large, unknown and noisy document bases, find promising content in some clusters and hence refine the analysis through explicit knowledge processing. This combination lies between the information exploitation and processing of the intelligence cycle, as recalled in the previous section; in fact, it contributes both to isolate meaningful information items and to analyze the overall results. Content Analysis Selected corpus 2 Annotated Docum. Corpus 4 Relevant Groups 1 Ref. Investigation Scenario Behaviour Analysis 3 Structured hypothesis Search strategy for missing information Fig. 4. Functional CA-BA combined dataflow for Investigation mode Investigation-mode operation is illustrated in Fig. 4 and proceeds as follows. 1) A reference criminal scenario is assumed as a basic hypothesis. 2) Alike prevention-mode operation, input documents are searched for to develop an understanding of the overall content of each document and its relevance to the reference scenario. 3) BA (re)groups the existing base of documents by embedding the scenariospecific conceptual similarity in the document- and cluster-distance criterion. The result is a grouping of documents that indirectly takes into account the relevance of the documents to the reference criminal scenario. 4) CA uses high-level knowledge describing the assumed scenario to verify the consistency of BA. The output is a confirmation of the sought-for hypothesis or, more likely, a structural description of the possibly partial match, which provides useful directives for actively searching missing information, which ultimately serves to validate or disclaim the investigative assumption. Elsag Datamat already uses CA with different tools within the LEAF component for Investigation, Intelligence and Decision Support. The combination with BA is being experimented in Italian document sets in order to set up a prototype for Carabinieri (one of the Italian security forces), to be experimented on the real field between 2009 and 2010. In parallel, multi-language CA-BA solutions are being studied for the needs of international Polices. 52 5 E. Appiani and G. Buslacchi Conclusions In this position paper we have described an industrial solution for integrated IT support to Law Enforcement Agencies, called LEAF, in line with the state of the art of this domain, including some advanced computational intelligence functions to support Intelligence and Investigations. The need for an integrated Law Enforcement support, realized by the LEAF architecture, has been explained. The key approach for LEAF computational intelligence is modularity and openness to different tools, in order to realize the most suitable processing workflow for any analysis needs. LEAF architecture just organizes such workflow to support, totally or partially, the intelligence cycle phases: acquisition, exploitation, processing and dissemination, all coordinated by management. Acquisition and exploitation involve multimedia data processing, while processing and dissemination work at conceptual object level. An innovative and promising approach to analysis of Open and Private Sources combines Content and Behavior Analysis, this last exploring the application of | clustering techniques to textual documents, usually subject to text and language processing. BA shows the potential to tolerate the high number, heterogeneity and information noise of large Open Sources, creating clusters whose most representative keywords can express an underlying scenario, directly to the analysts’ attention or with the help of knowledge models. References 1. Law Enforcement Record Management Systems (RMS) – Standard functional specifications by Law Enforcement Information Technology Standards Council (LEITSC) (updated 2006), http://www.leitsc.org 2. Europol – mission, mandates, security objectives, http://www.europol.europa.eu 3. European Security and Defense Policy – Wikipedia article, http://en.wikipedia.org/wiki/ European_Security_and_Defence_Policy 4. NATO Open Source Intelligence Handbook (November 2001), http://www.oss.net 5. NATO Allied Joint Intelligence, Counter Intelligence and Security Doctrine. Allied Joint Publication (July 2003) 6. COBASIM Proposal for FP7 – Security Call 1 – Proposal no. 218012 (2007) 7. Jing, L., Ng, M.K., Zhexue Huang, J.: An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data. IEEE Transactions on knowledge and data engineering 19(8) (August 2007)
© Copyright 2025 Paperzz