In the name of Allah Massive Data Algorithmics An Introduction Overview MADALGO SCALGO Basic Concepts The TerraFlow Project STREAM The TerraStream Project TPIE MADALGO- Introduction Center for MAssive Data ALGOrithmics A major basic research center funded by The Danish National Research Foundation Covers all areas of the design, analysis and implementation of algorithms and data structures for processing massive data MADALGO- Four core research areas I/O-efficient algorithms ◦ Algorithms designed in a two-level external memory (or I/O-) model ◦ The memory hierarchy consists of a main memory of limited size M and an external memory (disk) of unlimited size ◦ the goal is to minimize the number of times a block of B consecutive elements is read (or written) from (to) disk (an I/O-operation, or simply I/O) MADALGO- Four core research areas cache-oblivious algorithms ◦ Algorithms designed in the I/O-model – but without knowledge of M and B– and then analyzed as I/O-model algorithms ◦ Holds simultaneously on all levels of any multi-level memory hierarchy. MADALGO- Four core research areas streaming algorithms ◦ Only one (or a small constant number of) sequential pass(es) over the data is (are) allowed ◦ Solve a given problem using significantly less space than the input data size ◦ Process each data element as fast as possible MADALGO- Four core research areas algorithm engineering ◦ the design and analysis of practical algorithms ◦ efficient implementation of these algorithms ◦ experimentation that provide insight into their applicability and further improvements SCALGO SCALGO: SCALable alGOrithmics Was founded in 2009 in Aarhus, Denmark Mission: to bring cutting-edge massive terrain data-processing technology to market Terrain Terrain: The vertical and horizontal dimension of land surface LIDAR LIDAR: Light Detection And Ranging an optical remote sensing technology measures the distance to, or other properties of, a target by illuminating the target with light often uses pulses from a laser Point cloud A set of vertices in a three-dimensional coordinate system Usually defined by X, Y, and Z coordinates Typically intended to be representative of the external surface of an object DEM DEM: Digital elevation model A digital model or 3D representation of a terrain's surface ◦ Two most used types of DEM are regular grid and triangulated irregular network (TIN) Regular grid DEM a matrix of equally spaced points with each point having x, y and z coordinate values Regular grid DEM- Quadtree a tree data structure in which each internal node has exactly four children most often used to partition a two dimensional space by recursively subdividing it into four quadrants or regions Triangulated Irregular Network (TIN) irregularly distributed nodes and lines with three-dimensional coordinates arranged in a network of non-overlapping triangles TIN- Delaunay triangulation A triangulation for a set of points such that no point is inside the circumcircle of any triangle maximizes the minimum angle of all the angles of the triangles in the triangulation tends to avoid skinny triangles The TerraFlow Project Has emerged from the experiences with terrain analysis applications which do not scale up to large datasets a software package for computing flow routing and flow accumulation on massive grid-based terrains based on theoretically optimal algorithms designed using external memory paradigms Flow direction, flow routing and flow accumulation The flow directions of a cell correspond to the directions in which water would flow if poured at that cell onto the terrain ◦ water cannot go uphill The flow routing problem: the problem of assigning flow directions to all cells in the DEM such that 1. flow directions do not induce any cycles; 2. every cell has a flow path off the edge of the terrain The flow accumulation of a terrain is an index which estimates the surface runoff for each cell in the terrain STREAM- Introduction STREAM: Scalable Techniques for hiResolution Elevation data Analysis and Modeling Located in the CS department at Duke university funded by the U.S. Army Research Office STREAM- Projects Constructing DEM ◦ developed two methods for efficiently converting LIDAR point sets to more conventional formats: Grid Construction: uses a quad-tree segmentation TIN Construction: uses a Delaunay triangulation algorithm Terrain Flow Modeling ◦ improvements to existing work done as part of the TerraFlow project STREAM- Projects Noise Removal ◦ There is some level of noise in DEMs derived from LIDAR ◦ computes a persistence score for topological features ◦ uses this persistence score to remove small topological features likely the result of noise STREAM- Projects Hierarchical Watershed Decomposition ◦ partitions a terrain into a hierarchy of nested watersheds STREAM- Projects Topographic Change ◦ Detecting topographic change can quickly identify beach dunes damaged by hurricanes, monitor urban development or measure change in forest growth TerraSTREAM- Introduction A series of libraries and front-ends for these libraries Allows the user to perform a series of computational tasks on very large digital elevation models The data is represented either as a TIN or a GRID A collaboration between Duke University CS researchers and researchers at MADALGO TerraStream- Features DEM Construction ◦ Computes a digital elevation model (DEM) from a point cloud ◦ The input data is typically gathered using LIDAR ◦ Constructs both TINs and grids TerraStream- Features DEM Topological Conditioning ◦ Simplifies digital elevation models by first identifying and then removing insignificant geographical features ◦ Significance is the feature's height, area and volume or any combination of these ◦ A feature is insignificant if its significance is smaller than some threshold specified by the user TerraStream- Features Flow Routing ◦ Compute flow directions for each data point in a DEM ◦ The routing models supported are steepest-flow-descent multiple-flow-directions flux decomposition Flow Accumulation ◦ Accumulate amounts of, e.g., water on a DEM along flow paths as computed by the flow routing module TerraStream- Features Flood Simulation ◦ Flood Mask computes a mask of the cells that are flooded if the water lever were raised 'x' units ◦ General Transforms a DEM to a new DEM The height of each cell in the produced DEM is the minimum height that the water level needs to be raised to in order for that particular cell to flood TerraStream- Features Contour Map Computation ◦ Computes the contour map of a terrain TerraStream- Features Raster Quality Assessment ◦ takes a raster and point cloud ◦ computes how far the center of each raster cell is from the closest point in the point cloud ◦ it is easy to spot areas of the grid where there is no points close ◦ If the point cloud used is the same used for generating the input raster this can be used for quality control of the point cloud, the classification algorithm used and the produced raster TerraStream- Features Watershed Hierarchy Construction ◦ Construct a Pfafstetter labeling of the watersheds of a DEM LS-Factor Computation ◦ LS-factor: an aggregate of the slope length factor (L) and the slope steepness factor (S) ◦ estimate the effects of slope length and steepness on erosion Format Flexibility ◦ reading and writing mosaic grids in many common formats TPIE- Introduction TPIE: The Templated Portable I/O Environment A tool-box providing efficient and convenient tools To ease the implementation of algorithm and data structures on very large sets of data The algorithms and data structures that form the core of TPIE all provide efficient worst-case space, time and disk usage guarantees In Windows, TPIE is known to work with the Microsoft Visual Studio 2008 and 2010 compilers TPIE- Example Internal sorting TPIE- Example Reading and writing file streams TPIE- Example External sorting TPIE- Example Priority queue TPIE- I/O parameters M and B get_block_size() implementation TPIE- I/O parameters Elements’ block size ◦ Pass the block factor to the constructor The End Thank you for your time
© Copyright 2025 Paperzz