Department of Computing Building Internet Internet-Scale Scale Distributed Systems for Fun and Profit Peter Pietzuch prp@doc.ic.ac.uk Large-Scale g Distributed Systems y Group p http://platypus.doc.ic.ac.uk DistributedPeter Software (DSE) Section R.Engineering Pietzuch prp@doc.ic.ac.uk p p@ Department of Computing Imperial College London Oxford University Computer Laboratory – Oxford – June 2009 Internet-Scale Distributed Systems • Search engines (e.g. Google, Yahoo, ...) – Global crawling, indexing and search • Google: over 450,000 servers in at least 30 data centres world-wide (?) • Content delivery networks (CDNs) (e.g. Akamai, Limelight, ...) – Scalable web hosting hosting, file distribution, distribution media streaming, streaming ... • Akamai: hosting for Microsoft.com, CNN.com, BBC iPlayer, ... • Social networking sites (e.g. Facebook, Twitter, LinkedIn, ...) • Facebook: serves 200 million users and stores 40 billion photos • Cloud computing applications (e.g. Amazon, Microsoft, Google, ...) – Pay-as-you-use Pay as you use storage and computation for applications • Amazon: bought servers worth $86 million in 2008 alone 2 Internet-Scale Distributed Systems • Peer-to-peer computing (e.g. Bittorrent, BOINC, ...) – Contribute users users’ resources for file sharing, scientific computing • Bittorrent: “1/3 of all Internet traffic” (?) [CacheLogic’04] • @home computing: Quake-Catcher@home SETI@home • Large-scale test-beds (e.g. PlanetLab, Emulab, ...) – Possible to deploy research systems in real-world l ld • PlanetLab: 1041 nodes at 500 sites (May’09) – Great for student projects! 3 Properties of Internet-Scale Systems • Large number of users, requests, resources, ... – Single/multiple data centres, hosts and/or mobile clients ( Requirement: Scalability • Wide-area Internet communication – Cannot ignore network effects ( Requirement: Network-awareness • Long-running, 24/7 service – Must adapt to changing conditions and failure ( Requirement: R i t Fault-tolerance F lt t l 4 Why is Building Internet-Scale Systems Hard? • Scalability is hard to achieve – How to organise 1000s of processing hosts? – What is the programming model? • Applications must be intelligent about network use – How can we achieve application requirements? – Lead to data loss, loss resource shortages shortages, inconsistency • PlanetLab: 630 healthy machines outt off 1041 ttotal t l (May’09) • Google: 1 failure per hour in 10,000 node clusters source: Google • Continuous network, node failures 5 High-level Abstractions Help • Google uses several layers of abstraction – Runs applications (search, mail, ...) on top of highest layer – Each layer is scalable, network-aware and fault-tolerant Google Apps Google Apps Google Apps MapReduce computation BigTable storage system Chubby lock service Google File System 6 Large-Scale Distributed Systems Group • Research goal: “Support the design and engineering of scalable and a d robust obus Internet-wide e e de applications” app ca o s • Need to provide higher-level abstractions at different layers – Many success stories from h exist i t research • e.g. overlay networks, distributed hash tables, network coordinates, storage and replication mechanisms, ... – Combination of networks,, distributed systems & database research Data management g layer y Application layer Network layer 7 Talk Structure III. Data management layer: Supporting imperfect data processing DISSP Project: Dependable Internet-scale stream processing II. Application layer: Building adaptive overlay networks LANC CDN Project: Network/load Network/load-aware aware content delivery I. Network layer: Improving Internet routing Ukairo Project: Detour routing for applications 8 I. Improving Internet Routing • Internet-scale applications want custom communication paths – Skype wants path with low packet loss – iTunes wants path with high download rate • Internet scheme te et uses ttwo-level o e e hierarchical e a c ca routing out g sc e e a AS 1 AS 4 b AS3 AS 2 AS 5 AS 6 – Internet hosts part of autonomous systems (ASs) • Inter-AS routing (BGP) and intra-AS routing (OSPF) • Internet routing optimises for ISPs’ concerns! – One path for all applications and no control over returned path 9 Taking Detours on the Internet • Idea: Take multiple Internet paths and stitch them together Direct Path a AS1 AS2 AS4 AS3 AS6 AS5 b Detour Path d – Resulting detour path may have better properties • What causes Internet detour paths? – Inter-AS routing g not optimal p + limited expressiveness p 10 Finding Detours in the AS Graph [IPTPS’09] • Idea: Analyse detours in the Internet AS graph – Assume that similar AS-level AS level paths benefit from similar detours Shared AS link a AS1 AS3 AS2 c AS5 Known good detour AS7 AS4 b AS6 d Potential g good detour e – Perform clustering on similarity metric: shared link count 11 Ukairo Project: Detour Routing for Applications • Deploying general-purpose detour routing plane on PlanetLab – Continuously searches for Internet detour paths – Node exchange found detours using gossiping – Applications can use it transparently, e.g. web browser download • Open research questions – – – – What What What What is the overhead of finding detour paths? happens if everybody uses detour routing? do ISPs think about this? are the lessons for future Internet designs? 12 Talk Structure III. Data management layer: Supporting imperfect data processing DISSP Project: Dependable Internet-scale stream processing II. Application layer: Building adaptive overlay networks LANC CDN Project: Network/load Network/load-aware aware delivery of content I. Network layer: Improving Internet routing Ukairo Project: Detour routing for applications 13 II. Building Adaptive Overlay Networks • Imagine your start-up idea of “mugbook” becomes an overnight o e g success success... mugbook mugbook • How do you support such a website? – Single web server? – Multiple web servers in single data centre? 14 Content Delivery Networks • Content delivery networks (CDNs) serve content to many cclients e s world-wide o d de – Overlay network consists of: • Distributed set of servers that maintain content replicas • Clients (web browsers) that request content 15 Mapping Clients to Content Servers • How do we assign clients to content servers? – Load awareness • Don’t direct clients to overloaded content servers – Network N t k awareness • Don’t send traffic on congested network paths • Many heuristics proposed in the past – Geographic location – Clustering of address prefixes – Proprietary solutions 16 Cost Graph • Associate each client/server pair with cost – Use download times from servers as cost metric • Incorporates load and network congestion • But: measurement overhead remains high – Can’t measure all costs – need to estimate missing ones 17 Network Coordinates • Idea: Assume cost graph embeddable in metric space – Approximate missing measurements using Euclidean distances • Assign each client/server a network coordinate C – Distances between coordinates estimate download costs | C(Client1) – C(Server1) | = download_time 18 Computing Network Coordinates • Scalable, decentralised computation (e.g. using Vivaldi algorithm) [Dabek’04] – 2 2-5 5 dimensions sufficient in practice – Low measurement overhead – Continuous process ~1500 web servers with network delay as cost 19 LANC Content-Delivery Network [ROADS’08] • Use network coordinates to organise content servers and clients – Clients keep track of content servers in “neighbourhood” – Map clients to “nearest” content servers in space • Overloaded content servers “move away” 20 Does it really work? (Yes!) • Deployed LANC CDN on PlanetLab • 119 content servers and 16 clients • Downloaded Linux distribution from 100 web servers world-wide • Tried several different assignment strategies 1.0 08 0.8 LANC CDN Nearest Random Direct 0.6 CDF 0.4 02 0.2 0.0 10 100 1000 10000 Transfer data rate per request (KB/s) 21 Talk Structure III. Data management layer: Supporting imperfect data processing DISSP Project: Dependable Internet-scale stream processing II. Application layer: Building adaptive overlay networks LANC CDN Project: Network/load Network/load-aware aware delivery of content I. Network layer: Improving Internet routing Ukairo Project: Detour routing for applications 22 III. Supporting Imperfect Data Processing • Global sensing infrastructures Users Mobile sensing devices Applications Traffic monitors Data collection, f sion fusion, aggregation & dissemination Scientific instruments RFID tags g Cameras Body sensor networks Webfeeds Embedded sensors Wireless sensor networks Web content – Runs continuous queries over sensor streams – Failure takes out resources 23 Stream Data Model • Data sources emit streams of data tuples – Tuples contain schema with fields ts coord image ts coord image ts coord image ts coord image ts coord image • User submit declarative queries – Range of operators (filter, join, transform, ...) process data tuples image merging operator coordinate transform f operator coordinate transform f operator 24 Failure Recovery in Stream Processing • Use redundant resources to achieve dependability image merging operator coordinate transform operator image merging operator coordinate transform operator – Run multiple copies of same query operator • But: Internet-scale system may have not enough spare resources – Instead accept degradation in processing quality • Idea: Enhance stream data model to include quality information 25 Quality-Centric Stream Data Model • Enhance data tuples with: D8 D7 data weight recall 3 D8 2 D7 3 0.83 1 2 0.75 1 D1 1 D3 1 D5 1 D1 1 1 D3 1 1 D5 1 1 D2 1 D4 1 D6 1 D2 1 1 D4 1 1 D6 1 1 Weight Number of data sources in tuples Recall Fraction of received tuples 26 What is it Good for? • Provide feedback about result quality to users – Measure of how much data made it into the result tuple • Allow system to handle node and network failures 1. Proactive operator replication • Invest resources where failure impact highest 2. Reactive failure recovery • Decide based on lost recall if recovery worthwhile • Support for smart load-shedding under resource shortage – Discard tuples with lowest impact on overloaded processing nodes 27 DISSP Project: Dependable Internet-Scale Stream Processing • Currently building prototype system – Anybody will be able to connect sensor sources + run queries – System provide best effort service given available resources Users Applications Mobile sensing devices Traffic monitors Data collection, fusion, aggregation & dissemination Scientific instruments RFID tags Cameras p questions q • Open Body sensor networks Webfeeds Embedded sensors Wireless sensor networks Web content – What’s the right data model for processing sensor data? – How to discovery data sources in a scalable fashion? – How to perform query optimisation at a global scale? 28 Research Outlook • Programming model – What are the right abstractions for building Internet Internet-scale scale systems? • Need richer Internet interface – not just send(packet,dest_IP) – How do we build robust cloud applications? • Currently too much focus on low-level services • System management – How do we provision Internet-scale systems? • Scale up/down p/ for sudden rise in p popularity p y – “flash crowds” • Testing and evaluation – How do we test, debug and evaluate Internet-scale systems? • Hard to obtain reproducible results from PlanetLab experiments 29 Conclusions ( Internet-scale apps have new network requirements – “One One size doesn’t doesn t fit all” all – but it it’ss hard to change the Internet Ukairo: Overlay networks can provide custom routing ( Internet-scale Internet scale systems need new overlay abstractions – Apply geometric algorithm to solve distributed systems problems LANC CDN: Metric space for node organisation in CDN ( Internet-scale systems require new data models – Unrealistic U li ti to t expectt perfect f t processing i – Instead accept failure and overload as a fact of life DISSP: Make impact of failure on processing explicit Thank You! Any Questions? Peter Pietzuch <prp@doc.ic.ac.uk> http://platypus.doc.ic.ac.uk 30
© Copyright 2025 Paperzz