My flight has landed 30 minutes early. We’re sitting on the tarmac looking at a row of vacant gates at the terminal. Then the pilot announces “our gate isn’t ready” and we sit, pointlessly, until our scheduled arrival time. What a waste of time and fuel.
How can this happen? Isn’t it simple? There’s an empty gate; aren’t they like parking spots?
It turns out that it isn’t that simple. In fact, it’s really hard.
The air transport system consists of hundreds of airports, thousands of planes, tens of thousands of routes, millions of staff, more millions of passengers and millions of items of luggage. Plus fuel trucks, catering, maintenance facilities, air traffic control, and radar installations. And each of those components breaks down further. Take a pilot, for example. Where’s he staying? How long will it take him to reach the airport? How many more hours is he allowed to fly? Is he physically well? Every item breaks down, fractally, ad infinitum. To make things more complex still, there are myriad dependencies between them. Just suppose that a weather delay means a pilot reaches their hours of flight limit. Now the plane can’t fly, meaning the cabin crew won’t make their next flight in time, and so forth.
With so much to go astray, it’s amazing any journey is ever completed. Of course, the thousands of interlocked companies that make up the system understand this. They know from experience how much redundancy they need to factor into their operations so that they can achieve the appropriate service levels. To make the system more flexible, more able to respond to, say, an early arrival, the system could add redundancy. But that’s expensive. Intolerably so in such a cost-driven industry.
Our hypothetical early arrival is pretty trivial. How about a more serious disturbance? A volcanic eruption in Iceland that grounds half the airliners in Europe for hours or days. Now a massive number of items – planes, people, luggage etc. – are in the wrong place. What’s the fastest, cheapest way to recover the system? Graph analytics gives a set of tools to work on problems like this.
Read these too< >
What’s a graph?
This data set can be represented as a graph. A graph represents objects as points, or nodes, with relationships between pairs of objects connected via lines, or edges. In the graphic, the red nodes might represent aircraft, with the blue nodes representing passengers. The edges indicate which passengers are on which plane on a given day. Note that some passengers, presumably with multi-leg itineraries, are connected to more than one aircraft.
There are many other real-world examples that can be represented by graphs, such as
- Customer relationship management – who knows whom, what products have they bought, how were they configured, etc.
- Shipping – similar to the airline case, but modelling the movement of packages
- Security – analyzing users, machines, DNS logs etc. to identify unusual activity
- Social networks - understanding social influence, the propagation of marketing campaigns and so on
- Physical networks – roads, telecommunications networks, power grids
What can graph analytics do?
The powerful thing about graph analytics – tools to manipulate graph data, is that it allows you to retrieve information about relationships between objects (the connections between nodes), not just the properties of objects (data in the nodes). For instance, you can uncover communities or clusters of nodes with similar characteristics.
You can detect what’s normal, what’s changing and what’s abnormal. A pattern of new activity might indicate an unfolding security breach in a company’s network, allowing action to be taken before any sensitive data can be compromised.
Another powerful capability is to find the most important, or “critical” nodes. To use the security case again, it might be possible to find a computer that is the source of a security breach based solely on criticality, without having to actually understand in detail what that particular machine is doing.
Lastly, graph analytics can find the optimal path to achieve a stated goal. The best sequence of plane, personnel and passenger movements to recover from that Icelandic volcano disruption, for example. Airlines report that it can take 30 on-time flights to recover from one disrupted one. Just imagine how much money (and customer satisfaction) is on the line in case of a major incident!
Why aren’t graph analytics more well-known?
Graphs seem like such a natural way for us to represent so many of the complicated biological, economic, social and technological systems we find in the real world, so why aren’t they a more familiar part of our IT landscape? Unfortunately, many graph algorithms are “computationally intractable”, meaning that massive computational power and intricate parallel processing algorithms are required.
The problem with manipulating graphs is that conventional computers don’t think this way. They think in terms of orderly tables where the next piece of information needed is predictable. This predictability allows a system to guess with a surprisingly high degree of precision what data will be needed next and be working on the glacially slow task of retrieving that data from Flash or disk storage so the vastly faster processor won’t have to wait.
Graph traversal – analyzing a graph by hopping from node to node - isn’t like that. Traversal can take you to any node in the graph very quickly. If the node you want isn’t already cached in memory, the whole system has to wait and performance plummets. (See the sidebar for an illustration of just how long the wait can be.)
If a graph is small enough to fit in main memory this isn’t a problem. But the really interesting real-world graph problems can be enormous. Graph theory is one of the drivers towards “in-memory computing” – computers with really large main memory capacity and the software algorithms to make use of all that memory.
A leading example today is the HP HANA Hawk appliance, which hosts SAP’s excellent HANA in-memory database software with up to 12TB of memory – a thousand times more than most personal computers.
Towards exascale graphs and The Machine
12TB may seem like a lot, but we foresee petabyte-scale datasets becoming commonplace in the near future, moving toward the exascale by the end of the decade. When Facebook crossed the one billion user threshold in 2012, the community was adding half a petabyte every 24 hours, 180 petabytes per year.i Here the graph consists of the users, their posts and all the rich media they are constantly adding. It is forecast that there will be 26 billion connected devices by 2020ii. When you consider what (and who) those devices connect to, and the communications between them, you can start to get a feel for how large their graph will be.
Can we build computers with multiple orders of magnitude more memory? And just as importantly, can we do it without consuming orders of magnitude more energy, at a price that's within reach of the sort of researchers and analysts that need it?
You may be aware of the ambitious Hewlett Packard Labs’ research project that we’re calling The Machine. The Machine aims to reinvent computing from the ground up using novel hardware, a new, open source operating system and groundbreaking analytics algorithms.
The Machine is designed to be almost infinitely scalable – small enough to fit in a sensor, large enough to replace a datacenter. One of the most exciting properties of The Machine architecture is the ability to accommodate vast amounts of memory. Our calculations indicate we can access any bit in 160 petabytes of memory with a latency of just a few hundred nanoseconds. Our new architecture is also extremely energy efficient, of the order of 1/100th of the energy per calculation achievable today.
We’re expecting to have The Machine project finished by the end of this decade, but you’ll be hearing a lot more about it as various component technologies come to market. We’re also going to be enlisting collaborators to help us develop the open source operating system and prepare developers to be ready to take advantage of the whole new world of insight and value to be derived from massive graph analytics. Stay tuned!
How do the Internet of Things, big data and the future of computing collide into one vision for technology? Two words: The Machine. Join Martin Fink for an online conversation as he describes how The Machine will advance four emergent technologies in parallel to prevent the rising data flow from flooding conventional IT systems and to disrupt the way we think about computing. Watch the Replay.