In this lesson we'll examine data center applications and network traffic, and see what is implies for the network's design. Pretty much every popular web application we use today runs on data center networks. Not only do data centers run these applications, but they also run support tasks, which make the information available for these applications. For example, if you're thinking of FIB seach, you have to create search indices by collecting data from the web, and processing it. So this is something that these data centers, servers, work together on. Such infrastructure also supports big science applications, such as climate modeling. All of this creates massive amounts of data, being moved around inside data centers. Let's take a slightly closer look at how something, like a web search query, might work. So I make a search query, it hits a server in a data center. This server might query several other servers which further might communicate with other servers, and so on. These responses are then collated, and the final search response is sent to me. This kind of a traffic pattern is referred to as, scatter-gather, or partition aggregate. So you're scattering the request, and then you're gathering the results. For one query, there might be a large number of server to server interactions within the data center. If you think this picture is complicated, take a look at this. This is what Bing's query workflow for producing the first page of results from a search query looks like. So the query comes in here at the top. It's split into multiple requests, which further might be split into sub-requests and so on. So you can see the scatter pattern in play here. A server makes several different requests, to fulfill the search query. And then you can also see the gather pattern closer to the end of this graph. Where are the results from multiple, different subqueries, are gathered and processed together. And in the end, a response is produced, which is sent to the user. The processing of a query can be a multi-stage process, and each step can involve as many as 40 different sub-requests. And this kind of scatter-gather pattern, is not exclusive to search. Facebook, for example, also sees several, internal requests in processing a user request for a page. Loading one of Facebook's popular pages causes an average of 521 distinct requests on the backend. For the same page in the 91st percentile, you require 1740 items to be fetched. So really, inside these web applications, you can make a large number of requests involving one user request. Further these data centers also run, big data processing tools, such as Hadoop, Spark, Dryad, Databases, which all work to process information and make it available to these web applications. These data processing frameworks can move massive amounts of data around. So what does the actual measured traffic and data centers look like when they're running these applications? The short answer unfortunately, even though it seems like a cop-out is, it depends. On the applications that you're running, on the scale you're running them at, and the network's design as well as the design of the applications. But nevertheless, let's take a look at some of the published data. One thing that's unambiguously true is traffic volume inside data centers is growing rapidly, and is the majority of traffic that these servers see. The majority of the traffic is not, to and from the Internet but, is rather, inside the data center. Here what you're looking at is data from Google showing the traffic generated by their servers, and their datacenters over a period of time, a bit more than six years which is on the X-axis. And on the Y-axis is the aggregate traffic volume. The absolute numbers are not available, but over a six-year period the data volume grows by 50 times. Google has noted that this e-traffic doubling every year. Google's paper is not quite clear about whether this is data center internal traffic only but Facebook has mentioned that machine to machine traffic is several orders of magnitude larger than what goes out to the Internet. So really, most of the traffic in these facilities is going to be within the facility, as opposed to to and from the Internet, and it's growing quite rapidly. So what does this traffic look like? One question we might want to have answered is about locality. Do machines communicate with neighboring machines, or is traffic uniformly spread throughout the data center or such? So here we have some data from Facebook that goes some way towards addressing this question. Let's focus on this part of the table, where all of Facebook's data center traffic is partitioned by locality. Within rack, within a cluster, within a data center, or across data centers. As you can see roughly 13% of the traffic is within a rack. So these are just machines within the rack talking to each other. A rack might host some tens of machines, for example, 40 machines seems to be quite common. Then we see that there is 58% of traffic, which stays within a cluster but not within the rack. So this traffic is across racks within a cluster. Further, 12% of the traffic stays within the data center, but it not cluster local. So this is traffic, between multiple clusters in the data center. Also interesting is that 18% of the traffic is between data centers. This is actually larger than the rack local traffic. So locality in this work load is not really high. Also worth noting, is that Hadoop is the single largest driver of traffic in Facebook data centers. More data at rack locality comes from Google. What you're looking at here is data from 12 blocks of servers. Blocks are groups of racks, so a block might have a few hundred servers. This is a smaller granularity than a cluster, but a larger granularity than a rack. Here, what you're looking at, in the figure on the right is traffic, or a block, that is leaving for other blocks, so that is non-local traffic. For each of these 12 blocks in the figure, you see that most of the traffic is non-local, so most of the traffic goes to other blocks. Now there are 12 blocks here. If traffic was uniformly distributed, you would see 1/12 of the traffic being local, and 11/12, that is roughly 90%, being non-local, which is exactly what this graph shows. Part of this definitely stems from how Google organizes storage. The paper notes, that for great availability, they spread data around different fall domains. So for example, a block might have all its power supply from one source. If that power supply fails, you lose everything in that block. So if data is spread around well, over multiple blocks, you might still have the service be available. So this kind of organization is good for availability, but is bad for locality. Another set of measurements comes from Benson et al., who evaluated three university clusters, two private enterprise networks, and five commercial cloud networks. The paper obscures who the cloud provider is, but each of these datacenters host 10,000 or more servers running applications including web search, and one of the authors works at Microsoft. The university and private data centers have a few hundred to 2,000 servers each. While the cloud data centers here, have 10 to 15,000 servers. The cloud data centers one through three, run many applications including web, mail, etcetera, but cloud four and five run more MapReduce style workloads. One thing that's worth noticing is, the amount of rack-locality here is much larger. 70% or so for the cloud data centers, which is very different from what we saw earlier for Google and Facebook measurements. There're many possible reasons for these differences. For one, the workloads might be different. Not even all MapReduce jobs are the same. It's entirely possible that Google and Facebook run large MapReduce tasks which do not fit in a rack. While this mystery data center runs smaller tasks that do fit in a rack, and, hence, traffic is mostly rack-local. There might also just be different ways of organizing storage and compute. There's also a five-year gap between the publishing of these measurements. Perhaps, things are just different now, hap sizes might have grown substantially, or people have changed how they do these things. Having looked at locality, let's turn our attention to flow level characteristics. How many flows does a server see concurrently? Facebook's measurements, and I quote here: And Hadoop nodes have, approximately, 25 concurrent connections on average. In contrast, for a 1500 server cluster, running Hadoop style workloads, the numbers are much smaller. Two servers within a rack and four servers outside a rack is the number of correspondents for a server. So a server talks to six other servers in the median. Contrast that with the 25 in Facebook's workloads. The only things we can conclude from this, perhaps, are that: A) there are differences across applications, so web servers, caches, and Hadoop, all have different numbers of concurrent flows. And also, that even all Hadoop workloads are not created the same. So Facebook's Hadoop workloads seem to be different from this, mystery data center workload. Another thing we are interested in finding out, is, what is the arrival rate for flows? How soon do new flows arrive in the system? Facebook's measurements put the median flow inter-arrival time at a server at approximately 2 milliseconds. The mystery cluster has inter-arrival times in 10s of milliseconds, so a-less than a 10th of Facebook's rate. Regardless of the difference in these two measurements, one higher order bit that's worth mentioning here is that at the cluster scale, the flow of inter-arrival times, would be in the microseconds, if you had a 1000 server cluster, for example. So, really, new flows enter the system at a very high frequency, as a whole. What about flow sizes? Most Hadoop flows, it seems, are quite small in Facebook's measurement. The median flow is smaller than a kilobyte, and less than 5% of flows, exceed a megabyte, or last longer than 100 seconds. For caching workloads, most flows are long lived, but they're burst internally. So you can think of these as, perhaps, long lived TCP connections, where you do not want to incur the cost of a DCP handshake, and the rate increase, due to slow start every time, so you keep the DCP connection alive, but data is only exchanged sporadically. Also of interest are heavy hitters. So, are there a small fraction of flows that move most of the bytes? For this set of measurements, the answer seems to be no. Heavy-hitters are roughly the same size as the median flow, and are not persistent. So some flows which might be moving a large fraction of the bytes, now, do not continue to be the heavy-hitters in the next time interval. These measurements do a great interval from the measurements from the 1500 server cluster also running Hadoop. There are more than 80% of these servers, last less than 10 seconds, but, more than 50% of bytes are in flows that are also small or short, less than 25 seconds. Some part of this, in particular the heavy-hitters being roughly similar in size to the median flow, stands surely from application level load balancing. For example, for caching, if you're spreading the load across these different caches, you will see roughly similar flow sizes. So where do we go from here? What does data center traffic look like? We don't have a definitive answer. And right now not a whole lot of data is available. There are, however, some conclusions we can draw from the nature of data center applications, as well as the points of agreement of these different data sets. We look at these in detail, but briefly: One, data center traffic, internally, is quite big and increasing. Two, we have very tight deadlines for network I/O. That is, we are targeting small latencies for applications, like web search in particular. Three, because we have a large number of flows, we'll see congestion in the network, and problems like TCP incast. Four, we need isolation across multiple applications that are running in these data centers. Five, centralized control at the flow level might be difficult because of the large flow rates we are seeing. So let's talk about each of these in more detail. One thing that's amply clear from these measurements and from the growth of these applications, is that need for bandwidth into the centers is rapidly growing. As the application scale, we need to provide higher and higher capacity networks, and in a cheap and scalable and fault-tolerant manner. We want efficient network topologies and efficient routing to achieve high capacity. Apart form moving large volumes of data around the data center, we also want to do this at very small latencies. Particularly for applications like search, where the deadlines for responses can be very small, 10's of milliseconds. Just looking at this Bing query workflow tires me out. Let's take a small candy break. Here, I have a bunch of candy I've collected. [Rattling] And I'll drop in this, washed and cleaned, stone into it. It's roughly the same size and weight as the individual pieces of candy. Now, if I were to pick a piece from this box randomly, I'm highly likely to end up with candy. No teeth broken. So that worked. But if I grab a handful of candy now, let's see what happens. Oh, here it is. I have the rock now. [Rattling] This makes an important point. It's a simple point about probability. But when you are making a handful of selections, you have a higher chance of coming up with one of the undesirable objects. If there's even just one undesirable in a lot of objects you are choosing from. This is a lesson you're sure to use in terms of how web requests work. So as we discussed, the deadlines for network I/O in the data center can be quite small. Now suppose that all of your web servers are really nice, and their response times are quite small. For most requests, in fact, for 99% of the requests, the issue response is in 10 milliseconds or less. But for 1% of the requests, they take an entire second to respond. This is an example I'm stealing from Google's Jeff Dean. Now if you're making just one request, the odds that your response will take 1 second, or be slower than that, are just 1%. But, if you were to make hundred requests, and wait for them all to finish, the odds are 63% that you'll have to wait a second or more. Just like choosing a handful of candy from that box. You'll likely to end up with one of those servers that responds slowly. Now given of what you've seen about the nature of the applications, they make lots of small requests. This kind of thing can happen quite often, 63% is very poor odds. You do not want your service time to be such large numbers. And as these services expand, the problem only gets worse. Thus, what we need is to reduce variability in latency, and also to tolerate some latency, perhaps, at higher layers. For example, the application might make redundant requests in just takes the one that returns faster. Because there will be some variation latency inherent to anything that's networked. With the large number of requests, we'll also see a large number of flows that share the network's capacity, leading perhaps to congestion, and a problem called TCP incast. TCP incast is something we'll look at in more detail later in the course, but essentially, when a large number of hosts try to share the same buffer, when a large number of flows, try to share the same network buffer, they overwhelm that buffer, and network throughput drops, and latency increases. There are various application layer fixes to this, but ultimately, all of these complicate the application's logic. This is a problem that we should attempt to solve in the network, and we look at solutions to do that. Further, with the variety of applications that might be running together in these data centers, particularly if you're a cloud provider, like Amazon, you also need to look at isolating these applications from each other. So if the applications that are latency sensitive are stuck in buffers behind applications that are just moving bulk data, we will see very poor quality of service. So we need some way of isolating these applications from each other. Applications with different objectives are, after all, sharing the network. Lastly, given that data centers see very high flow arrival rates and most of the flows are short, that is not persistent, centralized control at the flow level could be very difficult to scale. Particularly to large deployments of tens of thousands of servers. So perhaps, we'll need distributed control, possibly with some centralized tinkering. The implications we have discussed here will factor into our discussion of all the topics we see in the course, from routing, congestion control, to software-defined networking.