Note: long, super-dorky and technical, but you should probably read the first paragraph at least.
I’m repeating the content of Footnote 9 here, because it’s the important takeaway: Stateful package inspection can be used to do a lot of nefarious things. Like, I once used it to redirect all unauthorized traffic on my wifi network to the web site of the World Beard and Mustache Championships (google it); and another person used it to edit all web pages loaded by unauthorized browsers so that all images appeared upside-down (http://www.ex-parrot.com/pete/upside-down-ternet.html). But you can also imagine it being used to do things like, “replace content from CNN.com with a random page from brietbart.com”, for example. That would be evil, but really obvious and you’d probably immediately suspect that something weird is going on. But stateful packet inspection can do things that would be just as evil but harder to notice. Like, “if a page is coming from CNN.com, delay all messages by 100 milliseconds”. If added to your router, or (much worse!) to the router on the ISP’s side of your Internet connection, then such a rule would make your web browser basically unusable for browsing CNN.com, but brietbart would still load just fine. And if you weren’t super-motivated to figure out why that was happening, you might just ignore the problem and not bother reading CNN, which is sometimes considered to be a left-leaning news source. And this is the kind of thing that “net neutrality” is really needed to prevent. Internet service providers should be allowed to slow down your connection if you’re genuinely using too much bandwidth and interfering with others’ ability to use their connections. But they should not be allowed to slow down your connection just because they don’t like the politics of the web sites you’re connecting to; or just because you’re connecting to Netflix.com which competes with their in-house ShittyISPTv.com. And those are things they totally could do with stateful packet inspection.
Back to the main content…
(I’m kind of writing this for the guys at the Reply All podcast, who seem a little naive about how the Internet really works for people who produce a podcast about… “the Internet”.)
My answer to the question above, based on a generic “browser” like, eg, curl (footnote 3):
- First, preliminary notes:
- A computer (a PC, or a laptop, or a router, whatever) can have one or more network interfaces. A network interface is a hardware thing that connects the machine to a local network where it can talk to other machines on that local network – usually between one and, say, a hundred other machines. (One machine if it’s your laptop and your workstation in your den, a hundred machines if it’s your company’s office.) A local network is a physical thing that enables computers to communicate with other computers connected to the same local network. For example, an Ethernet card in a desktop computer is a network interface that connects to other computers via Ethernet cables; similarly, WiFi USB dongle is a network interface that connects computers via short-range radio signals. Most machines have only one network interface, but many machines have more than one, and this fact is critical to understanding what “the Internet” is.
- A machine with two network interfaces can be connected to two different local networks. And therefore, in principle, it could move data from one of those networks to another, by reading the data from one network interface and then sending that data on the other. In the case of your wifi router, one of those local networks is your wifi network that your laptop connects to, and the other is the shitty Comcast DSL connection your ISP provides. And in fact, this is what the “Internet” is: it is a network of networks, connected to each other via computers with multiple network interfaces. And the really magical thing about the Internet is that it lets a machine on one local network send messages to a machine on a different local network, even if there are a thousand local networks (and consequently 999 machines with multiple network interfaces) between them over which that message must pass. Considering how complex this process is, it is unbelievably reliable.
- People who know a little bit about the internet usually have some factoid in their mind like “Every machine on the internet has a unique IP address, which is a number that identifies that machine among all others.” That’s not quite true. In reality, every network interface has a unique IP address. “IP” stands for “internet protocol”, which is the message protocol that actually allows data to make its way from network to network. (Footnote 1).
- For the purposes of this exercise, let’s say that both the client machine and the server machine have only one network interface each. The client’s interface IP (that is, on the machine running the browser program) is 9.8.7.6, and the server’s IP is 10.20.30.40. Furthermore, let’s say the human-understandable name of the server machine is “server.name”. Or maybe “ietf.org”. No, let’s stick with “server.name” for now.
- Let’s also say the URL (Universal Resource Locator) of the requested page looks like: “http://server.name/desired/url“. This is a string of characters in memory owned by the browser program. If the browser in use is curl, it would be memory reserved by the OS for the program’s command line, since curl is a command-line application that takes a URL as an argument.
- When I mention “OS” in this document, I’m referring to the “operating system” of the machine on which a program is running. Like, Windows or Linux or MacOS. And when I say “OS kernel” I mean the actual code that implements the OS. One of the really important jobs of the OS kernel is to provide services that allow programs to perform the tasks I’m talking about. Like, instead of every individual program having to understand how to set up a TCP connection and track all the data associated with it, the OS knows how to do that. Programs running “on” the OS have a way to say “Hey OS, please make a TCP connection for me”, and the OS hands over a special “token” (usually just a unique magic number, made up on the spot) that the program can use to manipulate the connection. Those ways of interacting with the OS are called “system calls” or “entry points”. An OS provides hundreds or thousands of system calls to do various tasks that are best provided by a central entity rather than by each individual program. Kind of like: delivering Social Security checks is a service best provided by a central entity (the government), but making a sandwich out of the food you bought with your SS check is something you’d probably rather handle yourself. The program, in this metaphor, is you making a sandwich, and the OS is the government, and the system call is someone at the SSA mailing you your SS check. Or maybe the system call is you calling up the government to find out why your SS check is late, and then them mailing it to you. It’s not a perfect metaphor!
- Finally, the Internet is a “packet-switched” network, which means that all data travels over the Internet in small packages that are routed and delivered independently. And those little packages are not even guaranteed to reach their destination. Remarkably, some of the communication protocols developed to work within those restrictions are able to guarantee that very long messages are delivered intact, in order, and with no missing pieces. I’ll explain how in more detail below.
- Now to the actual meat of the process:
- The browser builds an HTTP request in memory. That request is basically a string that looks like “GET HTTP 1.0 /desired/url”. That’s what an HTTP GET request looks like, according to the specification of the HyperText Transfer Protocol (footnote 2). This request will ultimately be delivered to the server that owns the page and will explain to that server exactly which web page the client wants it to deliver. But to deliver its HTTP request to the web server, the browser will need to make a network connection to the web server. This will be done using the TCP/IP protocol.
- TCP is the Transmission Control Protocol, and TCP/IP is what we call it when TCP messages travel over an underlying IP network. TCP is a reliable stream protocol, which means you can use it to send arbitrary streams of data – such as, for example, the contents of a file – and that data is guaranteed to end up at the destination in the correct order and with no missing pieces. If that cannot be done for some reason, the connection will fail, which is what we want – TCP connections are supposed to be reliable, and if our connection can’t operate in a reliable way, it needs to die immediately so we can know there’s a problem. And note that even though from, for instance, the browser’s perspective it can use a TCP connection to send or receive arbitrary amounts of ordered data, this doesn’t change the fact that the Internet is a packet-switched network: in reality, all that data gets chopped up into little messages, sent and received in some order that is not necessarily the order the sending program sends the original data in, and re-assembled and put back in order at the destination. It’s kind of like if you were writing a 50-page letter to send via the Postal Service, but instead of mailing the whole thing in one giant manila envelope, you mail each individual page in a single envelope, and each envelope is marked “page 1”, “page 2”, etc. When they have all the pieces, the recipient can re-assemble the complete document, and if they find out that, say, page 5 is missing, they can send you a letter saying, “Hey I didn’t get page 5, can you send it again?” In the case of TCP, the OS kernel handles all the pesky details of ordering, assembly, and making sure all the pieces are present.
- The browser establishes a TCP/IP connection to server.name on TCP port 80. There’s a lot (a lot) going on in that statement.
- First, the browser has to convert the name server.name to a numeric Internet Protocol address (footnote 9). In a typical case, this means calling an OS system call that (ultimately) sends a Domain Name Service request message to whatever IP address is configured as the DNS server for the local network interface. This is a complex task in itself and I don’t know all the details of how DNS works, but in essence, a DNS server is a machine that either knows the numeric IP associated with the name server.name, or knows how to ask another machine that knows how to get that information. So it finds that IP address and sends it back to 9.8.7.6. The result is a simple 4-byte integer value representing the IP address (footnote 4), which is usually written as a dotted quad like 10.20.30.40. From now on, just remember that the machine name server.name corresponds to the IP address 10.20.30.40.
- Important side note: this idea that “either I know how to do this thing myself, or I know how to ask another machine that knows how to do the thing” is a pervasive pattern in the infrastructure of the Internet. And kind of also in human organizations, I guess, but it is very formal and explicit in the case of the internet, in a number of different contexts.
- The browser creates a socket, which means asking the OS kernel (via a system call) to create a data structure associated with the browser process that will be used for TCP communication. From the browser’s perspective, the socket is just a number that it can use in network operations to send and receive data. The OS kernel knows the association between the socket number and its internal data structures that track the connection.
- The browser tells the OS kernel to connect the socket to 10.20.30.40, TCP port 80. (A “port” is just a number that represents a particular process running on the server machine, and it’s also part of every TCP protocol message. Conventionally, web servers use port number 80.) The OS has a lot of work to do to fulfill that request:
- It has to figure out which physical network interface the connection will be made through. That is, which cable or wifi connection is the appropriate one for this connection’s data to travel through. This is a “routing” question, and it’s another instance of the pervasive pattern mentioned above, as we will see. Routing is performed at the Internet Protocol level, and works in terms of numeric IP addresses.
The kernel consults its routing table, which, in essence, associates any numeric IP address with the appropriate network interface and next hop IP address. The routing table is configured (typically) when the machine boots and brings up its network interface(s), and can either be statically (manually) configured, or automatically configured via some mechanism like Dynamic Host Configuration Protocol (DHCP) or IPv6 auto-configuration. I’m not going to go into any further detail about that; just be aware that DHCP is one way that a machine can acquire routing information (it’s probably the way your laptop gets its routes), and that there are other ways this can happen.
For any given target IP address, there are exactly three possibilities that the routing table can know about:
1. The target address is directly reachable via a local network interface – that is, the target address is that of a machine directly attached to the local wifi or wired network, in which case the next hop is simply the target address; or
2. The target address is not directly reachable via a local network interface, but the routing table specifies the address of another machine that (a) knows how to deliver data to the target address, and (b) is reachable via a local network interface. That machine is called the “router” for traffic passing through the network interface, and the router’s IP is the next hop; or
3. The routing table doesn’t know how to reach the target address, in which case the whole network operation fails and the OS tells the browser process “Sorry, I couldn’t set up that connection, I don’t know the route”. (This rarely happens.)
To be more precise, the routing table maps prefixes (footnote 5) of IP addresses to network interfaces – eg, it might know that any IP starting with “192.168” should use interface 2. It doesn’t have to have a comprehensive catalog of every IP address on the whole internet; all it needs to know is (a) the correct network interface for each prefix corresponding to a local, directly-connected network, and (b) the correct (local) router IP and network interface for any non-local network prefix.
Most importantly, a routing table can have a “default” route, which means, “If none of the other route table entries matches this address (prefix), use the interface and router for the default route.” In other words, the routing table either knows which interface to deliver data on, or it knows which machine (router) to send data to in order to get it delivered. This is the “pervasive pattern” mentioned above. - In this case, let’s assume (as is typical) that the routing table doesn’t know anything about the address 10.20.30.40, and so the connection needs to go via the default route, via the router machine specified in the routing table. Let’s say the default router machine’s address is 9.8.0.1. That address is the next hop address for the target address 10.20.30.40: the local machine’s OS kernel must send traffic to the router at 9.8.0.1, and assume the router knows how to deliver it to its destination. The router could be a consumer router box, like an Airport or a Netgear box you’d buy at Office Depot; or it could be a regular PC that’s set up to perform routing; or a number of other possibilities. (It literally could be a guy with a cage full of homing pigeons and a notebook full of pigeon routing data. Like, this has actually been done: https://en.wikipedia.org/wiki/IP_over_Avian_Carriers)
- The OS for 9.8.7.6 must now send a message to the next-hop address saying, “Hey, I’m trying to talk to 10.20.30.40 port 80, please pass this request along to them and hook me up.”
- At this point the OS kernel knows it needs to send this request to a machine on a local network (9.8.0.1), and it knows which of its own physical network interfaces to talk to that machine on. But how does it talk to the target network interface on the 9.8.0.1 router? That is, how does it physically transmit data from its own network interface to another machine’s interface on a local network?
- Every network interface has a physical address, which is essentially a number unique to that specific hardware interface and which that interface can recognize when a data packet with that physical address appears on the network. (Luckily, there are a lot of numbers to pick from, so we don’t really need to worry about duplicate physical addresses. Very large ranges of physical address numbers are assigned to manufacturers of network equipment by a central authority, the Internet Committee on Assigned Names and Numbers, so that each interface device manufactured can be assigned a unique physical address.) When a message goes across the wire (in the case of an Ethernet network, for example), this wiggles the electrical voltage in a way that corresponds to the contents of the message. The first few wiggles will correspond to the physical address of the target interface, and that specific pattern of wiggles will cause that interface to pay attention to the message. This electrical response is built into the network hardware itself.
- The local OS kernel needs to send a message to the physical address of 9.8.0.1’s network interface, but how does it know that address? This is the local address resolution problem, and it is solved by the Address Resolution Protocol (ARP).
- The local machine (9.8.7.6) sends a broadcast ARP request on the local physical network. This is a message that all connected network interfaces on the local net will receive and process, using the physical network’s “broadcast address”. The broadcast address is a special sequence of voltage wiggles that all network interfaces are programmed to pay attention to, in addition to their own specific physical address. So to send a broadcast message, you don’t need to know all the physical addresses on your local network. You can just send a message saying “EVERYBODY LISTEN! I need the physical address corresponding to IP address 9.8.0.1! Send that to my physical address, which is xxxYYYzzz!”
- 9.8.0.1’s OS kernel (along with everyone else on the local net) sees that message. Since 9.8.0.1 knows its own physical address, it responds by sending its own physical address back to the source of the broadcast ARP message (xxxYYYzzz in this case). None of the other local machines know the answer to the ARP question, so they just ignore it. (In reality, these ARP requests aren’t needed for every message that any machine sends – usually each machine remembers the answers to ARP requests for a while so that it doesn’t have to ask over and over again.)
- Side note: ARP is not an example of the “pervasive pattern”. All ARP requests happen entirely over some local network, among machines that are, so to speak, mutually acquainted with one another. There’s never a need to defer an ARP request to some more-distant machine.
- Now that the browser machine (9.8.7.6) has the physical address of 9.8.0.1, it can send its “Connect me to 10.20.30.40 TCP port 80” request to that machine.
- The router machine at 9.8.0.1 receives that request and then the whole routing process happens again, from 9.8.0.1’s perspective. It has to figure out the next hop for the “Connect to 10.20.30.40” request it just received, which again is either going to be an address on a local directly-connected network, or a router that knows how to send the request on to its destination.
- Eventually, after how ever many routing decisions and hops and ARP messages are required, that request arrives at the proper network interface at the server machine, 10.20.30.40. The complete content of that request is, essentially, “Hi 10.20.30.40, I’m 9.8.7.6 and I want to establish a TCP connection to you at port 80”. Technically this is expressed in the form of a “SYN” message, which means “SYNthesize (create) a new connection”.
- The OS kernel on 10.20.30.40 verifies that there is a server process “listening” on TCP port 80 (if there isn’t, the network connection will fail and that failure reported back to the browser on 9.8.7.6, which is another whole process that I’m not going to go into right now). It builds the appropriate internal data structures to track the new connection.
- Now 10.20.30.40 knows the identity of the machine trying to connect (9.8.7.6, which was included in the connection request), so it can route its reply back to that machine. Routing happens again, starting at the 10.20.30.40 machine, and the reply, a “SYN/ACK” (acknowledgement of SYN) message, travels back to 9.8.7.6, over however many intermediate “hops” are required. (Usually, but not necessarily, this would involve the same sequence of local networks and routers as the original request, but in the opposite order. But if, for example, someone unplugged one of the routers involved, a different route might be found to get data back to 9.8.7.6. Or, the return route might just fail and the TCP connection would then fail.)
- When it sees the SYN/ACK reply, the 9.8.7.6 OS says “Yay, my connection is accepted!”, and replies with an “ACK” message. When 10.20.30.40 receives that message, it fills out any remaining information necessary to track the interaction with 9.8.7.6.
- 10.20.30.40 creates a new socket which it gives to the web server application to represent this new connection, and this completes the connection process. Note that there is no ongoing electrical connection between 9.8.7.6 and 10.20.30.40; the connection consists entirely of data structures in each OS’s kernel that allow them to build and interpret messages associated with the network connection and associate those messages with the sockets on each side of the connection.
- The web server process takes the new socket, which it acquired via an accept() system call into the OS kernel – the server process calls accept() very frequently, asking the kernel “are there any new connections for me?” The web server saves the socket number in a place where it can conveniently read data coming from the socket and write data to it.
- Every message that gets sent between these two machines on this connection will go through the routing process described above, but all the browser cares about is that when it sends data on its socket, that data ends up at the web server process at 10.20.30.40; and all the web server process cares about is that when it sends data on its own socket, that data will end up at the browser process on 9.8.7.6. Those programs don’t have to be concerned with routing or network transport; that is the OS’s problem. So a socket is just a number, but it represents a rather complex underlying process for moving data between two machines, which is always at work behind the scenes whenever a TCP connection is active. (This is also kind of analogous to the Marxist concept of “commodity fetishism”, where a manufactured object reflects, in a hidden way, the astronomical amount of complexity and human effort that went into making it.)
- The OS kernel on 9.8.7.6 also associated a TCP port number with the socket it created at the browser’s request; most likely it chose that port number at random (more or less). Let’s say that port number is 12345. Then the “name” of this TCP connection, unique across the entire internet, is “9.8.7.6:12345::10.20.30.40:80”. No other TCP connection corresponds to those specific IP addresses and their respective port numbers. Every TCP message passed along this connection will contain this name, and thus all the machines involved (the end points and the routers) know exactly where each message came from and where it’s going, and can always figure out the necessary “next hop”.
- It has to figure out which physical network interface the connection will be made through. That is, which cable or wifi connection is the appropriate one for this connection’s data to travel through. This is a “routing” question, and it’s another instance of the pervasive pattern mentioned above, as we will see. Routing is performed at the Internet Protocol level, and works in terms of numeric IP addresses.
- Now that a connection is established, the browser can send its HTTP request (which, recall, is the string “GET HTTP 1.0 /desired/url”). It makes a “write” request into its OS kernel on its TCP socket, with that message as the content. That request gets packaged as one or more TCP messages associated with the connection. Those messages are routed to their destination as described above. They may be delivered in a different order than they were sent, or they may not be delivered at all (in which case 9.8.7.6 might need to re-send them after a period of time – the details are handled by the TCP protocol logic in the OS kernels of the machines involved, and those details are what makes TCP a “reliable” protocol. The receiving machine can always tell whether a message is complete and in the correct order, and can ask the sender to re-send any missing pieces). In any case, the OS kernel at 10.20.30.40 eventually receives all the messages associated with the HTTP request, assembles them in the proper order, and feeds them to the socket that it created for the web server process. Importantly, the OS does not care at all what the contents of the message are – its only job is to deliver it on to the web server process (footnotes 7 and 8 – these are the incisive ones).
- The web server notices that its socket is ready to read data, and it makes a “read” request to its OS kernel, and reads the message “GET HTTP 1.0 /desired/url”. (How does the web server notice that the socket is readable? Typically this involves an OS system call called “select()”, which lets a program ask the OS, “Is there anything interesting happening with this socket?” But really a program is usually managing a bunch of requests from different network connections, so it can actually ask, “Is anything interesting happening with any of the sockets I care about, and if so, which ones?”)
- At this point, depending on the web server in use, there are a lot of different things that could happen. But let’s assume that “/desired/url” actually refers to a file called “hello.html” on the web server machine. The web server process figures this out somehow – the details are internal to the web server, but it might be as simple as a table in memory that says,
URL /x = file y
URL /w = file z
URL /desired/url = file hello.html
[etc].
(Note: usually it isn’t that simple.) - The web server asks its OS kernel to open file hello.html. That is also a rather complex process, although perhaps not quite as complex as routing.
- A file is one or more chunks of data sitting on a disk. All those chunks are linked together in a particular order, and are somehow associated (in this case) with the name “hello.html”. The details of the arrangement of data on the disk are specific to the “file system” in use (or more idiomatically, filesystem). A filesystem is just a set of rules for organizing and naming data in blocks on a storage device (like a hard disk). The OS knows about the file system and how to interpret that data on the disk properly. It also knows all the details for actually writing and reading data on the disk, which I’m not going to get into here.
- When the OS opens the file on behalf of the web server process, it gives the web server a number (called a “file descriptor”) representing the open file, and builds the necessary internal data structures to track the association between the file descriptor and the on-disk data. The web server can now use the file descriptor to read the data from the file, using the OS’s “read” system call. A file descriptor is basically the same kind of thing as a socket, only it represents a disk file rather than a network connection.
- The web server makes “read” system calls to the OS to read the data out of the file. Usually each of those requests asks for a fixed amount of data, for example, “Please give me the next 1024 bytes of data from this file descriptor, after the last piece I read”.
- Each time it successfully reads data from the file, it writes that data to the socket representing the TCP connection to 9.8.7.6.
- The browser at 9.8.7.6 makes a series of “read” requests on its socket and reads the messages coming across the connection, which (amazingly!) correspond to the contents (in order) of the file “hello.html” on the web server.
- Once the browser has received all the data, it displays the resulting web page to the user. In the case of the curl program, it just dumps the page text out to the command terminal. In the case of a browser like Firefox, there is a very complicated set of rules that the browser uses to convert the HTML text in the file to the pretty page you see in your browser window. For example, if there’s a part of the page text that looks like “<title>I am the title</title>” (pronounced “left-angle-bracket title right-angle-bracket I am the title left-angle-bracket slash title right-angle bracket”, or slightly more conveniently, “title-tag I am the title close-title-tag”), that will cause “I am the title” to appear in the title area of the browser window for that page. I could write more about how all that works, but it’s getting a little beyond the main point of this post, which is more along the lines of “How does the internet actually work.”
- Normally, once the browser has read the page contents, it will close the socket connection. This causes another set of special TCP messages to be exchanged between 9.8.7.6 and 10.20.30.40 to say, “Hey, I’m done with this connection, please shut it down”, and both OS’s will clean up the data structures tracking the connection. If either the browser or the web server program try to use their respective sockets after that, they will get an error that basically means “this socket isn’t connected”.
- While this story has concentrated on the specific case of a web browser communicating with a web server via TCP/IP, it’s important to realize that all interaction between programs over the internet uses the same routing and transport facilities provided by the IP protocol. There areother protocols besides TCP that work over IP; the other most common one is the User Datagram Protocol, ur UDP, which allows programs to send short (usually < 2000 characters) messages with no persistent connection and no guarantee of ordering or delivery. That sounds pretty useless compared to TCP, but there are a lot of cases where it’s super-useful – for example, when you want to send short events to alert another program of some condition, and you’re willing to manager any necessary redundancy and ordering issues yourself (within your program).
- So, that’s basically it. Except the footnotes. Read on if interested.
Footnotes:
- The IP addresses of network interfaces can change, and sometimes do, but there’s additional magic at work to be sure that none of the machines involved get confused about this. It’s pretty safe to assume that during any particular interaction that happens over the internet, the IP addresses of the machines involved won’t change. But for example, I mentioned above that when a machine is first booted, it figures out its routes using the DHCP protocol. This is a protocol that works between a computer and a router on a local network that lets a new-born OS kernel figure out what IP address it should use, which DNS server it should consult for name resolution, and so forth. DHCP stands for Dynamic Host Configuration Protocol, and that “dynamic” part is important: it means that usually, the router just picks some local IP address that it knows isn’t in use and assigns it to the interface being configured. And probably every consumer router uses DHCP, to save users from having to understand any of the stuff in this post. So one way that your machine’s IP address can change is that probably, every time you reboot your machine it ends up with a new IP. You can see this in the “advanced network settings” in your computer’s control panel.
- The HTTP protocol spec, which you can read here https://www.ietf.org/rfc/rfc2068.txt if you’re interested in all the details. Most standards that govern internet operation are freely published as “RFC” documents on the ietf.org site. “IETF” stands for Internet Engineering Task Force, and they are the people that manage most of the technical standards on which the Internet is based. (There’s also the World Wide Web Consortium, which is responsible for standards relevant to the web, but they don’t deal with low-level networking stuff. They deal with things like, “How exactly should a browser convert HTML text into nice page images” and things like that.) “RFC” stands for “Request for Comment”, which is ordinarily the first step of developing a standard that will be published as an OFFICIAL STANDARD DOCUMENT. But in IETF-world even confirmed standards are called “RFC whatever”. It’s just a tradition.
- curl is a super-handy utility you can run from a terminal window (for example, a DOS prompt in Windows) to fetch the contents of a particular web page and print them out as plain text. You can download it from https://curl.haxx.se/
- IPv4 vs IPv6 addresses: today there are, broadly, two different “versions” of the Internet running kind of in parallel. One is based on the old Internet Protocol standard, IPv4. The other is based on a newer standard, IPv6. (We don’t talk about IPv5.) The primary difference between them is that IPv4 addresses are 32 bits long (32 digits in base 2, using only 1 and 0 digits), which allows for about 4 billion unique IP addresses. It turns out that isn’t enough and we’re running out pretty rapidly – like, there will come a time very soon when it isn’t possible to assign a machine a new, unique IPv4 address. IPv6 addresses are 128 bits long, which allows us to assign a unique IPv6 address to, approximately, every molecule in the Solar System, so we probably don’t need to worry about running out of those. Anyway, it’s pretty likely that your machine’s network interface has both an IPv4 and an IPv6 address assigned, and routing will use the IPv6 address when that’s possible. The routing process is essentially the same in both cases, it’s just the format of the addresses that is different.
- Also, we’ve figured out “ways” to deal with the shortage of IPv4 addresses. Mainly this involves re-using a lot of those addresses inside “walled private gardens” in ways that we know are generally safe. The fact that an IP address inside one garden might be the same as one in another garden is unimportant, because the routers in the middle can use clever techniques to prevent those two machines from needing to know each other’s real IP addresses, even when they have to talk to each other. Almost all consumer home routers treat the machines on the “home” network as one of these walled gardens, which is why five million people can buy iPhones every day and hook them up to their wifi networks and that actually works. It’s really smart and cool but would take another ten pages to explain well. Google “IP masquerading” and “Network address translation” if you’re curious. Also “proxies” are an older and less smart way to do basically the same thing the network address translation does.
- How routing tables really work. Hmm, never mind, it’s too complicated and doesn’t really add much to the overall story. (And I don’t want to re-number all the footnotes because WordPress doesn’t make that especially easy to do. Or maybe I just haven’t figured out the easy way.)
- I said the OS doesn’t care about the contents of TCP messages, it only passes them along to the programs on either end of the connection. That isn’t strictly true, because of something called “stateful packet inspection”. Most OSs have a “firewall” component that allows users or administrators to supply rules that prevent bad hombres from sending yucky messages to their machines. Often those rules are just based on the sender, for example you could have a rule that says “don’t accept any traffic from any of the IP addresses on this list” (a “blacklist”), or one that says “only accept traffic from this list of IPs (a “whitelist”, which is somewhat less convenient to administer because there are like ten billion machines connected to the internet now). But you might want a rule that says, “don’t load any images from web pages if they’re going to these specific local IP addresses” (like, maybe you are super-paranoid about your kids accidentally downloading porn, or, perhaps, you want them to just hate the internet completely). And in that case, you need stateful packet inspection, because the OS’s firewall needs to be able to notice when a “please download this image” message goes by and kill that sucker dead. It can’t do that just by looking at the message addresses, it has to look inside the message and see the content.
- Stateful packet inspection can be used to do a lot of nefarious things. Like, I once used it to redirect all unauthorized traffic on my wifi network to the web site of the World Beard and Mustache Championships (google it). But you can also imagine it being used to do things like, “replace content from CNN.com with a random page from brietbart.com”, for example. Or things that would be just as evil but harder to notice. Like, “if a page is coming from CNN.com, delay all messages by 100 milliseconds”. If added to your router, or (much worse!) to the router on the ISP’s side of your Internet connection, then such a rule would make your web browser basically unusable for browsing CNN.com, but brietbart would still load just fine. And if you weren’t super-motivated to figure out why that was happening, you might just ignore the problem and not bother reading CNN, which is sometimes considered to be a left-leaning news source. And this is the kind of thing that “net neutrality” is really needed to prevent. Internet service providers should be allowed to slow down your connection if you’re genuinely using too much bandwidth and interfering with others’ ability to use their connections. But they should not be allowed to slow down your connection just because they don’t like the politics of the web sites you’re connecting to; or just because you’re connecting to Netflix.com which competes with their in-house ShittyISPTv.com. And those are things they totally could do with stateful packet inspection.
- Why can’t we just use names directly, rather than converting them to IP addresses first? Well… it’s complicated, and I may not be aware of the really really true reason, but here are what I consider some good reasons: First, the address parts of IP messages are very convenient to process if they have a fixed size, but domain names can be nearly any length, so having site names embedded in every Internet message would be a pain for OSs to deal with – it would add a lot of complexity for very little gain (this is probably the really really true reason). Second, not all machines even have names – your laptop probably doesn’t have a name that could be used unambiguously in network messages, for example, but it definitely has an IP address if it’s connected to a network. Third, people care about names in ways that they don’t care about numbers – you probably would love it if you, PJ Vogt of Reply All, could host your personal web site on a server named pjvogt.net, but you probably don’t give a shit what the IP address associated with that name is. So we want people to be able to care about their Internet site names, but we don’t want the underlying technical infrastructure to have to care about them very much. (Except that stateful packet inspection does care about names, potentially.) The Domain Name Service is basically a way of “factoring names out” of the internet equation so that everything can carry on in a name-agnostic way.