This is a tutorial to a set of core concepts related to Internet communication for the Course COM 251 Information, Technology, Society which I teach at Purdue University.
The Internet is not one net. It is more versatile than that.
The Internet is one of the most successful communication systems in history. In a brief period, it made its way to the most isolated communities on the planet? What made it so successful? Simply put, its versatility. This is a big word. Let us translate it. The Internet is what you want it to be. It is infinitely malleable. This is possible because the Internet never wanted to be one network, but a federation of networks. The Internet is the United Networks of the World if you want.
Common misunderstandings about the Internet
One of the most common misunderstandings about the Internet is that it is a single, unified network, with its own dedicated infrastructure. Furthermore, many believe that the Internet or the web are the same thing. Finally, we tend to believe that the Internet is a new technology, not older than a couple of decades. The Internet is, in fact, a collection of networks brought together by a number of common computer standards called protocols. These are standard ways to computers to greet each other and to send each other messages. The most famous is TCP/IP or more concisely, the Internet protocol. Its job, as I will explain below, is to address all Internet messages and content and keep track of them until they reach their destination. In fact, if there is something unique to the Internet, these are the protocols that allow millions of computers to talk with each other. In addition, there are a number (not too many, under 100) of dedicated computers, called ROOT SERVERS that serve as traffic cops and telephone operators for the Internet.
Packet switching and protocols: brief explanations
Of the technologies that make the Internet what it is, the protocols are the most important. Only understanding them we can understand how the Internet really works, what it can and cannot do, and where it is going to. Any communication professional needs to understand these issues. A generation ago, no communication professional could succeed in their profession without some passing knowledge of how print or broadcasting industries work, both technically and organizationally. The current generation of college students needs to have some basic understanding of how the Internet works and how this matters to the stability and progress of our communication world.
The Internet that we are using today is the product of many minds and many projects, spanning almost 50 years. Its basic architecture and philosophy, called packet switching, was invented during the cold war by Paul Baran, who was interested in creating a communication network that would withstand a nuclear attack (Look up the early history of the Internet in Wheen’s book, From Dot-dash to Dot-com). At the time, communication systems used a system called “circuit switching.” This meant that if you wanted to talk with someone on the phone, you had to first reach a switchboard, which would make a connection to your intended conversation partner. Once the connection was made, you and your partner of conversation had a slice of the network (circuit) dedicated only to you for the duration of the call. No one else could use that circuit, even if all you did was to wait in silence for your boyfriend to answer some awkward question. Switchboards were massive electronic devices, similar to train stations. Many lines would go into them and many would leave them. They were gigantic hubs of communication. While there were many of them, some were much more important than others, serving as central, mega-hubs. This made telecommunication very hierarchical and very vulnerable. If you took out a number of these central hubs, communication would become very difficult if not impossible. Moreover, the hubs worked on the principle all or nothing. If even a part of the hub came down, most of its functionality was crippled.
Why did Paul Baran create packet switching
Paul Baran came up with a brilliant idea. He imagined a network in which communication would be much more decentralized. In such a network, each node can communicate with any other node not only through central hubs – which would still exist – but also through their immediate neighbors. Moreover, Baran’s network connections would change from minute to minute, according to the most available path at the time. If this were the telephone network, which it never was, each word might be sent through a different neighbor. Because of this, there is no specific circuit allocated to any conversation or data exchange. That is why the network is not “circuit switched” but something else, namely, “packet switched,” a concept that I will explain in a minute.Baran imagined a world of communication that was more similar to that of the modern highways and road networks, where you can get from the smallest, most primitive country road to any metropolitan area through any mix of local, regional, or interstate roads, or highways. The old world of telecommunication was like the railroad system. A few, central and expensive railways dominated the landscape and a few central hubs (stations or railyards) connected them to each other. To get from anywhere to anywhere, you needed first to get to a railway hub (station) and then take one of the regularly scheduled trains with hundreds of other people. If the train station, the rail line, or the engine broke, all traffic would be blocked. In contrast, in a road and highway system, if a highway goes down, you can always take a side road or even simple country lane. All you need is a good car.
Baran imagined a world of communication that was more similar to that of the modern highways and road networks, where you can get from the smallest, most primitive country road to any metropolitan area through any mix of local, regional, or interstate roads, or highways. The old world of telecommunication was like the railroad system. A few, central and expensive railways dominated the landscape and a few central hubs (stations or railyards) connected them to each other. To get from anywhere to anywhere, you needed first to get to a railway hub (station) and then take one of the regularly scheduled trains with hundreds of other people. If the train station, the rail line, or the engine broke, all traffic would be blocked. In contrast, in a road and highway system, if a highway goes down, you can always take a side road or even simple country lane. All you need is a good car.
Why is the Internet like the highway network and the old telephone network like the railroad network?
Baran’s idea would’ve not been as revolutionary as it is if he did not propose something even more daring than using any connections available at a given time and de-emphasizing hubs. He proposed that the message to be communicated be also broken down into packets (words, sounds, bits of information) and be sent out not all at once, but that each packet (sentence or even word or bunch of pixels) should be sent through the most efficient and freest path available at that moment when each bit of information was about to be communicated. He proposed, in effect, that messages should take advantage of every single scrap of communication capacity available at any given moment so that the entire communication infrastructure could be used to the maximum. This is the gist of his idea of “packet switching.” Going back to our railroad – highway/country road example, he proposed that just like when we ship out merchandise from factories to stores, we should prefer a “car” instead of a “train” system. Shipping by train means that we need to send large batches at the same time. This creates long delays due to the time required to load the train cars, wait for the train to form, then to unload and do the process in reverse. Using many trucks, each serving a destination and taking the least crowded road at the
Going back to our railroad – highway/country road example, Baran proposed that just like when we ship out merchandise from factories to stores, we should prefer a “car” instead of a “train” system. Shipping by train means that we need to send large batches of merchandise at the same time. This creates long delays between production and delivery due to the time required to load the train cars, wait for the train to form, then to unload and do the process in reverse. Using many trucks, each serving a destination and taking the least crowded road at any given time, is far more efficient. This allows us to ship small batches of merchandise, just in time, each to an individual store. Similarly, in the world of communication, breaking down the message into smaller units (packets) is more efficient. This was not only efficient but also very resilient to attack. As if by taking a page of German military strategy used in World War 2 that just ended, to send their troops via roads instead of trains and to keep sending new troops when some were destroyed, Baran imagined in the 1950s that breaking down communication shipments ensured that the enemy could never destroy your communication system or messages. Even if you destroy several nodes in the system, the other nodes will step in to fill in the gaps. In the world of communication this means that if some parts of the message were lost, they can be retrieved by asking the sender to resend the missing parts.
Packet switching was a defense against a surprise Soviet nuclear attack
Packet switching is not only more flexible but also very resilient to attack. Baran worked at the height of the Cold War, when surprise attacks by massive nuclear strikes by the Soviet Union on major urban and communication areas was a very real possibility. His work was in part motivated by reinventing telecommunication networks, so that if they were attacked, they should still keep running, at least partially. Indeed, in a packet switched network, even if you destroy several nodes in the system, the other ones will step in to fill in the gaps. In the world of packet switched communication this means that if some parts of the message were lost, they can be retrieved by asking the sender to resend the missing parts.
Of course, the story is more complex, and as shown in Wheen’s book, packet switching is not a “slam dunk” solution, especially for situations where you want a large amount of information to be sent instantaneously. However, packet switching is what makes the Internet a very resilient communication infrastructure and VERY DIFFERENT.
Why is the Internet loosely hierarchical
Getting a bit more technical, when we talk about the structure of the Internet, several concepts are very important to understand and remember. First, the Internet is structured in loosely hierarchical manner. Listening to some descriptions of the Internet, which extol its “decentralized”, “anarchic” nature, one is tempted to believe that the Internet is a simple jumble of wires and computers. This is not at all true. While, as already discussed above, it is true that you can connect from any computer to any other computer on the net, and that you can do it in many ways (check out this tracert trick), if the Internet did not have a number of central, backbone exchange points (in effect very large capacity connections), it would look more like a medieval city, all warrens and alleys, than like a modern technology. To make the story short, and for details see the slides, the Internet is structured in local (Tier 3), regional (tier 2), and global/trunk (Tier 1) networks. These are, in keeping with our train-railroad/car-road comparison, similar to the county/state roads, state highways, and interstate highways. Tier 1 networks (interstate highways) are maintained by large telecommunications companies (Sprint, ATT, Level 3, NTT-Japan, Tata-India, etc). Tier 1 networks use gigantic pipelines, some of them capable of delivering Terabits of information per second. In addition, the Internet needs a number of routers at each point where a consumer, or tier network connects with the rest of the Internet. These routers, although not very sophisticated, are an essential part of the system. They are responsible for the process of packet switching and Internet addressing. Routers are the ones that decide where a web address is and how information should be brought back to us.
The Internet is a way of sending things back and forth. At heart, it is a protocol: TCP/IP
Second, always remember that packet switching is a general principle for organizing and sending information. What makes the Internet work the way it does are the protocols that implement this principle. These are the UDP and TCP/IP protocols. The User Datagram Protocol (UDP) is a computer algorithm (procedure) that is built into computer programs that break down the messages (files, emails, images, etc) and reconstruct them at the end of the transmission. The Transport Control Protocol is a similar computer procedure that keeps track of the packets and makes sure that the ones that are lost are replaced with copies. Finally, the Internet Protocol is the method by which computers are assigned addresses and are found when users want to send a message. We call them all protocols because they play a role similar to that of human protocols: they tell computers and messages what to do in specific situations. Protocols are also important because they allow computers that otherwise use different languages to exchange information (see Apple and PCs). Of course, the files themselves need to be readable by the receiving computers, but this is not what a protocol does. It simply plays the role of a universal mailman.
Protocols operate on all computers connected to the Internet (your own, included), but there are some computers on the net that are completely dedicated to handling protocols. These are the computers needed for directing traffic. They are called “routers” and are as important to the Internet as traffic lights in a busy city. Some scholars consider the routers and the protocols part of the Internet “core,” while the clients and the servers are designated as its “ends” or “edges”. This might strike you as paradoxical since up to this point it was emphasized that the Internet is a decentralized network, thus is has no “core.” This is still true. The distinction between “core” and “end” is conceptual, not physical. It is an important one, especially for policy reasons.
Who owns Internet protocols? No one. Or everyone. Does the Internet allow the creation of new protocols? You bet.
Another great innovation that came with the Internet is the fact that it is an open and extensible communication environment. The basic Internet protocols are not defended by traditional patents and copyright, anyone can use them freely, without paying for them. Furthermore, being open and free, we can write new programs/languages to extend the protocols and create new tools. In fact, when the basic TCP/IP and UDP protocols were written, their utility was rather limited. They allowed supercomputers to exchange files and to accept commands from remote locations. There was no email, no webpages, no video on demand. All these became possible with the invention of new protocols, which sit on top of the basic protocols invented 50 years ago. The openness of the Internet to new protocols makes advanced applications possible. For example, the web understood as a collection of graphically rich pages, which can be called with a mouse click, only became possible when a specific protocol was written for it, namely http. This protocol allows content producers to generate aesthetically pleasing, multimedia pages (by using an ancillary markup language named html) and the computers to request and display them using a dedicated client program, called a browser. Web protocols and technologies were invented in 1991, over 20 years after the first Internet connection was established at the end of the 1960s, beginning of the 1970s. The name of the inventor was Tim Berners Lee, who was not an engineer but a s0ftware developer.
What is an IP address? What is a web address? What is the difference between a web and an IP address?
An equally important characteristic of the Internet is that all its computers need to make themselves known to the network with a unique address (and name). This means that every single computer on the net, be it a router, a server (a hub that serves information), or a client (a computer used strictly for surfing the web) has an IP address. Until recently, this was an address in the format xxx.xxx.xxx.xxx where xxx stand for any number between 0 and 256. Why 256? Because being in a binary universe, the numbers used by the protocol have a set length of 8 digits, which means that they are 8 bit long, thus their maximum value is 2^8 or 11111111 or 255. Also, the total number of IP addresses used by this system is 2^(8*4)=2^32= 4,294,967,296 – about 4 billion. Since there are about 4 billion individuals connected to the Internet and many other millions or organizations that own hundreds and thousands of computers, we were just about to run out of IP addresses when a new system was introduced, IPv6. The format of a new IP address will be xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx (or, in a concrete example, something like
2001:0db8:0000:0000:0000:ff00:0042:8329, where each group is, in fact, a 16-bit digit. This allows for 2^128 addresses, which means that we can have about 3.4×10^38 (3 followed by 38 zeros) devices simultaneously connected to the Internet. Just to give you an idea how huge this name space is, with this new system we could give a number of every single atom on the surface of the earth and have many more left to spare.
Since computers are uniquely identifiable on the Internet (and they need to be so, since otherwise, you would not be able to send and receive information) privacy is not a real option in this realm. Or, at least, strict privacy, in the sense that you can do whatever, whenever, without fear of being watched. Whenever you browse a website from the “privacy” of your home, the IP address of your computer is recorded by the site you visit. Sometimes this information is stored, sometimes, not. Most times it is. Furthermore, the company or organization that serves as your Internet provider can personally identify you. Thus, there is no hiding on the Internet… (Of course, there is TOR, an anonymization program, but all TOR does is to mask your IP address. An IP address, albeit fake, is still assigned to you.)
The fact that your computer is identified by an IP address might surprise you as odd since you have known for some time that on the Internet we request Web addresses that have URLs (http://matei.org) and thus, that computer names are identified through words. This is partially true. To make the story complete, here how things really are: Each physical machine that is connected to the Internet regardless of its function (client or server) has an IP address. Some of these addresses are permanently tethered to a machine, some are dynamically assigned to them. Servers have dedicated numbers. Client computers, like your own, get a new IP address each time they are turned on (this does not mean that the company that assigns the number does not keep track of who got it…).
Does the Internet have a “phonebook”? Yes, it is called DNS
In the case where a computer is used to serve information (is loaded with web pages and applications), the company posts on a Domain Name Server the information that this IP address hosts a given web address expressed in words (http://matei.org, http://purdue.edu, etc). This information is then disseminated through the Internet, to all Domain Name Servers out there (which are like phone books). So, when you look up a web address, your computer calls on the local network “address book” to see at what IP address it is located. If the local DNS server does not have that information, it looks up a more authoritative DNS server, probably a root server (maintained by a central clearinghouse, called ICANN), which has a complete list of addresses. This reveals two interesting things. First, that the Internet needs a central organization and a central phone book system to function. When you look up an address, you don’t just go around the web ringing every single bell. Second, everything is pretty simple to find. Including information about websites that engage in all kinds of activities, such as terrorism. Try a tool like http://ipfingerprints.com find out what is the IP address of any web address is.
Discussion and presentation questions
- Who was Paul Baran? Explain what he did, why, and to what effect. What was his relationship with the military and with AT&T?
- What did the first version of the Internet do? Why was it created? Who were the most important actors involved in its invention?
- Why is the Internet different from other types of networks? Why are its core protocols good at certain things (name them), but not at others (name them as well)?
- Why is the Internet loosely hierarchical, and what does this mean? Who are some of the important players involved in the Internet infrastructure? Do they literally “own” the Internet?
- Why is the Internet like the car-road system and not like the railroad-train system? What is the long term consequence of this design choice?
- What did the Internet sleuthing exercise teach you about the structure of the Internet? How can the practical skills you picked up during that exercise help you in your future work as a communication professional, especially as a journalist or PR professional?
NOTE: I wrote this introduction for the Purdue students enrolled in some of my classes, especially COM 251 Information, Technology, Society and COM 435 Emerging Communication Technologies. If you have questions or if you want to help me improve this introduction, please leave a comment.