Lecture 18: The Internet and the Web Computer Literacy 1h Thursday - - PDF document

▶

Jan 22, 2024 368 likes •445 views

Lecture 18: The Internet and the Web Computer Literacy 1h Thursday 4/11/2004 Lecture Overview Topics covered The origin of the Internet The Internet and Network organisation The Web: client-server model The Web:

SLIDE 1

Lecture 18: The Internet and the Web

Computer Literacy 1h Thursday 4/11/2004

Lecture Overview

Topics covered

The origin of the Internet
The Internet and Network organisation
The Web: client-server model
The Web: caches, proxies and cookies
Peer-to-peer networks

The aim is to understand how network organisation affects network applications.

The Internet

In the 1960s, when computers were operated using punch cards and magnetic tape, they were viewed mainly as number crunchers. Few people saw the potential of computers as communication devices. The U.S. military were nervous because of the cold war; in particular they were nervous that their communication networks could be knocked out by strategic attacks. They funded a research project to establish a small network of computers that could communicate with each other, even if some were destroyed. The result of the project was a computer network called ARPANET. An important principle of the network was that all the computers on the network were equal in their ability to communicate. There was no central authority controlling the flow of information, as any such central component would make the whole network vulnerable to attack. An important step in allowing the network to operate was the invention of the packet-switching model of information transfer, discussed in detail in Lecture 8 (title: Communications 1). You should recognise the acronym ARPANET, and not remember that it stands for Advanced Research Projects Agency NETwork. The ARPANET network was quickly developed into a network that connected many military and university sites. It was disbanded in 1990, but by this time its success had led to the establishment of many similarly connected networks, many LANs (local area networks) and WANs (wide area networks) (discussed in Lecture 8). As these networks grew in number and themselves became connected, the Internet emerged.

Network organisation

The Internet today is a global network of computer networks. Individual networks may be run by universities, governments, businesses, individual etc. The network of connected networks is known as the Internet. Edinburgh university has a local area network (LAN) called EdLAN. EdLAN is connected

SLIDE 2

to a metropolitan area network (MAN) that serves communication between LANs in the Edinburgh and Stirling area, called EaStMAN. EaStMAN is itself connected to the Joint Academic Network (JANET) which provides high speed connections between all UK higher and further education organisations and research councils. All these networks are being developed to provide faster connections and greater coverage: activity on SuperJANET5 is underway, which will include the Spark initiative in Scotland, that connects schools. The Internet is growing faster than anyone can keep track. True to the spirit of ARPANET, there are no central controlling authorities on the Internet and it is globally self-regulated by volunteers, the IETF (Internet Engineering Task Force), and locally regulated by governments. Some networks have remained independent of the Internet: one is Fidonet, begun in 1984 and linking computers via a phone line based messaging system. Within the Internet, different networks can have different levels of access. Intranets are networks with access restricted to an organisation or group. This can be achieved through securities measures, such as only allowing computers with authorised IP addresses to pass data on the network. To understand how this method to works, and indeed to understand how Internet and network services work, it is helpful to have some terms that describe the structure of networks in general:

Bridges connect LANs of the same type. Different LANs can have different

packet-switching technologies, and bridges connect compatible networks. (More technically, they connect networks with common network layer addresses in the OSI 7-layer model of communications, see Lecture 14 notes).

Gateways connect networks of different type, for instance a LAN and a
WAN. Because of the short distances separating nodes of a LAN, data

transmission rates are high. In contrast, data transfer rates in the more distant nodes of a WAN are slower because of the larger distances between the nodes. Gateways are devices, sometimes computers, that solve these problems.

Routers are devices that decide where packets should go next on their

journey in order to reach their destination in the network. This routing decision is most often achieved using software that implements routing protocols.

Switches are devices that perform the same role as routers but using

dedicated hardware: they are usually faster but less flexible.

Repeaters receive signals (e.g. packets in a packet-switching network)

and amplify them to ensure successful transmission. A convenient way to establish an intranet is to control the flow of information at the network gateway. An elaboration of the intranet concept is the

extranet. Extranets are networks that allow access only to authorised users

SLIDE 3

utside the organisation. The most common application is in e-commerce: the

extranet allows access to selected databases and information for the customers of the business.

The WEB

The World Wide Web (the Web) is one method of communication on the

Internet. The Web uses the client-server model of network applications. In

this model, information is stored on a network device called a server. In order to access the information, an application is run on another machine in the network, the client machine. In the case of the Web, Web pages are stored

n Web servers, and the applications used to display the pages (browsers)

are run on the computer of the user, the client. This organisation is useful, because the specific demands of storing large amounts of data, receiving requests for data, finding and transmitting the data, and possibly performing computationally intensive operations on the data can be handled by dedicated machines. One disadvantage is that if there is a problem with the server, access to the information on the server is interrupted for all users. For the Web to work, it is essential that the Web clients and servers can

communicate. Three standards ensure that this happens. They are:
HTTP. Hypertext Transfer Protocol. Defines how the Web browser and

web page server exchange information.

URL. Uniform Resource Locator. Specifies a unique address for each web
page. Each URL has three components: the protocol used (e.g. http), the

server name, and the path to the file on the server. E.g. in the URL http://www.inf.ed.ac.uk/teaching/courses/cl1, http is the protocol used, www.inf.ed.ac.uk is the name of the server and /teaching/courses/cl1 is the path of the cl1 page (so the file is in the directory courses/, which is in the directory teaching/ on the server).

HTML. Hypertext Markup Language. Method of encoding information so

that it can be displayed by different devices. The Web was invented by Tim Berners-Lee, who now head the World Wide Web Consortium, who develop and regulate the three standards. The client-server model used by the Web means that it takes the same amount of time and resources to visit a Web page for a second time. An efficient way to make the second visit quicker is to make and store a copy of the page when you visit it . This is the idea behind a Web cache, the place where copies of visited Web pages are kept. HTTP has a set of rules which determine whether a page can be stored in a cache or not. Web caches can be either client-sided or server-sided. With a client-side cache, the copies are stored on the hard drive of the user, or on the computers of the ISP (Internet service provider). This provides faster access

SLIDE 4

to individuals, and frees bandwidth in the local network, allowing other users

access. Meanwhile, server-side caches, caches operating on the Web

servers providing the Web pages, allow the servers to keep a list of the most recently visited Web pages. Popular pages with therefore be found more quickly, increasing the speed of access and reducing the server workload. Normally when your browser tries to display a web page it makes a direct connection between your computer and the server on which the Web page is

stored. It is possible to make this connection using an indirect connection, by

connecting to Web proxy server first, and using a connection from this server to the target server holding the Web page. A common reason for having a Web proxy is to manage access to Web resources. For instance, access to the Web from the public computing labs in the University is achieved through a proxy server. The computers in the open labs are all automatically configured to use the University proxy server. This potentially allows the network managers to prevent the machines from accessing inappropriate Web sites. It also means that Web users outside of the University network (e.g. through a home dial-up or broadband connection) can access the University intranet by connecting to the Universities proxy server. A common application of proxy servers is a Web proxy cache, a cache located on the proxy server to speed up access and free bandwidth as above. If you have an external connection, you may want to use the proxy server because access to Web pages visited regularly on the University network will be stored in the Web proxy cache, and therefore faster to access. To set your proxy follow the instructions at: http://webhelp.ucs.ed.ac.uk/docs/proxycache A cookie is a packet of information sent to a browser from a server, when the user has visited a Web site on the server. The browser then sends the cookie back each time it visits the server again. Cookies are useful because they allow Web sites to keep track of their users. This can allow the website to

Remember your login name and password for the site
Remember personal details you may have entered
Maintain a “shopping basket” on a commercial site
Keep track of your use of the website

Cookies can last for just one session (e.g. for a shopping basket) or stay on the computer indefinitely. They can be time saving and helpful for the user. However, they can potentially present security issues. For instance, other sites may identify which Web sites you have visited by reading your cookies. You can decide whether or not to accept cookies:

Internet Explorer. Tools > Internet Options > Privacy tab
Mozilla. Tools > Cookie Manager

SLIDE 5

Peer-to-peer networks

In a peer-to-peer (P2P) network, there are no servers. Instead, each client (peer) communicates directly with other clients. For this to work, each peer must be able to perform the function of server as well as a client. Each client must be able to send and receive data, and to perform all the routing in

between. Issues relevant to P2P networks include
Cost. The initial cost of a P2P network is less, because no expensive

servers have to be bought.

Reliability. The ARPANET was a P2P network. Because no one node in

the network is managing communication between other nodes, if one node breaks down the network continues to function.

Scalability. P2P networks do not, in general, scale well. That is, the

performance of the network relative to its size decreases as the network gets bigger. P2P clients in large networks have to spend a lot of time querying the network to identify the location of the message target. One way to transfer files is using a peer-to-peer file sharing network. Gnutella is an example of P2P file sharing network. With Gnutella, each member of the network (node) keeps track of about 5 other members of the network. This creates a virtual network conceptually overlaid onto the Internet. Routing software connects the request sending node with the node where the file is located, and a copy of the file is transferred back to the sender. Successfully routing requests for files is a difficult task. Many “P2P” networks use some servers to make the task easier. For instance, Napster uses a client-server model for performing searches, with a centralised list of network members and files on the server computers, and uses P2P connections and protocols to transfer the files. The breakthrough of Napster was to use P2P protocols to enable peers, and this led it to being called a P2P network, even though this is only partly true. Napster can be understood to have a centralised list of file members, and in Gnutella can be understood to have a distributed list of members (so the list is spread out across the network). Freenet is a P2P network in which the identity of members and information about file content is protected by security

measures. This protects users from any potential privacy problems with

allowing P2P network software onto your computer for communicating with

ther users. It also makes routing messages a much more complex task, and

the network is correspondingly slow and only has limited search capabilities. The power of the approach is that it is very difficult to censor. This power can be used positively in support of the freedom of speech, and negatively to communicate illegal and harmful information. Is this a good thing? P2P networks have become synonymous with file sharing due to the infamy

f networks like Napster and the legal issues that surround them. It is

important to remember that file sharing is just one use. Other successful

SLIDE 6

applications that run on P2P networks include:

Payment schemes like PayPal, used on eBay Voice over Internet protocol (VOIP) services such as Skype, which allow

you to have phone conversations over the Internet (at Internet prices – free if you have unlimited access broadband)

Spam detection. A clever idea is Vipul's Razor, a P2P spam detector.

Standard spam (unwanted email; the name comes from a Monty Python “Spam” sketch) is identified by its contents. Vipul's Razor identifies messages which reach large numbers of computers at once: this increases the probability they are spam.

Instant messenging uses a combination of P2P and client-server networks