- #PYTHON 3.5 DOWNLOAD PROGRAM USING SOCKET NOT USING URLLIB CODE#
- #PYTHON 3.5 DOWNLOAD PROGRAM USING SOCKET NOT USING URLLIB FREE#
Regular expressions work very nicely when your HTML is well formatted and predictable. When we run the program and input a URL, we get the following output: Enter. The findall regular expression method will give us a list of all of the strings that match our regular expression, returning only the link text between the double quotes.
#PYTHON 3.5 DOWNLOAD PROGRAM USING SOCKET NOT USING URLLIB CODE#
The read method returns HTML source code as a bytes object instead of returning an HTTPResponse object. The ssl library allows this program to access web sites that strictly enforce HTTPS. Import urllib.request, urllib.parse, urllib.error # Search for link values within URL input We add parentheses to our regular expression to indicate which part of our matched string we would like to extract, and produce the following program: A non-greedy match tries to find the smallest possible matching string and a greedy match tries to find the largest possible matching string. +? indicates that the match is to be done in a “non-greedy” fashion instead of a “greedy” fashion. The question mark behind the ? indicates to search for the string “http” followed by zero or one “s”. Our regular expression looks for strings that start with “href=" or “href=" followed by one or more characters (. We can construct a well-formed regular expression to match and extract the link values from the above text as follows: href="http?://.+?" Instead of copying the data to the screen as the program runs, we accumulate the data in a string, trim off the headers, and then save the image data to a file as follows: import socket We can use a similar program to retrieve an image across using HTTP. In the above example, we retrieved a plain text file which had newlines in the file and we simply copied the data to the screen as the program ran. Mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)ī'Hello world' Retrieving an image over HTTP Perhaps the easiest way to show how the HTTP protocol works is to write a very simple Python program that makes a connection to a web server and follows the rules of the HTTP protocol to request a document and display what the server sends back. The web server will respond with some header information about the document and a blank line followed by the document content. Where the second parameter is the web page we are requesting, and then we also send a blank line. To request a document from a web server, we make a connection to the server on port 80, and then send a line of the form But if you take a look around page 36 of RFC2616 you will find the syntax for the GET request.
#PYTHON 3.5 DOWNLOAD PROGRAM USING SOCKET NOT USING URLLIB FREE#
If you find it interesting, feel free to read it all. This is a long and complex 176-page document with a lot of detail. The Hypertext Transfer Protocol is described in the following document:
There are many documents that describe these network protocols.
In a sense the two applications at either end of the socket are doing a dance and making sure not to step on each other’s toes. If the programs on both ends of the socket simply wait for some data without sending anything, they will wait for a very long time, so an important part of programs that communicate over the Internet is to have some sort of protocol.Ī protocol is a set of precise rules that determine who is to go first, what they are to do, and then what the responses are to that message, and who sends next, and so on. If you read from the socket, you are given the data which the other application has sent.īut if you try to read a socket when the program on the other end of the socket has not sent any data, you just sit and wait. If you write something to a socket, it is sent to the application at the other end of the socket. You can both read from and write to the same socket. The network protocol that powers the web is actually quite simple and there is built-in support in Python called socket which makes it very easy to make network connections and retrieve data over those sockets in a Python program.Ī socket is much like a file, except that a single socket provides a two-way connection between two programs. Then we will read through the web page data and parse it.
In this chapter we will pretend to be a web browser and retrieve web pages using the Hypertext Transfer Protocol (HTTP). While many of the examples in this book have focused on reading files and looking for data in those files, there are many different sources of information when one considers the Internet.