< BACKMake Note | BookmarkCONTINUE >
156135250194107072078175030179198180025031194137176049106218111004226053241020063154055007

Exercises

urllib Module and Files. Update the friends3.py script so that it stores names and corresponding number of friends into a 2-column text file on disk and continues to add names each time the script is run.

EXTRA CREDIT: Add code to dump the contents of such a file to the Web browser (in HTML format). Additional EXTRA CREDIT: Create a link that clears all the names in this file.

urllib Module. Write a program that takes a user-input URL (either a Web page or an FTP file, i.e., http://www.python.org or ftp://ftp.python.org/pub/python/README, and downloads it to your machine with the same filename (or modified name similar to the original if it is invalid on your system). Web pages (HTTP) should be saved as .htm or .html files, and FTP'd files should retain their extension.

urllib Module. Rewrite the grabWeb.py script of Example 11.2 which downloads a Web page and displays the first and last non-blank lines of the resulting HTML file so that you use urlopen() instead of urlretrieve() to process the data directly (as opposed to downloading the entire file first before processing it).

URLs and Regular Expressions. Your browser may save your favorite Web site URLs as a "bookmarks" HTML file (Netscape browsers do this) or as a set of .URL files in a "favorites" directory (Microsoft browsers do this). Find your browser's method of recording your "hot links" and the location of where and how they stored. Without altering any of the files, strip the URLs and names of the corresponding Web sites (if given) and produce a 2-column list of names and links as output, and storing this data into a disk file. Truncate site names or URLs to keep each line of output within 80 columns in size.

URLs, urllib Module, Exceptions, and REs. As a follow-up problem to the previous, add code to your script to test each of your favorite links. Report back a list of dead links (and their names), i.e., Web sites that are no longer active or a Web page that has been removed. Only output and save to disk the still-valid links.

Error Checking. The friends3.py script reports an error if no radio button was selected to indicate the number of friends. Update the CGI script to also report an error if no name (e.g., blank or whitespace) is entered.

EXTRA CREDIT: We have so far explored only server-side error checking. Explore JavaScript programming and implement client-side error checking by creating JavaScript code to check for both error situations so that these errors are stopped before they reach the server.

Problems 19–7 to 19–10 below pertain to Web server access log files and Regular Expressions. Web servers (and their administrators) generally have to maintain an access log file (usually logs/access_log from the main Web server directory) which tracks requests file. Over a period of time, such files get large and either need to be stored or truncated. Why not save only the pertinent information and delete the files to conserve disk space? The exercises below are designed to give you some exercise with REs and how they can be used to help archive and analyze Web server data.

Step 19-7.

Count how many of each type of request (GET vs. POST) exist in the log file.

Count the successful page/data downloads: Display all links which resulted in a return code of 200 (OK [no error]) and how many times each link was accessed.

Count the errors: Show all links which resulted in errors (return codes in the 400s or 500s) and how many times each link was accessed.

Track IP addresses: For each IP address, output a list of each page/data downloaded and how many times that link was accessed.

Simple CGI. Create a "Comments" or "Feedback" page for a Web site. Take user feedback via a form, process the data in your script, and return a "thank you" screen.

Simple CGI. Create a Web guestbook. Accept a name, an e-mail address, and a journal entry from a user and log it to a file (format of your choice). Like the previous problem, return a "thanks for filling out a guestbook entry" page. Also provide a link which allows users to view guestbooks.

Web Browser Cookies and Web Site Registration. Update your solution to Exercise 13-4. so that your user-password information pertains to Web site registration instead of a simple text-based menu system.

EXTRA CREDIT: familiarize yourself with setting Web browser cookies and maintain a login session for 4 hours from the last successful login.

Stock Quote Information. There are many online services which allow users to look up stock quote price information. A few of these sites, such as Yahoo! for example, allow users to download such data in a comma-delimited spreadsheet format. Become familiar with one of these sites and learn how to download stock price information onto your local hard drive. Create a Python application not only to perform the download, but also to be able to read, parse, and display the saved data for a specified set of stock ticker symbols.

EXTRA CREDIT: Integrate your solution to the previous problem by registering users and allowing individual portfolios using the classes created for your solution to Exercise 13-13.

Stock Quote Information. Update your solution to the previous problem by bypassing the downloading of the information to a local file. Open a connection directly to a Web server and parse the stock data as it streams down to your application, and display this information to the screen.

NOTE

Python on the Windows 32-bit platform contains connectivity to Component Object Model (COM), a Microsoft interfacing technology that allows objects to talk to one another, or more higher-level, applications to talk to one another, without any language- or format-dependence. You can read all about COM in Hammond and Robinson. The combination of Python and COM presents a unique opportunity to create Python scripts which can talk to such applications as Word or Excel.

Stock Quotes and Excel/COM programming (graphics/w.gif). Familiarize yourself with COM programming in Python, then use your solution to the previous problem to create a new application which downloads stock quote information and transfers that data directly to an Excel spreadsheet. You may choose to have the user manually invoke the Python script to update the data, or if you have a direct connection to the Internet, have your script update the data periodically during the business day. Merge any element of your solution to the previous problem by providing automatically-updating Excel spreadsheets for multiple portfolios.

Multithreaded COM Programming (graphics/w.gif). Update your solution to the previous problem so that the downloads of data happen "concurrently" using multiple threads.

Web Database Application. Think of a database schema you want to provide as part of a Web database application. For this multi-user application, you want to provide everyone read access to the entire contents of the database, but perhaps only write access to each individual. One example may be an "address book" for your family and relatives. Each family member, once successfully logged in, is presented with a Web page with several options, add an entry, view my entry, update my entry, remove or delete my entry, and view all entries (entire database).

Design a UserEntry class and create a database entry for each instance of this class. You may use any solution created for any previous problem to implement the registration framework. Finally, you make use any type of storage mechanism for your database, either a relational database such as mySQL or some of the simpler Python persistent storage modules such as anydbm or shelve.

Electronic Commerce Engine. Use the classes created for your solution to Exercise 13-11 and add some product inventory to create a potential electronic commerce Website. Be sure your Web application also supports multiple customers and provides registration for each user.

Dictionaries and cgi module. As you know, the cgi.FieldStorage() method returns a dictionary-like object containing the key-value pairs of the submitted CGI variables. You can use methods such as keys() and has_key() for such objects. In Python 1.5, a get() method was added to dictionaries which returned the value of the requested key, or the default value for a non-existent key. FieldStorage objects do not have such a method. Let's say we grab the form in the usual manner of:
							
form = cgi.FieldStorage()

						
Add a similar get() method to class definition in cgi.py (you can rename it to mycgi.py or something like that) such that code which looks like this:
							
if form.has_key(\qwho\q):
    who = form[\qwho\q].value
else:
    who = \q(no name submitted)\q

						

… can be replaced by a single line which makes forms even more like a dictionary:

							
howmany = form.get('who', '(no name submitted)')

						
Creating Web Servers. Our code for myhttpd.py in Section 19.7 is only able to read HTML files and return them to the calling client. Add support for plain text files with the ".txt" ending. Be sure that you return the correct MIME type of "text/plain."

EXTRA CREDIT: add support for JPEG files ending with either ".jpg" or ".jpeg" and having a MIME type of "image/jpeg".

Advanced Web Clients. Update the crawl.py script in Section 19.3 to also download links which use the "ftp:" scheme. All "mailto:" links are ignored by crawl.py. Add support to ensurethat it also ignores "telnet:, news:, gopher:," and "about:" links.

Advanced Web Clients. The crawl.py script in Section 19.3 only downloads .html files via links found in Web pages at the same site and does not handle/save images which are also valid "files" for those pages. It also does not handle servers which are susceptible to URLs which are missing the trailing slash ( / ). Add a pair of classes to crawl.py to deal with these problems. A My404UrlOpener class should subclass urllib.FancyURLOpener and consist of a single method, http_error_404() which determines if a 404 error was reached using a URL without a trailing slash. If so, it adds the slash and retries the request again (and only once). If it still fails, return a real 404 error. You must set urllib._urlopener with an instance of this class so that urllib uses it.

Create another class called LinkImageParser which derives from htmllib.HTMLParser. This class should contain a constructor to call the base class constructor as well as initialize a list for the image files parsed from Web pages. The handle_image() method should be overridden to add image filenames to the image list (instead of discarding them like the current base class method does).


Last updated on 9/14/2001
Core Python Programming, © 2002 Prentice Hall PTR

< BACKMake Note | BookmarkCONTINUE >

© 2002, O'Reilly & Associates, Inc.