Kjetil's Information Center: A Blog About My Projects

Python Web Proxy

Here is another simple proxy server written in Python, this time a regular web proxy. I tried to make it as simple as possible; using only common Python modules. This means it connects directly to a socket instead of using the more sophisticated HTTP modules and so on.

It works on most pages, but there are some "bugs" on some pages. Most notably, I have seen that the CSS stylesheets are not always transferred, meaning the layout will look strange. I have not tested it against any "rich" web-pages using AJAX and the like, so I have no idea how that will work. HTTPS does not work, since it relies on some more advanced mechanisms in the proxy, and support for the "CONNECT" request, which is not present here.

Then again, this is meant as an emergency solution and/or a way to observe the basic requirements of a web proxy server.

Enjoy:

#!/usr/bin/python

from threading import Thread
import socket
import re
import urlparse
import time

class ClientConnection(Thread):
    def __init__(self, client_socket):
        Thread.__init__(self)
        self.client_socket = client_socket

    def _extract_host(self, data):
        match = re.match("^(?:GET|POST|HEAD) (.*?) (HTTP\/[.0-9]*)", data)
        if match:
            url = urlparse.urlparse(match.group(1))
            # NOTE: Alternate port numbers not handled.
            return url.netloc
        else:
            return None

    def _try_recv(self, sock):
        try:
            return sock.recv(4096, socket.MSG_DONTWAIT)
        except:
            return None

    def run(self):
        try:
            data = self.client_socket.recv(4096)
        except:
            self.client_socket.close()
            return

        if len(data) == 0:
            self.client_socket.close()
            return

        host = self._extract_host(data)
        port = 80
        if host == None:
            self.client_socket.close()
            return

        # A hack, but it works.
        data = data.replace("HTTP/1.1", "HTTP/1.1\r\nConnection: close")

        print "Open: %s" % (host)
        self.server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self.server_socket.connect((host, port))
        self.server_socket.sendall(data)

        timeout = 0
        while True:
            data = self._try_recv(self.server_socket)
            if data != None:
                try:
                    self.client_socket.sendall(data)
                except:
                    break

            data = self._try_recv(self.client_socket)
            if data != None:
                try:
                    self.server_socket.sendall(data)
                except:
                    break

            if (timeout > 200):
                break
            time.sleep(0.1)
            timeout += 1

        print "Close: %s" % (host)
        self.server_socket.close()
        self.client_socket.close()

class ProxyServer(object):
    def __init__(self, port):
        self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self.sock.bind(('', port))
        self.sock.listen(64)

    def loop(self):
        while True:
            (client_socket, client_address) = self.sock.accept()
            connection = ClientConnection(client_socket)
            connection.start()

if __name__ == "__main__":
    ps = ProxyServer(8080)
    ps.loop()
          


Topic: Scripts and Code, by Kjetil @ 28/05-2012, Article Link