## Google PageRank matrix calculator (graphically)

**May 19, 2004**

2 comments Mathematics

Some time ago I wrote about the Google PageRank algorithm in Python. It's a matrix algorithm for calculating the PageRank values for every page in a web. All you have to do is define which pages links to which and the algorithm calculates the PageRanks for every page for you.

Now I'm going to try to illustrate it in practise for those of you who don't know what to do with a "Python script"n:/plog/blogitem-040321-1/PageRank.py.

**Start calculating!**

See the gallery of previous calculations.

The purpose of this simple script is to convert the web matrix that you entered into a directed graph showing the approximated PageRank value for every node.

What you can do with this is to test how the PageRank algorithm works graphically. You might want to know what the effect is to be linked to by one very popular page or the effect of being linked by several not so popular pages. It's up to you to draw your own conclusions.

The input is limited in size (to save my poor computer) and the graphs aren't beautiful. (Thanks Ero Carrera for pydot which made this possible)

## Two done three to go

**May 15, 2004**

0 comments Mathematics

Today I had my second exam. This was Mathematical Methods. Something I had thought was going to be harder than it was. Next week on Monday I have Differential Equations; C++ on Tuesday and lastly Bottom Up Computing And Discrete Mathematics on Wednesday. So my whole BSc degree is all over on Wednesday!! Yippi!

Today after the exam we had a few drinks but I left earlier than most people, feeling both drunk (3 drinks :) and sleepy. Taking an exam plus all the pre-anxiety really tires you out. This weekend is going to be about studying. Sigh.

## Zurich tram service problem

**May 11, 2004**

0 comments Mathematics

This is a little thought problem I learned during the Quantum Mechanics course I'm taking. It was (supposedly) A Einstein who thought about it first when he was working in Zurich.

As we all know, the tram service in Zurich runs like clockwork; all day every day. Or at least, let's assume so. This question appeared on one of my exercise sheets:

There is a tram service in the city of Zurich between two terminals A and B such that a tram (there is only one tram in service) leaves every five minutes from the terminals and makes the journey from A to B (B to A) in exactly five minutes. There is a passenger who comes to a stop C located 1/4 of the distance AB from A atcompletely randomtimes. He takes always the first tram regardless of the direction in which it arrives. He keeps all his tickets and after a year he counts how many times he travelled in the directions AB and BA. Explain why he finds that he travelled many more times in one direction than the other. What is the ratio of the number of journeys from A to B to the number of journeys from B to A?

If you can't be asked to do the simple calculation, at least figure out which direction he travelled more than the other. It's not a hard problem to understand so anybody can do it without any math skills. It's just a little mind boggling at first.

## My dissertation report

**April 8, 2004**

2 comments Web development, Mathematics

Now I have finished and submitted my dissertation. A great relief. The journey through it has been really interesting and I'm very please with it.

The title is: **Building a web application for an on-line
mathematics journal** and the abstract reads:

*"This project is about how to build an on-line journal for mathematics. This
was done using the web application platform Zope and the programming
language Python. It is now possible for people to register as members on
the site and upload papers and write descriptive text for these papers that
can be used in various abstraction methods. The report describes what technology
techniques were used to accomplish this and the object structure that
was applied. We will conclude by listing the shortcomings of the delivered
web application and aspects that can be improved and some suggestions to possible solutions to this."*

Feel free to download it. Any feedback is welcomed. The target audience kept in mind when writing this is fellow university students. It's probably too basic for a Zope developer or any senior web programmer, but you'll need some web development knowledge.

I recommend it to people who want to learn more about web application development and novice Zope developers. The report is 24 pages long and 700Kb to download.

Read more about the actual application that was developed here

## Google PageRank algorithm in Python

**March 21, 2004**

27 comments Python, Mathematics

There are many articles on the net about how the PageRank algorithm works that all copy from the original paper written by the very founders of Google Larry Page and Sergey Brin. Google itself also has a very good article that explain it with no formulas or numerical explanations. Basically PageRank is like social networks. If you're mentioned by someone important, your importance increases and the people you mention gets upped as well.

We recently had a coursework in discrete mathematics to calculate PageRank values for all web pages in a web matrix. To be able to do this you have to do many simplifications and you're limited in terms of complexity to keep it possible to do "by hand". I wrote a little program that calculates the PageRank for any web with no simplifications. The outcome is that I can quickly calculate the PageRank values for each page.

Here's how to use it:

```
from PageRank import PageRanker
web = ((0, 1, 0, 0),
(0, 0, 1, 0),
(0, 0, 0, 1),
(1, 0, 0, 0))
pr = PageRanker(0.85, web)
pr.improve_guess(100)
print pr.getPageRank()
```

Think of the entries in the matrix as A to D along every row and which page it has a link to along the column. In the above example it means that A has a link to B, B as a link to C, C has a link to D, D has a link to A.

The PageRank values when you run 100 iterations with no random jumps is:

```
[ 0.25 0.25 0.25 0.25 ]
```

Pretty obvious isn't it. One complication with the PageRank algorithm is that even if every page has an outgoing link, you don't always cover *everything* by just following links. That's why to sometimes need to random start over again from a randomly selected webpage. This is we we use 8.5 in the above example. That qualitativly means that there's a 15% chance that you randomly start on a random webpage and iterate from there.

Let's have a "more complex" web model:

```
web = ((0, 1, 0, 0),
(0, 0, 1, 0),
(0, 1, 0, 1),
(1, 1, 0, 0))
```

Running the algorithm again we find:

```
[ 0.14285725 0.28571447 0.28571419 0.2857141 ]
```

Notice how page B has the same PageRank as C and D even though page B has two links coming in to it. This is because it spreads it popularity to other pages. It also matters that the initial guess is that every page is equal initially.

Enough said, download the script yourself and make sure you have Python and the numarray Python module installed.

## Finished the bulk of my dissertation

**March 15, 2004**

1 comment Mathematics

This weekend I finished the bulk of my dissertation, which is a web application for academic staff to publish their academic papers online. The first target audience is for staff at City University Mathematics Department to whom I will deliver the project. The idea is that doctors and professors can submit their scientific papers on this web application. They enter some meta data about the paper such as title, abstract and co-authors and lastly upload the PDF or Word document that actually is the paper. The administrator will with time compile "Issues" which are basically bundles of papers published together with a little comment.

Registration is open to anyone but requires moderation. I.e. you can't log in straight after you have registered. Also all papers that registered members submit will need moderation too. This is done by a group of people (currently only me) who have administrator access.

My dissertation is about the computer science in building a web application. I.e. planning, data structures, algorithms, design, content management etc. It has "nothing" to do with mathematics even though I'm an undergraduate student of the mathematics department.

Please do go and **visit the site** and use it to help me get it as good as possible. There might still be bugs or spelling misstakes that needs to be taken out. When you register I will moderate you and please avoid rude words and try to make your example data as real as possible.

Now the last thing for me to do is to write the actual report about the project. That is what I will submit but my grade will based on the judgement of the web application. When I have finished my report I hope to show that on my web page too if I'm allowed.