RealURL
1.3.2. Page resolve methods, how does it work?
Let's first take a look at the simple case, where everything goes smooth and nothing goes wrong:
When you type an URL (or click on it), it is looked up in the so called URL-cache. Assuming it is found, we then know the page-ID (from the path) to generate the correct page. And we're done.
Now, some things can go wrong here: first of all, the paths generated by RealURL only contain a..z, 0..9 and underscores ('_'), so it's a good idea to strip the URL off all unwanted characters before we look it up. So we do that :)
Furthermore, if the URL isn't found in the URL-cache (e.g. when the cache is cleared), we have to search for it in the database. This is done by first examining the domain-part, and then searching for every 'directory' in the URL until we reach the destination-page. The language-part (if present) is also translated to a language-ID. This result is cached, so we can use it later on.
If the language wasn't given in the URL, a function is called to figure out what language will be most appropriate. I created some code which looks the IP-address up in the IP2Country-database (a table), which I imported into the TYPO3-database. I might create a separate extension for this, but for now you can uncomment the code if you want to use it. Look for getDefaultLangID().
When the page-ID is found, another check is done (after TYPO3 has loaded all kinds of information about the page): it checks to see if the requested URL corresponds to the 'real' URL of the page. This might be different due to changes in the page-title of the page, or one of it's parents. Or, one might have typed the URL to a page that is a shortcut to the real page. In those cases, the user is redirected to the real/official/new URL of the page and in case of a changed page-title, the old URL is marked as 'expiring'.
This makes it possible to change the page-title of a page (and thus it's URL), but still be able to reach that page through the old URL, which will still be used by e.g. Google.
A problem arises when you create another page, with the same title as a page that existed before, because that URL still points to the other page. Therefore, if RealURL notices that you typed an expiring URL, it searches the database like the URL wasn't found in the cache. If a page is found this way, that new URL + page-ID is cached instead. If no page is found, the cached result will be used.
The other way around is much simpler: when an URL has to be generated for a certain page, it is looked up in the same URL-cache. If it isn't already there, the URL will be created by building the so-called RootLine for the page, filtering every page-title so that it contains only a..z, 0..9 and underscores and finally caching it. This process does take languages into account, so if you're browsing to the Dutch version of a page, you'll get a Dutch URL to it (using the Dutch page-titles).