【正文】
onsider abs path of the form path 39。extension], . consisting of the filesystem path, filename and file extension.{ query is an optional collection of parameters, to be passed as an input to a resource that is actually an executable program, . a CGI script. On the one side, there are a number of normalizations that must be performed on URLs, in order to remove irrelevant syntactic differences (., thehost can be in IP format or host format { is the same host as ). On the other side, there are some web server programs that adopt nonstandard formats for passing parameters. The web server program is one of them. For instance, in the following URL: the file name 1,3478,|DX,00 contains a code for the local web site (1 stands for ), a web page id (3478) and its specific parameters (DX). The form above has been designed for excient machine processing. For instance, the web page id is a key for a database table where the page template is found, while the parameters allow for retrieving the web page content in some other table. Unfortunately, this is a nightmare when mining clickstream of URLs. Syntactic features of URLs are of little help: we need some semantic information,or ontology [5,13], assigned to URLs. At the best, we can expect that an applicationlevel log is available, . a log of accesses to semanticrelevant objects. An example of applicationlevel log is one recording that the user entered the site from the home page, then visited a sport page with news on a soccer team, and so on. This would require a system module monitoring user steps at a semantic level of granularity. In the ClickWorld project such a module is called Click Observe. Unfortunately , however, the module is a deliverable of the project, and it was not available for collecting data at the beginning of the project. Therefore, we decided to extract both syntactic and semantic information from URLs via a semiautomatic approach. The adopted approach consists in reverseengineering URLs, starting from the web site designer description of the meaning of each URL path, web page id and web page parameters. Using a PERL script, starting from the designer description we extracted from original URLs the following information: {local web server (., or etc.), which provides us with some spatial information about user interests。{ a parameter information, further detailing the three level classification, classified as programming book shopping may have the ISBN book code as parameter。{ a secondlevel classification of URLs depending on the firstlevel one, classified as shopping may be further classified as book shopping or pcshopping and so on。 filename [39。port] [ abs path [39。test00939。s interaction with the web, . web/proxy server logs, user queries, registration data. Usage mining tools [3,4,9,15] discover and predict user behavior, in order to help the designer to improve the web site, to attract visitors, or to give regul