(allegate 0 v 0.2) Comments by Leonardo Boselli PANEUROPEAN TRANSPORT SCHEDULES PROJECT This is the first proposal for developing a integrated service of delivery of informations on public transport services' (railway, ferries, bus, plane) schedules via internet. I have here the experience in helding the Italian Transport page, with full schedules of Railways and some bus and ferry lines. The program i use is available to people intersted in the project. It run under WinNT and LINUX without modifications (ANSI C - CGI interface) I give here my impression on the problems I faced or have reported me: 1. A first Problem would be the knowledge of IF the service exists. If you want to go from A to B you must be aware if at A and B exist a public service, the second problem is to know how to get the informations. It is not a great work, since it have not to be updated frequently, but need a lot of contributions, for every town in Europe should be a page telling what carries offer their services and the link on how to get the informations, In the event the schedules are unavailable to be put on line, a telephone number and at least the other places that are connected and the kind of services available (if is daily or not, if there are many run a day or is a "frequent" service for wich schedules are not af great interest such in urban services) This is also useful for small town that have connections with only a nearby node, or for places in or around a large town where is usually more convenient to go with a train downtown and then take a city bus, if a direct trains service does not fit. This to avoid to put on line the Urban schedules that usually due conditions of traffic are only indicative of the mean time of wait and lenght of run. This problem could be solved IMHO by having for every region a series of province files, that have tables of the localities and the link to the carrier that serve the place, if only one, or a list and the link to the global information service to do a global search. This in a standard way, so also a remote program could query and give the 'local' user a consistent interface, this leaving the programmers freedom in the choose of human interface. A question: what should be the format for this file ? I suggest this one: Name_of_Place [PROVINCE COUNTRY] CODE.PC.CY (....) The province and country should be the standard abbreviations, this to help the user to see if it is the correct place he wanted , The could be two things: a link to a page giving informations on the site, useful for smaller towns, where there is only a possible connection, that is the answer should be: Take this bus and go to XYZ, the search from there. This could also serve in larger towns to a link to page on motropolitan services (that usually do not run by time but by frequence . 2. The retrieving interface: In my service for trains it is sufficient to put a part of the name of the place to get the most likely match. This is opposite to the other approach to require an exact match of the name, or requiring to choose from a subset. There is an alternate page for my service that uses this approach (thanks to Alessandro Zorat) so one can choose. For this problem are welcome comment from user and human communication experts. (see http://www.cs.unitn.it/~zorat/orari.html 3. The algorhitm to search services: There are different approaches, some use fixed tables that require to be built at every schedule change, is the easier but is quite limited, it uses a lot of space and some odd routes could be missed. Other uses a table of routes, that for any souce/destination pair give the routing, so the program search services throught that places. This is usually the faster, but is easy to miss some services via a longer route, but with better services or connection times The third approach is the one I have used, that is to work on packing the database and optimizing the search algorhitm, to do a brute force search on ALL the reasonable possibility to route. It use few space on the disk (all the 10000+ italian trains with 3200 stations fit in 800k) but as the number of the stations grows its time to search increment. now is less than 1 sec but the time is rougly proportional to the number of the trains present, so faced with the adding of bus services i am thinking of an hybrid mode to keep search time (and server load) low. This is the area for programmers ideas .... By subdivinding regional services i would load only the region of origin and destination, plus the interregional services, for international services the region of origin, the country of origin, the international table, the country of destination and the region of destination. To avoid to miss some connection should be advisable to make manually (but have not to be updated frequently) a table that for every pair of origin/ destination regions, what are the databases to be taken in account (as an example for going to spain to denmark would be loaded also france and germany) . This list of modules to load should be the oly manual made table. 4. A way to code places: The CODE.PC.CY at point 1 should be the code used for searching the database. There could be more for larger cities with more stations. The code i choose this way: CODE is a number from 1 to 4095 for stations with international services, 4097 to 8191 for station with interregional services 8193 to 12287 for stations with only regional services. The PC represent the province (codes above 8192 do repeats) and CY is country code (codes above 4096 repeats too). A problem to be set at beginning is giving for every country a CY code and down to provinces. This is not easy since the countries and the regions should be CONVEX that is a travel inside the country or inside the province should be all inside that unit. Border station should be in the above subdivision, some peculiar routes should be taken in account by promoting some stations. This is howewer a violation of the spirit of the alghorithm that claim that no route have to be manually set. Beside this a small country would have no province subdivision, and small contiguous country would have a common database, if feasible (If less than 4000 services is better have a single table) The same for smaller provinces, expecially for the convexity costraint. For the railway stations i used a nuber 10 to 3199. If you have another way of coding, tell the people ! 5. Entering data: Most of the timetable publicly available are in a non machine readable format. If many people take care to enter the data for their region (the best should that every one care of a single line: this way every change will be immediately noticed: If they change something on a line you use daily you soon notice even just by reading a notice on a pole, if you copy a timetable for somewhere 1500 km apart, it is likely you will not be so up-to-date. 6. Updating data: There are two issues: the first to update a master database, that is the site that take the tables for a certain area, the second to update remote mirror sites. For the first issue, it should be possible to send via e-mail an update: this could be made in different ways: A random helper could simply send the new to a coordinator, that in turn will activate the change, with a password protected message, in the main database. Major changes would be made by rewriting the database. I wrote a program in VB, but a lot of approaches can be made and this could be the area where the fantasy of the programmers should have more space. The distributed database should have an expire date written on it, as well a last update stamp. 6bis. In my implementation (the proposed .5 format - current one is slightly different) the datafiles should have this format: An header to be defined, saying mainly what is the database, the date of building and other informations useful for the mirrors the for each train an header (for speeding processing every record is two 16 bit word) 0-services (the 0 serves to sync the reading, that can be sequential), services is a array of bit that gives information such: Urban service | Interurban Bus service | Regional Train Long distance train | Night train | ferry | Plane Bicycles Transport | Disabled passenger accessible Reservable | Advanced reservation not required Len_of_train-Len of comment : the first word is the number of data record (roughly the numer of the stops) the second can be 0 or a number. If 0 it is a train without any particularity (not even a name!) If not is the number of bytes that follows carrying the notes. The sign if 1 signify that the train is periodic, then the 12 word after the number and the comment give the periodicity (every bit is the nth day of a month, otherwise the service is daily. Number_of_the_train (1st word literal part, 2nd word number) [if len_of_comment: The bytes that fdorm the string. padded to doublewords with nulls [if len_of_comment & 0x8000] 12*periodicity (Len_of_train)* Station - time : station is the code of the station , ored with 0xc000 time of transit (stop and go) , 0x8000 if a departure and 0x4000 if arrival time is the time in minutes after midnight ( a train spanning on two days appear twice, once for the first day with numbers above 1440 and once for the second day, from the first station after midnight) - To speed searches after loading the train that does not fit the day choosen are removed, and thence also the periodicity fields, all other headers are taken apart. In my current implementation header, comment and timetables are on separate files, owing the necessity to share databases, all data are put in a single file. 7. Multisite: To avoid an highy charge on a single site, or to block the full system in the event of network ploblems it is advisable to have a few mirror sites: If the database are properly build a couple of MB here and these should not be a heavy burden. (the full database for italia railway fit in about 600k ) The main [european] database (with all connection with sites below 4096) would be in any partecipating site, the local database could be only on local sites, or loaded on request, in a way similar to the one on which the DNS service is organized: A single primary on which the changes are made, whose maintainer is responsable for the updates, then a few secondaries that regularly update, based on the "expiration date" present on the database, and a certain number of caching servers. The request for a schedule could then be made or by asking a particular "help" from a remote server or by loading on the fly the remote databse, then operating as a secondary until expire. (this is my first idea, if someone is smarter tell us !) A smarter idea should be to notify the master when such copies are made and then receiving the updates until expiring date. For minor updates could be sent simply a "difference" file. A protocol for this is a thing that someone could study The choose of loading the remote database or issuing a query would be based on loading factor: a site whith ample bandwidth and diskspace and/or large number of query could copy the remote database, a machine on a slow connection could simply issue a remote query, in the worst case even via an e-mail message. 8. Options: The user should be able to query not only the sites for departure and arrival, but also for other options: such particular service, avoiding use of bus or plane, requiring the quicker or the earlier service (my current implementation give ALL services, but in a remote query this could be harder to achieve) A standard should be use for impossible routes (such one wanting "Only day train" but having a route needing a bus) 9. The format of a query: I have ideas, but i think to be premature to talk about: the format of the query i use you can look by calling a request to www.dicea.unifi.it/ie.htm and lookink what appear on the address line of the browser, but the format for interchange the query in a cooperative mode would be to be set after choosing a preferred method for multiple lookups and query passing. write comments, suggestion, flames and other to leo@dicnet.ing.unifi.it (there is a list for conferencing, if you want to be subscribed prease tell me) Leonardo Boselli (11/11/96)