Article URLs week: Day 5
Throughout Article URLs week, I’ve tried to examine a good mix of news sites, although the ones I chose are hardly any kind of random sample. For the last set of reviews today, I’ve picked a few sites that I’m fairly fond of, either from having worked there or just for liking the way they’re designed.
- phillyBurbs.com: C+
http://www.phillyburbs.com/pb-dyn/news/
113-07312003-134071.html
I was an intern at this site two years ago (back when it had a different URL scheme). I don’t like the pb-dyn
garbage, the date is in the wrong order, and this URL is not hackable.
- naplesnews.com: B-
http://www.naplesnews.com/03/08/naples/d959969a.htm
The year should be 2003
, not 03
. The day of the month is missing, and a slug would be more enlightening than the ID number.
- seattlepi.com: B
http://seattlepi.nwsource.com/
transportation/133306_runway01.html
Transportation is a nice specific section to post an article into rather than just a section like “local.” But I’m confused by the decision to use an ID number and a slug, and the day of the month by itself isn’t very enlightening as to when the article was published.
- LJWorld.com: B+
http://www.ljworld.com/section/citynews/story/140693
Very few other news sites have eliminated the unneeded characters of a filename extension (i.e. .html
), something I didn’t even mark as redundant in reviewing all the other URLs. LJWorld gets bonus points for that and for hackability. But there are still some unnecessary pieces, and there’s no date or slug.
- TCPalm.com: D+
http://www.tcpalm.com/tcp/pj_local_news/
article/0,1651,TCP_1121_2151279,00.html
It’s only fair to conclude with a review of the fairly awful URLs at the site I work for now. There’s a lot of unneeded garbage in addition to the main ID number. The section name cries out to be replaced with a hierarchical and hackable pressjournal/local
version, but we can only do a single level. And I’d much prefer dates and slugs to the long numbers.
That’s it for my week of capsule reviews. I started out with a few ideas about the “best” way for news sites to implement principles for good URLs. I’ve thought it through more over the past few days, and tomorrow I’ll conclude the week with those recommendations.
In the meantime, there are a lot more news sites out there than the 26 I covered. If you have some favorite awful or outstanding article URLs of your own, follow Jason’s lead and post them here.
Nathan, as the man who wrote seattlepi.com's CMS, I can answer your question about the ID and slug. The slug has always come to the web site that way from the newsroom, in this case runway01. When we used to run Pantheon Builder (and who doesn't remember program), we always had problems with files overwriting each other. When I wrote our new CMS, I simply attached an ID to it to ensure each file had a unique name. The ID is actually handy to have when you need quick access to editing a file. Every article has a date published in it, so I really didn't see a need to attach that onto the URL.
I think the BBC deserve a mention for good hackable URLs:
http://news.bbc.co.uk/1/hi/world/middle_east/3118737.stm
Although the ID number isn't very informative (no slug or date) you can hack up to middle_east/ and then again up to world/. The 1/hi bit is unnecessary though.
Incidentally, is TCPalm.com running Vignette? The commas give it away.
Mike: Yes, I, too remember Pantheon Builder. Those were the days. I presume when you mention files overwriting each other you mean files from different days would overwrite each other, but that slug coming from the newsroom would at least be unique within each day? My thinking vis-a-vis dates was just that if you’re already adding six digits to the URL to guarantee uniqueness, as long as you can guarantee the slug’s uniqueness within each day, why not make those six digits a year and month? That could add some meaning to the URL without sacrificing length or uniqueness.
Simon: I shouldn’t have been entirely American-centric in my series; the BBC definitely has pretty nice URLs. And yes, we run Vignette at work. In addition to the commas, the other dead giveaway is the HTML comment (“<!-- Vignette V/5 Sat Aug 02 20:00:23 2003 -->”) it inserts all over the place.
Unfortunately, there is no absolute guarentee that the newsroom will have unique slugs. It is extremely minimal, but there is a chance (especially across departments) so I decided against it.
The problem with Pantheon if I remember correctly was mostly when you had the same slug more than once. It would attach a 1, then a 2, then a 3, etc. each time a new story came through. This was okay until you deleted a story somewhere in the middle. Pantheon then got confused on where it should number from and I remember having stories overwrite other stories (or publishing to URLs that used to belong to something else).
I would have to recommend Salon.com as having one of the best systems of URLs of any news site in the world:
http://www.salon.com/news/feature/2003/08/15/dust/index.html
http://www.salon.com/ent/feature/2003/08/15/open_range/index.html
http://www.salon.com/opinion/conason/2003/08/14/arnold/index.html
http://www.salon.com/news/sports/col/kaufman/2003/08/15/friday/index.html
The URLs are certainly are descriptive, and beautifully hierarchical. And you can drop the "index.html" from the main story page, and they still work. If they were hackable, I would nominate Salon.com for an A or A+.
Nathan, as the man who wrote seattlepi.com's CMS, I can answer your question about the ID and slug. The slug has always come to the web site that way from the newsroom, in this case runway01. When we used to run Pantheon Builder (and who doesn't remember program), we always had problems with files overwriting each other. When I wrote our new CMS, I simply attached an ID to it to ensure each file had a unique name. The ID is actually handy to have when you need quick access to editing a file. Every article has a date published in it, so I really didn't see a need to attach that onto the URL.
I think the BBC deserve a mention for good hackable URLs:
http://news.bbc.co.uk/1/hi/world/middle_east/3118737.stm
Although the ID number isn't very informative (no slug or date) you can hack up to middle_east/ and then again up to world/. The 1/hi bit is unnecessary though.
Incidentally, is TCPalm.com running Vignette? The commas give it away.
Mike: Yes, I, too remember Pantheon Builder. Those were the days. I presume when you mention files overwriting each other you mean files from different days would overwrite each other, but that slug coming from the newsroom would at least be unique within each day? My thinking vis-a-vis dates was just that if you’re already adding six digits to the URL to guarantee uniqueness, as long as you can guarantee the slug’s uniqueness within each day, why not make those six digits a year and month? That could add some meaning to the URL without sacrificing length or uniqueness.
Simon: I shouldn’t have been entirely American-centric in my series; the BBC definitely has pretty nice URLs. And yes, we run Vignette at work. In addition to the commas, the other dead giveaway is the HTML comment (“<!-- Vignette V/5 Sat Aug 02 20:00:23 2003 -->”) it inserts all over the place.
Unfortunately, there is no absolute guarentee that the newsroom will have unique slugs. It is extremely minimal, but there is a chance (especially across departments) so I decided against it.
The problem with Pantheon if I remember correctly was mostly when you had the same slug more than once. It would attach a 1, then a 2, then a 3, etc. each time a new story came through. This was okay until you deleted a story somewhere in the middle. Pantheon then got confused on where it should number from and I remember having stories overwrite other stories (or publishing to URLs that used to belong to something else).
I would have to recommend Salon.com as having one of the best systems of URLs of any news site in the world:
http://www.salon.com/news/feature/2003/08/15/dust/index.html
http://www.salon.com/ent/feature/2003/08/15/open_range/index.html
http://www.salon.com/opinion/conason/2003/08/14/arnold/index.html
http://www.salon.com/news/sports/col/kaufman/2003/08/15/friday/index.html
The URLs are certainly are descriptive, and beautifully hierarchical. And you can drop the "index.html" from the main story page, and they still work. If they were hackable, I would nominate Salon.com for an A or A+.