Article URLs week: Recommendations
During Article URLs week, I’ve been thinking about the merits of various news sites’ URL schemes. There were a bunch I thought were fairly good and a few very good ones, but I didn’t find any I thought were 100 percent perfect. Today, I’m going to try to derive a suggestion for what that ideal news site article URL might be from the principles I used to judge them during the week.
- To make URLs permanent, either a date or a unique numeric ID needs to be included.
- To make URLs readable, I prefer using the date and a descriptive slug as opposed to the numeric ID.
- To make URLs hierarchical, a section should be included and made hackable. Dates should be used in hierarchical year/month/day order.
- To make URLs brief and clean, no redundant parts or “garbage” CMS parameters should be included.
The only remaining thing, then, is deciding which order these elements should come in. The slug definitely belongs at the end, because that’s the most specific part of the URL, but is the section or the date more general? Should date come before section, as at nytimes.com:
Or should section come before date, as at ClarionLedger.com:
After much thought, I’ve decided I think the section should come before the date. That makes the URL hackable to the section while still permitting hacking to a date within the section. The alternative would make it harder to hack to the current contents of a particular section — which is why I don’t like it as much — even as it would make hacking to the entire contents of the site on a certain date easier. This is obviously a debatable point, and sort of moot anyway until more sites do have hackable URLs.
My suggestion for the “ideal” news site URL, then, is:
YYYY/MM/DD is a numeric date in year/month/day order and
slug is a short descriptive name for the story.
And in a well-designed content management system, the URL would be hackable to:
www.site.com/section/YYYY/MM/DD/, a list of all articles in that section on that day
www.site.com/section/YYYY/MM/, a list of all articles in that section in that month, grouped by day
www.site.com/section/YYYY/, a list of the articles-by-month pages
www.site.com/section/, the current index page for the section
There are plenty of good news sites out there using somewhat different URL schemes than what I propose, and at those sites there’s no reason to switch to a new system if the URLs are readable and hierarchical enough. But my main objective this week was to draw attention to the sites like those using the awful URLs I graded as Ds or Fs. The top priority for those sites’ CMS developers ought to be creating better URLs, using either the scheme I’ve proposed here or any system that would improve readability, brevity, cleanliness, hierarchy and permanence.
The section part of the URL may cause problems if an article relates to more than one section. Also, the taxonomy (section hierarchy) may change after the next redesign, so permanence of the URLs is not guaranteed.
Marek, I don’t see multiple sections as a real problem, because I think the way most news sites have solved that is to pick one section (often the more specific of the two) as the “canonical” section for the URL, linking to it but not publishing it “under” any other sections. So if I have an article that belongs in the “environment” and “local news” sections, “environment” would get listed in the URL. In any event, I think on most news sites the number of articles that actually do get listed in multiple sections is fairly small.
You’re right that the only failsafe way to achieve permanence is to dump everything into folders by date at the top level of the hierarchy; that way you could be adding or removing sections all the time. But even with sections at the top level as I suggest, as long as the appropriate redirects are set up nothing ever needs to be broken. Web servers can fairly easily be programmed to recognize /oldsection/YYYY/MM/DD/article as valid even if /oldsection/ no longer exists.
I agree on the order you described...but for a slightly different reason. There are plenty of pages in each section that aren't necessarily related to any publish date, but they should still be hackable. For instance, a regularly updated page called "write your congressman" could sneak itself in at www.site.com/section/write_congress.html.
To make URLs permanent, either a date or a unique numeric ID needs to be included.
Pray you, define permanent for me. I haven’t the ghost of an idea what you mean.
Unenlightened, Tim Berners-Lee discusses permanence much better than I ever can in his article, "Cool URIs don't change," where he specifically recommends publication dates as an organizing prinicple to prevent URLs from needing to change.
permanent urls are a fine point - in theory i have to add...
working at a newsproducing site i have learned that editors tend to "update" stories, rewriting parts of an article... i dont favor this kind of practice, but still it does happen.
another point: even if you fix a spelling error in an article (e.g. in the headline form late evening - you have to change it in the morning!)... at least our cms does not give a chance to distinguish between a "major update" to an article an a "small bugfix".
this would also cause enormous problems to url's containing the date... you could 1. change the url, which is bad news for all those bookmarks, newsletter-urls (and news.google.com) or you could 2. have a new article with a new url - which would enlarge confusion by two identical articles (almost same content) under different urls.
date could be fine, but its trickier than it seems.
btw: at kurier.at we try to stick with short urls using an id (e.g.: http://kurier.at/chronik/343950.php (sample article)
Thomas, I’m glad your site does make the corrections! Some sites don’t, and that’s bad. My thought was to basically keep the most current (up-to-date with all corrections) version available at the existing URL, which would be based on the original creation date. That way subsequent updates wouldn’t alter the URL if the URL was based on a modification date. I completely agree with you, though, that neither of those options you mention is acceptable.
I'm about to recode our entertainment site's news pages (and yes, the old URLs will still work) and I was wondering what your recommendation for slug length would be.
Looking at some of our recent headlines:
Schwarzenegger to run for governor
Toronto Film Festival's gala list grows longer
I imagine should be something like:
What would you suggest would be the maximum number of words?
Ian, your site looks like it has some pretty nice URLs already. Three words sounds like a good limit on slug length; you have some good examples there of how that would work. My personal preference is usually just to use a single word that’s the subject of the story — e.g. “arnold,” “gigli” or “tiff.” Your site probably covers all of those topics repeatedly, though, and it would probably get very confusing without the “what is the subject doing this time” part of your sample slugs. It looks to me like you have the exact right idea for what would work well on your site.
Hmmm...here's an interesting not re: slugs. If you want you site slugs to do well in google, use hyphens instead of underscores.
Apparently google views arnold-runs as two keywords while arnold_runs is viewed as one keyword:
Thanks for dedicating an entire week to proper URL architecture. In all honesty, I had never been to your web site and only found it tonight from a post at dive into mark to the permanent link for each of your article of the week; I even found myself "hacking" your URL to get to the next day in the series. This worked great, until the last day of the series (August 2nd).
After reading such a great discussion, I wanted to check out what else you had written since the 2nd, and following your advice again, I hacked your url to August 3rd. To my surprise, I arrived at a giant "404 Error" message. This definitely wasn't what I excepted and if I were any one of the 95% web reading population I wouldn't know what a 404 error was. Thankfully, you do suggest looking in your right hand navigation and provide a link to your home page.
I'm glad your site handles the error gracefully, but instead of telling the user to look elsewhere for the content they're looking for why don't you provide them with the content they want. For example, you didn't post to your site August 3rd and that's fine, but you could instead indicate to the user that (1) a post was not made to the site August 3rd, however, (2) there were other posts made in the month of August 2003 and (3) present a list of the the posts made the month.
This would help users, like me, who have never visited your site, and were not directly linked to it, find additional content immediately rather then having to navigate to another section of the site to find out that you didn't post again until August 5th.
Another thing, when using any kind of CMS that knows how to create a static site from database content, like Movable Type, wouldn't it be best to have it automatically generate "forward" and "back" links to each of your posts? This would definitely prevent the need to have hack the URL.
If you're going to discuss URL usability, then I think it's necessary to also discuss what should happen when a URL is not hackable. What do you think?