Northeast blackout unplugs 10 sites
Disasters can sometimes be journalism’s finest hour, as news organizations deliver information of vital importance and high interest, perhaps even rebuilding some public trust. Or they can be journalism’s worst hour, as news organizations themselves are affected and don’t rise to the heroics of offering facts and stability.
My friend Chris Trucksess e-mailed to report that OregonLive.com is down as a result of the New York blackout, replaced by a ridiculous “blackout blog” that I’ve archived here. It’s joined by the nine other Web sites hosted on the same content management system: NJ.com, AL.com, MLive.com, Cleveland.com, Nola.com, Syracuse.com, PennLive.com, MassLive.com and SILive.com. There are a number of reasons this is awful journalism from these already shoddy sites:
- Sites shouldn’t go down like this. Backup power and redundant facilities are just costs of doing business in the online world. Did those systems fail, or were they not even in place?
- Sites elsewhere in the country definitely shouldn’t go down like this. Readers in Oregon or Alabama or Louisiana don’t expect their “local” news sites to in fact be hosted in the Northeast. This is a real danger of putting all your eggs in one basket hosting-wise.
- The content is terrible. I think blogs ought to have a place on news sites, but the reverse-chronological structure is totally inappropriate for the front page. And the best these sites can offer are four AP stories, links to other news organizations’ coverage and unenlightening personal blogs?
- There was no explanation of the problem. The page now contains an “editor’s note” that says “we ... are working to restore our sites,” but it’s not prominent enough and wasn’t there a little while ago when I archived the page. That’s an inexcusable omission, especially for readers outside the Northeast.
- The blog’s timestamps are wrong. Perhaps the entries are being written every few minutes as the timestamps claim, but they only seem to be posted to the server and made publicly available in chunks every hour or so.
Overall, this is amateurish and irresponsible.
And you thought PDFs were bad...
Nytimes.com is linking to an Excel file of statistics from one of its stories today.
An Excel file has all the usability problems of PDF files for online reading. It also requires special software to view, but unlike Adobe’s PDF reader, that software is not free.
I applaud the Times for thinking to offer raw data in a form that might inspire some readers to do some number-crunching of their own. But an ordinary tabbed-delimited text file could serve that purpose — and be readable without expensive software many people don’t own.
I believe news sites have a journalistic duty to make their reporting accessible to everyone. That means distributing it via open standards, not proprietary file formats.
URLs: Nouns and verbs
I was hoping to lay off URLs for a while after writing about them for a whole week in which one of my recommendations was to make article URLs as short as possible.
But coincidentally Steve Yelvington has an excellent reminder that some developers go too far, making URLs too simple. “A URI, or URL, should point directly to a resource,” or a piece of content, Yelvington notes. If, in order to uniquely identify the content additional information from cookies, form variables or session variables is required, “it’s broken. The URL isn’t meaningful any more. It can’t be shared.”
Using different terminology, I think Yelvington’s argument is this: All nouns on the Web need to be bookmarkable, although verbs don’t need to be. A listing of airline flights matching my search criteria is a noun that I might want to bookmark — but the page that bills my credit card is a verb that I don’t want to return to. Problems happen when developers think they’re programming a verb but forget that the verb produces a noun. For example, searching is a verb, but the resulting list of results is a noun.
Calling directory assistance is a verb. Once I’m done, though, I get a noun — a phone number. If your Web site makes me call directory assistance (i.e. navigate or search) every time without ever giving me a unique phone number (i.e. URL) I can save to use next time, I’m going to get awfully frustrated.
Luckily, I think most news sites get this principle.
Article URLs week: Recommendations
During Article URLs week, I’ve been thinking about the merits of various news sites’ URL schemes. There were a bunch I thought were fairly good and a few very good ones, but I didn’t find any I thought were 100 percent perfect. Today, I’m going to try to derive a suggestion for what that ideal news site article URL might be from the principles I used to judge them during the week.
- To make URLs permanent, either a date or a unique numeric ID needs to be included.
- To make URLs readable, I prefer using the date and a descriptive slug as opposed to the numeric ID.
- To make URLs hierarchical, a section should be included and made hackable. Dates should be used in hierarchical year/month/day order.
- To make URLs brief and clean, no redundant parts or “garbage” CMS parameters should be included.
The only remaining thing, then, is deciding which order these elements should come in. The slug definitely belongs at the end, because that’s the most specific part of the URL, but is the section or the date more general? Should date come before section, as at nytimes.com:
Or should section come before date, as at ClarionLedger.com:
After much thought, I’ve decided I think the section should come before the date. That makes the URL hackable to the section while still permitting hacking to a date within the section. The alternative would make it harder to hack to the current contents of a particular section — which is why I don’t like it as much — even as it would make hacking to the entire contents of the site on a certain date easier. This is obviously a debatable point, and sort of moot anyway until more sites do have hackable URLs.
My suggestion for the “ideal” news site URL, then, is:
YYYY/MM/DD is a numeric date in year/month/day order and
slug is a short descriptive name for the story.
And in a well-designed content management system, the URL would be hackable to:
www.site.com/section/YYYY/MM/DD/, a list of all articles in that section on that day
www.site.com/section/YYYY/MM/, a list of all articles in that section in that month, grouped by day
www.site.com/section/YYYY/, a list of the articles-by-month pages
www.site.com/section/, the current index page for the section
There are plenty of good news sites out there using somewhat different URL schemes than what I propose, and at those sites there’s no reason to switch to a new system if the URLs are readable and hierarchical enough. But my main objective this week was to draw attention to the sites like those using the awful URLs I graded as Ds or Fs. The top priority for those sites’ CMS developers ought to be creating better URLs, using either the scheme I’ve proposed here or any system that would improve readability, brevity, cleanliness, hierarchy and permanence.
The section part of the URL may cause problems if an article relates to more than one section. Also, the taxonomy (section hierarchy) may change after the next redesign, so permanence of the URLs is not guaranteed.
Marek, I don’t see multiple sections as a real problem, because I think the way most news sites have solved that is to pick one section (often the more specific of the two) as the “canonical” section for the URL, linking to it but not publishing it “under” any other sections. So if I have an article that belongs in the “environment” and “local news” sections, “environment” would get listed in the URL. In any event, I think on most news sites the number of articles that actually do get listed in multiple sections is fairly small.
You’re right that the only failsafe way to achieve permanence is to dump everything into folders by date at the top level of the hierarchy; that way you could be adding or removing sections all the time. But even with sections at the top level as I suggest, as long as the appropriate redirects are set up nothing ever needs to be broken. Web servers can fairly easily be programmed to recognize /oldsection/YYYY/MM/DD/article as valid even if /oldsection/ no longer exists.
I agree on the order you described...but for a slightly different reason. There are plenty of pages in each section that aren't necessarily related to any publish date, but they should still be hackable. For instance, a regularly updated page called "write your congressman" could sneak itself in at www.site.com/section/write_congress.html.
To make URLs permanent, either a date or a unique numeric ID needs to be included.
Pray you, define permanent for me. I haven’t the ghost of an idea what you mean.
Unenlightened, Tim Berners-Lee discusses permanence much better than I ever can in his article, "Cool URIs don't change," where he specifically recommends publication dates as an organizing prinicple to prevent URLs from needing to change.
permanent urls are a fine point - in theory i have to add...
working at a newsproducing site i have learned that editors tend to "update" stories, rewriting parts of an article... i dont favor this kind of practice, but still it does happen.
another point: even if you fix a spelling error in an article (e.g. in the headline form late evening - you have to change it in the morning!)... at least our cms does not give a chance to distinguish between a "major update" to an article an a "small bugfix".
this would also cause enormous problems to url's containing the date... you could 1. change the url, which is bad news for all those bookmarks, newsletter-urls (and news.google.com) or you could 2. have a new article with a new url - which would enlarge confusion by two identical articles (almost same content) under different urls.
date could be fine, but its trickier than it seems.
btw: at kurier.at we try to stick with short urls using an id (e.g.: http://kurier.at/chronik/343950.php (sample article)
Thomas, I’m glad your site does make the corrections! Some sites don’t, and that’s bad. My thought was to basically keep the most current (up-to-date with all corrections) version available at the existing URL, which would be based on the original creation date. That way subsequent updates wouldn’t alter the URL if the URL was based on a modification date. I completely agree with you, though, that neither of those options you mention is acceptable.
I'm about to recode our entertainment site's news pages (and yes, the old URLs will still work) and I was wondering what your recommendation for slug length would be.
Looking at some of our recent headlines:
Schwarzenegger to run for governor
Toronto Film Festival's gala list grows longer
I imagine should be something like:
What would you suggest would be the maximum number of words?
Ian, your site looks like it has some pretty nice URLs already. Three words sounds like a good limit on slug length; you have some good examples there of how that would work. My personal preference is usually just to use a single word that’s the subject of the story — e.g. “arnold,” “gigli” or “tiff.” Your site probably covers all of those topics repeatedly, though, and it would probably get very confusing without the “what is the subject doing this time” part of your sample slugs. It looks to me like you have the exact right idea for what would work well on your site.
Hmmm...here's an interesting not re: slugs. If you want you site slugs to do well in google, use hyphens instead of underscores.
Apparently google views arnold-runs as two keywords while arnold_runs is viewed as one keyword:
Thanks for dedicating an entire week to proper URL architecture. In all honesty, I had never been to your web site and only found it tonight from a post at dive into mark to the permanent link for each of your article of the week; I even found myself "hacking" your URL to get to the next day in the series. This worked great, until the last day of the series (August 2nd).
After reading such a great discussion, I wanted to check out what else you had written since the 2nd, and following your advice again, I hacked your url to August 3rd. To my surprise, I arrived at a giant "404 Error" message. This definitely wasn't what I excepted and if I were any one of the 95% web reading population I wouldn't know what a 404 error was. Thankfully, you do suggest looking in your right hand navigation and provide a link to your home page.
I'm glad your site handles the error gracefully, but instead of telling the user to look elsewhere for the content they're looking for why don't you provide them with the content they want. For example, you didn't post to your site August 3rd and that's fine, but you could instead indicate to the user that (1) a post was not made to the site August 3rd, however, (2) there were other posts made in the month of August 2003 and (3) present a list of the the posts made the month.
This would help users, like me, who have never visited your site, and were not directly linked to it, find additional content immediately rather then having to navigate to another section of the site to find out that you didn't post again until August 5th.
Another thing, when using any kind of CMS that knows how to create a static site from database content, like Movable Type, wouldn't it be best to have it automatically generate "forward" and "back" links to each of your posts? This would definitely prevent the need to have hack the URL.
If you're going to discuss URL usability, then I think it's necessary to also discuss what should happen when a URL is not hackable. What do you think?
Article URLs week: Day 5
Throughout Article URLs week, I’ve tried to examine a good mix of news sites, although the ones I chose are hardly any kind of random sample. For the last set of reviews today, I’ve picked a few sites that I’m fairly fond of, either from having worked there or just for liking the way they’re designed.
- phillyBurbs.com: C+
I was an intern at this site two years ago (back when it had a different URL scheme). I don’t like the
pb-dyngarbage, the date is in the wrong order, and this URL is not hackable.
- naplesnews.com: B-
The year should be
03. The day of the month is missing, and a slug would be more enlightening than the ID number.
- seattlepi.com: B
Transportation is a nice specific section to post an article into rather than just a section like “local.” But I’m confused by the decision to use an ID number and a slug, and the day of the month by itself isn’t very enlightening as to when the article was published.
- LJWorld.com: B+
Very few other news sites have eliminated the unneeded characters of a filename extension (i.e.
.html), something I didn’t even mark as redundant in reviewing all the other URLs. LJWorld gets bonus points for that and for hackability. But there are still some unnecessary pieces, and there’s no date or slug.
- TCPalm.com: D+
It’s only fair to conclude with a review of the fairly awful URLs at the site I work for now. There’s a lot of unneeded garbage in addition to the main ID number. The section name cries out to be replaced with a hierarchical and hackable
tcp/pj_local_news/ article/0,1651,TCP_1121_2151279 ,00.html
pressjournal/localversion, but we can only do a single level. And I’d much prefer dates and slugs to the long numbers.
That’s it for my week of capsule reviews. I started out with a few ideas about the “best” way for news sites to implement principles for good URLs. I’ve thought it through more over the past few days, and tomorrow I’ll conclude the week with those recommendations.
In the meantime, there are a lot more news sites out there than the 26 I covered. If you have some favorite awful or outstanding article URLs of your own, follow Jason’s lead and post them here.
Nathan, as the man who wrote seattlepi.com's CMS, I can answer your question about the ID and slug. The slug has always come to the web site that way from the newsroom, in this case runway01. When we used to run Pantheon Builder (and who doesn't remember program), we always had problems with files overwriting each other. When I wrote our new CMS, I simply attached an ID to it to ensure each file had a unique name. The ID is actually handy to have when you need quick access to editing a file. Every article has a date published in it, so I really didn't see a need to attach that onto the URL.
I think the BBC deserve a mention for good hackable URLs:
Although the ID number isn't very informative (no slug or date) you can hack up to middle_east/ and then again up to world/. The 1/hi bit is unnecessary though.
Incidentally, is TCPalm.com running Vignette? The commas give it away.
Mike: Yes, I, too remember Pantheon Builder. Those were the days. I presume when you mention files overwriting each other you mean files from different days would overwrite each other, but that slug coming from the newsroom would at least be unique within each day? My thinking vis-a-vis dates was just that if you’re already adding six digits to the URL to guarantee uniqueness, as long as you can guarantee the slug’s uniqueness within each day, why not make those six digits a year and month? That could add some meaning to the URL without sacrificing length or uniqueness.
Simon: I shouldn’t have been entirely American-centric in my series; the BBC definitely has pretty nice URLs. And yes, we run Vignette at work. In addition to the commas, the other dead giveaway is the HTML comment (“<!-- Vignette V/5 Sat Aug 02 20:00:23 2003 -->”) it inserts all over the place.
Unfortunately, there is no absolute guarentee that the newsroom will have unique slugs. It is extremely minimal, but there is a chance (especially across departments) so I decided against it.
The problem with Pantheon if I remember correctly was mostly when you had the same slug more than once. It would attach a 1, then a 2, then a 3, etc. each time a new story came through. This was okay until you deleted a story somewhere in the middle. Pantheon then got confused on where it should number from and I remember having stories overwrite other stories (or publishing to URLs that used to belong to something else).
I would have to recommend Salon.com as having one of the best systems of URLs of any news site in the world:
The URLs are certainly are descriptive, and beautifully hierarchical. And you can drop the "index.html" from the main story page, and they still work. If they were hackable, I would nominate Salon.com for an A or A+.
One of the things I really noticed was the lack of 'ads.' It was rather nice. I might pay for something like that each month. ;)