Flat Vs Hierarchical information storage

Really interesting blog post by Tom Evslin (creator of Microsoft Exchange Server) on how information systems have evolved in a comparatively short period of time from being Hierarchical to Flat:

The WorldWideWeb is where Moore’s Law met Metcalfe’s Law. Information management – the way we find out what we want to know – went from hierarchical to flat in just a few years as a result. We now assume – usually correctly – that we can find any particular piece of data from a railroad schedule in Estonia to a quote by an Argentine novelist on the Web within minutes of wanting it….

…we all assumed that most people would approach information through the categories they assigned the information to….To put it mildly, we were all wrong!

People don’t think hierarchically – at least most people don’t. We think in terms of associations….

When we were working on our first ELN projects (in the Mid-90’s) categorization and hierarchy was on everyone’s mind. Before a scientist created an experiment we had them fill out all sorts of metadata about it, and we’d have day long meetings as part of the implementation where the records management, library services, IT, and the user representatives would thrash out what they needed to have for each project. We’d be trying to keep the amount of metadata down to a few elements (at some point the users just put anything into the field because they’ve had enough of filling in silly boxes) but there was still a huge amount of pressure. Then, once we’d figured out all the metadata, we’d get into yet more meetings about how the information should be presented. An awful lot of pain for everyone involved….

Fast forward to today. Our ELN can still capture and track metadata, and show the content of the system in different ways (e.g. you can drill down by project). But it is much less of “thing”. I guess it must come up only 50% of the time, and when it does they only really want a few items – generally, what they need to implement a records management process and maybe make the list of documents pretty. Sure, we’ve improved the product so the whole metadata issue is less hassle – we can now extract most of the metadata transparently rather than bugging the user, and we’ve got a more open framework to manage it. But even so, metadata is less on people’s minds.

I’m not saying metadata isn’t important, because it is. But is isn’t as big a thing as it was. Partly that’s because we’ve got better tools (primarily more CPU/Memory/DIsk), partly because we’re comfortable using Google and know that full text search really does work even on large bodies of content, but also because people realize that acquiring metadata isn’t cost-free. So the tradeoffs have changed – and most importantly, we’re all members of the Google generation.

The great thing is that ELNs are becoming much more lightweight. Less disruptive to the existing processes (but still delivering huge amounts of benefit), and cheaper/quicker to install too (because we’re not spending 2 days in meetings to figure out how to configure the thing).

Thanks Google 🙂

“Feeling Secure”

Bruce Schneier quotes “Confessions of a Master Jewel Thief” : ”

Nothing works more in a thief’s favor than people feeling secure. That’s why places that are heavily alarmed and guarded can sometimes be the easiest targets. The single most important factor in security – more than locks, alarms, sensors, or armed guards – is attitude. A building protected by nothing more than a cheap combination lock but inhabited by people who are alert and risk-aware is much safer than one with the world’s most sophisticated alarm system whose tenants assume they’re living in an impregnable fortress.

I get uncomfortable when I hear people placing too much faith in technology – partly because the technology often lets us down, and partly because when we trust in the technology we let our guard down. That’s why I prefer simple, boring systems for Patent Evidence. Sexy technology (like Digital Signatures, Encryption, third parties) are useful in their place, but only as part of an overall system where the system owners understand they – and only they – responsible for the system’s integrity. Also, bear in mind that a lot of the technology people think they want to use for Patent Evidence systems is actually aimed at different problems (e.g. commerce) and is potentially being misapplied for the patent area.

I was with a customer the other week and they wanted to implement a particular technique which didn’t really do much for the system’s integrity and actually made it a whole load more complicated (and expensive). “Why do you want to do this?” I ask – “Because it makes us feel comfortable” they said. Well, the customer is the ultimate arbiter (it is after all their system) so if they want it, they can have it – but me, I’d rather everyone involved in running this system felt very uncomfortable because then they’ll keep an eye on it.

Sometimes to be “safe” you have to feel unsafe. Only the paranoid survive etc. 🙂

SHA-1 Broken

From Bruce Schneier’s web log: SHA-1 has been broken. In simple terms, a Hash algorithm takes a document and generates a fingerprint for it. If the document changes, so should the fingerprint. See here for a more more detailed explanation.

Hashes are an important part of any Digital Signature scheme, and SHA-1 is one of the more popular (and until now well-respected) Hash algorithms. Any flaw found in the hashing algorithm is a serious problem for any ELN system that uses Digital Signatures to prove a record has not been changed. As Jon Callas, PGP’s CTO, puts it: ‘It’s time to walk, but not run, to the fire exits. You don’t see smoke, but the fire alarms have gone off.’

So this isn’t a crisis – if you were using digital signatures yesterday, they haven’t suddenly become worthless today. But it is a reminder that nothing lasts forever, especially in Cryptography. Flaws are found in algorithms, computers get faster, and soon it costs a mere $1m to forge the Digital Signature on the document that proves you have the rights to a Blockbuster drug. Picking a “better” algorithm or longer key isn’t going to help – erosion of Cryptographic tools is a fact of life.

Any Evidence system needs to have a series of controls and capabilities to show that a document has been unaltered. Digital Signatures are one of the tools you would use in such a system (we use them in our PatentSafe product) but they shouldn’t be the only thing you rely on. Indeed PatentSafe has a whole series of checks & balances in it and if you run the system properly, even if it turned out the signature algorithm was worthless, you’d still be able to use your records in court.

Unfortunately there are some vendors who are relying on Cryptography alone to prove authenticity. Listening to their marketing, they’re treating the technology as a silver bullet – “Buy this magic, and all your concerns are over”. This has always made me intensely uncomfortable, especially given the timescales that our customers expect to be able to use the records they put into our systems. Fortunately given the level of concern over Electronic Records for patents, few customers have actually implemented such systems – the laggards have been proven correct!

It is possible to do perfectly safe, effective, Electronic Records systems for Patent Evidence Creation and Preservation. Unfortunately, it is a lot harder than just implementing some Cryptographic magic, as the demise of SHA-1 shows. My problem is that it is much easier (and more seductive) to talk about apparently sexy technology rather than the real (but less exciting) issues around good electronic records systems 🙁