Trust? ==> 0 or 100 ... or perhaps 42 ?
When I ask the Google voice search:
"What time is it in Mountain View?"
Why do I trust the answer returned?
"Who invented the world?"
the answer brings tears to my eyes from laughing?
You know the answer should have been:
The level of trust we have in information depends on the trust we have in the original provider, the trust we have in the whole chain of intermediate transmissions and/or transformations up to the trust we have in our own sensory input.
I like to make a distinction between carbon-based and silicon-based producers (although silicon is perhaps a bit to imprecise).
Starting with the latter, we can separate them in two categories: the ones providing raw data (a surveillance video camera e.g.) and the ones providing interpreted data (temperature sensor, smoke detector or a smart collision detector camera in a tunnel). These last ones can be seen as a combination of a raw sensor and one or more chained interpretation services (software in most cases).
As sensors a relatively cheap, often multiplying the number of sensors can validate the produced information. In case of interpreted data, we should of course prefer different interpretation services to avoid or detect errors in a specific one.
Taking a helicopter view, what is a search engine? Input: the Internet (text, images, video), services : indexing, filtering, output: references related to a query. Or like in this example: input: YouTube still frames, output: cats and faces.
Thus the frontier between traditional sensors of our physical world and sensors of our digital worlds (a crawl bot or a collision detector in a virtual world) is perhaps not so clean as it might appear initially.
Remains the last element, the human producer, before handling the essence of this post (sorry for the long introduction). Yes, I know that dogs are also producers (of ...).
In some sense we are a bunch of sensors operating since our birth, a bunch of input interpretation services called brain, a bunch of restitution services called memory (recent neurological research shows that different kind of input is stored in different parts of our memory) and finally a bunch of production services (voice, body language and the fingers used to type this post - dictation is not good enough yet).
All of them are subject to errors.
I forgot one. The only one which doesn't make any errors (by definition): creation services.
Trustworthy? Make up your own mind.
Apparently there is a strong need for some kind of validation (a term silently introduced above) especially for human produced information. At least if we feel the need to think about trusting it or not.
My ideal information world?
OK +Google Glass (or some other device or app), show me how reliable this information is and why.
- independency of original sources
- trustworthiness of the independent sources
- chain of dependent sources
Big data? Sure!
Done? Nope! (only partially for recent productions: retweets and re-shares e.g.).
Now it's time to use the magic word "Semantics".
One thing that is necessary is establishing identity of the producer. Mind that for humans this is not necessarily an identification of the physical person, an avatar is fine. Nothing wrong with having multiple identities. There is only a need for enough data to establish some meaningful (initial) state of trustworthiness, authority or whatever term you would like to use.
The meaning (read semantics) of the information produced. This becomes better and better but there is still much room left for improvements (or perhaps we should not go for the 10% improvement but for the 10 x solution - i.e. come up with an entirely new way). This is needed to trace back when and by whom the same information was given. The copy - paste of text is easy to detect, but we are still in the childhood of detecting copy - paste of meaning.
And a facilitating role we all have: cite your sources! This is common is scientific sources. But elsewhere? We will have another facilitation role, but that will be the subject of another post.
Why this post now and here?
How many books have been scanned and/or are available online? How many articles are online? And blog posts? Not to mention all the rest. The source data is there. And so are the (not yet perfect) techniques.
Computer power is also available.
What are we waiting for?