Globally, Data Literacy is a major inhibitor to the successful exploitation of data as a valuable asset.
When describing and managing information assets and data capabilities, language becomes incredibly important. Especially today.
In 2017, there is some emerging and legitimate debate on the subtle, but important difference between using the term DATA versus the term INFORMATION.
This discussion is globally beginning to bubble up with professionals due to social and business changes, where business people are increasingly taking over leadership of data assets, and leading commercially and legally minded professionals in innovative discussions. This is a good trend, but does create some re-posturing of language.
Even professional bodies need to choose between the terms DATA and INFORMATION thoughtfully.
Take for example the language used by respected professional bodies like IAIDQ (International Association of Information and Data Quality ) formerly found at www.IAIDQ.org, after 10 years has changed it’s name to IQ International (Information Quality International) now found at www.IQINT.org. After a decade of pumping the “Data Quality” theme, this well respected organisation has dropped the D-bomb to help it’s relevance in a growing, non-technical audience (amongst other reasons).
Will other well respected and thought leading data professional organisations do the same (?), such as Enterprise Data Management Council (www.EDMCouncil.org) which is over a decade old or the Data Management Association (www.DAMA.org) which has been going for 30+ years ? CMMI’s Data Maturity Model (www.cmmiinstitute.com/data-management-maturity) ? etc etc
Enter the INFORMATION brigade, essentially coming at the same topics using slight different, but semantically equivalent language, such as AIIM – the Association for Information and Image Management (www.AIIM.org), Data Manifesto (dataleaders.org), Data Governance Australia (www.DataGovernanceAUS.com.au), Information Governance ANZ (www.InfoGovANZ.com), and probably the more interesting Information Governance Initiative (www.IGInitiative.com). These groups are gaining rapid momentum and membership, largely from Records Management, Privacy, Legal, eDiscovery and Document Management (unstructured data) industry professions. These specialised information intensive verticals are legitimate siblings to our traditional “data engineering” family pedigree.
According to Google Trends, global interest in the term “Information Governance” peaked in 2011 and search requests today (May 2017) are currently 50% that of the global interest in the term “Data Governance“. Interest in the term “Data Governance” has in fact tripled, in the same period (the last 6 years). In the Google numbers, Data Governance is being boosted by “Master Data Governance”, which has no credible “information” equivalent. So an understanding of both language sets is needed to detect difference, true meaning and credibility.
I have found it beneficial to carefully review the language from all these groups to harvest what is old (and rebranded), what is a duplicated overlap, and what is genuinely a new concept, like governing the explosion in Data Monetization (selling, renting or leasing data).
To take the language point further, let’s think about Data Models (just one of about 136+ data maturity model examples). Think about Entity:Attribute Relationship Modelling, E:R diagrams, Concept Data Models, Logical Data Models and Physical Data Models. These commonly used methods are what all information systems are build on regardless of underlying technology, and define business rules between different data elements (can a customer hold multiple accounts ? can an account have joint customers ?). Data Professionals have read books, attended classes and learnt or taught these skills globally in almost every language for 3+ decades. I think it would splitting hairs to argue that replacing the word “Data” with the word “Information” in the context of these well oiled disciplines suddenly makes these disciplines new, different or invalid. So some maturity is required to know when using the terms DATA vs INFORMATION change meaning or value or purpose.
Another word game might be taking the DAMA DMBok (Data Management Body of Knowledge, 450 pages of it, and 3 decades of global IP) and replacing every instance of the word DATA with the word INFORMATION. Would that make the DMBok different ? more valuable ? less valuable ? I don’t think so. However this document (and it’s notable expert contributors) has spawned an entire industry discipline and an array of peer group permutations which I find rather exciting. At least people are thinking about this stuff.
All these organisations carry varying degrees of momentum, pedigree and influence. Some possess and share significant, irreplaceable Intellectual Property around wholesome data/information management, like the EDM Council (www.EDMcouncil.org) who has been monitoring early data specific legislation on Capitol Hill for it’s Banking/Financial system members, and providing (collaboratively developed by their membership) Data Maturity and Assessment Models to their membership. They are bilingual (Data + Information) and understand the semantic difference between the terms “information” and “data”, and have not blinked, or slowed down in their research or value delivered to members for more than a decade.
Ironically with each of these industry camps, disagreement can erupt internally around what people mean all kinds of terms such as: “Data Lake”, “Virtual Data Warehouse”, “Logical Data Warehouse” , “Data Ownership”, “Information Policy” , “Information Governance” or “Information Management”. Even within these groups there can be room for disagreement and refinement on language.
- Do we all mean the same thing ? At a macro level the answer is yes, but at a micro level, the answer is a resounding and emphatic no.
- Can we learn from each other ? Yes.
- Do these meanings overlap ? Most of the time, yes they do.
- Does the industry have a language problem ? I believe yes, but it’s contextual and semantic, and mostly unproductive, and solved via education/literacy not false standardisation.
What is the solution ? I believe the answer lies in a thoughtful and clever blend of terms that satisfies Business and Legal people, along with Data Managers and Data Engineers as well. For now, let’s call it the “united nations language of Information and Data”.
The main language and communication issue around information and data nomenclature is simple. Agree on mutual understanding of an “information/data” term with whom you are addressing. If you achieve that alone, you will we be on your way to creating value and solving problems. Ambiguity is NOT your friend.
To be very provocative, over the last 20 years I have grown to dislike the terms “Information Governance” and “Data Governance”, they are so tired and passee. I prefer business people describing and naming what they want, like “Data Health”, “Data Assurance”, and then parking proven governance disciplines into branded language containers. Ironically the most successful Data/Information Governance programs I have seen did not use that “governance” terminology, but I do appreciate why we need some basic industry language, hence the emergence of industry groups.
Put simply, each working organisation on the planet, managing its own information assets, needs to define it’s own vocabulary to clarify “what it means” when referring to Information and Data topics, risks, capabilities and objects, and embrace it’s own “living” vocabulary. Rather than blindly accepting terms from vendors or narrow special interest groups.
To bring this debate to a close, I believe that this is where this all lands: The classic Data (physical), Information (abstract), Knowledge (intangible asset) value pyramid.
The reality when dealing with data engineering topics we do(have for 40+ years) , and always will refer to granular data as “Data“… I don’t see that language changing anytime soon by Data Engineers and Software (Technical) Professionals.
The word “Information” is a higher level of abstraction which can blur lines between raw data, data combinations, data formats, data objects and data delivery methods, and can carry rich business meaning such “Database”, “Spreadsheet”, “Brochure” “Contract”, “Product Disclosure”, “Song”, “Movie Credits”, “Colour Range”, “Product Catalog”, “Product List”, “Customer”, “Customer List”, “Patient History”, etc..