In the last 5 years we’ve been doing much more Text Analytics (TA) than ever before (and we would say that Text Analytics and Text Mining are one and the same).
In a research context, practitioners often ask us how the methodology relates to tried and tested verbatim coding. The first answer to verbatim coding is that it is automated and not manual. Though there is an initial setup stage which is partly automated, but which requires the same coding scales to make sure that the automated coding – often called “categorising” in TA speak – is curated and accurate. And if it is an ongoing project, someone needs to keep an eye on it to review ongoing accuracy and to look out for the emergence of new themes.
Like verbatim coding, the object of the exercise is to turn unstructured data, i.e. open-ended text, into structured data, i.e. numbers e.g. how many respondents mentioned my brand (and as per the next point how many did that in a positive/negative sense). Numbers that we can analyse in the same way as, and often in combination with, structured data we collect. As an example of that, we’ve been working with a car company to link the themes and sentiment coming through their ‘Voice of the Customer’ data to future revenue.
TA tools also specialise in identifying sentiment. Usually simply positive/neutral/negative sentiment, but sometimes including more granularity/valence.
Linking text and sentiment
We usually look to associate/link the sentiment with another concept/term e.g.
The staff in the contact centre were so polite
Could/should translate to a positive data point for the contact centre staff.
This is often called Text Link Analysis (TLA)
It isn’t perfect … but it can be refined/improved
When it comes to sentiment in particular, no software we are aware of gets it right all the time. You will often hear that 80% accuracy is good for a given piece of data. And anything close to 90% is exceptional.
There are a lot of choices when it comes to software, in part because it is relatively new and in part because it is an area that touches on some of the hottest topics in tech including AI (Artificial Intelligence).
Most of our experience to date has been with what might be described as, “traditional” TA tools.
Broadly speaking the traditional tools use one or both of:
A. Natural Language Processing (NLP)
B. Built-in Resources that look for recognised terms and patterns in the data
Just some, of the many, options worth considering are:
From the tried-and-tested/traditional camp:
The Text Analytics modules from (the usual suspects) IBM/SPSS and SAS
Lexalytics is another well established offering but – unlike the big analytics vendors their focus is TA and they offer a SaaS platform
Kapiche is an interesting twist on the theme. It offers cloud-based self service text analytics for the business/research user (as opposed to the “data scientist” as we say these days).
In the cutting edge/AI camp:
A number of MR industry veterans, including Pat Molloy, are behind Digital Taxonomy. With an MR focus we feel it ought to be in the consideration set for anyone in the industry embarking on a TA journey.
Beyond that there are a plethora of – usually cloud-based – APIs/Services that offer
Some are from the big players like IBM and Microsoft e.g.
IBM/Watson offers services like the Tone Analyzer:
And Microsoft have a service that used Deep Learning algorithms to detect emotion
In our next newsletter I’m hoping to be able to report back on our experiences with some of the newer/more cutting edge tools and methods, so stay tuned!