I’m Mary Beth Ainsworth,
the Global Product Marketing Manager for Text Analytics. And joining me today
is Simran Bagga, the Principal Product Manager
for Text Analytics here at SAS. Simran, thanks for
speaking with us today. SIMRAN BAGGA: Glad to be here. MARY BETH AINSWORTH:
Many people may not realize that text is the largest
human-generated data source, and it grows exponentially
on a daily basis. Just think about all of the
emails and text messages, social media postings, chats,
and even online reviews that we create everyday. SIMRAN BAGGA: That’s right. Organizations are
generating and storing large amounts of data,
majority of which is unstructured content. To truly unlock the value,
uncover emerging trends, and spot new
opportunities for action, there’s really only one
solution, text analytics. MARY BETH AINSWORTH:
Today, Simran is going to show us SAS
Visual Text Analytics, which is a comprehensive
text-analytics solution that combines natural-language
processing, machine learning, and linguistic rules. SIMRAN BAGGA: Visual
Text Analytics provides an end-to-end analytics
framework enabling users to prepare data, dynamically
explore and visualize text, as well as
build and deploy a variety of
text-analytics models. Let’s take a look at Visual
Text Analytics in action. In this specific use
case, I’m working with text reviews
from apartment rentals where customers have provided
their feedback, likes and dislikes, and also a rating
score related to the stay. My goal is to quickly uncover
insights and commonly occurring themes, understand
associated sentiment, and then build a
categorization model that classifies future reviews. In the Data Preparation window,
I can transform or subset my data. Here I only want to
evaluate the experience of rentals that cost over $200. After preparing my
data, I can choose to build text models
directly, but here I first want to explore the
text before I build the models. I’m using the text
topics exploration. Here I’m asking text
topics to be generated from the comments field, along
with the sentiment information. Topics provide more context
than just terms alone. So I can see without having
to read thousands of documents that customers are talking
about walking distance from restaurants,
feeling like home, or maybe unhappy about automatic
cancellation of reservations. Text topics are
discovered from your data using natural-language
processing, or NLP, and machine learning. NLP output feeds
into machine learning to identify these themes. MARY BETH AINSWORTH: Simran, NLP
is the most foundational aspect of text analytics. Can you highlight
some NLP capabilities and show how users
can modify topics for more-relevant results? SIMRAN BAGGA: Absolutely. NLP essentially helps
the machine understand the concept of a language. And in Visual Text Analytics
we provide native support for 30 languages. SAS’s NLP capabilities
include text parsing and contextual extraction. Entities, noun groups, as well
as relationships and facts relevant to the business
is valuable output from NLP that feeds into topic analysis
and categorization models. I’ve created a text
analytics project where I can enhance
the output from NLP to generate relevant topics
in categorization models. Inside the project, I
can see data attributes and the best-practice pipeline
that ships with Visual Text Analytics. The two nodes
following the data node give the user more
control over NLP output. I can choose to include
predefined concepts. These predefined
concepts are standardized across all languages. I can also add my
own custom concepts that allow me to extract
business-specific information. For the purpose
of this analysis, I want to extract any type
of room amenities or hotel amenities that my customers
have mentioned in their reviews. In the text parsing node,
I can see all the terms recommended to keep and drop. The user has complete
control over what terms they want to keep for analysis. I can select a
term like location and see other terms that are
associated with it in the term map. Here we see that location
is associated with the term great location, good location,
and also oftentimes it’s referred to with
the word perfect. The term great also
appears very frequently in my document collection. Clicking on show
similarity scores will return a
sorted list of terms that are similar to great. This can be useful in
creation of synonyms or linguistic rules. MARY BETH AINSWORTH:
These automated insights generated by NLP
can significantly reduce the manual effort
that goes into reading and understanding text. The user doesn’t need to have
a preconceived notion of what they’re looking for. SIMRAN BAGGA: Right. Visual Text Analytics
provides capabilities to leverage machine
learning, enable automation, and reduce time to solution. For example, Visual
Text Analytics provides a domain-independent
sentiment model that displays sentiment
scores in the user interface. This can be a
helpful aid for users trying to evaluate topics or
build concept or categorization rules. This allows users to see
relevance of sentiment scores. And users can refine
parameters to make the resulting topic set even
more meaningful and useful. You can select a topic to
see documents and terms that match that topic. You can also merge topics
that are similar in nature. For example, here I see walk,
few minutes, common station, and walking distance
to restaurants. Both these topics talk about
walking distance to something. As a user, I want to
merge these topics and create one combined topic. You can also select a single
term or a set of terms and then request a new
topic to be created. And you can also select
any topics of interest and promote them to categories. Here I’m selecting two positive
topics, “great apartment stay place” and “walking distance
from restaurants”, and two negative topics, one that
talks about noise and second that talks about automatic
cancellation of reservations, and promote these
topics to my categories. This will allow the user to
capture future reviews that represent these topics. Opening the category node
shows me these promoted topics and the rules that
it has automatically generated that I can
further operationalize. I can also rearrange these
topics in my taxonomy as I see fit. As you can see, I’ve
created two broad categories for positive aspects
and negative aspects. And I have captured “great
apartment walking distance” as well as automatic
cancellation and noise problems into this taxonomy. Visual Text Analytics enables
collaboration among users by allowing them to save
their best practice nodes and pipelines in the toolbox
that can be reused or shared with others. MARY BETH AINSWORTH: Let’s
talk about deployment. Now that these text
models are built, users have a few options
to operationalize them. Why don’t you tell us
about these options? SIMRAN BAGGA: Sure. Model deployment can vary
based on the use case. And Visual Text
Analytics provides the much-needed flexibility
in deployment options. Users may simply want to
visualize the output of a text model, use the text
analytics output as input into other advanced
analytic techniques, or deploy the analysis
into an operational system. Here I have an example
of a report that uses the output from a
categorization model, enabling decision makers
to visually understand key areas of concern. They can also
analyze and visualize relevant factors that lead to
specific outcomes of interest. In this example, I
have a decision tree that was generated using a
combination of structured data and topics derived from
unstructured text, which gives me quick
insights into factors that contribute to review
ratings given by customers. Visual Text Analytics also
enables model deployment in Hadoop, in
batch, and in stream to support edge analytics. The goal is to
reduce time to action and minimize data movement. MARY BETH AINSWORTH:
Simran, thanks so much for sharing SAS Visual
Text Analytics with us today. SIMRAN BAGGA: My pleasure. MARY BETH AINSWORTH:
Here at SAS we say that data without
analytics is potential yet to be realized. If you have text data, the
power of running text analytics and gaining insights on
that data is incredible. Thanks for watching.
Is there a book or step by step guide I can follow? Your demo tutorial are quite short.