How to leverage Data Set, Data Flow and NLP Text Analyzer in PEGA Application?

Sandeep Pamidamarri
5 min readSep 27, 2022

In this particular post, let’s try to leverage the Data set, Data Flow and Text Analyzer with a real-time example in PEGA Application. For an organisation, there can be different sources of data to read. Some of the Incoming Data Stream include Social media platforms [Facebook, Twitter, Youtube, etc…], any File-System, AWS Kinesis, Apache Kafka, Database table etc…

All the incoming different data streams are configured as Dataset instances. Data flows are data pipelines to read, transform, filter, update, merge or migrate the data. For a data flow, the source can be a data set / application-specific data retrieved using the Report definition. The text analyzer NLP model is to read the incoming text to identify the sentiment and topics and to extract the entities.

Take a scenario — An organization would like to understand the sentiment of a customer complaint. Take an assumption — all customer complaints across all platforms are consolidated in a JSON file and placed in a file system.

Step 1: Configure a simple case to capture the customer complaint

I won’t go into the details on how to create a case. It is a simple case to capture complaint info and sentiment.

Escalate stage will execute only if the sentiment is negative.

Sample case user interface to capture two fields.

Step 2: Configure a Data Set that pointed to File System

As mentioned, the assumption all customer complaints are consolidated into a JSON and placed in the file system.

Sample JSON

"customerComplaint":"I am happy with the service",
"customerComplaint":"I am not happy with the service",

Here in this example, I am using the Localhost system — the default store repository pointed to the temp folder. I created the folder with the name — CustomerComplaintFiles and placed the JSON below.

Create a data set as below and select the File System — File as the source

Configure the Data set remaining fields as below and select the File type as “JSON”

Step 3: Configure a Data flow

Configure the data flow as below — we will go through all the shapes involved.

a. Source — Data set as the data source: Configure the previously configured Data set as the source

Note: As the customer complaint is the primary class — configure the data set related — JSON fields in the class. These properties help to access the fields present in the sample file JSON. PEGA does the mapping automatically.

b. Data Transform — To populate the inputs required for the next shape — NLP text analyser.

Highlighted — pyText is the input to the NLP text analyzer.

c. Text Analyzer

In this example, I am using the Pega OOTB Text Analyzer — pyDefaultInteractionAPITextAnalyzer

Observe the input and output of the text analyzer rule form.

The outcome of the NLP analysis is stored in the “pyNLPOutcome” page property.

d. Destination — Create Case

The destination of the data flow is to create the customer complaint case.

As mentioned, the case is simple to have two fields:

  1. Complaint info — that gets populated from the file JSON
  2. Sentiment — It is the outcome of the NLP Text Analyzer

Step 4: Execute the data flow

We can execute the data flow in three different modes [Single case processing, real-time processing and batch processing]

Let’s execute the data flow in batch processing mode.

Note: Select the service instance name to execute the data flow as “BackgroundProcessing” mode.

Click start as part of the processing

See the CustomerComplaint case instances

Open the case and see the prepopulated case-specific fields

Wohooooooo….. Congratulations :) Now you successfully learned to levarage the Data Set, Data Flow and Text Analyzer rule forms in the PEGA Application.



Sandeep Pamidamarri

Digital Transformation Leader | Pega Lead Solution Architect | Pega Certified Data Scientist | Pega Customer Service | Pega Sales Automation | AWS Cloud