Hospital Call Center Reporting

Hospital call centers have reporting needs that are very helpful for call center managers. We recommend using Amazon Connect Contact center software to our healthcare customers because of it’s ease of setup, scaling capabilities and the ability to integrate into and leverage the huge AWS ecosystem. Let’s see how to get real time call center analytics with AWS.

Hospital call centers have various (real time and historical) reporting needs that are very helpful for call center managers. We recommend using Amazon Connect Contact center software to our healthcare customers because of its ease of setup, scaling capabilities and the ability to integrate into and leverage the huge AWS ecosystem.

In this particular post, let’s look into creating real time call center analytics / reports using various AWS capabilities.

We are assuming that you have already started using Amazon Connect Contact Center software.

First, make sure you have turned on Data streaming

Hospital Call Center Real Time Reporting - data streaming
Hospital Call Center Real Time Reporting – data streaming

If you do not have an Amazon kinesis stream setup (I am assuming that you do not), create a new AWS Kinesis stream.

Let’s start with the contact trace records first (CTR). Create a kinesis stream for it. 

Hospital Call Center Real Time Reporting - create Kinesis stream
Hospital Call Center Real Time Reporting – create Kinesis stream

It’s a good idea to start with 1 or more shards. For our call processing needs, we chose 4 shards. 

Now, go back to your Amazon Connect Call Center console and associate this data stream to handle your call center CTR records.

Hospital Call Center Real Time Reporting - use Kinesis stream
Hospital Call Center Real Time Reporting – use Kinesis stream

Your next step would be to create an application to process this data. You can see that there is a “producer” and a “consumer” section in the Amazon kinesis stream that you just created. Now, your Amazon Connect hospital call center call records will be the “producers” of data as you will tie in this newly created Kinesis stream to the Amazon Connect hospital call center console pretty soon. 

On this screen, you will be associating Consumers to the Amazon Kinesis stream that you created just now. This “consumer” will consume all the data that your Amazon Connect hospital call center Contact Trace Records (CTRs) generate.

Keep in mind that here, you do have the choice to bypass processing with an Amazon Kinesis Delivery stream to analyze your hospital call center data. You can choose to associate an Amazon Kinesis Data Analytics application directly to analyze your hospital call center data instead (there are some pros and cons discussed in another post).

Let’s say that you did choose to use an Amazon Kinesis Data Analytics application instead of Amazon Kinesis Firehose Data. Here, you will find an option to choose SQL or Amazon Fink. 

I doubt that your IT team has already developed the skills to manage reading data using Amazon Fink. If they do not know Amazon Fink, they most certainly know SQL. Go ahead and choose that.

Hospital Call Center Real Time Reporting - create Kinesis Data Analytics
Hospital Call Center Real Time Reporting – create Kinesis Data Analytics

We advise clients to ALWAYS use tags as it allows to filter data and associate costs back to various departments for chargeback purposes. Even if you are not asked to help with cost allocation at the time of creating this setup, it is always best to take care of tags now than later on. 

Hospital Call Center Real Time Reporting - create Kinesis Data Analytics tags
Hospital Call Center Real Time Reporting – create Kinesis Data Analytics tags

On the next screen, all you have to do is to associate the data stream you had created, with this analytics application.

Hospital Call Center Real Time Reporting - connect to Kinesis Data Analytics
Hospital Call Center Real Time Reporting – connect to Kinesis Data Analytics

Should you want to do any preprocessing with AWS Lambda, this is where you would choose the Lambda function. However, in this post, we are assuming that you have not created any Lambda functions to preprocess the call centre CTR records. 

One GREAT thing about using Amazon Kinesis Data Analytics for your hospital call center is that it allows you to do SCHEMA DISCOVERY. This in itself is a big time saver – do not underestimate that.

Hospital Call Center Real Time Reporting - Kinesis Data Analytics schema discovery
Hospital Call Center Real Time Reporting – Kinesis Data Analytics schema discovery

However, for Amazon Kinesis Data analytics to discover the schema, you need to wait for a bit while your call center agents make/receive calls. This generates enough call center CTRs for Kinesis Data analytics to do a proper schema discovery.

This will start the discovery process with the beginning of NOW and show you a single column schema first.

Hospital Call Center Real Time Reporting - Kinesis Data Analytics edit schema
Hospital Call Center Real Time Reporting – Kinesis Data Analytics edit schema

If you really want to, you can go ahead with a one column schema, however, it is going to be painful to do data discovery later on. We suggest that you click on the EDIT SCHEMA button and create a schema of your own.

For this, first look at the raw data, choose JSON as the format. 

Hospital Call Center Real Time Reporting - Kinesis Data Analytics edit schema as JSON

Next, copy the output and head over to some json linter tool (e.g. jsonlint.com) to clean/format it a bit. Now, you have enough data to create your own columns.

{
     "AWSAccountId": "90290668xxxx",
     "AWSContactTraceRecordFormatVersion": "2017-03-10",
     "Agent": {
         "ARN": "arn:aws:connect:us-east-1:90290668xxxx:instance/xxxea4c7-514d-43d1-bdb9-173e51b1e729/agent/xxx8523a-7207-45e5-a76f-935694a2faf4",
         "AfterContactWorkDuration": 0,
         "AfterContactWorkEndTimestamp": null,
         "AfterContactWorkStartTimestamp": null,
         "AgentInteractionDuration": 0,
         "ConnectedToAgentTimestamp": null,
         "CustomerHoldDuration": 0,
         "HierarchyGroups": null,
         "LongestHoldDuration": 0,
         "NumberOfHolds": 0,
         "RoutingProfile": {
             "ARN": "arn:aws:connect:us-east-1:90290668xxxx:instance/xxxea4c7-514d-43d1-bdb9-173e51b1e729/routing-profile/xxxe2ae3-6507-4e53-b5b3-c083ccbadd2a",
             "Name": "xxx Calling Routing Profile"
         },
         "Username": "nh-tanvi"
     },
     "AgentConnectionAttempts": 0,
     "Attributes": {},
     "Channel": "VOICE",
     "ConnectedToSystemTimestamp": null,
     "ContactDetails": {},
     "ContactId": "xxx52ed7-585d-4434-9405-e3ddf07b9528",
     "CustomerEndpoint": {
         "Address": "+1xxx5432200",
         "Type": "TELEPHONE_NUMBER"
     },
     "DisconnectReason": "CUSTOMER_DISCONNECT",
     "DisconnectTimestamp": "2021-02-16T19:02:27Z",
     "InitialContactId": null,
     "InitiationMethod": "OUTBOUND",
     "InitiationTimestamp": "2021-02-16T19:02:15Z",
     "InstanceARN": "arn:aws:connect:us-east-1:90290668xxxx:instance/xxxea4c7-514d-43d1-bdb9-173e51b1e729",
     "LastUpdateTimestamp": "2021-02-16T19:03:34Z",
     "MediaStreams": [{
         "Type": "AUDIO"
     }],
     "NextContactId": null,
     "PreviousContactId": null,
     "Queue": {
         "ARN": "arn:aws:connect:us-east-1:90290668xxxx:instance/xxxea4c7-514d-43d1-bdb9-173e51b1e729/queue/xxx57bef-0a79-4e33-ac23-80aa3eb02e9e",
         "DequeueTimestamp": null,
         "Duration": 0,
         "EnqueueTimestamp": null,
         "Name": "NISOS Calling Q"
     },
     "Recording": null,
     "Recordings": null,
     "References": [],
     "SystemEndpoint": {
         "Address": "+1xxx9002523",
         "Type": "TELEPHONE_NUMBER"
     },
     "TransferCompletedTimestamp": null,
     "TransferredToEndpoint": null
 }
Hospital Call Center Real Time Reporting - Kinesis Data Analytics verify schema results
Hospital Call Center Real Time Reporting – Kinesis Data Analytics verify schema results

Once you have added all the columns flattened from the JSON output of your Amazon Connect CTR records, you can start using this as your “real time” query engine. If you have no errors, you will see the following results.. 

Hospital Call Center Real Time Reporting - Kinesis Data Analytics verify schema results
Hospital Call Center Real Time Reporting – Kinesis Data Analytics verify schema results

And once your healthcare call center agents start making/receiving calls, you will see the data as well. Of course, in your data, you might have symbols that are reference data OR you might have these phone numbers mapped to contact records in your medical CRM. You can very easily connect your reference data as a CSV in S3 and then query the same in real time.

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - connect reference data
Hospital Call Center Real Time Reporting – Kinesis Data Analytics – connect reference data

Here, click on the Connect Reference data button. Before you can connect to a reference database / CSV in your S3, you need to upload your file to Amazon S3. Once you upload the file, then you can finish this section

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - connect reference data
Hospital Call Center Real Time Reporting – Kinesis Data Analytics – connect reference data

Even here, you can discover the schema like before

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - connect reference data
Hospital Call Center Real Time Reporting – Kinesis Data Analytics – connect reference data

Once it finishes discovering the schema (in our case, it is the NPI database), you will see the outcome as below and can also edit the schema

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - verify reference data
Hospital Call Center Real Time Reporting – Kinesis Data Analytics – verify reference data

We recommend that you edit the schema and give the column names something that makes sense for your data. This would be something you would refer to while running the SQL queries.

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - edit reference data schema
Hospital Call Center Real Time Reporting – Kinesis Data Analytics – edit reference data schema

Be careful about having commas (or the delimiter of your choice) in columns of the CSV you have uploaded to S3.

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - edit reference data schema
Hospital Call Center Real Time Reporting – Kinesis Data Analytics – edit reference data schema

It’s always a good idea to double check your CSV data with running Query with S3 Select on the CSV first, verifying the data that is being shown as formatted and as Raw.. and then proceeding with using this file in your Amazon Kinesis Data Analytics reference data.

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - verify with S3 Select
Hospital Call Center Real Time Reporting – Kinesis Data Analytics – verify with S3 Select

Now that you have set yourself up with reference data and to capture real time data in your real time hospital call center data analytics application using Amazon Kinesis Data analytics, you can proceed with querying this data.

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - add reference data
Hospital Call Center Real Time Reporting – Kinesis Data Analytics – add reference data

From time to time, you might see the following, but this should not concern you. This only means that no CTRs are available at that minute – that’s all.

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - add reference data
Hospital Call Center Real Time Reporting – Kinesis Data Analytics – add reference data

For our example here, we created a continuous filter like below.

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - add continuous filter
Hospital Call Center Real Time Reporting – Kinesis Data Analytics – add continuous filter

All it does are:

  1. Creates a stream first (CTR_DESTINATION_SQL_STREAM) that holds one of the columns we had created before (called DATA). This DATA column holds the entire CTR record for us. Ideally, you would select specific columns, but for brevity sake, we chose data. The CTR_DESTINATION_SQL_STREAM is where your CTR record output is recorded.
  2. Creates a PUMP that takes the data from the STREAM CTR_STREAM_PUMP and selects the column DATA (as mentioned above) from the stream you had created before- namely, SOURCE_SQL_STREAM_001.
  3. Allows you to continuously query the CTR_DESTINATION_SQL_STREAM table as a normal SQL query to get the real time analytics data you are seeking.

Wait for a few seconds and if your agents are making calls, data will start showing up like below

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - verify continuous filter results
Hospital Call Center Real Time Reporting – Kinesis Data Analytics – verify continuous filter results

New results are added every 2-10 seconds. 

At this point, you can also connect an in-application stream to a Kinesis stream, or to a Firehose delivery stream, to continuously deliver SQL results to AWS destinations. This could be a Lambda function, another Firehose stream, Kinesis stream etc. This way, you can write applications that read from any of these data sources to show you the desired charts.

Do keep in mind that the limit is three destinations for each application.

Let’s say that you want to store the outputs of your continuous queries in Amazon S3 as CSV files. 

For this, you would first create a bucket/destination to hold your CSV files.

Next, go ahead and create a new Kinesis Firehose stream.

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - connect to firehose delivery stream
Hospital Call Center Real Time Reporting – Kinesis Data Analytics – connect to firehose delivery stream

And for source, use direct PUT

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - connect to firehose delivery stream
Hospital Call Center Real Time Reporting – Kinesis Data Analytics – connect to firehose delivery stream

For the next step, you can choose to transform the record, but we are not discussing that in this post. Instead, disable transformation and proceed.

In the next step, choose the folder you had created in Amazon S3

We always recommend compressing the files you put in S3. In this case, you can choose Snappy (pretty good). You can also choose to ignore compression on this next page.

Wait for a bit and the Amazon Firehose stream would be created and ready for you.

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - connect to firehose delivery stream

Next, go back to Amazon Kinesis and connect to the destination you just created (the Amazon Kinesis Data Firehouse), choose the in-application stream you had created in the steps before and proceed.

Hospital Call Center Real Time Reporting - Kinesis Data Analytics - connect to firehose delivery stream
Hospital Call Center Real Time Reporting – Kinesis Data Analytics – connect to firehose delivery stream

You are now ready to go. As the continuous filter runs and produces results (i.e. your call center agents are working), these results will show up in your Amazon S3 bucket you created before.

Once you have your results in the Amazon S3 bucket, you can query it in your web application using Amazon Athena as well (super cheap to use).

Now, let’s achieve something similar with Amazon Quicksight – a powerful application for real time business intelligence. What’s more? It charges you for usage ONLY.

In this case, let’s go back to the Amazon Kinesis Data Stream you had created before. You can also create a brand new data stream if you want to experiment with these two methods. 

For our purposes, let’s create a brand new Amazon Kinesis data stream. 

Once you create a Kinesis data stream, you will see both producers and consumers in your data stream like below.

Hospital Call Center Real Time Reporting - kinesis producers consumers
Hospital Call Center Real Time Reporting – kinesis producers consumers

Next, let’s create a consumer that’s an Amazon Kinesis Data Firehose (the button that says “Process with delivery stream”).

Hospital Call Center Real Time Reporting - kinesis
Hospital Call Center Real Time Reporting – kinesis Data Firehose

So, effectively, you are taking the stream of data that Amazon Connect contact center is producing (Btw, do note that Amazon Connect produces multiple logs namely agent logs, contact flow logs, contact trace records logs etc), and you are storing them for further analysis. Your analysis could be real time or historical – each has to be treated differently.

What you want to do is to flatten those json files that Amazon Connect contact center CTR records produces.

Again, in this case, you can choose to not “Transform source records with AWS Lambda“. In this case, you are sending the raw data stream logs to Amazon S3. 

If you choose not to transform source records with AWS Lambda at the firehose level itself, you can flatten those files once they hit the S3 bucket. You will have to do this with an AWS Lambda that is triggered when the file hits the Amazon S3 bucket.

For the moment, let’s send these CTR log files to Amazon S3.

In your Amazon Kinesis Firehose delivery stream, make sure that you select the right Kinesis data stream for contact trace records

Hospital Call Center Real Time Reporting - kinesis contact trace records
Hospital Call Center Real Time Reporting – kinesis contact trace records

That’s the same one you are going to use in your Amazon Kinesis Firehose delivery stream.

Hospital Call Center Real Time Reporting - kinesis contact trace records
Hospital Call Center Real Time Reporting – kinesis contact trace records

This will log to a bucket of your choice. You can choose prefixes – e.g. fh_logs/ and fh_error_logs/ for your logs and error logs. The slash is needed if those are going to be directories. If you want to simply prepend a prefix you can choose fh_logs_ and fh_error_logs_ OR you can leave this entirely blank.

Place some test calls and answer your own calls to test this out.

Confirm that you do receive log files in the bucket you created. E.g.

Hospital Call Center Real Time Reporting - kinesis to Amazon S3

You can quickly query any of those files with Amazon S3 Select to see the output of the file

Hospital Call Center Real Time Reporting -query with S3 select
Hospital Call Center Real Time Reporting -query with S3 select

Thereafter choose JSON as input and output formats

Hospital Call Center Real Time Reporting -query with S3 select
Hospital Call Center Real Time Reporting -query with S3 select

Note that this is not a flat format – rather in JSON structured data format

Hospital Call Center Real Time Reporting -query with S3 select
Hospital Call Center Real Time Reporting -query with S3 select

So, your next step is to flatten these files. To do this, you will create a lambda function and associate it with a trigger. The trigger will be that when a file is added to your S3 bucket via a PUT operation, it will flatten the file and will write it back to the same Amazon S3 bucket.

You can use any language you are comfortable with. 

Here’s a great blogpost with the complete code.

Hospital Call Center Real Time Reporting - flatten the files
Hospital Call Center Real Time Reporting – flatten the files

All this does is to take the nested directory structure that Amazon Kinesis puts the CTR records in, within Amazon S3 and flattens it to the flatfiles directory. That’s about it.

This makes it increasingly easier for Amazon Glue to crawl the flatfiles directory and make any schema discovery and schema changes / updates if and when discovered.

Whoa wait – Amazon Glue? For what?

Simple – we are going to use Amazon Glue to discover a schema from the giant JSON that the Amazon Connect Call center CTR records are in. Schema changes can and do occur – that’s why this is not a one time thing. Keep in mind that Amazon Connect Contact center CTR attributes allow you to create multiple custom options based on whatever call flow you have decided. So, when this happens, you need to capture this information to flatten it out. 

These CTR attributes are just key value pairs, so instead of traversing through a giant list of key value pairs, you can use Glue crawler to discover the updated schema and update your database table with the new schema as well. 

That’s why we use Amazon Glue here.

This doesn’t mean that you run Amazon Glue crawler all the time. You can trigger the Amazon Glue crawler via events. So, based on CTR attribute change events, you trigger the AWS Glue crawler, discover and update the schema / database tables. That’s it.

So, back to the Lambda function you created just now. Next step is to make sure that it triggers on appropriate events – i.e. when a file lands in the CTR bucket you created in S3.

Hospital Call Center Real Time Reporting - trigger for the lambda
Hospital Call Center Real Time Reporting – trigger for the lambda

Once you have done this, make some calls to and from your number. You will see ctr records being created in the S3 bucket you created above. You will also see an additional folder and files inside that folder

Hospital Call Center Real Time Reporting - validate the S3 files
Hospital Call Center Real Time Reporting – validate the S3 files

Your next step is to create an AWS Glue Crawler. A crawler connects to a data store, progresses through a prioritized list of classifiers to determine the schema for your data, and then creates metadata tables in your data catalog. 

Go ahead and create an AWS Glue crawler.

Hospital Call Center Real Time Reporting - AWS Glue crawler set up
Hospital Call Center Real Time Reporting – AWS Glue crawler set up

See above for the configuration that you would need to create.

Keep in mind that your Glue crawler ETL job really does not need to crawl ALL the files for eternity. It just needs to crawl the latest files to find schema changes, so make sure you set a lifecycle policy on your flatfiles folder to something short – e.g. 1 day or 2 days (there are LOTs of files – albeit small).

Run the crawler after creating it. If all goes well, the table will show up in AWS Glue (and you can view the data / contents using AWS Athena). 

If there are any errors, check CloudWatch for the possible errors

Hospital Call Center Real Time Reporting - AWS Glue crawler set up
Hospital Call Center Real Time Reporting – AWS Glue crawler set up

If you do not like the schema that Amazon Glue has inferred / discovered for you, you can edit the schema as well

Hospital Call Center Real Time Reporting - AWS Glue crawler edit schema
Hospital Call Center Real Time Reporting – AWS Glue crawler edit schema

Next, go to Amazon Athena and find this database and the table.

In all probability, it will look something like this

Hospital Call Center Real Time Reporting - AWS Glue crawler edit schema
Hospital Call Center Real Time Reporting – AWS Glue crawler edit schema

Note that this table still has nested JSONs as various columns.

You could store the files in this format as well, but we recommend transforming these files into parquet format from the JSON format that it is currently in. 

If you modify the files to parquet format, the storage is optimized a lot AND on top of it, athena queries become infinitely easier to run (much like regular SQL rather than running UNNEST commands that are typical of PrestoDB.

One HUGE benefit of using Amazon Athena is that you only pay for the queries you run 🙂 We LOVE it.

Keep in mind that you want to store call logs for an extended period of time because you want to perform historical analysis / reporting. Meanwhile, you want the latest (maybe the last 15-30 mins) data to perform “real time” reporting as well. Read more on call center reporting here

After you have data in the flatfiles folder, you can use AWS Glue to catalog the data and transform it into Parquet format inside a separate folder. In this case, the AWS Glue job performs the ETL that transforms the data from JSON to Parquet format. However, do keep in mind that jobs cannot be run faster than 5 mins. So, that’s a limitation you need to be aware of and work around / work with.

Considering that this is not a real time stock trading application, you can get away with processing data every 5 mins.

Great, so your next step would be to (again) use another AWS Glue crawler to crawl through this parquet directory and create a table for you.

You have already created a crawler before, so this shouldn’t be a problem for you.

Go ahead and create a Glue Job to transform your JSON files in the flattened directory to parquet format.

Hospital Call Center Real Time Reporting - AWS Glue Job create
Hospital Call Center Real Time Reporting – AWS Glue Job create

We would advise you to use AWS Glue Studio to visually create and monitor jobs

Hospital Call Center Real Time Reporting - AWS Glue Job create
Hospital Call Center Real Time Reporting – AWS Glue Job create

And use these parameters to drive the job

Hospital Call Center Real Time Reporting - AWS Glue Job create
Hospital Call Center Real Time Reporting – AWS Glue Job create

Click on Advanced Properties dropdown and fill in the details. We tend to collocate our work logically in a single bucket, but you can choose to do this however it works for you.

You will notice that the AWS Glue job is ready to be run. You can click on the Script tab to fill in the details. 

Hospital Call Center Real Time Reporting - AWS Glue Job create
Hospital Call Center Real Time Reporting – AWS Glue Job create

The entire script is available on AWS Blog

Keep in mind that this script does a couple of things

  1. It waits for the first AWS Glue crawler to finish running.
  2. Then it uses relationalize to flattens the nested json attributes of the data
  3. Then, it removes fields with null values
  4. Then it deletes the repeated records using distinct
  5. It also turns column names to lowercase – this follows Athena Best Practices (Hive ones)
  6. Finally, the result data is sent to the target

Go ahead and run this first crawler. Run the job to try it out. If there are errors, you will find them in the AWS Cloudwatch error logs (see link). 

If there are no errors, you will find the ctr files now available in the parquet/ctr bucket you had created.

Hospital Call Center Real Time Reporting - AWS Glue Job validate parquet outputs
Hospital Call Center Real Time Reporting – AWS Glue Job validate parquet outputs
Hospital Call Center Real Time Reporting - AWS Glue Job validate parquet table
Hospital Call Center Real Time Reporting – AWS Glue Job validate parquet table

Next up, you can create a trigger and schedule it to run every 5-10 mins.. And have it kick off the AWS Glue crawler.

Hospital call center reporting - edit AWS Glue trigger
Hospital call center reporting – edit AWS Glue trigger

This trigger is going to kick off the parquet transformation job you had created.

Hospital call center reporting - AWS Glue trigger your crawler
Hospital call center reporting – AWS Glue trigger your crawler

Keep in mind that the script in that transformation job starts the source crawler at this line.. glue.start_crawler(Name=sourcecrawler) . The sourcecrawler is what you fed the script. In our case, it was “nengage-ccp-crawler”. 

Schedule it every 5-10 mins (example) 0/10 13-22 ? * MON-FRI *

Keep in mind that the cron is in UTC timing (hence the 13-22)

Verify that your crawler is running

Hospital call center reporting - AWS Glue trigger your crawler
Hospital call center reporting – AWS Glue trigger your crawler

This crawler, as you recall, crawls the flatfiles folder in your S3 bucket to discover changes to schema (if any).

Typically you would see this output if you haven’t made any changes

Hospital call center reporting - AWS Glue trigger your crawler
Hospital call center reporting – AWS Glue trigger your crawler

After running this crawler, the transform job next updates the parquet files in your destination path – /parquet/ctr within the S3 bucket you had created.

Hospital call center reporting - AWS Glue trigger your crawler
Hospital call center reporting – AWS Glue trigger your crawler

After this, it kicks off the results crawler – which in our case was the “nengage-ccp-parquet-crawler”. This crawler, if you recall, crawls through the parquet files created by AWS Glue job and infers schema for the parquet formatted CTR data.

So, if there were any updates to the schema, you would see the updates reflected in the parquet table as well.

So now, you have 

  1. Captured CTR logs of your Amazon Connect Contact center instance via an Amazon Kinesis Data stream (real time).
  2. You have created an Amazon Kinesis Firehose data delivery stream to take the Kinesis stream of CTR records and delivered the file to an Amazon S3 bucket of your choice.
  3. You have created a lambda function that takes the files from the deep folder structure in the Amazon S3 bucket and moves them to a flat directory called flatfiles.
  4. An AWS Glue crawler was created to take the files, infer the schema, create a table for you. This table is accessible to you in Amazon Athena. In our case, that table was called ccp_flatfiles.
  5. You have created an AWS Glue job cataloging the data and transforming it into Parquet format. After cataloging, AWS Glue places this data inside a separate AWS S3 bucket – e.g. parquet/ctr.
  6. You created another Glue crawler to take the parquet files, infer the schema, and create a table for you. This table is also accessible to you in Amazon Athena.