Project Overview
A client receives 100M+ transactions a year from external users of their flagship product.
The client's existing reporting processes were ad-hoc, complicated and time-consuming, due to the well-defined but very complicated schema their transactions use, and the wide variety of secondary data sources required to derive business meaning from their transactions.
The client's business analysts wanted a flexible, easy-to-use reporting tool that could provide predefined as well as ad-hoc queries and visualization on large and growing data sets. Their IT department wanted a scalable, secure cloud-hosted solution that allowed long term storage.
We developed an end-to-end big data solution hosted entirely using Amazon Web Services (AWS) cloud. Data is piped through a collector micro-service, parsed into a data lake made up of numerous S3 buckets, and then automatically copied into Redshift data warehouse using Lambda. Tableau Server and Desktop are used to host and present 50+ detailed visual reports to the business analysts.
Project Goals & Objectives
The business goal was to drive reports around the functionality and health of the client’s flagship product, identify capability or process gaps, and provide longer-term reporting that may assist with help desk and IT operations. The reporting helps identify common patterns in end user submission practices, procedures, and errors.
The IT goal was to use AWS cloud hosting and services wherever possible, and scalably store a large and growing number of transactions for years to come.
Project Results
The solution we provided involved:
- Regular client workshops;
- Frequent prototyping;
- AWS infrastructure setup and hardening;
- Detailed architecture and design work including with third party organizations;
- Data processing and analysis;
- Custom development;
- Deployment;
- Ongoing testing and support.
We produce the end-to-end solution in approximately 12 weeks of elapsed effort.
One of our major success factors was that we spent a great deal of time working closely with the principal business analyst to obtain a deep knowledge of their business model, goals & objectives, and data schema.
Once we had that shared understanding, we identified their desired reports and processes using mock reports. Instead of working with their raw data formats, we mocked up the "desired data schema" that would facilitate their ongoing reporting. We then worked backwards to map their actual raw data with their desired state.
By keeping the myriad of complex data and technology issues "under the hood", and optimizing the solution performance for reading (reporting) instead of writing (data intake), we thus ensured that the end user reporting and analysis experience was never compromised, regardless of the state of their raw data.
Another major success factor was our decision to use a variety of best-of-breed server and serverless technologies including AWS S3, Lambda, Redshift, EC2, Tableau Server, and Tableau Desktop. These ensured our solution contained the most supported and feature-rich big data technology.
Where required, we developed code and scheduled PowerShell scripts as "glue" to integrate the various components. But we limited the custom coding and attempted to use industry standards such as SQL wherever possible, to support future upgrades.
Finally, we acknowledged up front that business queries will change over time (as they have). To accommodate this, we designed the system to handle a wide variety of future data sets and foreseeable reporting needs. In particular, we built our data warehouse using the radical MAD schema approach (see "Going MAD with Tableau and AWS Redshift").