Deciphering The Way Data Is Tested - Automate The Movement, Transformation & Visualization Of Data

Rajni Singh

Senior Manager, Nagarro

What is the quality of data?


Is it good enough to be collected, consumed, and interpreted for business usage?


And how should we use this data?


Many more questions when a tester involves in testing applications with big data, AI, IoT, and analytical solutions.


Ambiguity has always been a key challenge for testers – be it with the ambiguous definition of requirements or with unstable test environments. But testing a data, big data workflow adds a completely new level of uncertainty to a tester’s life for modern technologies.


Data Validation is simply verifying the correctness of data. The Big Data Testing Pipeline consists of horizontal workflows where data transformations occur continuously, managing a series of steps that process and transform the data. The obtained result can be settled into a database for analysis(Machine Learning Models, BI reports or act as an input to other workflows.


This session is to provide a solution to challenges faced while data testing for an application (with big data, IoT, a mesh of devices, artificially intelligent algorithms and with data analytics, like:

  1. Lack of technical expertise and coordination
  2. Heterogeneous data format
  3. Inadequacy of data anomaly identification
  4. Huge data sets and a real-time stream of data
  5. Understanding the data sentiment
  6. Continuous testing and monitoring


The research employed an open-source solution for the implementation. Apache Kafka was used to gathering Batch data and Streaming data (Sensor/Logs). Apache Spark Streaming consumed the data from Kafka in Realtime and carried out the validations in Spark Engine. Further in the workflow, the data was stored in Apache Cassandra and then configured in Elasticsearch and Logstash to generate real-time reports/graphs in Kibana. The proposed tool is generic as well as highly configurable to incorporate any open-source tool you require to work within streaming, processing, or storing the data. The system includes configuration files where every single detail of the dependent tool used is appended and can be modified according to the needs.


This solution aims to analyze various Key performance indicators for Big Data like a data health check, downtime, time-to-market as well as throughput, and response time. The tool can be considered as a pluggable solution that can efficiently drive big data testing and uplift the data quality for further usage.


Benefits & Takeaways: Attend this session to understand the basic need of future application testing.

  1. Understanding of data and importance of data quality
  2. Why automation is an essential strategy for data testing
  3. Vertical continuous flow for data and the horizontal flow of data in the pipeline
  4. Potential solution demo with an implemented use case for real-time experience
  5. Generic code will be shared with attendees for enhancement
  6. KPI’s consideration for data validation

Don’t miss to get ticket

  • What you Get

    A ticket to Discover SUMMIT 2021 from Sep 23-24, 2021 a two full day pass from 11:00 AM - 07:30 PM IST for one person.

  • Bulk Purchase

    If you want to buy tickets in bulk, get in touch with us at


INR 1499/- for
Indian delegates


USD 49/- for
International delegates


INR 1499/- for
Indian delegates


USD 49/- for
International delegates

Online Testing Conference

  • Date:

    September 23 - 24, 2021

  • Venue:

    Online Conference

  • Contact:
    +91 98230 64054

Contact Us