StreamSets Data Collector is unusual in its ability to monitor and alert users to data quality issues in real-time data pipelines. Cleverly, it’s able to do this either in real time, by sitting in the data pipeline thanks to its in-memory architecture, or in batch, when sitting on Hadoop. With the release of v1.2, it now supports Hadoop distributions from Cloudera, MapR and Hortonworks. It’s also certified with the MapR Converged Data Platform including extended support for MapR Streams. What the company lacks so far is any proprietary technology to sell over and above its open source StreamSets Data Collector – something we expect it to address later this year.
Download the full 451 Research report on how StreamSets Data Collector Focuses on Data Quality of Data Ingestion.