At SIFR our opinion is that big data or distributed computing is one of the key technologies that has empowered the AI revolution. Distributed computing technologies such as Hadoop or Spark gives us the ability to process extremely large datasets quickly. In order to train a good machine learning algorithm a lot of data is required.

Big data is a collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis.

We work with large datasets daily, which also means designing, building and implementing big data solutions such as:

  • Computation engines: Spark, Hadoop
  • NoSQL databases: HBase, Cassandra, MongoDB
  • Messaging: Kafka
  • Stream processing: Flink, Spark streaming