Author: Marcin Bernaś

Big Data nowadays

Big Data nowadays

The Big Data is a relatively new area of research and it merges various areas like cloud computing, data science and Artificial Intelligence. Its definition it was proposed in 2012 [1], where large Volume of data, Variety in kind was processed taking under consideration its Velocity (thus 3V). Nowadays definition evolved to 5V [2] and takes under consideration both veracity (treated as quality of captured data) and its value (as usefulness).

Over the last decade the Big data processing pattern was established and takes under consideration the following elements: ingest (data collection stage), store (managing and storing data – also in real time), process (managing data), analyse (obtaining vital information) and insight (data consumption in form of information or data for further applications).

The process usually starts with data collection (data are meeting 5V definition). The data are usually processed as data logs (e.g., Flume), bulk data (e.g., Sqoop), messages (e.g., Kafka), dataflow (e.g., NiFi). Then large data are processed using computing engine as batches (e.g., MapReduce) or streams (e.g., Flink, Spark, Storm, Flink). Data (structured or not) are analysed using machine Learning methods (e.g., Caffe, Tensorflow, Python), statistic approach (SparkR, R) and then visualized (e.g.  Tableau, GraphX). It is worth to keep in mind that created solution is constantly changing and solution should be updated (e.g., Oozie, Kepler, Apache NiFi). The obtained data can be managed by various solutions e.g., Apache Falcon, Apache Atlas, Apache Sentry, Apache Hive. The important issue is also data security (e.g., Apache Metron or Apache Knox) or new technology that changes ways and types of data like InfiniBand or 5G.

The Big data has over 10 years and it is reaching new heights, thanks to the vast adaptation and companies that are delivering new tools. Looking at the summary [3] the number of technology and solutions is overwhelming (look hire).

During our research we are looking for competencies required by the international and local market. Based on our analysis and trends [4] we identified classic opensource Hadoop, Spark and Storm solutions and technologies that gain popularity. Our research is focusing on open-source solutions that could be used on dedicated infrastructure or Big data cloud services provided by leading platforms like AWS, Microsoft Azure or Big Query by Google.

In our research we keep in mind that the market is flooded with new mechanism and pipelines constantly to allow to tackle with big data in simpler and unified way. The solutions tend create solutions which simplifies the Big Data analysis and make it easier to use. There are several solutions [3], which shows current trends:

• visual analytical tools that allow to focus on data analytics using simple calculations or point-and-click approach, while gaining support in big data storage, real-time management and security. The services allowing this are Arcadia Enterprise 4.0, AtScale 5.0 or Dataguise DgSecure 6.0.5;
• frameworks that allow to create application based on Big Data using DevOps capabilities and big data transformation support. They allow to utilize known languages as R, Python or SQL. They are Attunity Compose 3.0, Cazena Data Science Sandbox as a Service or Lucidworks Fusion 3. Some solutions like Couchbase suite are directed for web, mobile, and Internet of Things (IoT) applications based.
• solutions that are helping to provide data as service for applications. They are using pipelines like Microsoft Azure or Hadoop ecosystem and change them into information platform (Paxata Spring ’17, Pentaho 7.0 or Qubole Data Service).

[1] Wu, X., Zhu, X., Wu, G.-Q. and Ding, W. (2014) Data Mining with Big Data. IEEE Transactions on Knowledge and Data Engineering, 26, 97-107.
[2] Nagorny K., Lima – Monteiro, P. Barata J., Colombo A.W., Big Data analysis in smart manufacturing. Int.J.Commun.Netw.Syst.Sci.10(2017)31–58
[3] The Big data technology map:
[4] Yesheng Cui and Sami Kara and Ka C. Cha . Manufacturing big data ecosystem: A systematic literature review. Robotics and Computer-Integrated Manufacturing, 62: 101861, 2020.
[5] Article online:

Big Data specialists needed?

Big Data specialists needed?

According to three popular Polish job portals [1] [2] [3] the Polish marked is in need of IT specialists. There are lot of job opportunities for programmers (3988/830/3570 offers), web developers (2625/ 98/ 355 offers) and data analytics (668/ 356/ 115 offers). Recently, new job opportunities are rising concerning Big Data specialist (335/ 48/ 22 offers). This gives on average 5% of Polish market waiting for new employees with Big Data skills. However, the particular Big Data skills (languages, data analysis and machine learning skills) can be found even more often. What’s more according to Hays’s report [4] average salary in Poland for Big Data engineer is 25% higher than programmers and is equal on average 17 thousand PLN. What skills are needed? The report will be available under this project soon on this site.



[1] [access 22.03.2021]

[2] [access 22.03.2021]

[3] [access 22.03.2021]

[4] Hays report 2021: