Month: March 2021

Big Data nowadays

Big Data nowadays

The Big Data is a relatively new area of research and it merges various areas like cloud computing, data science and Artificial Intelligence. Its definition it was proposed in 2012 [1], where large Volume of data, Variety in kind was processed taking under consideration its Velocity (thus 3V). Nowadays definition evolved to 5V [2] and takes under consideration both veracity (treated as quality of captured data) and its value (as usefulness).

Over the last decade the Big data processing pattern was established and takes under consideration the following elements: ingest (data collection stage), store (managing and storing data – also in real time), process (managing data), analyse (obtaining vital information) and insight (data consumption in form of information or data for further applications).

The process usually starts with data collection (data are meeting 5V definition). The data are usually processed as data logs (e.g., Flume), bulk data (e.g., Sqoop), messages (e.g., Kafka), dataflow (e.g., NiFi). Then large data are processed using computing engine as batches (e.g., MapReduce) or streams (e.g., Flink, Spark, Storm, Flink). Data (structured or not) are analysed using machine Learning methods (e.g., Caffe, Tensorflow, Python), statistic approach (SparkR, R) and then visualized (e.g.  Tableau, GraphX). It is worth to keep in mind that created solution is constantly changing and solution should be updated (e.g., Oozie, Kepler, Apache NiFi). The obtained data can be managed by various solutions e.g., Apache Falcon, Apache Atlas, Apache Sentry, Apache Hive. The important issue is also data security (e.g., Apache Metron or Apache Knox) or new technology that changes ways and types of data like InfiniBand or 5G.

The Big data has over 10 years and it is reaching new heights, thanks to the vast adaptation and companies that are delivering new tools. Looking at the summary [3] the number of technology and solutions is overwhelming (look hire).

During our research we are looking for competencies required by the international and local market. Based on our analysis and trends [4] we identified classic opensource Hadoop, Spark and Storm solutions and technologies that gain popularity. Our research is focusing on open-source solutions that could be used on dedicated infrastructure or Big data cloud services provided by leading platforms like AWS, Microsoft Azure or Big Query by Google.

In our research we keep in mind that the market is flooded with new mechanism and pipelines constantly to allow to tackle with big data in simpler and unified way. The solutions tend create solutions which simplifies the Big Data analysis and make it easier to use. There are several solutions [3], which shows current trends:

• visual analytical tools that allow to focus on data analytics using simple calculations or point-and-click approach, while gaining support in big data storage, real-time management and security. The services allowing this are Arcadia Enterprise 4.0, AtScale 5.0 or Dataguise DgSecure 6.0.5;
• frameworks that allow to create application based on Big Data using DevOps capabilities and big data transformation support. They allow to utilize known languages as R, Python or SQL. They are Attunity Compose 3.0, Cazena Data Science Sandbox as a Service or Lucidworks Fusion 3. Some solutions like Couchbase suite are directed for web, mobile, and Internet of Things (IoT) applications based.
• solutions that are helping to provide data as service for applications. They are using pipelines like Microsoft Azure or Hadoop ecosystem and change them into information platform (Paxata Spring ’17, Pentaho 7.0 or Qubole Data Service).

References
[1] Wu, X., Zhu, X., Wu, G.-Q. and Ding, W. (2014) Data Mining with Big Data. IEEE Transactions on Knowledge and Data Engineering, 26, 97-107.
https://doi.org/10.1109/TKDE.2013.109
[2] Nagorny K., Lima – Monteiro, P. Barata J., Colombo A.W., Big Data analysis in smart manufacturing. Int.J.Commun.Netw.Syst.Sci.10(2017)31–58
[3] The Big data technology map: http://mattturck.com/wp-content/uploads/2020/09/2020-Data-and-AI-Landscape-Matt-Turck-at-FirstMark-v1.pdf
[4] Yesheng Cui and Sami Kara and Ka C. Cha . Manufacturing big data ecosystem: A systematic literature review. Robotics and Computer-Integrated Manufacturing, 62: 101861, 2020.
[5] Article online: https://www.readitquik.com/articles/digital-transformation/10-big-data-advances-that-are-changing-the-game/

Big Data specialists needed?

Big Data specialists needed?

According to three popular Polish job portals [1] [2] [3] the Polish marked is in need of IT specialists. There are lot of job opportunities for programmers (3988/830/3570 offers), web developers (2625/ 98/ 355 offers) and data analytics (668/ 356/ 115 offers). Recently, new job opportunities are rising concerning Big Data specialist (335/ 48/ 22 offers). This gives on average 5% of Polish market waiting for new employees with Big Data skills. However, the particular Big Data skills (languages, data analysis and machine learning skills) can be found even more often. What’s more according to Hays’s report [4] average salary in Poland for Big Data engineer is 25% higher than programmers and is equal on average 17 thousand PLN. What skills are needed? The report will be available under this project soon on this site.

 

Reference

[1] praca.pl [access 22.03.2021]

[2] pracuj.pl [access 22.03.2021]

[3] jobs.pl [access 22.03.2021]

[4] Hays report 2021: hays.com

Research on the expectations and knowledge of Big Data issues

Research on the expectations and knowledge of Big Data issues

In the past months, the project team (project no. 2020-1-PL01-KA203-082197 “Innovations for Big Data in a Real World”) conducted research on the expectations and knowledge of Big Data issues among students and lecturers.

Information was collected on the competencies required for Big Data specialists among employers. Graduates were also monitored.

The results of the research are being compiled and their conclusions will be presented soon.

The same research is carried out in the partner countries i.e. Bulgaria, Ukraine, Serbia.

Start!

Start!

Welcome to Erasmus+ Innovations for Big Data in a Real World
(iBIG World) project website!

The Department of Computer Science and Automatics launched a new Erasmus+ Project entitled Innovations for Big Data in a Real World (iBIG World). The project aims to join together HEIs, business in order to address the competencies and compatible job profile. This collaboration will provide innovative solutions to develop BigData experts. The project’s learning framework is based on IEEE guidelines for big data in Machine Learning.

The project is carried out by a consortium of four universities: the University of Bielsko-Biała, University of Library Studies and Information Technology (Bulgaria), University of Nis (Serbia), and Taras Shevchenko National University of Kyiv. The consortium is coordinated by professor Vasyl Martsenyuk.  

 Find out more about the project: http://ibigworld.ni.ac.rs/

Contact the project team: erasmusibigdata@ath.edu.pl

iBIGWorld on ICICT’2021

iBIGWorld on ICICT’2021

The results of the project iBIGWorld have been presented at the 6th International Congress on Information and Communication Technology (ICICT’2021), February 25-26 London

The work was co-funded by the European Union’s Erasmus + Programme for Education under KA2 grant (project no. 2020-1-PL01-KA203-082197 “Innovations for Big Data in a Real World”)​
The conference was held through digital platform ZOOM.
The work considers ML problems in medical application and presents a minimax approach for developing ML models that would be resistant to aleatoric and epistemic uncertainties.
The main methods applied are based on linear regression, SVM, random forest for ML, PCA for dimension reduction, cross-validation as a resampling strategy.
The approach which is offered is presented with the help of the flowchart which includes basic steps of ML model development under uncertainties, including import and primary processing the clinical data, the statement of task, resampling strategy including the dimension reduction, the choice of methods (learners), tuning their parameters and models comparison on the basis of minimax criterion.
The work is concerned with the iBIGWorld project since it will be used for the development of tutorials for BidData Analytics.