Apache Spark ile Anket Verilerindeki Tutarsızlığının Tespiti

Bu projede, analiz metotları kullanılarak anket verileri üzerinde bir analiz yaparak çıkan sonuçlar içerisinde bir anormallik olup olmadığının tespitinin yapılması beklenmektedir. Anormal veri olarak bahsettiğimiz veriler, daha önceden elimizde bulunan verilere göre oluşturduğumuz modele uymayan veri veya veri setidir. Kısacası, beklenenden farklı olan değerlerdir. Gün geçtikçe dünya üzerindeki veri miktarı hızlıca artmaktadır. Fakat, bu verilerin … Read more

Analyzing Big Security Logs in Cluster with Apache Spark

Abstract. Cyber security is the major concern in today’s highly net- worked environment and logging is the primary way of tracking compli- ance with the security policies. However analyzing the massive amount of logs has become a “Big Data” problem. Apache Spark is one of the latest and most notable incarnation of Data Flow Models … Read more

Playlist Generation via Vector Representation of Songs

Abstract. This study proposes a song recommender system. The architecture is based on a distributed scalable big data framework. The recommender system analyzes songs a person listens to most and recommends a list of songs as a playlist. To realize the system, we use Word2vec algorithm by creating vector representations of songs. Word2vec algorithm is … Read more

Sleep Stage Classification: Scalability Evaluations of Distributed Approaches

Processing and analyzing of massive clinical data are resource intensive and time consuming with traditional analytic tools. Electroencephalogram (EEG) is one of the major technologies in detecting and diagnosing various brain disorders, and produces huge volume big data to process. In this study, we propose a big data framework to diagnose sleep disorders by classifying … Read more