A scalable distributed query framework for unstructured big clinical data: A case study on diabetic records

Abstract—Unstructured data forms close to 80% of information in the healthcare industry and is growing exponentially. Analyzing and querying of those type of data is not efficient with traditional relational database technologies. In this paper, we propose a distributed and scalable big data framework for querying and analyzing of unstructured clinical data. The framework is … Read more

hadoop ecosystem

Apache OOZIE: Workflow Scheduler for Hadoop Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions. Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability. Oozie is integrated with the rest of the Hadoop stack supporting several types … Read more