时事通讯

通过电子邮件获得 Hortonworks 的最新更新

每月一次,接收最新的洞察力、趋势、分析信息和大数据的知识。

AVAILABLE NEWSLETTERS:

Sign up for the Developers Newsletter

每月一次,接收最新的洞察力、趋势、分析信息和大数据的知识。

CTA

开始

云

是否已准备就绪?

下载 sandbox

我们能为您做什么?

* 我了解我可以随时取消预订。我也承认在 Hortonworks 隐私政策中发现的更多信息。
关闭关闭按钮
CDA > Data Engineers & Scientists > Data Science Applications

Building a Sentiment Analysis Application

云 是否已准备就绪?

下载 SANDBOX

Introduction

For this project, you will play the part of a Big Data Application Developer who leverages their skills as a Data Engineer and Data Scientist by using multiple Big Data Technologies provided by Hortonworks Data Flow (HDF) and Hortonworks Data Platform (HDP) to build a Real-Time Sentiment Analysis Application. For the application, you will learn to acquire tweet data from Twitter’s Decahose API and send the tweets to the Kafka Topic “tweets” using NiFi. Next you will learn to build Spark Machine Learning Model that classifies the data as happy or sad and export the model to HDFS. However, before building the model, Spark requires the data that builds and trains the model to be in feature array, so you will have to do some data cleansing with SparkSQL. Once the model is built, you will use Spark Structured Streaming to load the model from HDFS, pull in tweets from Kafka topic “tweets”, add a sentiment score to the tweet, then stream the data to Kafka topic “tweetsSentiment”. Earlier after finishing the NiFi flow, you will build another NiFi flow that ingests data from Kafka topic “tweetsSentiment” and stores the data into HBase. With Hive and HBase integration, you will perform queries to visualize that the data was stored successfully and also show the sentiment score for tweets.

Big Data Technologies used to develop the Application:

Goals and Objectives

  • Learn to create a Twitter Application using Twitter’s Developer Portal to get KEYS and TOKENS for connecting to Twitter’s APIs
  • Learn to create a NiFi Dataflow Application that integrates Twitter’s Decahose API to ingest tweets, perform some preprocessing, store the data into the Kafka Topic “tweets”.
  • Learn to create a NiFi Dataflow Application that ingests the Kafka Topic “tweetsSentiment” to stream sentiment tweet data to HBase
  • Learn to build a SparkSQL Application to clean the data and get it into a suitable format for building the sentiment classification model
  • Learn to build a SparkML Application to train and validate a sentiment classification model using Gradient Boosting
  • Learn to build a Spark Structured Streaming Application to stream the sentiment tweet data from Kafka topic “tweets” on HDP to Kafka topic “tweetsSentiment” on HDF while attaching a sentiment score per tweet based on output of the classification model
  • Learn to visualize the tweet sentiment score by using Zeppelin’s Hive interpreter mapping to the HBase table

Prerequisites

Outline

The tutorial series consists of the following tutorial modules:

1. Application Development Concepts You will be introduced to sentiment fundamentals: sentiment analysis, ways to perform the data analysis and the various use cases.

2. Setting up the Development Environment You will create a Twitter Application in Twitter’s Developer Portal for access to KEYS and TOKENS. You will then write a shell code and perform Ambari REST API Calls to setup a development environment.

3. Acquiring Twitter Data You will build a NiFi Dataflow to ingest Twitter data, preprocess it and store it into the Kafka Topic “tweets”. The second NiFi Dataflow you will build, ingests the enriched sentiment tweet data from Kafka topic “tweetsSentiment” and streams the content of the flowfile to HBase.

4. Cleaning the Raw Twitter Data You will create a Zeppelin notebook and use Zeppelin’s Spark Interpreter to clean the raw twitter data in preparation to create the sentiment classification model.

5. Building a Sentiment Classification Model You will create a Zeppelin notebook and use Zeppelin’s Spark Interpreter to build a sentiment classification model that classifies tweets as Happy or Sad and exports the model to HDFS.

6. Deploying a Sentiment Classification Model You will create a Scala IntelliJ project in which you develop a Spark Structured Streaming application that streams the data from Kafka topic “tweets” on HDP, processes the tweet JSON data by adding sentiment and streaming the data into Kafka topic “tweetsSentiment” on HDF.

7. Visualizing Sentiment Scores You will use Zeppelin’s JDBC Hive Interpreter to perform SQL queries against the noSQL HBase table “tweets_sentiment” for visual insight into tweet sentiment score.

User Reviews

User Rating
0 No Reviews
5 Star 0%
4 Star 0%
3 Star 0%
2 Star 0%
1 Star 0%
Tutorial Name
Building a Sentiment Analysis Application

To ask a question, or find an answer, please visit the Hortonworks Community Connection.

No Reviews
Write Review

注册

Please register to write a review

Share Your Experience

Example: Best Tutorial Ever

You must write at least 50 characters for this field.

Success

Thank you for sharing your review!