Tweet analysis service

This lesson demonstrates how Redis can be used as the foundation for a analytics-like back end service. The application showcased here consists of multiple services which ingest tweets in real time, store them in Redis and allow the tweet information to be queried. SET, HASH data structures are leveraged along with the reliable queue pattern with Redis LISTs.

Real time tweets stats service

The service allows users to define a bunch of tweet keywords/terms they are interested in e.g. since you're reading this, you might be interested in redis, nosql, database, golang etc. It ingests tweets in real time and allows users to query those tweets with criteria based on the keywords/terms which they had previously defined. It is then possible to

  • find all the tweets with a specific keyword e.g. tweets which contain the word redis

  • find all the tweets with a combination of keywords e.g. tweets which contain both golang and nosql, tweets which contain database or redis

  • use the above criteria with an added date filter e.g. tweets posted on 15th May 2018 which contain the terms redis, golang and nosql

The application comprises of three distinct microservices and each of them is responsible for a specific piece of the overall functionality

  • Tweets ingestion microservice - ingests real time tweets from the Twitter Streaming API and pushes them to Redis

  • Tweets processor microservice - uses the reliable consumer pattern to process each tweet and saves them to Redis for further analysis

  • Tweets statistics microservice - exposes a REST API for users to be able to query tweet data

Technical stack

The sample app uses

Schema

  • tweets - a Redis LIST which stores raw tweets in JSON format

  • tweet:[id] - a HASH containing tweet details like tweet ID, tweeter, date, tweet text and the matched keywords e.g. tweet:987654321

  • keyword_tweets:[term] - a SET which stores tweets (only the IDs) for a specific keyword e.g. keyword_tweets:java

  • keyword_tweets:[term]:[date] - another SET stores tweets (only the IDs) for a specific keyword posted on a specific date e.g. keyword_tweets:nosql:2018-05-10

Implementation details

Let's look at some of the internal details. You can grab the source code from Github

Tweet Ingestion service

The code package structure for this service is as follows

├── lcm
│   └── api.go
├── main.go
└── service
    └── tweets-listener.go

The bulk of this service is implemented in the Start function (in service/tweets-listener.go). It's task is to tap into the Twitter Streaming API fetch tweets based on the user defined keyword(s).

the list of keywords can be provided using the TWITTER_TRACKED_TERMS environment variable and defaults to trump,realDonaldTrump,redis,golang,database,nosql (comma separated)

Each tweet is serialized to a JSON and stored in a the tweets LIST in Redis using LPUSH.

Redis as a Queue

In addition to the traditional use cases of adding and querying data, Redis LISTs are heavily used as queues. This is made possible by commands like LPUSH and RPUSH which allow you to add data to the head and tail of the LIST respectively (note that it's a constant time operation irrespective of the list size!) and then extracting (and removing them at the same time) them (probably in another process/program) using LPOP and RPOP. LISTs also support blocking variations of these operations i.e. BLPOP and BRPOP. These commands block for the specified time period (or indefintely if its set to 0) an item to appear in the queue and then pop it out. Another great property is that if multiple programs/processes are involved, each of them receives a unique item from the queue. This means that the processing workload can be distributed among multiple processes and can be easily scald horizontally

Bootstrap

The entry point to the service is main.go which sets up the REST API routes for the tweets listener life cycle manager

func main() {
    router := gin.Default()
    router.GET("tweets/producer", lcm.StartServiceHandler)
    router.DELETE("tweets/producer", lcm.StopServiceHandler)

    router.Run()
}

Tweet Ingestion Service Lifecycle Manager (LCM)

A convenient REST API can be used to start/stop the tweet ingestion service - thanks to StartServiceHandler and StopServiceHandler handler functions in lcm/api.go which allow the user to start and stop the service using HTTP GET and DELETE respectively e.g. to start the tweets ingestion service, you need to send an HTTP GET to /tweets/producer and DELETE to the same URI to stop the service

func StartServiceHandler(c *gin.Context) {
    fmt.Println("StartServiceHandler API invoked")
    if service.GetTweetsListenerStatus() {
        alreadyRunning := "Tweets listener service is already running!"
        fmt.Println(alreadyRunning)
        c.Writer.WriteString(alreadyRunning)
        return
    }
    err := service.Start()
    if err != nil {
        c.String(500, err.Error())
        return
    }
    c.Writer.WriteString("Started Tweets listener")
}

func StopServiceHandler(c *gin.Context) {
    fmt.Println("StopServiceHandler API invoked")
    if !service.GetTweetsListenerStatus() {
        notRunning := "Tweets listener service is not running!"
        fmt.Println(notRunning)
        c.Writer.WriteString(notRunning)
        return
    }
    service.Stop()
    c.Writer.WriteString("Stopped Tweets listener")
}

The handler methods delegate the work to the core implementation which is a part of service/tweets-listener.go

As mentioned earlier, the Start() function is the work horse of the service

It starts by connecting to Redis

redisServer := getFromEnvOrDefault("REDIS_HOST", "localhost")
    redisPort := getFromEnvOrDefault("REDIS_PORT", "6379")
    redisClient := redis.NewClient(&redis.Options{Addr: redisServer + ":" + redisPort})
    _, pingErr := redisClient.Ping().Result()
    if pingErr != nil {
        fmt.Println("could not connect to Redis due to " + pingErr.Error())
        return pingErr
    }

.. followed by Twitter

consumerKey := os.Getenv("TWITTER_CONSUMER_KEY")
consumerSecret := os.Getenv("TWITTER_CONSUMER_SECRET")
accessToken := os.Getenv("TWITTER_ACCESS_TOKEN")
accessSecret := os.Getenv("TWITTER_ACCESS_TOKEN_SECRET")

config := oauth1.NewConfig(consumerKey, consumerSecret)
token := oauth1.NewToken(accessToken, accessSecret)
httpClient := config.Client(oauth1.NoContext, token)
twitterClient := twitter.NewClient(httpClient)

Then, a *twitter.Stream is created with the specified parameters which includes the tweet filteration criteria based on the keywords provided by the user

trackedTerms := os.Getenv("TWITTER_TRACKED_TERMS")

    trackedTermsSlice := strings.Split(trackedTerms, ",")
    params := &twitter.StreamFilterParams{
        Track:         trackedTermsSlice,
        StallWarnings: twitter.Bool(true),
    }
    var err error
    stream, err := twitterClient.Streams.Filter(params)
    if err != nil {
        return err
    }

We then define a twitter.SwitchDemux which is nothing but a function which defines what to do when a Tweet is recieved. In this case, the implementation is to push it to Redis in JSON format

demux := twitter.NewSwitchDemux()
demux.Tweet = func(tweet *twitter.Tweet) {
    fmt.Println(tweet.Text)
    matches := getMatchedTerms(tweet.Text)
    date := formatTweetDate(tweet.CreatedAt)
    tweetInfoByte, marshalErr := json.Marshal(tweetInfo{TweetID: strconv.Itoa(int(tweet.ID)), Tweeter: tweet.User.Name, Tweet: tweet.Text, Terms: matches, CreatedDate: date})
    if marshalErr != nil {
        fmt.Println("failed to marshal TweetInfo to JSON", marshalErr)
    } else {
        go func() {
            _, lpushErr := redisClient.LPush(redisTweetListName, string(tweetInfoByte)).Result()
            if lpushErr != nil {
                fmt.Println("failed to push tweet info to Redis", lpushErr)
            }
        }()
    }
}

The JSON serialization happens in two steps wherein the tweet data is first converted to a tweetInfo struct which is then converted ti JSON string using json.Marshal

type tweetInfo struct {
    TweetID     string   `json:"tweetID"`
    Tweeter     string   `json:"tweeter"`
    Tweet       string   `json:"tweet"`
    Terms       []string `json:"terms"`
    CreatedDate string   `json:"createdDate"`
}

The twitter stream listener is started as a different goroutine

go demux.HandleChan(stream.Messages)

.. and another goroutine is started to make sure that the stream is closed when this service is stopped via the REST API. It blocks on the apiStopChannel and closes the stream as well as the Redis connection.

go func() {
    fmt.Println("Waiting for listener to stop")
    <-apiStopChannel
    fmt.Println("Listener stop request")
    active = false

    stream.Stop()
    fmt.Println("Listener stopped...")

    redisClient.Close()
    fmt.Println("Redis connection closed...")
}()

Tweet Consumer service

It's responsible for consuming the tweets enqueued in Redis LIST (tweets) by the Tweet Ingestion service. It blocks and waits for tweets to appear and processes and saves them back to Redis such that they are ready for further analysis. This is done in a reliable manner, thanks to BRPOPLPUSH command

Reliable processing

It is possible that the data obtained by the *POP commands specified above is received but not processed due to some reason e.g. if the consumer application crashes. This can lead to messages/events/data geting lost. There is a reliable alternative using RPOPLPUSH or its blocking variation BRPOPLPUSH. It simply picks up the data from the tail of the source queue and puts it in a destination queue of your choice (which by the way can be the same as the source queue!). What's special about this process is that it is atomic and reliable in nature i.e. the transfer operation from one queue to another will either happen successfully or fail

  • if the transfer is successful, you will have your data safely backed-up in another queue

    • if the pop-and-push fails, you can handle this in your application and retry it (your data/event is still safe and under your control)

  • once the processing is done, the data can be removed (with LREM) from the back-up list

  • even if the consumer process (which executed the transfer) fails to process the data or crashes, it's always possible to pick up the data from the back-up list and put it back to the original queue and the processing cycle will begin once again

The logic is contained within tweets-consumer.go

BRPOPLPUSH is used in an infinite for loop for continuous tweet ingestion

func main() {

    redisServer := getFromEnvOrDefault("REDIS_HOST", "localhost")
    redisPort := getFromEnvOrDefault("REDIS_PORT", "6379")
    client = redis.NewClient(&redis.Options{Addr: redisServer + ":" + redisPort})
    _, pingErr := client.Ping().Result()
    if pingErr != nil {
        fmt.Println("could not connect to Redis due to " + pingErr.Error())
        return
    }
    defer client.Close()

    for {
        tweetJSON, err := client.BRPopLPush(tweetRedisListName, tweetsProcessorListName, 0*time.Second).Result()
        if err != nil {
            fmt.Println("failed to push tweet info to "+tweetsProcessorListName, err.Error())
        } else {
            go process(tweetJSON) //done in a different goroutine
        }
    }

}

The process function ensures data is stored in HASHes and SETs to make it ready for analysis/query

  • it de-serializes the tweet information from its JSON form into a Go struct (tweet)

      type tweet struct {
          TweetID     string   `json:"tweetID"`
          Tweeter     string   `json:"tweeter"`
          Tweet       string   `json:"tweet"`
          Terms       []string `json:"terms"`
          CreatedDate string   `json:"createdDate"`
      }
  • uses HMSET to store the tweet details in a HASH whose naming format is tweet:[tweetID]

  • also stores the tweet info in two separate SETs using SADD

    • keyword_tweets:[term] e.g. if a tweet contains the keywords redis and nosql, it will be stored in both the SETs keyword_tweet:redis and keyword_tweet:nosql

    • keyword_tweets:[term]:[created_date] (e.g. keyword_tweets:java:2018-05-10)

    • this is repeated for all the matched keywords/terms

Since the logic involves three distinct calls to Redis, Pipelining was used to make this more efficient i.e. the HMSET and SADD commands are invoked in a batch - this bring us down from three invocations to one.

func process(tweetJSON string) {
    var tweetObj tweet
    unmarshalErr := json.Unmarshal([]byte(tweetJSON), &tweetObj)
    if unmarshalErr == nil {
        fmt.Println("converted tweet to JSON", tweetObj)
    }
    if len(tweetObj.Terms) == 0 {
        return
    }
    hashName := "tweet:" + tweetObj.TweetID
    pipe := client.Pipeline()
    pipe.HMSet(hashName, tweetObj.toMap())

    for _, term := range tweetObj.Terms {
        set1Name := redisSetNamePrefix + term
        pipe.SAdd(set1Name, tweetObj.TweetID)

        set2Name := redisSetNamePrefix + term + ":" + tweetObj.CreatedDate
        pipe.SAdd(set2Name, tweetObj.TweetID).Result()

    }

    _, pipeErr := pipe.Exec()

    if pipeErr != nil {
        fmt.Println("Pipeline execution error " + pipeErr.Error())
    } else {
        fmt.Println("Stored tweet data for analysis")
        _, lRemErr := client.LRem(tweetsProcessorListName, 0, tweetJSON).Result()
        if lRemErr != nil {
            fmt.Println("unable to delete entry from list " + lRemErr.Error())
        }
    }

}

Once tweet information is stored (processed successfully), the entry from the list is removed using LREM

Tweets statistics service

This module exposes a REST API to extract tweet information based on few user specified criterion - tweets-stats-api.go is where all the logic resides

The first criteria allows user to search tweets based on keywords/terms

  • you can specify one or more keywords (keywords query parameter), and,

  • choose whether you want to apply the AND or OR criteria on top of it (op query parameter)

The second criteria adds the date dimension (along with the keyword) - using the date query parameter. The logic is the same as the previous API. The difference is the SETs which are queried - these are the ones in which date-wise keyword information is stored keyword_tweets:[keyword]:[date] e.g.keyword_tweets:java:2018-05-10

func main() {
    router := gin.Default()
    router.GET("tweets", findTweetsHandler)
    router.Run()
}

findTweetsHandler is the function serves as the entry point of the REST API. It extracts the required information from the URL i.e. keyword(s), operation (AND or OR), date and passes on the work to a more generic findTweets function

func findTweetsHandler(c *gin.Context) {
    fmt.Println("request URL", c.Request.URL.String())

    keywords := c.Query("keywords")

    if keywords == "" {
        c.Status(400)
        c.Writer.WriteString("keywords query parameter cannot be empty")
        return
    }

    fmt.Println("searching for tweets with keywords", keywords)

    date := c.Query("date")

    if date != "" {
        fmt.Println("searching for tweets on", date)
    }
    operation := c.Query("op")

    if operation != "" {
        fmt.Println("applying operation", operation)
    }

    tweets, err := findTweets(keywords, operation, date)

    if err != nil {
        c.Status(500)
        c.Writer.WriteString("Unable to fetch tweets due to " + err.Error())
        return
    }

    c.JSON(200, tweets)
}    

The first part of findTweets function calculates the SETs from which the tweet info needs to be extracted

var sets []string
keywordsSlice := strings.Split(keywords, ",")
for _, keyword := range keywordsSlice {
    var setName string
    if date == "" {
        setName = setNamePrefix + keyword
    } else {
        setName = setNamePrefix + keyword + ":" + date
    }
    sets = append(sets, setName)
}

In case of a single keyword, the a single SET name needs to be queried e.g. keyword_tweet:redis. In case of multiple keywords with AND criteria, the intersection of all SETs is calculated (SINTERSTORE) e.g. keyword_tweet:java AND keyword_tweet:nosql. For multiple keywords with OR criteria, the union of all SETs is calculated (SUNIONSTORE) e.g. keyword_tweet:redis OR keyword_tweet:nosql OR keyword_tweet:database

switch operation {
case "":
    fmt.Println("no operation specified")
    tempSetName = sets[0]
    deleteSet = false
case "AND":
    fmt.Println("AND operation specified")
    tempSetName, _ = generateRandomString(10)
    client.SInterStore(tempSetName, sets...)
case "OR":
    fmt.Println("OR operation specified")
    tempSetName = sets[0]
    tempSetName, _ = generateRandomString(10)
    client.SUnionStore(tempSetName, sets...)
}

The resulting SET just contains the tweet ID and the tweets:[tweet_id] which is obtained using SMEMBERS

tweetIDs, smembersErr := client.SMembers(tempSetName).Result()
    if smembersErr != nil {
        return nil, errors.New("Unable to find members of set " + tempSetName)
    }

The details of the tweet is obtained by querying each tweet ID from the SET from the specific HASH (e.g. tweet:987654321) using HGETALL

var tweets []map[string]string
for _, tweetID := range tweetIDs {
    tweetInfoHashName := hashNamePrefix + tweetID
    tweetInfoMap, hgetallErr := client.HGetAll(tweetInfoHashName).Result()
    if hgetallErr != nil {
        fmt.Println("unable to fetch info for tweet ID", tweetID)
    }
    tweets = append(tweets, tweetInfoMap)
}

These details are serialized and returned to user in JSON format

Docker setup

Here is the docker-compose.yml for the application

version: '3'
services:
    redis:
        image: redis
        container_name: redis
        ports:
            - '6379:6379'
    tweets-ingestion-service:
        build: tweets-producer
        environment:
            - REDIS_HOST=redis
            - REDIS_PORT=6379
            - PORT=9090
        ports:
            - '8080:9090'
        depends_on:
            - redis
    tweets-consumer:
        build: tweets-consumer
        depends_on:
            - redis
        environment:
            - REDIS_HOST=redis
            - REDIS_PORT=6379
            - TWITTER_TRACKED_TERMS=trump,realDonaldTrump,redis,golang,database,nosql
            - TWITTER_CONSUMER_KEY=s3cr3t
            - TWITTER_CONSUMER_SECRET=s3cr3t
            - TWITTER_ACCESS_TOKEN=s3cr3t
            - TWITTER_ACCESS_TOKEN_SECRET=s3cr3t
    tweets-stats-service:
        build: tweets-stats
        environment:
            - REDIS_HOST=redis
            - REDIS_PORT=6379
            - PORT=9090
        ports:
            - '8081:9090'
        depends_on:
            - redis

It defines four services

  • redis - this is based off the Docker Hub Redis image

  • tweets-ingestion-service- it is based on a custom Dockerfile

    FROM golang as build-stage WORKDIR /go/ RUN go get -u github.com/go-redis/redis && go get -u github.com/gin-gonic/gin && go get -u github.com/dghubble/oauth1 && go get -u github.com/dghubble/go-twitter/twitter COPY src/ /go/src RUN cd /go/src && CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o tweets-producer

    FROM scratch COPY --from=build-stage /go/src/tweets-producer / CMD ["/tweets-producer"]

A multi-stage build process is used wherein a different image is used for building our Go app and a different image is used as the base for running it - golang Dockerhub image is used for the build process which results in a single binary (for linux). Since we have the binary with all dependencies packed in, all we need is the minimal image for running it and thus we use the lightweight scratch image for this purpose

  • tweets-consumer and tweets-stats-service - the image creation recipes for these services is the same as tweets-ingestion-service

Test drive

  • Install curl, Postman or any other HTTP tool to interact with the REST endpoints of the service

  • Get the project - git clone https://github.com/abhirockzz/practical-redis.git

  • cd practical-redis/real-time-twitter-analysis

Create Twitter app

You need to have a Twiter account in order to create a Twitter App. Browse to https://apps.twitter.com/ and click Create New App to start the process

Update docker-compose.yml

After creating the app, update docker-compose.yml with credentials associated with the app you just created (check the Keys and Access Tokens tab)

  • TWITTER_CONSUMER_KEY - populate this with the value in Consumer Key (API Key) field on the app details page

  • TWITTER_CONSUMER_SECRET - populate this with the value in Consumer Secret (API Secret) field on the app details page

  • TWITTER_ACCESS_TOKEN - populate this with the value in Access Token field on the app details page

  • TWITTER_ACCESS_TOKEN_SECRET - populate this with the value in Access Token Secret field on the app details page

To start the services - docker-compose up --build and use docker-compose down -v to stop

Replace DOCKER_IP with the IP address of your Docker instance which you can obtain using docker-machine ip. In case of Linux or Mac, this might be localhost. The port (8080 or 8081 in this case) is the specified in docker-compose.yml

Start the tweet ingestion service

curl http://DOCKER_IP:8080/tweets/producer

to stop the service, use curl -X DELETE http://DOCKER_IP:8080/tweets/producer

The ingestion service should now start consuming relevant tweets (as per the keywords specified by TWITTER_TRACKED_TERMS environment variable). You can check redis to confirm the creation of the HASHes and SETs created by consumer service

Execute queries using stats service

As the tweets continue to flow and get processed, you can query the stats service to get tweet data for relevant keywords and dates

  • Get all tweets with a keyword (e.g. redis) - e.g. to search for tweets with the keyword redis execute curl http://DOCKER_IP:8081/tweets?keywords=redis. You should recieve a HTTP 200 as a response along with the JSON payload (if there are tweets which contain the keyword)

      {
          "tweets": [
              {
                  "tweeter": "tosadvsr",
                  "tweet": "@gregyoung do you use something to keep a state as your events come in for quick reports access, like stats.d or a redis abstraction? would love to dm if you have a free moment please inbox.",
                  "created_date": "19-06-2018",
                  "tweet_id": "1009098347016269825",
                   "terms": [
                      "redis"
                  ]
              },
              {
                  "tweeter": "totalcloudio",
                  "tweet": "RT @awswhatsnew: Amazon ElastiCache for Redis announces support for Redis 4.0 with caching improvements and better memory management for hi…",
                  "created_date": "19-06-2018",
                  "tweet_id": "1009098444487589888",
                   "terms": [
                      "redis"
                  ]
              },
              ...........
          ]
      }
  • Get tweets with multiple keywords using OR operator - e.g. to search for tweets with keywords java or database execute curl http://DOCKER_IP:8081/tweets?keywords=java,database&op=OR. You should recieve a HTTP 200 as a response along with the JSON payload (if there are tweets which contain the keywords)

          {
              "tweets": [
                  {
                      "tweeter": "ciphertxt",
                      "tweet": "Maven: Deploy Java Apps to Azure with Tomcat on Linux https://t.co/GbSsNx0YWM #Azure",
                      "created_date": "19-06-2018",
                      "tweet_id": "1009105478985617408",
                       "terms": [
                          "java"
                      ]
                  },
                  {
                      "tweeter": "ITJobs_Az",
                      "tweet": "Sr. java Backend developer: Sr. java Backend developer Ref No.: 18-24863 Location: Tempe, Arizona Role Sr. JAVA Back End Developer Responsibilities – Hands-on skills in troubleshooting and debugging complex software – 5+ years of experience in designing… https://t.co/7YeVbLSTcR",
                      "created_date": "19-06-2018",
                      "tweet_id": "1009103998241001473",
                       "terms": [
                          "java"
                      ]
                  },
                  {
                      "tweeter": "armaninspace",
                      "tweet": "Artificial Intelligence Takes on Large-Scale Database Management #Artificial_Intelligence #Database https://t.co/wNZVRJtAOn",
                      "created_date": "19-06-2018",
                      "tweet_id": "1009109659083595781",
                       "terms": [
                          "database"
                      ]
                  },
                  .........
              ]
          }
  • Get tweets with multiple keywords using AND operator - e.g. to search for tweets with keywords java and database execute curl http://DOCKER_IP:8081/tweets?keywords=java,database&op=AND. You should recieve a HTTP 200 as a response along with the JSON payload (if there are tweets which contain the keywords)

      {
          "tweets": [
              {
                  "tweeter": "Internships_KE",
                  "tweet": "RT @droid254: Cellulant is looking for PHP and Java developers Database : MySQL",
                  "created_date": "19-06-2018",
                  "tweet_id": "1009103084381929474",
                  "hashtags": [
                      "database",
                      "java"
                  ]
              },
              {
                  "tweeter": "lewis_sawe",
                  "tweet": "RT @droid254: Cellulant is looking for PHP and Java developers Database : MySQL",
                  "created_date": "19-06-2018",
                  "tweet_id": "1009105251306176512",
                  "hashtags": [
                      "database",
                      "java"
                  ]
              }
          ]
      }
  • Get tweets with for a specific keyword on a date - e.g. to search for tweets with keyword nosql on 20 July, 2018 you need to execute curl http://DOCKER_IP:8081/tweets?keywords=nosql&date=20-06-2018. You should recieve a HTTP 200 as a response along with the JSON payload (if there are tweets on th specified date which contain the keywords)

    {
        "tweets": [
            {
                "tweeter": "Arieleit",
                "tweet": "RT @code__tutorials: Learn How Python Works with NoSql Database MongoDB: PyMongo\n\n☞ https://t.co/70CXrFtj38\n\n#python #MongoDB https://t.co/…",
                "created_date": "20-06-2018",
                "tweet_id": "1009253195275816960",
                 "terms": [
                    "database",
                    "nosql"
                ]
            },
            {
                "tweeter": "launchjobs",
                "tweet": "What makes NoSQL databases especially relevant today is that they are particularly well suited for working with large sets of distributed . #Tech #Data \nhttps://t.co/UL8nK4Gs7L",
                "created_date": "20-06-2018",
                "tweet_id": "1009253269116473345",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @CastIrony: @thoward37 @mattly Depends which NoSQL db you're looking at of course, but generally aggregation (including complex grouped…",
                "created_date": "20-06-2018",
                "tweet_id": "1009253361604939777",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @nycallday247: WHICH NOSQL DATASTORE?! ORACLE'S JOURNEY ASSESSING CASSANDRA, SCYLLA, REDIS...\nAaron Stockton, Principal Software Enginee…",
                "created_date": "20-06-2018",
                "tweet_id": "1009253413878525952",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @RedisLabs: Calling all #Minneapolis Redis and NoSQL developers! Join #RedisLabs for a #REDWorkshop on June 21 and explore the full powe…",
                "created_date": "20-06-2018",
                "tweet_id": "1009253457792937986",
                 "terms": [
                    "redis",
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @BigDataBatman: What is the formula that makes Eclipse #JNoSQL a tool for polyglot persistence, Batman on the NoSQL world?… https://t.co…",
                "created_date": "20-06-2018",
                "tweet_id": "1009253623157571585",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @mikegreiling: @jakecodes @postgresql right? no more need for nosql document-oriented DBs like mongodb when you can have the best of bot…",
                "created_date": "20-06-2018",
                "tweet_id": "1009253739998294016",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @e4developer: @nicolas_frankel @springunidotcom I am not sure that I agree with relational databases being simpler than nosql (and nosql…",
                "created_date": "20-06-2018",
                "tweet_id": "1009259418909732864",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @chuckcalio: @IBMPowerSystems \n  Running NoSQL and encountering high costs due to 'node sprawl' ? IBM Power Systems and #scylla can help…",
                "created_date": "20-06-2018",
                "tweet_id": "1009259635058982912",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @Calista_Redmond: Extend your data architectures with @MongoDB Open Source based NoSQL data store @IBMAnalytics https://t.co/oG3j3l27m9",
                "created_date": "20-06-2018",
                "tweet_id": "1009259739027394560",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @h_feddersen: Should all companies start to invest in NoSQL databases? https://t.co/VUp0Q8hzDn via @h_feddersen",
                "created_date": "20-06-2018",
                "tweet_id": "1009259917125955584",
                 "terms": [
                    "nosql"
                ]
            }
        ]
    }
  • Get tweets with for multiple keywords on a date using OR operator - e.g. to search for tweets with keywords java or database on 20 July, 2018 you need to execute curl http://DOCKER_IP:8081/tweets?keywords=java,database&op=OR&date=20-06-2018. You should recieve a HTTP 200 as a response along with the JSON payload (if there are tweets on th specified date which contain the keywords)

      {
          "tweets": [
                      {
                  "tweeter": "MakotoTheKnight",
                  "tweet": "For context: NXT was released a long time ago (November '15) with initial Linux support.  A good start; an olive branch in the right direction given that Linux support had always kind of \"existed\" with the Java client, and it would be reasonable to see that continue.",
                  "created_date": "20-06-2018",
                  "tweet_id": "1009249404107157509",
                   "terms": [
                      "java"
                  ]
              },
              {
                  "tweeter": "stetayen",
                  "tweet": "RT @shani_o: One platform (LinkedIn) created a  scrapable database of people's work and education histories while two other platforms (Gith…",
                  "created_date": "20-06-2018",
                  "tweet_id": "1009249413490003968",
                   "terms": [
                      "database"
                  ]
              },
              ......
          ]
      }
  • Get tweets with for multiple keywords on a date using AND operator - e.g. to search for tweets with keywords java and database on 19 July, 2018 you need to execute curl http://DOCKER_IP:8081/tweets?keywords=java,database&op=AND&date=19-06-2018. You should recieve a HTTP 200 as a response along with the JSON payload (if there are tweets on th specified date which contain the keywords)

      {
          "tweets": [
              {
                  "tweeter": "Internships_KE",
                  "tweet": "RT @droid254: Cellulant is looking for PHP and Java developers Database : MySQL",
                  "created_date": "19-06-2018",
                  "tweet_id": "1009103084381929474",
                  "hashtags": [
                      "database",
                      "java"
                  ]
              },
              {
                  "tweeter": "lewis_sawe",
                  "tweet": "RT @droid254: Cellulant is looking for PHP and Java developers Database : MySQL",
                  "created_date": "19-06-2018",
                  "tweet_id": "1009105251306176512",
                  "hashtags": [
                      "database",
                      "java"
                  ]
              },
              .........
          ]
      }

Scale out

To increase the number of instances tweets-consumer (e.g. from 1 to 2), use docker-compose scale tweets-consumer=2.

An additional container will start up and the tweet processing workload will now be distributed amongst both the instances. To make this slightly easier to confirm, you can check the logs of the tweets-consumer service in isolation - docker-compose logs tweets-consumer - you should see logs from the tweets-consumer_2 container (in addition to the tweets-consumer_1) which was created as a result of the scale out process

Last updated