# Tweet analysis service

This lesson demonstrates how Redis can be used as the foundation for a analytics-like back end service. The application showcased here consists of multiple services which ingest tweets in real time, store them in Redis and allow the tweet information to be queried. `SET`, `HASH` data structures are leveraged along with the `reliable queue` pattern with Redis `LIST`s.

## Real time tweets stats service

The service allows users to define a bunch of **tweet keywords/terms** they are interested in e.g. since you're reading this, you might be interested in redis, nosql, database, golang etc. It ingests tweets in real time and allows users to query those tweets with criteria based on the keywords/terms which they had previously defined. It is then possible to

* find all the tweets with a specific keyword e.g. tweets which contain the word `redis`
* find all the tweets with a combination of keywords e.g. tweets which contain both `golang` **and** `nosql`, tweets which contain `database` **or** `redis`
* use the above criteria with an added date filter e.g. tweets posted on `15th May 2018` which contain the terms `redis`, `golang` **and** `nosql`

The application comprises of three distinct microservices and each of them is responsible for a specific piece of the overall functionality

* **Tweets ingestion microservice** - ingests real time tweets from the [Twitter Streaming API](https://developer.twitter.com/en/docs/tutorials/consuming-streaming-data.html) and pushes them to Redis
* **Tweets processor microservice** - uses the **reliable consumer pattern** to process each tweet and saves them to Redis for further analysis&#x20;
* **Tweets statistics microservice** - exposes a REST API for users to be able to query tweet data

### Technical stack

The sample app uses

* [go-twitter](https://github.com/abhirockzz/practical-redis/tree/9e32502aba89ea3f90467b4d1d31bc893c5d2721/github.com/dghubble/go-twitter/README.md) to tap into the Twitter Streaming API
* [gin-gonic](https://github.com/gin-gonic/gin) for REST based services
* [go-redis](https://github.com/go-redis/redis) as the Redis Go client
* [Docker](https://www.docker.com/)
  * [pre-built Docker image](https://hub.docker.com/_/redis/) for Redis
  * [Docker Compose](https://docs.docker.com/compose/) to run the solution with a single command

### Schema

* `tweets` - a Redis `LIST` which stores raw tweets in JSON format
* `tweet:[id]` - a `HASH` containing tweet details like tweet ID, tweeter, date, tweet text and the matched keywords e.g. `tweet:987654321`
* `keyword_tweets:[term]` - a `SET` which stores tweets (only the IDs) for a specific keyword e.g. `keyword_tweets:java`
* `keyword_tweets:[term]:[date]` - another `SET` stores tweets (only the IDs) for a specific keyword posted on a specific date e.g. `keyword_tweets:nosql:2018-05-10`

## Implementation details

Let's look at some of the internal details. You can grab the [source code from Github](https://github.com/abhirockzz/practical-redis/tree/master/real-time-twitter-analysis)

### Tweet Ingestion service

The code package structure for this service is as follows

```
├── lcm
│   └── api.go
├── main.go
└── service
    └── tweets-listener.go
```

The bulk of this service is implemented in the `Start` function (in `service/tweets-listener.go`). It's task is to tap into the Twitter Streaming API fetch tweets based on the user defined keyword(s).

> the list of keywords can be provided using the `TWITTER_TRACKED_TERMS` environment variable and defaults to `trump,realDonaldTrump,redis,golang,database,nosql` (comma separated)

Each tweet is serialized to a JSON and stored in a the `tweets` `LIST` in Redis using `LPUSH`.

**Redis as a Queue**

In addition to the traditional use cases of adding and querying data, Redis `LIST`s are heavily used as queues. This is made possible by commands like `LPUSH` and `RPUSH` which allow you to add data to the head and tail of the `LIST` respectively (note that it's a constant time operation irrespective of the list size!) and then extracting (and removing them at the same time) them (probably in another process/program) using `LPOP` and `RPOP`. `LIST`s also support blocking variations of these operations i.e. `BLPOP` and `BRPOP`. These commands block for the specified time period (or indefintely if its set to 0) an item to appear in the queue and then `pop` it out. Another great property is that if multiple programs/processes are involved, each of them receives a unique item from the queue. This means that the processing workload can be distributed among multiple processes and can be easily scald horizontally

**Bootstrap**

The entry point to the service is `main.go` which sets up the REST API routes for the tweets listener life cycle manager

```
func main() {
    router := gin.Default()
    router.GET("tweets/producer", lcm.StartServiceHandler)
    router.DELETE("tweets/producer", lcm.StopServiceHandler)

    router.Run()
}
```

**Tweet Ingestion Service Lifecycle Manager (LCM)**

A convenient REST API can be used to start/stop the tweet ingestion service - thanks to `StartServiceHandler` and `StopServiceHandler` handler functions in `lcm/api.go` which allow the user to start and stop the service using HTTP `GET` and `DELETE` respectively e.g. to start the tweets ingestion service, you need to send an HTTP `GET` to `/tweets/producer` and `DELETE` to the same URI to stop the service

```
func StartServiceHandler(c *gin.Context) {
    fmt.Println("StartServiceHandler API invoked")
    if service.GetTweetsListenerStatus() {
        alreadyRunning := "Tweets listener service is already running!"
        fmt.Println(alreadyRunning)
        c.Writer.WriteString(alreadyRunning)
        return
    }
    err := service.Start()
    if err != nil {
        c.String(500, err.Error())
        return
    }
    c.Writer.WriteString("Started Tweets listener")
}

func StopServiceHandler(c *gin.Context) {
    fmt.Println("StopServiceHandler API invoked")
    if !service.GetTweetsListenerStatus() {
        notRunning := "Tweets listener service is not running!"
        fmt.Println(notRunning)
        c.Writer.WriteString(notRunning)
        return
    }
    service.Stop()
    c.Writer.WriteString("Stopped Tweets listener")
}
```

The handler methods delegate the work to the core implementation which is a part of `service/tweets-listener.go`

As mentioned earlier, the `Start()` function is the work horse of the service

It starts by connecting to Redis

```
redisServer := getFromEnvOrDefault("REDIS_HOST", "localhost")
    redisPort := getFromEnvOrDefault("REDIS_PORT", "6379")
    redisClient := redis.NewClient(&redis.Options{Addr: redisServer + ":" + redisPort})
    _, pingErr := redisClient.Ping().Result()
    if pingErr != nil {
        fmt.Println("could not connect to Redis due to " + pingErr.Error())
        return pingErr
    }
```

.. followed by Twitter

```
consumerKey := os.Getenv("TWITTER_CONSUMER_KEY")
consumerSecret := os.Getenv("TWITTER_CONSUMER_SECRET")
accessToken := os.Getenv("TWITTER_ACCESS_TOKEN")
accessSecret := os.Getenv("TWITTER_ACCESS_TOKEN_SECRET")

config := oauth1.NewConfig(consumerKey, consumerSecret)
token := oauth1.NewToken(accessToken, accessSecret)
httpClient := config.Client(oauth1.NoContext, token)
twitterClient := twitter.NewClient(httpClient)
```

Then, a `*twitter.Stream` is created with the specified parameters which includes the tweet filteration criteria based on the keywords provided by the user

```
trackedTerms := os.Getenv("TWITTER_TRACKED_TERMS")

    trackedTermsSlice := strings.Split(trackedTerms, ",")
    params := &twitter.StreamFilterParams{
        Track:         trackedTermsSlice,
        StallWarnings: twitter.Bool(true),
    }
    var err error
    stream, err := twitterClient.Streams.Filter(params)
    if err != nil {
        return err
    }
```

We then define a `twitter.SwitchDemux` which is nothing but a function which defines what to do when a Tweet is recieved. In this case, the implementation is to push it to Redis in JSON format

```
demux := twitter.NewSwitchDemux()
demux.Tweet = func(tweet *twitter.Tweet) {
    fmt.Println(tweet.Text)
    matches := getMatchedTerms(tweet.Text)
    date := formatTweetDate(tweet.CreatedAt)
    tweetInfoByte, marshalErr := json.Marshal(tweetInfo{TweetID: strconv.Itoa(int(tweet.ID)), Tweeter: tweet.User.Name, Tweet: tweet.Text, Terms: matches, CreatedDate: date})
    if marshalErr != nil {
        fmt.Println("failed to marshal TweetInfo to JSON", marshalErr)
    } else {
        go func() {
            _, lpushErr := redisClient.LPush(redisTweetListName, string(tweetInfoByte)).Result()
            if lpushErr != nil {
                fmt.Println("failed to push tweet info to Redis", lpushErr)
            }
        }()
    }
}
```

The JSON serialization happens in two steps wherein the tweet data is first converted to a `tweetInfo` struct which is then converted ti JSON string using `json.Marshal`

```
type tweetInfo struct {
    TweetID     string   `json:"tweetID"`
    Tweeter     string   `json:"tweeter"`
    Tweet       string   `json:"tweet"`
    Terms       []string `json:"terms"`
    CreatedDate string   `json:"createdDate"`
}
```

The twitter stream listener is started as a different goroutine

```
go demux.HandleChan(stream.Messages)
```

.. and another goroutine is started to make sure that the stream is closed when this service is stopped via the REST API. It blocks on the `apiStopChannel` and closes the stream as well as the Redis connection.

```
go func() {
    fmt.Println("Waiting for listener to stop")
    <-apiStopChannel
    fmt.Println("Listener stop request")
    active = false

    stream.Stop()
    fmt.Println("Listener stopped...")

    redisClient.Close()
    fmt.Println("Redis connection closed...")
}()
```

### Tweet Consumer service

It's responsible for consuming the tweets enqueued in Redis `LIST` (`tweets`) by the Tweet Ingestion service. It blocks and waits for tweets to appear and processes and saves them back to Redis such that they are ready for further analysis. This is done in a reliable manner, thanks to `BRPOPLPUSH` command

**Reliable processing**

It is possible that the data obtained by the `*POP` commands specified above is received but not processed due to some reason e.g. if the consumer application crashes. This can lead to messages/events/data geting lost. There is a reliable alternative using `RPOPLPUSH` or its blocking variation `BRPOPLPUSH`. It simply picks up the data from the tail of the source queue and puts it in a destination queue of your choice (which by the way can be the same as the source queue!). What's special about this process is that it is **atomic** and **reliable** in nature i.e. the transfer operation from one queue to another will either happen successfully or fail

* if the transfer is successful, you will have your data safely backed-up in another queue
  * if the `pop-and-push` fails, you can handle this in your application and retry it (your data/event is still safe and under your control)
* once the processing is done, the data can be removed (with `LREM`) from the back-up list
* even if the consumer process (which executed the transfer) fails to process the data or crashes, it's always possible to pick up the data from the back-up list and put it back to the original queue and the processing cycle will begin once again

The logic is contained within `tweets-consumer.go`

`BRPOPLPUSH` is used in an infinite `for` loop for continuous tweet ingestion

```
func main() {

    redisServer := getFromEnvOrDefault("REDIS_HOST", "localhost")
    redisPort := getFromEnvOrDefault("REDIS_PORT", "6379")
    client = redis.NewClient(&redis.Options{Addr: redisServer + ":" + redisPort})
    _, pingErr := client.Ping().Result()
    if pingErr != nil {
        fmt.Println("could not connect to Redis due to " + pingErr.Error())
        return
    }
    defer client.Close()

    for {
        tweetJSON, err := client.BRPopLPush(tweetRedisListName, tweetsProcessorListName, 0*time.Second).Result()
        if err != nil {
            fmt.Println("failed to push tweet info to "+tweetsProcessorListName, err.Error())
        } else {
            go process(tweetJSON) //done in a different goroutine
        }
    }

}
```

The `process` function ensures data is stored in `HASH`es and `SET`s to make it ready for analysis/query

* it de-serializes the tweet information from its JSON form into a Go struct (`tweet`)

  ```
    type tweet struct {
        TweetID     string   `json:"tweetID"`
        Tweeter     string   `json:"tweeter"`
        Tweet       string   `json:"tweet"`
        Terms       []string `json:"terms"`
        CreatedDate string   `json:"createdDate"`
    }
  ```
* uses `HMSET` to store the tweet details in a `HASH` whose naming format is `tweet:[tweetID]`
* also stores the tweet info in two separate `SET`s using `SADD`
  * `keyword_tweets:[term]` e.g. if a tweet contains the keywords `redis` and `nosql`, it will be stored in both the `SET`s  `keyword_tweet:redis` and `keyword_tweet:nosql`
  * `keyword_tweets:[term]:[created_date]` (e.g. `keyword_tweets:java:2018-05-10`)
  * this is repeated for all the matched keywords/terms

Since the logic involves three distinct calls to Redis, `Pipelining` was used to make this more efficient i.e. the `HMSET` and `SADD` commands are invoked in a batch - this bring us down from three invocations to one.

```
func process(tweetJSON string) {
    var tweetObj tweet
    unmarshalErr := json.Unmarshal([]byte(tweetJSON), &tweetObj)
    if unmarshalErr == nil {
        fmt.Println("converted tweet to JSON", tweetObj)
    }
    if len(tweetObj.Terms) == 0 {
        return
    }
    hashName := "tweet:" + tweetObj.TweetID
    pipe := client.Pipeline()
    pipe.HMSet(hashName, tweetObj.toMap())

    for _, term := range tweetObj.Terms {
        set1Name := redisSetNamePrefix + term
        pipe.SAdd(set1Name, tweetObj.TweetID)

        set2Name := redisSetNamePrefix + term + ":" + tweetObj.CreatedDate
        pipe.SAdd(set2Name, tweetObj.TweetID).Result()

    }

    _, pipeErr := pipe.Exec()

    if pipeErr != nil {
        fmt.Println("Pipeline execution error " + pipeErr.Error())
    } else {
        fmt.Println("Stored tweet data for analysis")
        _, lRemErr := client.LRem(tweetsProcessorListName, 0, tweetJSON).Result()
        if lRemErr != nil {
            fmt.Println("unable to delete entry from list " + lRemErr.Error())
        }
    }

}
```

Once tweet information is stored (processed successfully), the entry from the list is removed using `LREM`

### Tweets statistics service

This module exposes a REST API to extract tweet information based on few user specified criterion - `tweets-stats-api.go` is where all the logic resides

The first criteria allows user to search tweets based on keywords/terms

* you can specify one or more keywords (`keywords` query parameter), and,
* choose whether you want to apply the `AND` or `OR` criteria on top of it (`op` query parameter)

The second criteria adds the `date` dimension (along with the keyword) - using the `date` query parameter. The logic is the same as the previous API. The difference is the `SET`s which are queried - these are the ones in which date-wise keyword information is stored `keyword_tweets:[keyword]:[date]` e.g.`keyword_tweets:java:2018-05-10`

```
func main() {
    router := gin.Default()
    router.GET("tweets", findTweetsHandler)
    router.Run()
}
```

`findTweetsHandler` is the function serves as the entry point of the REST API. It extracts the required information from the URL i.e. keyword(s), operation (`AND` or `OR`), date and passes on the work to a more generic `findTweets` function

```
func findTweetsHandler(c *gin.Context) {
    fmt.Println("request URL", c.Request.URL.String())

    keywords := c.Query("keywords")

    if keywords == "" {
        c.Status(400)
        c.Writer.WriteString("keywords query parameter cannot be empty")
        return
    }

    fmt.Println("searching for tweets with keywords", keywords)

    date := c.Query("date")

    if date != "" {
        fmt.Println("searching for tweets on", date)
    }
    operation := c.Query("op")

    if operation != "" {
        fmt.Println("applying operation", operation)
    }

    tweets, err := findTweets(keywords, operation, date)

    if err != nil {
        c.Status(500)
        c.Writer.WriteString("Unable to fetch tweets due to " + err.Error())
        return
    }

    c.JSON(200, tweets)
}    
```

The first part of `findTweets` function calculates the `SET`s from which the tweet info needs to be extracted

```
var sets []string
keywordsSlice := strings.Split(keywords, ",")
for _, keyword := range keywordsSlice {
    var setName string
    if date == "" {
        setName = setNamePrefix + keyword
    } else {
        setName = setNamePrefix + keyword + ":" + date
    }
    sets = append(sets, setName)
}
```

In case of a single keyword, the a single `SET` name needs to be queried e.g. `keyword_tweet:redis`. In case of multiple keywords with `AND` criteria, the intersection of all `SET`s is calculated (`SINTERSTORE`) e.g. `keyword_tweet:java` AND `keyword_tweet:nosql`. For multiple keywords with `OR` criteria, the union of all `SET`s is calculated (`SUNIONSTORE`) e.g. `keyword_tweet:redis` OR `keyword_tweet:nosql` OR `keyword_tweet:database`

```
switch operation {
case "":
    fmt.Println("no operation specified")
    tempSetName = sets[0]
    deleteSet = false
case "AND":
    fmt.Println("AND operation specified")
    tempSetName, _ = generateRandomString(10)
    client.SInterStore(tempSetName, sets...)
case "OR":
    fmt.Println("OR operation specified")
    tempSetName = sets[0]
    tempSetName, _ = generateRandomString(10)
    client.SUnionStore(tempSetName, sets...)
}
```

The resulting `SET` just contains the tweet ID and the `tweets:[tweet_id]` which is obtained using `SMEMBERS`

```
tweetIDs, smembersErr := client.SMembers(tempSetName).Result()
    if smembersErr != nil {
        return nil, errors.New("Unable to find members of set " + tempSetName)
    }
```

The details of the tweet is obtained by querying each tweet ID from the `SET` from the specific `HASH` (e.g. `tweet:987654321`) using `HGETALL`

```
var tweets []map[string]string
for _, tweetID := range tweetIDs {
    tweetInfoHashName := hashNamePrefix + tweetID
    tweetInfoMap, hgetallErr := client.HGetAll(tweetInfoHashName).Result()
    if hgetallErr != nil {
        fmt.Println("unable to fetch info for tweet ID", tweetID)
    }
    tweets = append(tweets, tweetInfoMap)
}
```

These details are serialized and returned to user in JSON format

## Docker setup

Here is the `docker-compose.yml` for the application

```
version: '3'
services:
    redis:
        image: redis
        container_name: redis
        ports:
            - '6379:6379'
    tweets-ingestion-service:
        build: tweets-producer
        environment:
            - REDIS_HOST=redis
            - REDIS_PORT=6379
            - PORT=9090
        ports:
            - '8080:9090'
        depends_on:
            - redis
    tweets-consumer:
        build: tweets-consumer
        depends_on:
            - redis
        environment:
            - REDIS_HOST=redis
            - REDIS_PORT=6379
            - TWITTER_TRACKED_TERMS=trump,realDonaldTrump,redis,golang,database,nosql
            - TWITTER_CONSUMER_KEY=s3cr3t
            - TWITTER_CONSUMER_SECRET=s3cr3t
            - TWITTER_ACCESS_TOKEN=s3cr3t
            - TWITTER_ACCESS_TOKEN_SECRET=s3cr3t
    tweets-stats-service:
        build: tweets-stats
        environment:
            - REDIS_HOST=redis
            - REDIS_PORT=6379
            - PORT=9090
        ports:
            - '8081:9090'
        depends_on:
            - redis
```

It defines four services

* `redis` - this is based off the Docker Hub Redis image
* `tweets-ingestion-service`- it is based on a custom Dockerfile

  FROM golang as build-stage WORKDIR /go/ RUN go get -u github.com/go-redis/redis && go get -u github.com/gin-gonic/gin && go get -u github.com/dghubble/oauth1 && go get -u github.com/dghubble/go-twitter/twitter COPY src/ /go/src RUN cd /go/src && CGO\_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o tweets-producer

  FROM scratch COPY --from=build-stage /go/src/tweets-producer / CMD \["/tweets-producer"]

A [multi-stage build process](https://docs.docker.com/develop/develop-images/multistage-build/) is used wherein a different image is used for building our Go app and a different image is used as the base for running it - `golang` [Dockerhub image](https://hub.docker.com/_/golang/) is used for the build process which results in a single binary (for linux). Since we have the binary with all dependencies packed in, all we need is the minimal image for running it and thus we use the lightweight `scratch` [image](https://hub.docker.com/_/scratch/) for this purpose

* `tweets-consumer` and `tweets-stats-service` - the image creation recipes for these services is the same as `tweets-ingestion-service`

## Test drive

* Install [curl](https://curl.haxx.se/), [Postman](https://www.getpostman.com/) or any other HTTP tool to interact with the REST endpoints of the service
* Get the project - `git clone https://github.com/abhirockzz/practical-redis.git`&#x20;
* `cd practical-redis/real-time-twitter-analysis`

**Create Twitter app**

You need to have a Twiter account in order to create a Twitter App. Browse to `https://apps.twitter.com/` and click `Create New App` to start the process

**Update docker-compose.yml**

After creating the app, update `docker-compose.yml` with credentials associated with the app you just created (check the `Keys and Access Tokens` tab)

* `TWITTER_CONSUMER_KEY` - populate this with the value in `Consumer Key (API Key)` field on the app details page
* `TWITTER_CONSUMER_SECRET` - populate this with the value in `Consumer Secret (API Secret)` field on the app details page
* `TWITTER_ACCESS_TOKEN` - populate this with the value in `Access Token` field on the app details page
* `TWITTER_ACCESS_TOKEN_SECRET` - populate this with the value in `Access Token Secret` field on the app details page

To start the services - `docker-compose up --build` and use `docker-compose down -v` to stop

> Replace `DOCKER_IP` with the IP address of your Docker instance which you can obtain using `docker-machine ip`. In case of Linux or Mac, this might be `localhost`. The port (`8080` or `8081` in this case) is the specified in `docker-compose.yml`

**Start the tweet ingestion service**

`curl http://DOCKER_IP:8080/tweets/producer`

> to stop the service, use `curl -X DELETE http://DOCKER_IP:8080/tweets/producer`

The ingestion service should now start consuming relevant tweets (as per the keywords specified by `TWITTER_TRACKED_TERMS` environment variable). You can check redis to confirm the creation of the `HASH`es and `SET`s created by consumer service

**Execute queries using stats service**

As the tweets continue to flow and get processed, you can query the stats service to get tweet data for relevant keywords and dates

* Get all tweets with a keyword (e.g. redis) - e.g. to search for tweets with the keyword `redis` execute `curl http://DOCKER_IP:8081/tweets?keywords=redis`. You should recieve a HTTP `200` as a response along with the JSON payload (if there are tweets which contain the keyword)

  ```
    {
        "tweets": [
            {
                "tweeter": "tosadvsr",
                "tweet": "@gregyoung do you use something to keep a state as your events come in for quick reports access, like stats.d or a redis abstraction? would love to dm if you have a free moment please inbox.",
                "created_date": "19-06-2018",
                "tweet_id": "1009098347016269825",
                 "terms": [
                    "redis"
                ]
            },
            {
                "tweeter": "totalcloudio",
                "tweet": "RT @awswhatsnew: Amazon ElastiCache for Redis announces support for Redis 4.0 with caching improvements and better memory management for hi…",
                "created_date": "19-06-2018",
                "tweet_id": "1009098444487589888",
                 "terms": [
                    "redis"
                ]
            },
            ...........
        ]
    }
  ```
* Get tweets with multiple keywords using `OR` operator - e.g. to search for tweets with keywords `java` **or** `database` execute `curl http://DOCKER_IP:8081/tweets?keywords=java,database&op=OR`. You should recieve a HTTP `200` as a response along with the JSON payload (if there are tweets which contain the keywords)

  ```
        {
            "tweets": [
                {
                    "tweeter": "ciphertxt",
                    "tweet": "Maven: Deploy Java Apps to Azure with Tomcat on Linux https://t.co/GbSsNx0YWM #Azure",
                    "created_date": "19-06-2018",
                    "tweet_id": "1009105478985617408",
                     "terms": [
                        "java"
                    ]
                },
                {
                    "tweeter": "ITJobs_Az",
                    "tweet": "Sr. java Backend developer: Sr. java Backend developer Ref No.: 18-24863 Location: Tempe, Arizona Role Sr. JAVA Back End Developer Responsibilities – Hands-on skills in troubleshooting and debugging complex software – 5+ years of experience in designing… https://t.co/7YeVbLSTcR",
                    "created_date": "19-06-2018",
                    "tweet_id": "1009103998241001473",
                     "terms": [
                        "java"
                    ]
                },
                {
                    "tweeter": "armaninspace",
                    "tweet": "Artificial Intelligence Takes on Large-Scale Database Management #Artificial_Intelligence #Database https://t.co/wNZVRJtAOn",
                    "created_date": "19-06-2018",
                    "tweet_id": "1009109659083595781",
                     "terms": [
                        "database"
                    ]
                },
                .........
            ]
        }
  ```
* Get tweets with multiple keywords using `AND` operator - e.g. to search for tweets with keywords `java` **and** `database` execute `curl http://DOCKER_IP:8081/tweets?keywords=java,database&op=AND`. You should recieve a HTTP `200` as a response along with the JSON payload (if there are tweets which contain the keywords)

  ```
    {
        "tweets": [
            {
                "tweeter": "Internships_KE",
                "tweet": "RT @droid254: Cellulant is looking for PHP and Java developers Database : MySQL",
                "created_date": "19-06-2018",
                "tweet_id": "1009103084381929474",
                "hashtags": [
                    "database",
                    "java"
                ]
            },
            {
                "tweeter": "lewis_sawe",
                "tweet": "RT @droid254: Cellulant is looking for PHP and Java developers Database : MySQL",
                "created_date": "19-06-2018",
                "tweet_id": "1009105251306176512",
                "hashtags": [
                    "database",
                    "java"
                ]
            }
        ]
    }
  ```
* Get tweets with for a specific keyword on a date - e.g. to search for tweets with keyword `nosql` on 20 July, 2018 you need to execute `curl http://DOCKER_IP:8081/tweets?keywords=nosql&date=20-06-2018`. You should recieve a HTTP `200` as a response along with the JSON payload (if there are tweets on th specified date which contain the keywords)

```
    {
        "tweets": [
            {
                "tweeter": "Arieleit",
                "tweet": "RT @code__tutorials: Learn How Python Works with NoSql Database MongoDB: PyMongo\n\n☞ https://t.co/70CXrFtj38\n\n#python #MongoDB https://t.co/…",
                "created_date": "20-06-2018",
                "tweet_id": "1009253195275816960",
                 "terms": [
                    "database",
                    "nosql"
                ]
            },
            {
                "tweeter": "launchjobs",
                "tweet": "What makes NoSQL databases especially relevant today is that they are particularly well suited for working with large sets of distributed . #Tech #Data \nhttps://t.co/UL8nK4Gs7L",
                "created_date": "20-06-2018",
                "tweet_id": "1009253269116473345",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @CastIrony: @thoward37 @mattly Depends which NoSQL db you're looking at of course, but generally aggregation (including complex grouped…",
                "created_date": "20-06-2018",
                "tweet_id": "1009253361604939777",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @nycallday247: WHICH NOSQL DATASTORE?! ORACLE'S JOURNEY ASSESSING CASSANDRA, SCYLLA, REDIS...\nAaron Stockton, Principal Software Enginee…",
                "created_date": "20-06-2018",
                "tweet_id": "1009253413878525952",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @RedisLabs: Calling all #Minneapolis Redis and NoSQL developers! Join #RedisLabs for a #REDWorkshop on June 21 and explore the full powe…",
                "created_date": "20-06-2018",
                "tweet_id": "1009253457792937986",
                 "terms": [
                    "redis",
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @BigDataBatman: What is the formula that makes Eclipse #JNoSQL a tool for polyglot persistence, Batman on the NoSQL world?… https://t.co…",
                "created_date": "20-06-2018",
                "tweet_id": "1009253623157571585",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @mikegreiling: @jakecodes @postgresql right? no more need for nosql document-oriented DBs like mongodb when you can have the best of bot…",
                "created_date": "20-06-2018",
                "tweet_id": "1009253739998294016",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @e4developer: @nicolas_frankel @springunidotcom I am not sure that I agree with relational databases being simpler than nosql (and nosql…",
                "created_date": "20-06-2018",
                "tweet_id": "1009259418909732864",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @chuckcalio: @IBMPowerSystems \n  Running NoSQL and encountering high costs due to 'node sprawl' ? IBM Power Systems and #scylla can help…",
                "created_date": "20-06-2018",
                "tweet_id": "1009259635058982912",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @Calista_Redmond: Extend your data architectures with @MongoDB Open Source based NoSQL data store @IBMAnalytics https://t.co/oG3j3l27m9",
                "created_date": "20-06-2018",
                "tweet_id": "1009259739027394560",
                 "terms": [
                    "nosql"
                ]
            },
            {
                "tweeter": "NoSQLDigest",
                "tweet": "RT @h_feddersen: Should all companies start to invest in NoSQL databases? https://t.co/VUp0Q8hzDn via @h_feddersen",
                "created_date": "20-06-2018",
                "tweet_id": "1009259917125955584",
                 "terms": [
                    "nosql"
                ]
            }
        ]
    }
```

* Get tweets with for multiple keywords on a date using `OR` operator - e.g. to search for tweets with keywords `java` **or** `database` on 20 July, 2018 you need to execute `curl http://DOCKER_IP:8081/tweets?keywords=java,database&op=OR&date=20-06-2018`. You should recieve a HTTP `200` as a response along with the JSON payload (if there are tweets on th specified date which contain the keywords)

  ```
    {
        "tweets": [
                    {
                "tweeter": "MakotoTheKnight",
                "tweet": "For context: NXT was released a long time ago (November '15) with initial Linux support.  A good start; an olive branch in the right direction given that Linux support had always kind of \"existed\" with the Java client, and it would be reasonable to see that continue.",
                "created_date": "20-06-2018",
                "tweet_id": "1009249404107157509",
                 "terms": [
                    "java"
                ]
            },
            {
                "tweeter": "stetayen",
                "tweet": "RT @shani_o: One platform (LinkedIn) created a  scrapable database of people's work and education histories while two other platforms (Gith…",
                "created_date": "20-06-2018",
                "tweet_id": "1009249413490003968",
                 "terms": [
                    "database"
                ]
            },
            ......
        ]
    }
  ```
* Get tweets with for multiple keywords on a date using `AND` operator - e.g. to search for tweets with keywords `java` **and** `database` on 19 July, 2018 you need to execute `curl http://DOCKER_IP:8081/tweets?keywords=java,database&op=AND&date=19-06-2018`. You should recieve a HTTP `200` as a response along with the JSON payload (if there are tweets on th specified date which contain the keywords)

  ```
    {
        "tweets": [
            {
                "tweeter": "Internships_KE",
                "tweet": "RT @droid254: Cellulant is looking for PHP and Java developers Database : MySQL",
                "created_date": "19-06-2018",
                "tweet_id": "1009103084381929474",
                "hashtags": [
                    "database",
                    "java"
                ]
            },
            {
                "tweeter": "lewis_sawe",
                "tweet": "RT @droid254: Cellulant is looking for PHP and Java developers Database : MySQL",
                "created_date": "19-06-2018",
                "tweet_id": "1009105251306176512",
                "hashtags": [
                    "database",
                    "java"
                ]
            },
            .........
        ]
    }
  ```

**Scale out**

To increase the number of instances `tweets-consumer` (e.g. from 1 to 2), use `docker-compose scale tweets-consumer=2`.

An additional container will start up and the tweet processing workload will now be distributed amongst both the instances. To make this slightly easier to confirm, you can check the logs of the `tweets-consumer` service in isolation - `docker-compose logs tweets-consumer` - you should see logs from the `tweets-consumer_2` container (in addition to the `tweets-consumer_1`) which was created as a result of the scale out process


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://abhishek-gupta.gitbook.io/practical-redis/tweet-analysis-app.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
