# Extending Redis with Redis Modules

This chapter covers **Redis Modules**

* Starts off with a brief introduction to Redis Modules
* Explore [ReBloom module](https://github.com/RedisLabsModules/rebloom) which provides Bloom Filter as a native Redis data structure

## Redis Modules 101

Redis Modules are components whose goal is to allow use to extend Redis feature set without changing the core source code. Module libraries need to be loaded into Redis. There are a couple of options

* (preferred way) use the `loadmodule` configuration directive in `redis.conf` e.g. `loadmodule custommodule.so`
* start Redis server with the `loadmodule` argument e.g. `redis-server --loadmodule custommodule.so`
* you can also use the `MODULE LOAD` command to achieve the same result at runtime e.g. `MODULE LOAD custommodule.so`

For example, in order to build the **ReJSON** module you can

* get the source code - `git clone https://github.com/RedisLabsModules/rejson.git`
* build (using `make`) the module - resulting in `rejson.so`
* .. and then *load* the module using above mentioned methods

> The list of available (open source) Redis modules can be [found here](https://redis.io/modules)

## Bloom Filter

It is a *probabilistic* data structure. Simply put, it can be used to check whether an element is present in a set or not. It's probabilistic because it can never give a *false negative* but it is possible that it might return a *false positive* i.e. a Bloom Filter will tell you of the absence of an element with *100% accuracy* but there is a (rare) probability of an error (this is actually configurable) in checking for presence of an element. It is both fast as well as memory efficient

> it is possible to tune the probability of getting a false positive by changing the *size* of the Bloom filter (the *more* space we allocate, the *lesser* the likelihood of false positives)

Two of its fundamental operations include

* adding an element
* checking if an element exists

> it's not possible to remove an element from a Bloom Filter

Some of its real world uses include

* reduce expensive disk access e.g. [Apache Cassandra](http://cassandra.apache.org/doc/latest/operating/bloom_filters.html)
* reduce expensive network lookups e.g. [Google Chrome browser](http://blog.alexyakunin.com/2010/03/nice-bloom-filter-application.html)
* recommendation engines e.g. [Medium](https://blog.medium.com/what-are-bloom-filters-1ec2a50c68ff) (our scenario is inspired by this)
* etc...

> [This](https://en.wikipedia.org/wiki/Bloom_filter) contains a bunch of userful information about Bloom Filters

## Articles recommendation service using **ReBloom**

[ReBloom module](https://github.com/RedisLabsModules/rebloom) provides the capabilities of a Bloom Filter via a native Redis data structure with the help of [multiple commands](https://github.com/RedisLabsModules/rebloom/blob/master/docs/Bloom_Commands.md) which it exposes e.g. `BF.ADD`, `BF.EXISTS` etc.

The scenario which we will use to explore ReBloom module is one that of a recommendation engine

* The user will get article recommendations (based on their interests) when they visit `http://app-url:port/articles/`&#x20;
* User can read an article from the list of presented recommendations (URLs) - `http://app-url:port/article/?url=https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a`
* Rinse and repeat......

The key is to make sure that the recommendations are not repeated i.e. if a user has read a recommended article, it should not be recommended again - this is where a Bloom Filter comes in. The end result is that, our system will **never** recommend an article which has been read, although it **might miss** a recommendation (during a rare case of a false positive) assuming that it has already been read

### Technical stack

* [go-redis](https://github.com/go-redis/redis) as the Redis Go client
* [Gorilla mux package](https://github.com/gorilla/mux) for REST endpoints
* [Docker](https://www.docker.com/)
  * [pre-built Docker image ](https://hub.docker.com/r/redislabs/rebloom/)for ReBloom module
  * [Docker Compose](https://docs.docker.com/compose/) to run the solution with a single command

### Schema

Here is a quick overview of the application specific entities and the Redis data structures they map to

* `topic:[topic-name]:articles` - various topics e.g. programming, travel etc. and (URLs of) articles associated with them are stored in a Redis `SET` e.g. `topic:creativity:articles`
* `user:[user-name]:interests` - the bunch of topics which a user has opted in for is also represented as a `SET` e.g. `user:john:interests`
* `RecommendationHits:[user-name]` - the ReBloom filter for driving recommendations for a specific user

## Implementation details

> Source code available on [Github](https://github.com/abhirockzz/practical-redis/tree/master/redis-modules/rebloom)

### Data load

As mentioned above, our sample data set consists of topics and users interests. This is exposed via a REST API and it invokes `SADD` command to get the job done

> the articles are actual entries on [Medium.com](https://medium.com), but the data is simulated as well as scaled down for ease of demonstration and to keep the focus on the concepts

```
func LoadArticlesForTopics(redisCoordinate string) {

    client := redis.NewClient(&redis.Options{Addr: redisCoordinate})
    defer client.Close()

    //add software engineering articles

    client.SAdd("topic:softwareengineering:articles", "https://medium.com/@anildash/what-if-javascript-wins-84898e5341a")
    client.SAdd("topic:softwareengineering:articles", "https://hackernoon.com/the-7-biggest-lessons-ive-learned-by-building-a-twitter-bot-59fee84a9ed9")
    client.SAdd("topic:softwareengineering:articles", "https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a")
    client.SAdd("topic:softwareengineering:articles", "https://towardsdatascience.com/universal-language-model-to-boost-your-nlp-models-d59469dcbd64")
    client.SAdd("topic:softwareengineering:articles", "https://towardsdatascience.com/designing-an-iot-solution-in-2018-7fe1356e63d6")

    ........
}
```

User interests are also populated into a Redis `SET`

* we are adding 5 users (`user-1` to `user-5`) - you can change the outer loop to add more
* randomly adding interests for each user from a pool of 5 topics

  ```
    func LoadUserInterests(redisCoordinate string) {
    topics := []string{"softwareengineering", "creativity", "programming", "productivity", "travel"}

    client := redis.NewClient(&redis.Options{Addr: redisCoordinate})
    defer client.Close()

    rand.Seed(50)

    for i := 1; i <= 5; i++ {
        setName := "user:user-" + strconv.Itoa(i) + ":interests"

        //try to add (max) 5 interests per user. not all might be added because
        //we are at the mercy of the random generator
        for c := 0; c < 5; c++ {
            topic := topics[rand.Intn(len(topics))]
            result, _ := client.SAdd(setName, topic).Result()
            if result > 0 {
                fmt.Println("added topic " + topic + " to set " + setName)
            }
        }

    }
  ```

  }

### Generating article recommendations

This is the meat of our app. Here is the simple process for recommending articles to a specific user

* Look at the all the interests of the user. This is a simple `SET` lookup using `SMEMBERS` e.g. for `user-1` it is `SMEMBERS user:user-1:interests`
* Get all the articles for the topics which the user is interested in. This is another `SMEMBERS` query e.g. if `user-1` is interested in `programming` and `travel`, we check the following `SET`s for articles - `topic:programming:articles` and `topic:travel:articles`
* Combine entries from the above `SET`s using `SUNION` and the resulting entries in the `SET` are the bunch of recommended articles

Hold on, there is more. These are just raw recommendations. Recall, our original goal with regards to recommendations

> the key is to make sure that they are not repeated i.e. if a user has read a recommended article, it should not be recommended again

To make this work

* each of the raw results from the previous step is cross-checked in the Bloom Filter using `BF.EXISTS`
* if it exists, its not included in the final set of recommendations

That's all! Fairly simple/primitive, but gets the job done. The true value will be evident when there are hundreds of topics each with 1000s of articles along with millions of users with lots of interests

This is just based on `SUNION` (without bloom filter)

```
func (recoUtil *RecommendationUtil) genRawArticleRecommendations(user string) []string {
    userInteresetsSet := "user:" + user + ":interests"

    members, _ := recoUtil.redisClient.SMembers(userInteresetsSet).Result()

    var recoSetArr []string
    for i := 0; i < len(members); i++ {
        topicSetName := "topic:" + members[i] + ":articles"
        recoSetArr = append(recoSetArr, topicSetName)
    }

    recos, _ := recoUtil.redisClient.SUnion(recoSetArr...).Result()
    return recos
}
```

Fine grained recommendation is based on `SUNION` and `BF.EXISTS` to avoid recommending already read articles

```
func (recoUtil *RecommendationUtil) GenArticleRecommendations(user string) []string {
    //if bloom filter contains a reco, do not include in final reco
    rawRecos := recoUtil.genRawArticleRecommendations(user)

    var finalRecos []string
    for i := 0; i < len(rawRecos); i++ {
        if recoUtil.isArticleAlreadyReadByUser(user, rawRecos[i]) == 0 { //has NOT been read for SURE
            finalRecos = append(finalRecos, rawRecos[i])
        } else {
            fmt.Println("article " + rawRecos[i] + " has already been read")
        }
    }
    return finalRecos
}
```

> The actual recommendation feature is exposed via REST API

```
func getRecommendedArticles(resp http.ResponseWriter, req *http.Request) {
    user := mux.Vars(req)["user"]
    recoUtil := reco.NewRecommendationUtil(redisCoordinate)
    defer recoUtil.CloseConn()

    recommendedArticles := recoUtil.GenArticleRecommendations(user)
    resp.Header().Set("Content-Type", "application/json")
    json.NewEncoder(resp).Encode(recommendedArticles)
}
```

### Accessing a recommended article

Once you see a list of recommended articles, you can access them another REST API (do this in your browser) e.g. `http://192.168.99.100:8080/article/user-1/?url=https://medium.com/swlh/how-to-make-something-people-love-a8364771b7e6`

Key thing to note is that once a recommended article is accessed, it is added to a user specific Bloom Filter e.g. `RecommendationHits-user-1`. As mentioned above, the recommendation process checks this Bloom Filter for a potential article recommendation, such that its absence can confirm that the article has not been read/accessed can be recommended

## Docker setup

The `docker-compose.yml` defines the `rebloom-redis` and `articles-recommendation-service` services

```
version: '3'
services:
    rebloom-redis:
        image: redislabs/rebloom
        container_name: rebloom-redis
        ports:
            - '6379:6379'
    articles-recommendation-service:
        build: .
        environment:
            - REDIS_HOST=rebloom-redis
            - REDIS_PORT=6379
            - PORT=9090
        ports:
            - '8080:9090'
        depends_on:
            - rebloom-redis
```

The `rebloom-redis` service is based on the [Rebloom image from Docker Hub](https://hub.docker.com/r/redislabs/rebloom/) and the `articles-recommendation-service` is built using the below Dockerfile

```
FROM golang:alpine as build-stage
WORKDIR /go/
RUN apk --no-cache add ca-certificates git
RUN go get -u github.com/go-redis/redis && go get -u github.com/gorilla/mux
COPY src/ /go/src
RUN cd /go/src && CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o articles-recommendation-service

FROM scratch
COPY --from=build-stage /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=build-stage /go/src/ /
CMD ["/articles-recommendation-service"]
```

A [multi-stage build process](https://docs.docker.com/develop/develop-images/multistage-build/) is used wherein a different image is used for building our Go app and a different image is used as the base for running it - `golang` [Dockerhub image](https://hub.docker.com/_/golang/) is used for the build process which results in a single binary (for linux). Since we have the binary with all dependencies packed in, all we need is the minimal image for running it and thus we use the lightweight `scratch` [image](https://hub.docker.com/_/scratch/) for this purpose

## Test drive

* Install [curl](https://curl.haxx.se/), [Postman](https://www.getpostman.com/) or any other HTTP tool to interact with the REST endpoints of the service
* Get the project - `git clone https://github.com/abhirockzz/practical-redis.git`&#x20;
* `cd practical-redis/redis-modules/rebloom`
* Invoke the startup script `./run.sh` (this in turn invokes `docker-compose` commands)
* Stop the application by invoking `./stop.sh` from another terminal

> Replace `DOCKER_IP` with the IP address of your Docker instance which you can obtain using `docker-machine ip`. The port (`8080` in this case) is the specified in `docker-compose.yml`

**Load test data**

`curl http://DOCKER_IP:8080/load/`

You should see a HTTP `200` status response. Check Redis (using `redis-cli`) to ensure all the data has been seeded - `KEYS *`

**Get recommended articles for a user**

`curl http://DOCKER_IP:8080/<user>/articles/` e.g. `curl http://192.168.99.100:8080/user-2/articles/`

You should see a JSON (array) response (similar to below)

```
[
"https://medium.com/swlh/how-to-make-something-people-love-a8364771b7e6",
"https://towardsdatascience.com/designing-an-iot-solution-in-2018-7fe1356e63d6",
"https://towardsdatascience.com/unsupervised-learning-with-python-173c51dc7f03",
"https://towardsdatascience.com/universal-language-model-to-boost-your-nlp-models-d59469dcbd64",
"https://blog.prototypr.io/growing-an-idea-from-an-interest-to-a-product-a0757b415bbb",
"https://medium.com/@michaelpollan/medium-com-trips-aed86f968810",
"https://hackernoon.com/the-7-biggest-lessons-ive-learned-by-building-a-twitter-bot-59fee84a9ed9",
"https://medium.com/@jrodthoughts/using-deep-learning-to-understand-your-source-code-28e5c284bfda",
"https://medium.com/@evheniybystrov/react-redux-for-lazy-developers-b551f16a456f",
"https://medium.com/@anildash/what-if-javascript-wins-84898e5341a",
"https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a",
"https://medium.com/hackerpreneur-magazine/how-i-hacked-into-one-of-the-most-popular-dating-websites-4cb7907c3796",
"https://medium.com/personal-growth/walt-disney-how-to-truly-love-what-you-do-f3449c78ca65",
"https://medium.com/@shauntagrimes/challenge-yourself-to-learn-from-masters-3f99064e0f2e",
"https://medium.com/sololearn/warning-your-programming-career-b9579b3a878b"
]
```

**Read/access recommended articles**

Access the following URL (preferably using your browser) `http://DOCKER_IP:8080/<user>/article/?url=<one of the recommended article URL>` e.g. `http://192.168.99.100:8080/user-1/article/?url=https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a`. You should see the article in your browser

**Check recommendations for the same user**

`curl http://DOCKER_IP:8080/<user>/articles/` e.g. `curl http://192.168.99.100:8080/user-2/articles/`. You should see lesser of recommendations (depending on how many articles you read/accessed using the previous step)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://abhishek-gupta.gitbook.io/practical-redis/redis-modules.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
