Extending Redis with Redis Modules

This chapter covers Redis Modules

  • Starts off with a brief introduction to Redis Modules

  • Explore ReBloom module which provides Bloom Filter as a native Redis data structure

Redis Modules 101

Redis Modules are components whose goal is to allow use to extend Redis feature set without changing the core source code. Module libraries need to be loaded into Redis. There are a couple of options

  • (preferred way) use the loadmodule configuration directive in redis.conf e.g. loadmodule custommodule.so

  • start Redis server with the loadmodule argument e.g. redis-server --loadmodule custommodule.so

  • you can also use the MODULE LOAD command to achieve the same result at runtime e.g. MODULE LOAD custommodule.so

For example, in order to build the ReJSON module you can

  • get the source code - git clone https://github.com/RedisLabsModules/rejson.git

  • build (using make) the module - resulting in rejson.so

  • .. and then load the module using above mentioned methods

The list of available (open source) Redis modules can be found here

Bloom Filter

It is a probabilistic data structure. Simply put, it can be used to check whether an element is present in a set or not. It's probabilistic because it can never give a false negative but it is possible that it might return a false positive i.e. a Bloom Filter will tell you of the absence of an element with 100% accuracy but there is a (rare) probability of an error (this is actually configurable) in checking for presence of an element. It is both fast as well as memory efficient

it is possible to tune the probability of getting a false positive by changing the size of the Bloom filter (the more space we allocate, the lesser the likelihood of false positives)

Two of its fundamental operations include

  • adding an element

  • checking if an element exists

it's not possible to remove an element from a Bloom Filter

Some of its real world uses include

This contains a bunch of userful information about Bloom Filters

Articles recommendation service using ReBloom

ReBloom module provides the capabilities of a Bloom Filter via a native Redis data structure with the help of multiple commands which it exposes e.g. BF.ADD, BF.EXISTS etc.

The scenario which we will use to explore ReBloom module is one that of a recommendation engine

  • The user will get article recommendations (based on their interests) when they visit http://app-url:port/articles/

  • User can read an article from the list of presented recommendations (URLs) - http://app-url:port/article/?url=https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a

  • Rinse and repeat......

The key is to make sure that the recommendations are not repeated i.e. if a user has read a recommended article, it should not be recommended again - this is where a Bloom Filter comes in. The end result is that, our system will never recommend an article which has been read, although it might miss a recommendation (during a rare case of a false positive) assuming that it has already been read

Technical stack

Schema

Here is a quick overview of the application specific entities and the Redis data structures they map to

  • topic:[topic-name]:articles - various topics e.g. programming, travel etc. and (URLs of) articles associated with them are stored in a Redis SET e.g. topic:creativity:articles

  • user:[user-name]:interests - the bunch of topics which a user has opted in for is also represented as a SET e.g. user:john:interests

  • RecommendationHits:[user-name] - the ReBloom filter for driving recommendations for a specific user

Implementation details

Source code available on Github

Data load

As mentioned above, our sample data set consists of topics and users interests. This is exposed via a REST API and it invokes SADD command to get the job done

the articles are actual entries on Medium.com, but the data is simulated as well as scaled down for ease of demonstration and to keep the focus on the concepts

func LoadArticlesForTopics(redisCoordinate string) {
client := redis.NewClient(&redis.Options{Addr: redisCoordinate})
defer client.Close()
//add software engineering articles
client.SAdd("topic:softwareengineering:articles", "https://medium.com/@anildash/what-if-javascript-wins-84898e5341a")
client.SAdd("topic:softwareengineering:articles", "https://hackernoon.com/the-7-biggest-lessons-ive-learned-by-building-a-twitter-bot-59fee84a9ed9")
client.SAdd("topic:softwareengineering:articles", "https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a")
client.SAdd("topic:softwareengineering:articles", "https://towardsdatascience.com/universal-language-model-to-boost-your-nlp-models-d59469dcbd64")
client.SAdd("topic:softwareengineering:articles", "https://towardsdatascience.com/designing-an-iot-solution-in-2018-7fe1356e63d6")
........
}

User interests are also populated into a Redis SET

  • we are adding 5 users (user-1 to user-5) - you can change the outer loop to add more

  • randomly adding interests for each user from a pool of 5 topics

    func LoadUserInterests(redisCoordinate string) {
    topics := []string{"softwareengineering", "creativity", "programming", "productivity", "travel"}
    client := redis.NewClient(&redis.Options{Addr: redisCoordinate})
    defer client.Close()
    rand.Seed(50)
    for i := 1; i <= 5; i++ {
    setName := "user:user-" + strconv.Itoa(i) + ":interests"
    //try to add (max) 5 interests per user. not all might be added because
    //we are at the mercy of the random generator
    for c := 0; c < 5; c++ {
    topic := topics[rand.Intn(len(topics))]
    result, _ := client.SAdd(setName, topic).Result()
    if result > 0 {
    fmt.Println("added topic " + topic + " to set " + setName)
    }
    }
    }

    }

Generating article recommendations

This is the meat of our app. Here is the simple process for recommending articles to a specific user

  • Look at the all the interests of the user. This is a simple SET lookup using SMEMBERS e.g. for user-1 it is SMEMBERS user:user-1:interests

  • Get all the articles for the topics which the user is interested in. This is another SMEMBERS query e.g. if user-1 is interested in programming and travel, we check the following SETs for articles - topic:programming:articles and topic:travel:articles

  • Combine entries from the above SETs using SUNION and the resulting entries in the SET are the bunch of recommended articles

Hold on, there is more. These are just raw recommendations. Recall, our original goal with regards to recommendations

the key is to make sure that they are not repeated i.e. if a user has read a recommended article, it should not be recommended again

To make this work

  • each of the raw results from the previous step is cross-checked in the Bloom Filter using BF.EXISTS

  • if it exists, its not included in the final set of recommendations

That's all! Fairly simple/primitive, but gets the job done. The true value will be evident when there are hundreds of topics each with 1000s of articles along with millions of users with lots of interests

This is just based on SUNION (without bloom filter)

func (recoUtil *RecommendationUtil) genRawArticleRecommendations(user string) []string {
userInteresetsSet := "user:" + user + ":interests"
members, _ := recoUtil.redisClient.SMembers(userInteresetsSet).Result()
var recoSetArr []string
for i := 0; i < len(members); i++ {
topicSetName := "topic:" + members[i] + ":articles"
recoSetArr = append(recoSetArr, topicSetName)
}
recos, _ := recoUtil.redisClient.SUnion(recoSetArr...).Result()
return recos
}

Fine grained recommendation is based on SUNION and BF.EXISTS to avoid recommending already read articles

func (recoUtil *RecommendationUtil) GenArticleRecommendations(user string) []string {
//if bloom filter contains a reco, do not include in final reco
rawRecos := recoUtil.genRawArticleRecommendations(user)
var finalRecos []string
for i := 0; i < len(rawRecos); i++ {
if recoUtil.isArticleAlreadyReadByUser(user, rawRecos[i]) == 0 { //has NOT been read for SURE
finalRecos = append(finalRecos, rawRecos[i])
} else {
fmt.Println("article " + rawRecos[i] + " has already been read")
}
}
return finalRecos
}

The actual recommendation feature is exposed via REST API

func getRecommendedArticles(resp http.ResponseWriter, req *http.Request) {
user := mux.Vars(req)["user"]
recoUtil := reco.NewRecommendationUtil(redisCoordinate)
defer recoUtil.CloseConn()
recommendedArticles := recoUtil.GenArticleRecommendations(user)
resp.Header().Set("Content-Type", "application/json")
json.NewEncoder(resp).Encode(recommendedArticles)
}

Accessing a recommended article

Once you see a list of recommended articles, you can access them another REST API (do this in your browser) e.g. http://192.168.99.100:8080/article/user-1/?url=https://medium.com/swlh/how-to-make-something-people-love-a8364771b7e6

Key thing to note is that once a recommended article is accessed, it is added to a user specific Bloom Filter e.g. RecommendationHits-user-1. As mentioned above, the recommendation process checks this Bloom Filter for a potential article recommendation, such that its absence can confirm that the article has not been read/accessed can be recommended

Docker setup

The docker-compose.yml defines the rebloom-redis and articles-recommendation-service services

version: '3'
services:
rebloom-redis:
image: redislabs/rebloom
container_name: rebloom-redis
ports:
- '6379:6379'
articles-recommendation-service:
build: .
environment:
- REDIS_HOST=rebloom-redis
- REDIS_PORT=6379
- PORT=9090
ports:
- '8080:9090'
depends_on:
- rebloom-redis

The rebloom-redis service is based on the Rebloom image from Docker Hub and the articles-recommendation-service is built using the below Dockerfile

FROM golang:alpine as build-stage
WORKDIR /go/
RUN apk --no-cache add ca-certificates git
RUN go get -u github.com/go-redis/redis && go get -u github.com/gorilla/mux
COPY src/ /go/src
RUN cd /go/src && CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o articles-recommendation-service
FROM scratch
COPY --from=build-stage /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=build-stage /go/src/ /
CMD ["/articles-recommendation-service"]

A multi-stage build process is used wherein a different image is used for building our Go app and a different image is used as the base for running it - golang Dockerhub image is used for the build process which results in a single binary (for linux). Since we have the binary with all dependencies packed in, all we need is the minimal image for running it and thus we use the lightweight scratch image for this purpose

Test drive

  • Install curl, Postman or any other HTTP tool to interact with the REST endpoints of the service

  • Get the project - git clone https://github.com/abhirockzz/practical-redis.git

  • cd practical-redis/redis-modules/rebloom

  • Invoke the startup script ./run.sh (this in turn invokes docker-compose commands)

  • Stop the application by invoking ./stop.sh from another terminal

Replace DOCKER_IP with the IP address of your Docker instance which you can obtain using docker-machine ip. The port (8080 in this case) is the specified in docker-compose.yml

Load test data

curl http://DOCKER_IP:8080/load/

You should see a HTTP 200 status response. Check Redis (using redis-cli) to ensure all the data has been seeded - KEYS *

Get recommended articles for a user

curl http://DOCKER_IP:8080/<user>/articles/ e.g. curl http://192.168.99.100:8080/user-2/articles/

You should see a JSON (array) response (similar to below)

[
"https://medium.com/swlh/how-to-make-something-people-love-a8364771b7e6",
"https://towardsdatascience.com/designing-an-iot-solution-in-2018-7fe1356e63d6",
"https://towardsdatascience.com/unsupervised-learning-with-python-173c51dc7f03",
"https://towardsdatascience.com/universal-language-model-to-boost-your-nlp-models-d59469dcbd64",
"https://blog.prototypr.io/growing-an-idea-from-an-interest-to-a-product-a0757b415bbb",
"https://medium.com/@michaelpollan/medium-com-trips-aed86f968810",
"https://hackernoon.com/the-7-biggest-lessons-ive-learned-by-building-a-twitter-bot-59fee84a9ed9",
"https://medium.com/@jrodthoughts/using-deep-learning-to-understand-your-source-code-28e5c284bfda",
"https://medium.com/@evheniybystrov/react-redux-for-lazy-developers-b551f16a456f",
"https://medium.com/@anildash/what-if-javascript-wins-84898e5341a",
"https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a",
"https://medium.com/hackerpreneur-magazine/how-i-hacked-into-one-of-the-most-popular-dating-websites-4cb7907c3796",
"https://medium.com/personal-growth/walt-disney-how-to-truly-love-what-you-do-f3449c78ca65",
"https://medium.com/@shauntagrimes/challenge-yourself-to-learn-from-masters-3f99064e0f2e",
"https://medium.com/sololearn/warning-your-programming-career-b9579b3a878b"
]

Read/access recommended articles

Access the following URL (preferably using your browser) http://DOCKER_IP:8080/<user>/article/?url=<one of the recommended article URL> e.g. http://192.168.99.100:8080/user-1/article/?url=https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a. You should see the article in your browser

Check recommendations for the same user

curl http://DOCKER_IP:8080/<user>/articles/ e.g. curl http://192.168.99.100:8080/user-2/articles/. You should see lesser of recommendations (depending on how many articles you read/accessed using the previous step)