Extending Redis with Redis Modules

This chapter covers Redis Modules

Starts off with a brief introduction to Redis Modules
Explore ReBloom module which provides Bloom Filter as a native Redis data structure

Redis Modules 101

Redis Modules are components whose goal is to allow use to extend Redis feature set without changing the core source code. Module libraries need to be loaded into Redis. There are a couple of options

(preferred way) use the loadmodule configuration directive in redis.conf e.g. loadmodule custommodule.so
start Redis server with the loadmodule argument e.g. redis-server --loadmodule custommodule.so
you can also use the MODULE LOAD command to achieve the same result at runtime e.g. MODULE LOAD custommodule.so

For example, in order to build the ReJSON module you can

get the source code - git clone https://github.com/RedisLabsModules/rejson.git
build (using make) the module - resulting in rejson.so
.. and then load the module using above mentioned methods

The list of available (open source) Redis modules can be found here

Bloom Filter

It is a probabilistic data structure. Simply put, it can be used to check whether an element is present in a set or not. It's probabilistic because it can never give a false negative but it is possible that it might return a false positive i.e. a Bloom Filter will tell you of the absence of an element with 100% accuracy but there is a (rare) probability of an error (this is actually configurable) in checking for presence of an element. It is both fast as well as memory efficient

it is possible to tune the probability of getting a false positive by changing the size of the Bloom filter (the more space we allocate, the lesser the likelihood of false positives)

Two of its fundamental operations include

adding an element
checking if an element exists

it's not possible to remove an element from a Bloom Filter

Some of its real world uses include

reduce expensive disk access e.g. Apache Cassandra
reduce expensive network lookups e.g. Google Chrome browser
recommendation engines e.g. Medium (our scenario is inspired by this)
etc...

This contains a bunch of userful information about Bloom Filters

Articles recommendation service using ReBloom

ReBloom module provides the capabilities of a Bloom Filter via a native Redis data structure with the help of multiple commands which it exposes e.g. BF.ADD, BF.EXISTS etc.

The scenario which we will use to explore ReBloom module is one that of a recommendation engine

The user will get article recommendations (based on their interests) when they visit http://app-url:port/articles/
User can read an article from the list of presented recommendations (URLs) - http://app-url:port/article/?url=https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a
Rinse and repeat......

The key is to make sure that the recommendations are not repeated i.e. if a user has read a recommended article, it should not be recommended again - this is where a Bloom Filter comes in. The end result is that, our system will never recommend an article which has been read, although it might miss a recommendation (during a rare case of a false positive) assuming that it has already been read

Technical stack

go-redis as the Redis Go client
Gorilla mux package for REST endpoints
Docker
- pre-built Docker image for ReBloom module
- Docker Compose to run the solution with a single command

Schema

Here is a quick overview of the application specific entities and the Redis data structures they map to

topic:[topic-name]:articles - various topics e.g. programming, travel etc. and (URLs of) articles associated with them are stored in a Redis SET e.g. topic:creativity:articles
user:[user-name]:interests - the bunch of topics which a user has opted in for is also represented as a SET e.g. user:john:interests
RecommendationHits:[user-name] - the ReBloom filter for driving recommendations for a specific user

Implementation details

Source code available on Github

Data load

As mentioned above, our sample data set consists of topics and users interests. This is exposed via a REST API and it invokes SADD command to get the job done

the articles are actual entries on Medium.com, but the data is simulated as well as scaled down for ease of demonstration and to keep the focus on the concepts

func LoadArticlesForTopics(redisCoordinate string) {

    client := redis.NewClient(&redis.Options{Addr: redisCoordinate})
    defer client.Close()

    //add software engineering articles

    client.SAdd("topic:softwareengineering:articles", "https://medium.com/@anildash/what-if-javascript-wins-84898e5341a")
    client.SAdd("topic:softwareengineering:articles", "https://hackernoon.com/the-7-biggest-lessons-ive-learned-by-building-a-twitter-bot-59fee84a9ed9")
    client.SAdd("topic:softwareengineering:articles", "https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a")
    client.SAdd("topic:softwareengineering:articles", "https://towardsdatascience.com/universal-language-model-to-boost-your-nlp-models-d59469dcbd64")
    client.SAdd("topic:softwareengineering:articles", "https://towardsdatascience.com/designing-an-iot-solution-in-2018-7fe1356e63d6")

    ........
}

User interests are also populated into a Redis SET

we are adding 5 users (user-1 to user-5) - you can change the outer loop to add more

randomly adding interests for each user from a pool of 5 topics

  func LoadUserInterests(redisCoordinate string) {
  topics := []string{"softwareengineering", "creativity", "programming", "productivity", "travel"}

  client := redis.NewClient(&redis.Options{Addr: redisCoordinate})
  defer client.Close()

  rand.Seed(50)

  for i := 1; i <= 5; i++ {
      setName := "user:user-" + strconv.Itoa(i) + ":interests"

      //try to add (max) 5 interests per user. not all might be added because
      //we are at the mercy of the random generator
      for c := 0; c < 5; c++ {
          topic := topics[rand.Intn(len(topics))]
          result, _ := client.SAdd(setName, topic).Result()
          if result > 0 {
              fmt.Println("added topic " + topic + " to set " + setName)
          }
      }

  }

}

Generating article recommendations

This is the meat of our app. Here is the simple process for recommending articles to a specific user

Look at the all the interests of the user. This is a simple SET lookup using SMEMBERS e.g. for user-1 it is SMEMBERS user:user-1:interests
Get all the articles for the topics which the user is interested in. This is another SMEMBERS query e.g. if user-1 is interested in programming and travel, we check the following SETs for articles - topic:programming:articles and topic:travel:articles
Combine entries from the above SETs using SUNION and the resulting entries in the SET are the bunch of recommended articles

Hold on, there is more. These are just raw recommendations. Recall, our original goal with regards to recommendations

the key is to make sure that they are not repeated i.e. if a user has read a recommended article, it should not be recommended again

To make this work

each of the raw results from the previous step is cross-checked in the Bloom Filter using BF.EXISTS
if it exists, its not included in the final set of recommendations

That's all! Fairly simple/primitive, but gets the job done. The true value will be evident when there are hundreds of topics each with 1000s of articles along with millions of users with lots of interests

This is just based on SUNION (without bloom filter)

func (recoUtil *RecommendationUtil) genRawArticleRecommendations(user string) []string {
    userInteresetsSet := "user:" + user + ":interests"

    members, _ := recoUtil.redisClient.SMembers(userInteresetsSet).Result()

    var recoSetArr []string
    for i := 0; i < len(members); i++ {
        topicSetName := "topic:" + members[i] + ":articles"
        recoSetArr = append(recoSetArr, topicSetName)
    }

    recos, _ := recoUtil.redisClient.SUnion(recoSetArr...).Result()
    return recos
}

Fine grained recommendation is based on SUNION and BF.EXISTS to avoid recommending already read articles

func (recoUtil *RecommendationUtil) GenArticleRecommendations(user string) []string {
    //if bloom filter contains a reco, do not include in final reco
    rawRecos := recoUtil.genRawArticleRecommendations(user)

    var finalRecos []string
    for i := 0; i < len(rawRecos); i++ {
        if recoUtil.isArticleAlreadyReadByUser(user, rawRecos[i]) == 0 { //has NOT been read for SURE
            finalRecos = append(finalRecos, rawRecos[i])
        } else {
            fmt.Println("article " + rawRecos[i] + " has already been read")
        }
    }
    return finalRecos
}

The actual recommendation feature is exposed via REST API

func getRecommendedArticles(resp http.ResponseWriter, req *http.Request) {
    user := mux.Vars(req)["user"]
    recoUtil := reco.NewRecommendationUtil(redisCoordinate)
    defer recoUtil.CloseConn()

    recommendedArticles := recoUtil.GenArticleRecommendations(user)
    resp.Header().Set("Content-Type", "application/json")
    json.NewEncoder(resp).Encode(recommendedArticles)
}

Accessing a recommended article

Once you see a list of recommended articles, you can access them another REST API (do this in your browser) e.g. http://192.168.99.100:8080/article/user-1/?url=https://medium.com/swlh/how-to-make-something-people-love-a8364771b7e6

Key thing to note is that once a recommended article is accessed, it is added to a user specific Bloom Filter e.g. RecommendationHits-user-1. As mentioned above, the recommendation process checks this Bloom Filter for a potential article recommendation, such that its absence can confirm that the article has not been read/accessed can be recommended

Docker setup

The docker-compose.yml defines the rebloom-redis and articles-recommendation-service services

version: '3'
services:
    rebloom-redis:
        image: redislabs/rebloom
        container_name: rebloom-redis
        ports:
            - '6379:6379'
    articles-recommendation-service:
        build: .
        environment:
            - REDIS_HOST=rebloom-redis
            - REDIS_PORT=6379
            - PORT=9090
        ports:
            - '8080:9090'
        depends_on:
            - rebloom-redis

The rebloom-redis service is based on the Rebloom image from Docker Hub and the articles-recommendation-service is built using the below Dockerfile

FROM golang:alpine as build-stage
WORKDIR /go/
RUN apk --no-cache add ca-certificates git
RUN go get -u github.com/go-redis/redis && go get -u github.com/gorilla/mux
COPY src/ /go/src
RUN cd /go/src && CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o articles-recommendation-service

FROM scratch
COPY --from=build-stage /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=build-stage /go/src/ /
CMD ["/articles-recommendation-service"]

A multi-stage build process is used wherein a different image is used for building our Go app and a different image is used as the base for running it - golang Dockerhub image is used for the build process which results in a single binary (for linux). Since we have the binary with all dependencies packed in, all we need is the minimal image for running it and thus we use the lightweight scratch image for this purpose

Test drive

Install curl, Postman or any other HTTP tool to interact with the REST endpoints of the service
Get the project - git clone https://github.com/abhirockzz/practical-redis.git
cd practical-redis/redis-modules/rebloom
Invoke the startup script ./run.sh (this in turn invokes docker-compose commands)
Stop the application by invoking ./stop.sh from another terminal

Replace DOCKER_IP with the IP address of your Docker instance which you can obtain using docker-machine ip. The port (8080 in this case) is the specified in docker-compose.yml

Load test data

curl http://DOCKER_IP:8080/load/

You should see a HTTP 200 status response. Check Redis (using redis-cli) to ensure all the data has been seeded - KEYS *

Get recommended articles for a user

curl http://DOCKER_IP:8080/<user>/articles/ e.g. curl http://192.168.99.100:8080/user-2/articles/

You should see a JSON (array) response (similar to below)

[
"https://medium.com/swlh/how-to-make-something-people-love-a8364771b7e6",
"https://towardsdatascience.com/designing-an-iot-solution-in-2018-7fe1356e63d6",
"https://towardsdatascience.com/unsupervised-learning-with-python-173c51dc7f03",
"https://towardsdatascience.com/universal-language-model-to-boost-your-nlp-models-d59469dcbd64",
"https://blog.prototypr.io/growing-an-idea-from-an-interest-to-a-product-a0757b415bbb",
"https://medium.com/@michaelpollan/medium-com-trips-aed86f968810",
"https://hackernoon.com/the-7-biggest-lessons-ive-learned-by-building-a-twitter-bot-59fee84a9ed9",
"https://medium.com/@jrodthoughts/using-deep-learning-to-understand-your-source-code-28e5c284bfda",
"https://medium.com/@evheniybystrov/react-redux-for-lazy-developers-b551f16a456f",
"https://medium.com/@anildash/what-if-javascript-wins-84898e5341a",
"https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a",
"https://medium.com/hackerpreneur-magazine/how-i-hacked-into-one-of-the-most-popular-dating-websites-4cb7907c3796",
"https://medium.com/personal-growth/walt-disney-how-to-truly-love-what-you-do-f3449c78ca65",
"https://medium.com/@shauntagrimes/challenge-yourself-to-learn-from-masters-3f99064e0f2e",
"https://medium.com/sololearn/warning-your-programming-career-b9579b3a878b"
]

Read/access recommended articles

Access the following URL (preferably using your browser) http://DOCKER_IP:8080/<user>/article/?url=<one of the recommended article URL> e.g. http://192.168.99.100:8080/user-1/article/?url=https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a. You should see the article in your browser

Check recommendations for the same user

curl http://DOCKER_IP:8080/<user>/articles/ e.g. curl http://192.168.99.100:8080/user-2/articles/. You should see lesser of recommendations (depending on how many articles you read/accessed using the previous step)

PreviousTweet analysis service

Last updated 5 years ago

Was this helpful?