Extending Redis with Redis Modules
This chapter covers Redis Modules
Starts off with a brief introduction to Redis Modules
Explore ReBloom module which provides Bloom Filter as a native Redis data structure
Redis Modules 101
Redis Modules are components whose goal is to allow use to extend Redis feature set without changing the core source code. Module libraries need to be loaded into Redis. There are a couple of options
(preferred way) use the
loadmodule
configuration directive inredis.conf
e.g.loadmodule custommodule.so
start Redis server with the
loadmodule
argument e.g.redis-server --loadmodule custommodule.so
you can also use the
MODULE LOAD
command to achieve the same result at runtime e.g.MODULE LOAD custommodule.so
For example, in order to build the ReJSON module you can
get the source code -
git clone https://github.com/RedisLabsModules/rejson.git
build (using
make
) the module - resulting inrejson.so
.. and then load the module using above mentioned methods
The list of available (open source) Redis modules can be found here
Bloom Filter
It is a probabilistic data structure. Simply put, it can be used to check whether an element is present in a set or not. It's probabilistic because it can never give a false negative but it is possible that it might return a false positive i.e. a Bloom Filter will tell you of the absence of an element with 100% accuracy but there is a (rare) probability of an error (this is actually configurable) in checking for presence of an element. It is both fast as well as memory efficient
it is possible to tune the probability of getting a false positive by changing the size of the Bloom filter (the more space we allocate, the lesser the likelihood of false positives)
Two of its fundamental operations include
adding an element
checking if an element exists
it's not possible to remove an element from a Bloom Filter
Some of its real world uses include
reduce expensive disk access e.g. Apache Cassandra
reduce expensive network lookups e.g. Google Chrome browser
recommendation engines e.g. Medium (our scenario is inspired by this)
etc...
This contains a bunch of userful information about Bloom Filters
Articles recommendation service using ReBloom
ReBloom module provides the capabilities of a Bloom Filter via a native Redis data structure with the help of multiple commands which it exposes e.g. BF.ADD
, BF.EXISTS
etc.
The scenario which we will use to explore ReBloom module is one that of a recommendation engine
The user will get article recommendations (based on their interests) when they visit
http://app-url:port/articles/
User can read an article from the list of presented recommendations (URLs) -
http://app-url:port/article/?url=https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a
Rinse and repeat......
The key is to make sure that the recommendations are not repeated i.e. if a user has read a recommended article, it should not be recommended again - this is where a Bloom Filter comes in. The end result is that, our system will never recommend an article which has been read, although it might miss a recommendation (during a rare case of a false positive) assuming that it has already been read
Technical stack
go-redis as the Redis Go client
Gorilla mux package for REST endpoints
pre-built Docker image for ReBloom module
Docker Compose to run the solution with a single command
Schema
Here is a quick overview of the application specific entities and the Redis data structures they map to
topic:[topic-name]:articles
- various topics e.g. programming, travel etc. and (URLs of) articles associated with them are stored in a RedisSET
e.g.topic:creativity:articles
user:[user-name]:interests
- the bunch of topics which a user has opted in for is also represented as aSET
e.g.user:john:interests
RecommendationHits:[user-name]
- the ReBloom filter for driving recommendations for a specific user
Implementation details
Source code available on Github
Data load
As mentioned above, our sample data set consists of topics and users interests. This is exposed via a REST API and it invokes SADD
command to get the job done
the articles are actual entries on Medium.com, but the data is simulated as well as scaled down for ease of demonstration and to keep the focus on the concepts
User interests are also populated into a Redis SET
we are adding 5 users (
user-1
touser-5
) - you can change the outer loop to add morerandomly adding interests for each user from a pool of 5 topics
}
Generating article recommendations
This is the meat of our app. Here is the simple process for recommending articles to a specific user
Look at the all the interests of the user. This is a simple
SET
lookup usingSMEMBERS
e.g. foruser-1
it isSMEMBERS user:user-1:interests
Get all the articles for the topics which the user is interested in. This is another
SMEMBERS
query e.g. ifuser-1
is interested inprogramming
andtravel
, we check the followingSET
s for articles -topic:programming:articles
andtopic:travel:articles
Combine entries from the above
SET
s usingSUNION
and the resulting entries in theSET
are the bunch of recommended articles
Hold on, there is more. These are just raw recommendations. Recall, our original goal with regards to recommendations
the key is to make sure that they are not repeated i.e. if a user has read a recommended article, it should not be recommended again
To make this work
each of the raw results from the previous step is cross-checked in the Bloom Filter using
BF.EXISTS
if it exists, its not included in the final set of recommendations
That's all! Fairly simple/primitive, but gets the job done. The true value will be evident when there are hundreds of topics each with 1000s of articles along with millions of users with lots of interests
This is just based on SUNION
(without bloom filter)
Fine grained recommendation is based on SUNION
and BF.EXISTS
to avoid recommending already read articles
The actual recommendation feature is exposed via REST API
Accessing a recommended article
Once you see a list of recommended articles, you can access them another REST API (do this in your browser) e.g. http://192.168.99.100:8080/article/user-1/?url=https://medium.com/swlh/how-to-make-something-people-love-a8364771b7e6
Key thing to note is that once a recommended article is accessed, it is added to a user specific Bloom Filter e.g. RecommendationHits-user-1
. As mentioned above, the recommendation process checks this Bloom Filter for a potential article recommendation, such that its absence can confirm that the article has not been read/accessed can be recommended
Docker setup
The docker-compose.yml
defines the rebloom-redis
and articles-recommendation-service
services
The rebloom-redis
service is based on the Rebloom image from Docker Hub and the articles-recommendation-service
is built using the below Dockerfile
A multi-stage build process is used wherein a different image is used for building our Go app and a different image is used as the base for running it - golang
Dockerhub image is used for the build process which results in a single binary (for linux). Since we have the binary with all dependencies packed in, all we need is the minimal image for running it and thus we use the lightweight scratch
image for this purpose
Test drive
Get the project -
git clone https://github.com/abhirockzz/practical-redis.git
cd practical-redis/redis-modules/rebloom
Invoke the startup script
./run.sh
(this in turn invokesdocker-compose
commands)Stop the application by invoking
./stop.sh
from another terminal
Replace
DOCKER_IP
with the IP address of your Docker instance which you can obtain usingdocker-machine ip
. The port (8080
in this case) is the specified indocker-compose.yml
Load test data
curl http://DOCKER_IP:8080/load/
You should see a HTTP 200
status response. Check Redis (using redis-cli
) to ensure all the data has been seeded - KEYS *
Get recommended articles for a user
curl http://DOCKER_IP:8080/<user>/articles/
e.g. curl http://192.168.99.100:8080/user-2/articles/
You should see a JSON (array) response (similar to below)
Read/access recommended articles
Access the following URL (preferably using your browser) http://DOCKER_IP:8080/<user>/article/?url=<one of the recommended article URL>
e.g. http://192.168.99.100:8080/user-1/article/?url=https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a
. You should see the article in your browser
Check recommendations for the same user
curl http://DOCKER_IP:8080/<user>/articles/
e.g. curl http://192.168.99.100:8080/user-2/articles/
. You should see lesser of recommendations (depending on how many articles you read/accessed using the previous step)
Last updated