Home
>_

Dating App theory part 2 - Data and code


This is the second part in a random series Im writing about a dating app I would create (but Im not going to). Previously we looked at the rules of the app and how the rules help with a £50 per month budget to run the app for free. See the first part here. In this part I will look at our data requirements and some of the code that would need writing for the app.


We established in the previous part that each user would have a profile page, an inbox and then their main page would show their matches for the day. I also showed that a three image photo limit and 1000 words would be around 50Kb per user for their profile. This image restriction was mainly because of costs for serving images etc as we may see a limited bandwidth with some of the proposed solutions (AWS mostly).


Data


Although we should only need 50Kb per user, ofcourse this isnt actually true, as I still havent worked out how I would store their matches and who they have already matched with. This is something I would need to look into. In fact I will right now, but in the meantime, here are some other proposed data budgets.


Lets say that we somehow get 10,000 actual users who login every day and check through their matches. Lets say then that the average login every day for a user is 3 times, thats just for the users that are hunting and havent actually found a match.


Lets then say (we are saying alot) that a user that makes a match exchanges about 10 messages between them before they move onto a better messaging app. In fact lets make this 8 just because our slow but reliable messaging is probably really off putting. Also lets account for messages sent between people that then fizzle out, so 5 there. Finally, we shall say we get about 20 matches a day. Obviously Im just saying these things, this wouldnt actually happen. I mean the first thing was that we have the actual 10,000 target users, so we already know we are in dreamland here. But anyway, we have these stats:


Okay so now just to get a gauge on this data, Im going to suggest a 150Kb budget for the frontpage, a 50Kb budget for the inbox page then a 65Kb budget for the profile page including the three photos. So a user may login three times, check each profile and then login again.


In total from this one login we get 395Kb of bandwidth for this first visit a day. Then lets say the other two possible visits would just check messages and two profiles for a maximum. So the other two visits would be 330Kb of bandwidth. Overall then per day, with our app at maximum users with average user we get 395 + 330 + 330 = 1055Kb. So about a megabyte per user per day.


Then when we look at messages, they wouldnt take much to store, I wouldnt even say a Kb. Lets say we have a 1000 character limit on messages. This would translate to about a Kb per message, but nobody is doing that. Anyway we will go with that because we cant be too tight (right?). So we expect 20 matches a day with an average of 5 messages exchanged between matches. So that would be 100Kb per day. Not much at all.


Our messages and our bandwidth requirements are different, as our messages mainly have the issue of storage. But this shouldnt be a problem as they arent very large. So taking the data from the previous post about profile restrictions, we have the following bandwidth requirements if we were at maximum capacity:


Across a 30 day month, while our profile storage should stay the same, our message data will increase, until users leave the platform. Therefore I would say 150Mb for message data is more than enough. So a 30-day month would requirement:


This is over the top with the calculations really. Again, this is the service working at maximum capacity in some dream world where every user is using the application every day to look for matches. It also assumes that theres 20 matches a day. Im no whizz, but even I know off the top of my head that among 10,000 people your not likely to get 10 matches, never mind finding 600 across a month in this 10,000. But its a simple way to get an idea of what we require.


Also it highlights that our dating app doesnt actually have any massive requirements. I mean most platforms allow for more than 300Gb of bandwidth and less than 1Gb in data and image storage with no trouble.


Code


So we have a basic idea of the sort of data we will be throwing around. Just to carry these over, we have three pages with Kb budgets:


The only other thing to mention here is that the profile page includes images and text, so is actually more like 14Kb to add. Also the Inbox would need to show any actual messages too so the page itself will be smaller. But lets just say 50Kb and assume we will be fine with the less than 1Kb messages it shows. But Im overthinking here.


Going off this, there are a few things we need our app to do:


While we dont need to go right into the nitty gritty, even though I seem to have up to this point, the majority of these should be relatively simple. However the most complicated will be the first - Create potential matches for each user. The other difficulty is keeping track of who a user has already seen.


Before going on to those two problems though. The code for the other parts of the application should be relatively straightforward. Displaying profiles to a user and allowing them to reject or accept them is quite simple. These are CRUD calls really. I may get into them a bit more with the implementations as that can change how they work (e.g. seperate frontend and backend, server-side etc).


In terms of messaging, to keep it really simple setting a refresh on the page every say 10 minutes is enough. Again, this is slow messaging, we dont really expect people to reply quickly or sit waiting for a reply. So we dont need Websockets or anything fancy, just a page that simply calls the database for any new messages and displays them. To get into it a bit more, when the messages get pulled from the database, they get updated to say they have been viewed. Then when the page loads it will show those that havent been viewed yet. But a reload will show that they have. I dont care about this, I would rather a message shows up but isnt very clear its new, than messages get lost at all.


'The Algo'


Okay so this is 'The Algorithm' essentially. Everything pretty much depends on this part. The big question is how do we match people up? Normal apps I dont believe really do much 'matchmaking', they have the ability to just keep showing you profiles for you to swipe through. In fact I might even argue that they spend more time ensuring they show you the 'best'matches' less so that you pay money. And I put 'best' in quotes there because the best for them are the ones that get the most positive responses, not neccesarily the best match for the user.


So our 'Algorithm' maybe just needs to do a very basic matchup to beat some of these other apps. I would call out here that its very unlikely we have the perfect match for somebody on the app. Plus, what is a good match? Read any dating and relationship advice and they all have differing opinions on whats a good matchup. So I think we could attempt a very basic 'matchmake' and then just show other people. Or we could just show what is available anyway. I mean a user can have a look at the three they have been given for the day and just say no and wait until the next day. Hopefully by having such a small selection they consider them more.


Im not sure here really, I think apps can try to gauge your likes and dislikes by your swiping activity but they have the benefit of large amounts of data. They can send you 200 people and then get an idea, but again, they arent working in your best interest so they could have your perfect match somewhere there - they know exactly what you want - but theyll only tease it until you pay up.


So instead yes, we will just show them who is available. I dont think there would be any need to do location. As I say this app would be aimed at a small area launch. I also recently found out that you can get a users latitude and longitude through the Location API in browsers. So in theory you could verify that a person is signing up in the correct area of the world for the part they have selected. So location wont be part of the matchmaking.


Who saw who


The thing that will be an issue is keeping track of who a user has already seen. The only solution I can think of for this is keeping a set for each user which contains the ID of every profile they have already seen. However this would get wildly inefficient the longer a user stays on the app, plus eats into our storage requirements.


Ive done more research on this but couldnt find anything that would help, so for now I would say this is the best solution. The advantage we do have here is that because of our midday to midnight restrictions, we have the morning to do these searches. So really with this solution it is just a storage issue. Lets do some maths to work out our requirements.


Okay so, we wont ever have 9999 men and 1 woman or the opposite (more on this later). But lets just say worst case scenario we have a user thats been on the app for 333 days and there are actually 9999 users that could match with them. Then lets say that each user ID is 10 alphanumeric characters (upper and lower).


Ten characters allows us 10 to the power of 62. Which is 1e62. Or 1 followed by 62 zeroes. Im not even attempting that one. Okay lets anyway:


100000000000000000000000000000000000000000000000000000000000000


So thats way more combinations than we need. Lets reduce this to 6 alphanumeric characters, just lower case. That would be 6 to the power of 36. Which is approximatly 1.03 x (10^28). So roughly more than this:


10000000000000000000000000000


So we have a stupid amount of combinations and this is where our security is. But if we are going to store these, why not go smaller? Say 5 with alpha and uppercase. Which would be 5 to the power of 62. Which is 2.1684043e+43. That extra character saves us.


Anyway, each user has a 5 character alphanumeric ID. Lets say then that to store this will be 8 bytes, considering a line break or a comma seperator. So our unlucky user has 9999 IDs wherever we store the seen set. This gives us:


8b x 9999 = 79992b = 80Kb


So what we are saying is that if a user was to be on the app for almost a year, we would somewhere be storing an extra 80Kb of data. Plus thats also assuming that every day they have gone through each of their new matches and interacted with them.


So a better way to look at this would be maybe three months. Lets say that a user is on the app for three months average. Thats very high, I mean all things considered users will probably not be on it very long at all. But again this is all made up so lets assume maximum usage - the average user is highly interactive and has gone through their three matches per day. Over those three months they have seen 270 people. So we have:


8bytes x 270 = 2160bytes = 22Kb


So somewhere we need to store this data. The only little experiment I want to do here is how long it would take to parse this.


Who saw who - experiment


I want to just do a small simulation of searching for an ID within a 'large' set. For this I will use Python as its what I know best plus its known as 'slow' right?


So first Ill generate a file thats a bunch of 5 character random strings seperated by commas. This should add up to about 22Kb if we do 270 of these. Then we will grab that data and get a few timings of trying to find an ID within this set.


Using the python docs page for secrets and the recipes in there we have something like this for creating the set:


import string
import secrets
alph = string.ascii_letters + string.digits
seen = [ ''.join(secrets.choice(alph) for i in range(5)) for x in range(270) ]
f = open('myf', 'a')
for e in seen:
. f.write(f'{e},')
f.close()


I know its ugly or whatever but it works, we then have a file with the set in. Now to read it in and do a couple of searches. I dont really need to write it to a file. I can just use the 'seen' variable. By the way, the file with 270 different IDs in seperated by commas is 1.6Kb, so my calculations have gone wrong somewhere. Anyway:


len(seen)
>>> 270
len(set(seen))
>>> 270


Now we can run a script to time how long it takes to find an entry that wont be in there. Something like this:


now = datetime.now()
if 'nofnd' in seen:
. print('Seen it')
print(datetime.now() - now)
>>> 0:00:00.000015
>>> 0:00:00.000024
>>> 0:00:00.000015


I originally did this using time.time() but it was returning 8/9 which sounded incorrect. Just to clarify too, I did run that as a full script.


Looking at the results, it takes around 20 nano seconds to find this. Lets just increase this now to 1000 entries and see the time it takes just out of interest.


>>> 0:00:00.000032
>>> 0:00:00.000034
>>> 0:00:00.000065


So next to nothing again. Okay so a quick test shows that looking through the IDs of the potential matches a user has already seen would be super fast and even if a user was to use the app everyday for a year we wouldnt have much trouble skipping users they have already seen. (Even in Python). The only other point worth noting here is we havent considered the time it would take to read those values from say a database and parse them into a set. But we dont need to go that deep.


Moderation


Okay so we have an idea of how matching would work and how we track who somebody has seen. The only other thing I would want to mention as I have been writing this is moderation.


I am aware that most of the features of this dating app appeal more to men. I dont want to get too far into it but it has been proven that dating apps are fine for women, they usually have plenty of choice while men rarely get matches with their swipes. This makes sense, I saw a quote somewhere that said:


"Dating for men is like searching for water in a desert, dating for women is like searching for water in a swamp"


Anyway, I would say a big thing for women is safety. They'll most likely get inappropriate messages on dating apps and so need a fast way to report this. This also leads into spam too.


Basically we would need a set of tools to quickly review new profiles, reported profiles and reported messages. While these can be simple enough to setup, they still cost bandwidth and some storage. Theres also the question with spam of whether to shadow ban or just delete accounts. Deleting an account means the abuser can just create a new account. A better solution would be to make them think they are still using the app but arent actually having any effect. However, is the cost of keeping them on the app but not interacting higher than just deleting their account and allowing them to create a new one again?


We dont need to go too far into this, but I would say that we could shadow ban an account for a week and then delete it. So that atleast some time has been taken away from an abuser and this may stop them from trying again with a new profile.


So to add on the storage and bandwidth for reporting/spam, lets go with the maximum usage. Lets say we get 100 new signups a day, with around 10 messages reported a day and 3 profiles reported a day. The signups and reported profiles would need looking at so they are a full profile view. So we get:


Overall in a 30 day month this would be ~210Mb a month. Not alot, lets just say we need another Gigabyte of bandwidth a month. So actually, we probably dont need to add anything onto our estimates I wouldnt say, because even with very busy traffic we arent even using a GB of bandwidth for moderation.


Conclusion


Okay so after looking into the data and code we need for the app, we have the following requirements:


We have also established a couple of extra rules/processes for the app, so here are all of the rules we have for the app:


Next we will look at one of the infrastructure setups, the first being bare metal.


28 Oct 2022, 12:16 p.m.

Home