Notice: My personal stance on AI generated artwork. Retweet and share if you agree. Let us discuss, and not immediately scream bloody murder.

Now Viewing: "Unable to search this deep due to coding limitations." - How to get past search limit. (Read fifth post)
Keep it civil, do not flame or bait other users. If you notice anything illegal or inappropriate being discussed, contact an administrator or moderator.

jedi1357 - Group: Moderator - Total Posts: 5778
user_avatar
Posted on: 06/12/21 11:38AM

Mderms said:
Yes, but while we're on the subject and there are a few threads active about this already, can someone explain to me like I'm 5 why this limitation is even a thing? I get that its a hard coded limit to stop the servers exploding, but why would the servers explode in the first place? Is it a hardware/software limitation? Or is it a spaghetti code problem. Why is it that similar websites to this don't have this same issue? Is it something to do with how you guys started developing the site versus how someone else started developing theirs and at this point you are too far along to easily change the foundation of how the site works and brings up information?


It's the Apache Solr we use for searches.

We are not the only site that does this. pixiv for example limits searches to 1000 pages (60,000 posts) for the same exact reason.



PietroSoft - Group: Member - Total Posts: 2500
user_avatar
Posted on: 06/12/21 11:48AM

jedi1357 said:
It's the Apache Solr we use for searches.

We are not the only site that does this. pixiv for example limits searches to 1000 pages (60,000 posts) for the same exact reason.

Why not just implement the simple workaround in the back?
Make the site ALWAYS list a maximum of 477 pages. When people reach the end, put an option at the end of the page to go beyond. When you click on it it automatically adds the id:<blabla to the the search and loads 20K more posts. There is a ton of art in the back that commoners never see because they dont know the workaround or are too lazy to use it.



Jerl - Group: The Real Administrator - Total Posts: 6708
user_avatar
Posted on: 06/12/21 02:12PM

Mderms said:
Yes, but while we're on the subject and there are a few threads active about this already, can someone explain to me like I'm 5 why this limitation is even a thing? I get that its a hard coded limit to stop the servers exploding, but why would the servers explode in the first place? Is it a hardware/software limitation? Or is it a spaghetti code problem. Why is it that similar websites to this don't have this same issue? Is it something to do with how you guys started developing the site versus how someone else started developing theirs and at this point you are too far along to easily change the foundation of how the site works and brings up information?


Most websites directly query their SQL database for searching. This is essentially infinitely scalable, but can be slow since it isn't optimized for any particular task.

To dramatically increase the speed of searches, we instead use Apache Solr, which is optimized specifically for searching text within documents. This is intended more for full-text searches on entire pages of text (imagine, for example, searching within an entire database of academic papers for a given word or line of text), but given that all of a post's tags are stored in a single text field in tha database, it also works quite well for our purposes. It does this very quickly, but at the expense of additional server load. This quickly scales up with the depth within the results, just due to the way the algorithm is designed. Directly searching the database, on the other hand, scales up with the size of the entire table. It scales up slower than Solr's scaling, but the table growing larger impacts all searches, regardless of depth. Solr, on the other hand, isn't slowed down nearly as much by large tables - it only use lots of resources for deep searches. Given that the vast majority of searches are only a few pages deep at most, Solr is drastically faster on a table of 6 million posts than a direct database query - unless someone tries to go particularly deep into a search, which we've prevented.

The "id:<" trick works because it cuts out all of the results from before that ID. Since Solr's resource usage increases with the depth into the reasults, not the deptb into the database, this essentially cuts our all of the resource usage from before that point.

This could theorerically be implemented automaticaly in our backend, but it isn't as simple as just grabbing the post ID at a certain depth - the search doesn't know that until it does the search. We could change our pagination to work based on post ID's instead of search depth, but then you'd only be able to move back or forward one page at a time because it won't know the first and last ID's on other pages without searching at that depth first. Also, this would be a major change to both the backend and frontend that would essentially require the whole search system to be rebuilt from the ground up. Searches with page numbers higher than the limit could automatically be redirected to the "id:" search, but it would need to complete the search right up to the limit to get that ID and then complete the new search with the ID metatag - two searches for one page load. It could fall back to a direct database search over a certain depth, but that has its own problems, and searches past that depth would suddenly become slow, even if they impact other users' experience less.

We could put instructions on getting around the limit on the error page so people don't have to find the answer on the forums, similar to the fringe filter's instructions, but they'd still need to use the ID metatag trick.



PietroSoft - Group: Member - Total Posts: 2500
user_avatar
Posted on: 06/12/21 02:21PM

Jerl said:
-A large word soup-

Was expecting to read something like that.

My idea cant be implemented my Jerltastic dude? Limit to 477, click on something on page 477 to add the id:< to the search? too complex?



Mderms - Group: I do edits sometimes. - Total Posts: 1313
user_avatar
Posted on: 06/12/21 06:37PM

If I'm understanding that wall of text right the search won't know what the last id on the last page will be until someone manages to dig that deep, likely because that number will always be changing and the oldest images will always be getting lost to the 20k limit ether as new images relevant to that search get uploaded. And its possible having that number dynamically generated might be too much strain on the servers that deep. Idk.


This has me curious though. Is 20k right on the limit of what the servers can handle reasonably before slowing down or is there a buffer zone of sorts between that number for normal server load and meltdown?



PietroSoft - Group: Member - Total Posts: 2500
user_avatar
Posted on: 06/12/21 07:03PM

Mderms said:
From the ancient runic text I just read from what I can tell the search won't know what the last id on the last page will be until someone manages to dig that deep, likely because that number will always be changing and the oldest images will always be getting lost to the 20k limit ether as new images relevant to that search get uploaded. And its possible having that number dynamically generated might be too much strain on the servers that deep. Idk.


This makes me curious though. Is 20k right on the limit of what the servers can handle reasonably or is there a buffer zone of sorts between that number for normal server load and meltdown?

The thing with my idea is that the server dont have to guess anything and the shit doesn't happen automatically either. You just only have to put a hard limit on how many pages the user can see, always 477, (You can only see 477 pages with the actual 20K limiter, page 478 triggers the message). And program a special action when the user sees page 477 that adds the id:< trick, a simple thing that you can click at the bottom of the page. There is no hard limit on the last page now, making the last page button on the posts homepage and any tag with more than 20K posts useless.



Jerl - Group: The Real Administrator - Total Posts: 6708
user_avatar
Posted on: 06/12/21 07:25PM

Mderms said:
If I'm understanding that wall of text right the search won't know what the last id on the last page will be until someone manages to dig that deep, likely because that number will always be changing and the oldest images will always be getting lost to the 20k limit ether as new images relevant to that search get uploaded. And its possible having that number dynamically generated might be too much strain on the servers that deep. Idk.


This has me curious though. Is 20k right on the limit of what the servers can handle reasonably before slowing down or is there a buffer zone of sorts between that number for normal server load and meltdown?


There's a buffer zone so that multiple users can't do deep searches at one time and defeat it.

Without the limit, a single user trying to load the last page of some large tags could stall the server for everyone for potentially minutes.



PietroSoft - Group: Member - Total Posts: 2500
user_avatar
Posted on: 06/12/21 11:22PM

I guess my idea cant receive a yes or no. Or im just being plain ignored.

I know that im not a favorite here. I am not a shill of the mods, when they do something that i dont like i say it. I got the gardener status only because i complained a lot about recaptcha and had a stupid amount of edits. I didn't even got added to the gardeners list. But i still like to help.



Mderms - Group: I do edits sometimes. - Total Posts: 1313
user_avatar
Posted on: 06/12/21 11:26PM

It probably isn't a simple yes or no answer. It probably *could* be done I'm sure but I'm sure theres other higher priority things to be done at the moment.

Not shilling either, just saying.



PietroSoft - Group: Member - Total Posts: 2500
user_avatar
Posted on: 06/12/21 11:32PM

Mderms said:
It probably isn't a simple yes or no answer. It probably *could* be done I'm sure but I'm sure theres other higher priority things to be done at the moment.

Not shilling either, just saying.

Well.



add_replyAdd Reply


123 4 56