Gelbooru

Notice: We are now selling NEW ! Domestic shipping is free on all orders! Do you have an artist tag on Gelbooru? Let us know so we can properly credit you!

Ticket Information - ID: #982


ID:Category:SeverityReproducibilityDate SubmittedUpdated By:
0000982Feature RequestnormalN/A08/07/19 02:20AMchiyachan
Reporterchiyachan
Assigned to:geltas
Resolution:Resolved
View StatusPublic
Version:
Target Version:N/A
Summary:API: Tag types in Posts list
Description:Right now the API only puts the tag names in the Posts list, but not the tag types. The API should include the tag types as well.

Right now you have to make a request to /index.php?page=dapi&s=tag&q=index for every post which is much slower and probably wastes more server resources since a request has to be made individually for every post.
A quick look at some software that fetches tags from Gelbooru reveal that they are foregoing the API entirely and scraping the websites HTML to get the tag types, https://github.com/bakape/hydron is an example of some software that does that. This is a terrible solution.

The two easiest ways to add this to Gelbooru I think are as follows:

- Do what Danbooru does and have numerous tags fields. One with all tags, another with only artist tags, another with only character tags, another with only meta tags, ... etc. This option does not break compatibility with existing clients as the original tags field is kept.

- Use type:name notation in the current tags field, so "1girl mitsuba_choco ..." would become "tag:1girl artist:mitsuba_choco ...". You could omit the tag: part for general tags to reduce bandwidth usage and just assume tag names without a type are always general. This option might break compatibility with existing clients.

A possible third option would be to do what Sankaku Channel does and turn the tags field into a JSON array of objects with a field for the name and a field for the type. This would probably break existing clients too much and might require more effort to develop.

I think option 1 would be the best way to go for Gelbooru.
Additional Info:
chiyachan replied at 2019-08-07 02:27:03
Forgot to mention that option 1 would have a tags field for all the general tags as well. Though this might not be necessary if tag names on Gelbooru are unique, which I don't know if they are.

Jerl replied at 2019-08-08 20:21:39
Making you use the tag API instead of just putting that information in the post API actually wastes considerably LESS server resources. Here's why:

To start with, tags on a post are literally stored in the database as a single string column in the posts table with no tag type information included. This means that to get a list of posts from the database, we only need to query one table with a text search. This is extremely efficient. This is also the way that Danbooru does it; they separate them visually for you in the tag list and tags box when generating the page, but join them together again when you send any changes to them.

Tag type information is stored in a separate table. This means that to get tag type information, we must perform a query on that table too (and it's a much heavier query too, since it has to individually search for all of the individual tags information is requested for), which means that your two queries would only have an extremely negligible amount of additional overhead versus one query.

However, like I said, it actually costs us LESS resources to only give information from the posts table on the post API. The reason for this is because the vast majority of users using the post API don't access the tag API at all - which means that for the vast majority of API users, we only ever have the one query to the post database.

It gets better than that, though. We actually don't query the database AT ALL for the post API. We use Apache Solr, which is a standalone dedicated full-text search server with its own index, instead for almost all cases where we would otherwise query the post API. Solr is incredibly fast and uses considerably fewer resources than querying the database. This means that adding tag type information to the post table adds a *considerable* amount of overhead compared to just searching Solr. As it is, the API uses just as much server resources as the entire rest of the site, so adding tag type information to the post API wouldn't just make each individual post API request slower, but it would actually bog down the entire site and would likely require us to much more aggressively rate-limit the API.

I agree that scraping the API instead of using two API queries is a terrible solution. However, all they've accomplished by doing that was adding work for themselves and their software to parse the HTML instead of XML. They may actually be adding considerably more overhead to the site by loading every post page, since that does a separate Solr query for each individual post versus just one query for each page of an API search which gives you all of the tags for all of the posts, and it also hits the comment, note, and user tables for each one - something that the post and tag API's don't do at all. Because of this, unless they have comments in their code indicating that they're doing it to try and save our bandwidth, I suspect that they probably either don't know how to use the tag API, or they're doing it to try and get around our API rate limiting, not to try and save server resources.

Yes, tag names are unique.

Jerl replied at 2019-08-08 20:24:20
s/scraping the API/scraping post pages

lozertuser replied at 2019-08-08 20:38:38
TLDR; Use the tag and post api to get the information you need, instead of scraping HTML. It's less wasteful for us and easier to manage the output for you.

chiyachan replied at 2019-08-09 00:09:52
Thanks for the informative response.
From reading your post, it sounds like the best way to use the tags API is to gather all tags (sans duplicates) in the page and submit one query to the tags api per page to get their types, and possibly cache them for later. It sounds like this could result in a lot of names being passed as the names parameter though, is there a limit to how many names or characters can be passed? I am aware that there are limits to how long a URL can be, I'm just wondering if Gelbooru imposes any additional limits.