Updating local cache #10

Closed
opened 2026-04-09 14:53:48 +02:00 by maleszka · 6 comments
Owner

Currently, whenever all ContestNode refs are present in the cache, wbij will refuse to download any data from the server. How to identify that some nodes had been added to the remote server?

In contrast to submissions, most platforms does not provide any API calls to fetch "newly created contests/rounds/problems". So the question is rather: how can the user indicate the will to refresh some subtree?

Currently, whenever all `ContestNode` refs are present in the cache, `wbij` will refuse to download any data from the server. How to identify that some nodes had been added to the remote server? In contrast to submissions, most platforms does not provide any API calls to fetch "newly created contests/rounds/problems". So the question is rather: how can the user indicate the will to refresh some subtree?
maleszka added this to the 0.1.0 milestone 2026-04-09 17:17:14 +02:00
Author
Owner

I have the following idea. We can have a wbij new or wbij update subcommand that will just try to sensibly update all data and will report to stdout all the new stuff that it downloaded. For example, we can display a list of ten recent submissions (with new ones being highlighted) and a tree of new contestnodes (tree can be formatted the same way as std.progress?).

Every wbij new call, we will refetch all contestnodes and fetch all new submissions (thankfully, it will require very few api calls, see #14), so wbij new won't be even interactive -> user won't have to select what nodes to fetch!

But do we really need to refetch all contestnodes every time? I don't think so. We can be wise about it.

First of all, the majority of contests does not change at all once they are settled. We can mark most of them as "archived" and update only these contests that exhibit some kind of activity. Then, we can implement wbij new simply by refetching only non-archived contests. Just in case, we can provide a wbij new -a flag that refetches also archived contests.

But how to indicate that contest is archived?

Strategy 1: last submission date

We can consider archived every contest that hasn't received any submission in the past 6 months. This strategy is almost perfect, apart from the fact that:

  • fresh contests that don't have any submission yet are considered archived, but in practice, they are very likely to change
  • submissions relation must be loaded and processed to decide whether contest node is archived
  • in wbij new we would have to fetch new submissions first, then refetch nodes, then probably once again fetch new submissions (cause we can potentially obtain new submission feeds); so we need to write iterations logic (although that's not a huge problem)

Strategy 2: ContestNode stores last_modified attribute

Each ContestNode stores an additional date that is updated every time: given node is refetched, we submit to given node, we receive some submission connected to given node. If some node is modified, all its ancestor should be considered modified. So last_modified attribute will be basically a maximum over children.

It's also good, but:

  • even very old abandoned contests with no submits are likely not to be considered archived, cause cache drops can happen quite often (e.g. schema version upgrade), what will bump the last_modified date
  • we need to store additional attribute in contest_nodes relation and this attribute will likely have a lot of redundancy (due to the fact that last_modified is at least max of children)

Strategy 3: we store last_time_selected attribute for each node

Like strategy 2, but the date is just updated every time wbij selects a contest ref in subtree (whether it's using fzf or modeline or .wbij.zon, see #3) and additionally set to at least the date of the last submission in the subtree.

In practice, we can implement this fully on the front-end site and simply store a relation with all the selects that we've made in the past (timestamp, contest_ref, lang), where timestamp can serve a role of a primary key. Maybe we could use this information in wbij stats to display, e.g. the number of solutions submitted with wbij. Potentially, it could serve us as a bonus indicator when we will be implementing our own fuzzy search engine.

It should be noted that we can as well use the submission info to provide both the stats and bonuses in fuzzy search engine. Using a cache-local history of selects will have this disadvantage that it's disconnected (unrelated) with the information found on the web platform and it also does not synchronize across devices.

@kbity do you have some ideas for other strategies? Which one do you is the best?

I have the following idea. We can have a `wbij new` or `wbij update` subcommand that will just try to sensibly update all data and will report to stdout all the new stuff that it downloaded. For example, we can display a list of ten recent submissions (with new ones being highlighted) and a tree of new contestnodes (tree can be formatted the same way as std.progress?). Every `wbij new` call, we will refetch all contestnodes and fetch all new submissions (thankfully, it will require very few api calls, see #14), so `wbij new` won't be even interactive -> user won't have to select what nodes to fetch! But do we really need to refetch all contestnodes every time? I don't think so. We can be wise about it. First of all, the majority of contests does not change at all once they are settled. We can mark most of them as "archived" and update only these contests that exhibit some kind of activity. Then, we can implement `wbij new` simply by refetching only non-archived contests. Just in case, we can provide a `wbij new -a` flag that refetches also archived contests. But how to indicate that contest is archived? ## Strategy 1: last submission date We can consider _archived_ every contest that hasn't received any submission in the past 6 months. This strategy is almost perfect, apart from the fact that: - fresh contests that don't have any submission yet are considered _archived_, but in practice, they are very likely to change - submissions relation must be loaded and processed to decide whether contest node is archived - in `wbij new` we would have to fetch new submissions first, then refetch nodes, then probably once again fetch new submissions (cause we can potentially obtain new submission feeds); so we need to write iterations logic (although that's not a huge problem) ## Strategy 2: ContestNode stores last_modified attribute Each ContestNode stores an additional date that is updated every time: given node is refetched, we submit to given node, we receive some submission connected to given node. If some node is modified, all its ancestor should be considered modified. So last_modified attribute will be basically a maximum over children. It's also good, but: - even very old abandoned contests with no submits are likely not to be considered archived, cause cache drops can happen quite often (e.g. schema version upgrade), what will bump the last_modified date - we need to store additional attribute in `contest_nodes` relation and this attribute will likely have a lot of redundancy (due to the fact that last_modified is at least max of children) ## Strategy 3: we store last_time_selected attribute for each node Like strategy 2, but the date is just updated every time `wbij` selects a contest ref in subtree (whether it's using fzf or modeline or `.wbij.zon`, see #3) and additionally set to at least the date of the last submission in the subtree. In practice, we can implement this fully on the front-end site and simply store a relation with all the selects that we've made in the past `(timestamp, contest_ref, lang)`, where timestamp can serve a role of a primary key. Maybe we could use this information in `wbij stats` to display, e.g. the number of solutions submitted with `wbij`. Potentially, it could serve us as a bonus indicator when we will be implementing our own fuzzy search engine. It should be noted that we can as well use the submission info to provide both the stats and bonuses in fuzzy search engine. Using a cache-local history of selects will have this disadvantage that it's disconnected (unrelated) with the information found on the web platform and it also does not synchronize across devices. @kbity do you have some ideas for other strategies? Which one do you is the best?
Author
Owner

I think we can choose this design:

Every contest node will have a last_modified date attribute which indicates when the node was fetched and was found to be actually different from what we had already got in the cache.

Now, we will provide two flags for the user: --drop-cache which ignores all loaded relations and -u,--update [Xm,Xd,all] which runs our update routine with given parameter X (set to 3 months by default?).

The update routine will do the following:

  1. catch up on all Submissions from Feeds (see #14)
  2. collect all contest nodes that had some submission or were modified in last X months (last_modified >= today - X months)
  3. refetch these nodes and all their ancestors
  4. look for new Submission Feeds
  5. fetch submissions from new feeds

Options --drop-cache and -u,--update will be global options and wbij commands will not perform update by default. We can introduce later wbij new [Xm,Xd] command that display all contest nodes that were modified in some last period of time. In the future, we will have wbij that simply displays some kind of dashboard with new nodes and nodes with close deadlines. @kbity sounds good?

I think we can choose this design: Every contest node will have a `last_modified` date attribute which indicates when the node was fetched and was found to be actually _different_ from what we had already got in the cache. Now, we will provide two flags for the user: `--drop-cache` which ignores all loaded relations and `-u,--update [Xm,Xd,all]` which runs our _update_ routine with given parameter X (set to 3 months by default?). The _update_ routine will do the following: 1. catch up on all Submissions from Feeds (see #14) 2. collect all contest nodes that had some submission or were modified in last X months (`last_modified >= today - X months`) 3. refetch these nodes and all their ancestors 4. look for new Submission Feeds 5. fetch submissions from new feeds Options `--drop-cache` and `-u,--update` will be global options and `wbij` commands will not perform update by default. We can introduce later `wbij new [Xm,Xd]` command that display all contest nodes that were modified in some last period of time. In the future, we will have `wbij` that simply displays some kind of dashboard with new nodes and nodes with close deadlines. @kbity sounds good?
Collaborator
  1. collect all contest nodes that had some submission or were modified in last X months (last_modified >= today - X months)

So, if I understand correctly, by tweaking X we change the threshold for considering a node to be archived? If so, then it sounds pretty reasonable 👍

> 2. collect all contest nodes that had some submission or were modified in last X months (`last_modified >= today - X months`) So, if I understand correctly, by tweaking *X* we change the threshold for considering a node to be archived? If so, then it sounds pretty reasonable :+1:
Author
Owner

I mean, I would depart from this idea of marking contests as "archived". Instead, it would work this way that although wbij usually uses cached data, you can add --update Xm flag and wbij will download new data (but only for contests that changed in any way in last X months)

I mean, I would depart from this idea of marking contests as "archived". Instead, it would work this way that although wbij usually uses cached data, you can add `--update Xm` flag and wbij will download new data (but only for contests that changed in any way in last X months)
Author
Owner

I already introduced the last_modified attribute, cause I'm pretty sure about this one maleszka/wbij@8c2377139f.

I wanted to introduce this change early enough that we will actually get some reasonable data to work with by the time we get here.

I already introduced the `last_modified` attribute, cause I'm pretty sure about this one https://repos.adamm.rocks/maleszka/wbij/commit/8c2377139f206737397813a8a3666c54a85be1c7. I wanted to introduce this change early enough that we will actually get some reasonable data to work with by the time we get here.
Author
Owner

Implemented in #19

Implemented in #19
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
maleszka/wbij#10
No description provided.