[Refdb-devel] [ refdb-Feature Requests-2872243 ] Get IDs fast

Discussion:

SourceForge.net

2009-10-03 14:36:37 UTC

Feature Requests item #2872243, was opened at 2009-10-03 16:36
Message generated for change (Tracker Item Submitted) made by bronger
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=385994&aid=2872243&group_id=26091

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: refdbd
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Torsten Bronger (bronger)
Assigned to: Markus Hoenicka (mhoenicka)
Summary: Get IDs fast

Initial Comment:
Currently, it takes 40ms per reference to get the ID of a found reference:

$ time refdbc -u refdb -w Sonne -d biblio -C getref -s ID -t ris ":ID:>0" > /dev/null
999:96 retrieved:0 failed

real 0m4.026s
user 0m0.000s
sys 0m0.004s

This is problematic for a web frontend because even if you work with aggressive caching, you have to know at least the IDs of found references.

Therefore, I request to optimise the ID-only request.

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=385994&aid=2872243&group_id=26091

SourceForge.net

2009-10-04 21:36:39 UTC

Permalink

Feature Requests item #2872243, was opened at 2009-10-03 16:36
Message generated for change (Comment added) made by mhoenicka
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=385994&aid=2872243&group_id=26091

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: refdbd
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Torsten Bronger (bronger)
Assigned to: Markus Hoenicka (mhoenicka)
Summary: Get IDs fast

Initial Comment:
Currently, it takes 40ms per reference to get the ID of a found reference:

$ time refdbc -u refdb -w Sonne -d biblio -C getref -s ID -t ris ":ID:>0" > /dev/null
999:96 retrieved:0 failed

real 0m4.026s
user 0m0.000s
sys 0m0.004s

This is problematic for a web frontend because even if you work with aggressive caching, you have to know at least the IDs of found references.

Therefore, I request to optimise the ID-only request.

----------------------------------------------------------------------

Comment By: Markus Hoenicka (mhoenicka)

Date: 2009-10-04 23:36

Message:
I've tried to track down where refdbd spends its time returning the ID
list. Looks like lots of time are wasted doing the client/server messaging
as refdbd, by default, returns reference data one dataset at a time. If you
return ID lists, which consist of RIS datasets with 4 lines each, the
overhead is out of proportion. Please have a look at refdbdgetref.c as of
revision 703. There is a tunable at line 2841 which is set to default
values according to the type of query a few lines further down. The idea is
to group references before sending them to the client. This requires more
memory, but reduces the overhead of client/server messaging. I've arrived
at values of 100 for ID queries and 10 for other queries empirically,
looking only at RIS data. These values certainly depend on the speed and
memory of the machine refdbd runs on. Feel free to play with these numbers
and see if it helps. If it does, I could turn this into configurable
parameters. I've managed to reduce the time for retrieving 100 IDs to
0.732s from 10.66s and the time for retrieving 100 RIS datasets to 3.96s
from 12.42s using the current defaults.

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=385994&aid=2872243&group_id=26091

SourceForge.net

2009-10-05 16:03:14 UTC

Permalink

Feature Requests item #2872243, was opened at 2009-10-03 16:36
Message generated for change (Comment added) made by bronger
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=385994&aid=2872243&group_id=26091

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: refdbd
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Torsten Bronger (bronger)
Assigned to: Markus Hoenicka (mhoenicka)
Summary: Get IDs fast

Initial Comment:
Currently, it takes 40ms per reference to get the ID of a found reference:

$ time refdbc -u refdb -w Sonne -d biblio -C getref -s ID -t ris ":ID:>0" > /dev/null
999:96 retrieved:0 failed

real 0m4.026s
user 0m0.000s
sys 0m0.004s

This is problematic for a web frontend because even if you work with aggressive caching, you have to know at least the IDs of found references.

Therefore, I request to optimise the ID-only request.

----------------------------------------------------------------------

Comment By: Torsten Bronger (bronger)

Date: 2009-10-05 18:03

Message:
No further configuration is necessary. In my test case, the time dropped
from 3.8 seconds to 0.16 seconds. Now, caching is real fun. Great!

----------------------------------------------------------------------

Comment By: Markus Hoenicka (mhoenicka)
Date: 2009-10-04 23:36

Message:
I've tried to track down where refdbd spends its time returning the ID
list. Looks like lots of time are wasted doing the client/server messaging
as refdbd, by default, returns reference data one dataset at a time. If you
return ID lists, which consist of RIS datasets with 4 lines each, the
overhead is out of proportion. Please have a look at refdbdgetref.c as of
revision 703. There is a tunable at line 2841 which is set to default
values according to the type of query a few lines further down. The idea is
to group references before sending them to the client. This requires more
memory, but reduces the overhead of client/server messaging. I've arrived
at values of 100 for ID queries and 10 for other queries empirically,
looking only at RIS data. These values certainly depend on the speed and
memory of the machine refdbd runs on. Feel free to play with these numbers
and see if it helps. If it does, I could turn this into configurable
parameters. I've managed to reduce the time for retrieving 100 IDs to
0.732s from 10.66s and the time for retrieving 100 RIS datasets to 3.96s
from 12.42s using the current defaults.

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=385994&aid=2872243&group_id=26091

SourceForge.net

2009-10-05 20:38:30 UTC

Permalink

Feature Requests item #2872243, was opened at 2009-10-03 16:36
Message generated for change (Settings changed) made by mhoenicka
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=385994&aid=2872243&group_id=26091

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: refdbd
Group: None

Status: Closed

Priority: 5
Private: No
Submitted By: Torsten Bronger (bronger)
Assigned to: Markus Hoenicka (mhoenicka)
Summary: Get IDs fast

Initial Comment:
Currently, it takes 40ms per reference to get the ID of a found reference:

$ time refdbc -u refdb -w Sonne -d biblio -C getref -s ID -t ris ":ID:>0" > /dev/null
999:96 retrieved:0 failed

real 0m4.026s
user 0m0.000s
sys 0m0.004s

This is problematic for a web frontend because even if you work with aggressive caching, you have to know at least the IDs of found references.

Therefore, I request to optimise the ID-only request.

----------------------------------------------------------------------

Comment By: Torsten Bronger (bronger)
Date: 2009-10-05 18:03

Message:
No further configuration is necessary. In my test case, the time dropped
from 3.8 seconds to 0.16 seconds. Now, caching is real fun. Great!

----------------------------------------------------------------------

Comment By: Markus Hoenicka (mhoenicka)
Date: 2009-10-04 23:36

Message:
I've tried to track down where refdbd spends its time returning the ID
list. Looks like lots of time are wasted doing the client/server messaging
as refdbd, by default, returns reference data one dataset at a time. If you
return ID lists, which consist of RIS datasets with 4 lines each, the
overhead is out of proportion. Please have a look at refdbdgetref.c as of
revision 703. There is a tunable at line 2841 which is set to default
values according to the type of query a few lines further down. The idea is
to group references before sending them to the client. This requires more
memory, but reduces the overhead of client/server messaging. I've arrived
at values of 100 for ID queries and 10 for other queries empirically,
looking only at RIS data. These values certainly depend on the speed and
memory of the machine refdbd runs on. Feel free to play with these numbers
and see if it helps. If it does, I could turn this into configurable
parameters. I've managed to reduce the time for retrieving 100 IDs to
0.732s from 10.66s and the time for retrieving 100 RIS datasets to 3.96s
from 12.42s using the current defaults.

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=385994&aid=2872243&group_id=26091