Discussion:
[Refdb-devel] Storage locations again
Markus Hoenicka
2007-01-20 22:23:11 UTC
Permalink
[ I've cc'ed and reply-to'ed the refdb devel list. A few others might
be interested too ]
Hi Markus,
I'm thinking about how to handle the AV field, and especially what to do
about digital and physical locations for the same copy. Related to this
is call numbers.
Currently, in your implementation of RIS an object can't have both a
physical and a virtual AV, right? Do you think I should require users to
specify which they mean? I.e. check a box to say "computer file" vs.
"physical copy"? Or we could move one of the fields off to a U or M.
As usual, the RIS spec is not very clear about this, but I wonder
whether we should use the L1/L2 fields for storing paths to electronic
copies. Each dataset can have an unlimited number of L1 and L2 entries
so this would not collide with an URL to the original location of a
PDF. The only difference would be a local path, something like
file:///path/to/file.pdf, instead of an URL. We'd have to figure out
something to keep the pdfroot mechanism alive though. refdbc, or a web
frontend would then have to check both the AV and the L1/L2 fields for
matching entries if the RP field claims the item is in file.
Same thing with Call numbers, which are covered anywhere in RIS.
Refworks actually uses ER - to store them!?! We discussed this earlier
and for my personal use I was just going to choose one of the U or M
fields, but a PHP form is going to enforce the choice on whoever uses
it. Any suggestions for where that ought to go?
If at all, we should settle on one of the U fields. Frankly, I don't
have a better suggestion right now if we want to be able to export all
data to RIS. We'll have to decide whether this is indeed an important
goal, or if we should move ahead and officially support more data than
RIS can store.
Given your own work on a better storage and transport language, I think
we should be looking for solutions to these that are easily undone so
data entered into RIS using this form can be converted easily to
whatever comes next. I wouldn't mind being on the same page with you on
this.
I have no idea what your data entry code looks like right now. I
assume you have some internal representation of the data that you
receive as input, and you map that representation to either RIS or
risx. I guess that only that mapping is going to change if RefDB moves
to a richer storage model.

regards,
Markus
--
Markus Hoenicka
***@cats.de
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de
Daniel O'Donnell
2007-01-21 18:35:08 UTC
Permalink
Post by Markus Hoenicka
[ I've cc'ed and reply-to'ed the refdb devel list. A few others might
be interested too ]
Hi Markus,
I'm thinking about how to handle the AV field, and especially what to do
about digital and physical locations for the same copy. Related to this
is call numbers.
Currently, in your implementation of RIS an object can't have both a
physical and a virtual AV, right? Do you think I should require users to
specify which they mean? I.e. check a box to say "computer file" vs.
"physical copy"? Or we could move one of the fields off to a U or M.
As usual, the RIS spec is not very clear about this, but I wonder
whether we should use the L1/L2 fields for storing paths to electronic
copies. Each dataset can have an unlimited number of L1 and L2 entries
so this would not collide with an URL to the original location of a
PDF. The only difference would be a local path, something like
file:///path/to/file.pdf, instead of an URL. We'd have to figure out
something to keep the pdfroot mechanism alive though. refdbc, or a web
frontend would then have to check both the AV and the L1/L2 fields for
matching entries if the RP field claims the item is in file.
Looking of the Reference Manager Spec, I'd say it looks like the L
series is what we want, though I'm not sure we need their level of
granularity (i.e. distinguishing between specifically PDF files vs.
specifically full text bu not PDF vs. images). That would leave AV
solely for physical location, though as you say refdb itself would need
for reasons of backward compatability to look for PATH on the AV field.
Post by Markus Hoenicka
Same thing with Call numbers, which are covered anywhere in RIS.
Refworks actually uses ER - to store them!?! We discussed this earlier
and for my personal use I was just going to choose one of the U or M
fields, but a PHP form is going to enforce the choice on whoever uses
it. Any suggestions for where that ought to go?
If at all, we should settle on one of the U fields. Frankly, I don't
have a better suggestion right now if we want to be able to export all
data to RIS. We'll have to decide whether this is indeed an important
goal, or if we should move ahead and officially support more data than
RIS can store.
It's crucial for me--my library is organised by call numbers--and
probably a number of humanists. I've always recorded call number data
(sometimes multiple CNs) for books I use a lot, even before I began
LoCing my home library. What if I took the last U--U5--on the assumption
that would be least likely to conflict with any existing data?
Post by Markus Hoenicka
Given your own work on a better storage and transport language, I think
we should be looking for solutions to these that are easily undone so
data entered into RIS using this form can be converted easily to
whatever comes next. I wouldn't mind being on the same page with you on
this.
I have no idea what your data entry code looks like right now. I
assume you have some internal representation of the data that you
receive as input, and you map that representation to either RIS or
risx. I guess that only that mapping is going to change if RefDB moves
to a richer storage model.
Basically the PHP entry form is using fields corresponding to the refdb
use of RIS: so the value a user places on article title is assigned to
T1 if the type is an article. The output of the form is an RIS dataset
that is fed directly to refdb of addition to the database. The only
internal datamodelling at the moment involves elements for which fields
can repeat (e.g. like authors); in the PHP form you enter them all in
one line separated by colons.

It seems to me best to do as little internal modelling in the PHP as
possible: the goal is to make a web-based front end for refdb.
Post by Markus Hoenicka
regards,
Markus
--
Daniel Paul O'Donnell, PhD
Director, Digital Medievalist Project http://www.digitalmedievalist.org/
Associate Professor and Chair, Department of English
University of Lethbridge
Lethbridge AB T1K 3M4
Canada
Vox: +1 403 329-2378
Fax: +1 403 382-7191
Markus Hoenicka
2007-01-22 10:28:10 UTC
Permalink
Post by Daniel O'Donnell
Looking of the Reference Manager Spec, I'd say it looks like the L
series is what we want, though I'm not sure we need their level of
granularity (i.e. distinguishing between specifically PDF files vs.
specifically full text bu not PDF vs. images). That would leave AV
solely for physical location, though as you say refdb itself would need
for reasons of backward compatability to look for PATH on the AV field.
I agree. The PATH: kludge was actually introduced before the L1-L4 fields were
available in RIS, so it is about time to abandon it. The PHP form should allow
to enter either a full path (file:///path/to/pdf.pdf) or a relative path
(path/to/pdf.pdf), as the latter would allow to use the pdfroot setting with
all its advantages (move your PDF repository without breaking your database,
access your repository from a remote computer via NFS and so on).
Post by Daniel O'Donnell
It's crucial for me--my library is organised by call numbers--and
probably a number of humanists. I've always recorded call number data
(sometimes multiple CNs) for books I use a lot, even before I began
LoCing my home library. What if I took the last U--U5--on the assumption
that would be least likely to conflict with any existing data?
Come to think of it, wouldn't the AV field be a natural match for call numbers
once we've moved the links to PDFs out of the way? Or do you need to store a
call number *and* a physical location?

regards,
Markus
--
Markus Hoenicka
***@cats.de
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de
Daniel O'Donnell
2007-01-22 12:24:16 UTC
Permalink
Post by Markus Hoenicka
Post by Daniel O'Donnell
Looking of the Reference Manager Spec, I'd say it looks like the L
series is what we want, though I'm not sure we need their level of
granularity (i.e. distinguishing between specifically PDF files vs.
specifically full text bu not PDF vs. images). That would leave AV
solely for physical location, though as you say refdb itself would need
for reasons of backward compatability to look for PATH on the AV field.
I agree. The PATH: kludge was actually introduced before the L1-L4 fields were
available in RIS, so it is about time to abandon it. The PHP form should allow
to enter either a full path (file:///path/to/pdf.pdf) or a relative path
(path/to/pdf.pdf), as the latter would allow to use the pdfroot setting with
all its advantages (move your PDF repository without breaking your database,
access your repository from a remote computer via NFS and so on).
Post by Daniel O'Donnell
It's crucial for me--my library is organised by call numbers--and
probably a number of humanists. I've always recorded call number data
(sometimes multiple CNs) for books I use a lot, even before I began
LoCing my home library. What if I took the last U--U5--on the assumption
that would be least likely to conflict with any existing data?
Come to think of it, wouldn't the AV field be a natural match for call numbers
once we've moved the links to PDFs out of the way? Or do you need to store a
call number *and* a physical location?
I can see both: a call number and information about reading room, home
or office, etc. Also what about a chapter from a collection: the book
has a call number (so I can get it again to see the other chapters), and
the chapter I photocopied has a physical location. If we can have
multiple AVs, I suppose it is not a great problem. Hard to disentangle,
though if we decide to have a CN field later.

Lying awake thinking about this (!), I began to wonder is M was not
better than U: strictly speaking we are talking about agreeing on a
miscellaneous field, not adding a user-defined one.
Post by Markus Hoenicka
regards,
Markus
--
Daniel Paul O'Donnell, PhD
Department Chair and Associate Professor of English
Director, Digital Medievalist Project http://www.digitalmedievalist.org/
Chair, Text Encoding Initiative http://www.tei-c.org/

Department of English
University of Lethbridge
Lethbridge AB T1K 3M4
Vox +1 403 329-2377
Fax +1 403 382-7191
Email: ***@uleth.ca
WWW: http://people.uleth.ca/~daniel.odonnell/
Markus Hoenicka
2007-01-22 13:19:58 UTC
Permalink
Post by Daniel O'Donnell
I can see both: a call number and information about reading room, home
or office, etc. Also what about a chapter from a collection: the book
has a call number (so I can get it again to see the other chapters), and
the chapter I photocopied has a physical location. If we can have
multiple AVs, I suppose it is not a great problem. Hard to disentangle,
though if we decide to have a CN field later.
I see. Multiple AVs or a new CN field (or do you need more than one?) are of
course doable in the database. We just have to figure out ways to export these
data to RIS. We could concatenate the values with a separator, as the AV field
is of unlimited size as per the spec. risx could be amended to allow several AV
fields (in fact, it already supports one AV per user).
Post by Daniel O'Donnell
Lying awake thinking about this (!), I began to wonder is M was not
better than U: strictly speaking we are talking about agreeing on a
miscellaneous field, not adding a user-defined one.
The thing is, there is no M field which is unused in all reference types. You'll
always face conflicts with particular reference types no matter which M field
you pick. The M fields are a leftover of the record-based schema that the
original incarnations of Reference Manager apparently used, as I've discussed
here:

http://www.mhoenicka.de/system-cgi/blog/index.php?itemid=515

regards,
Markus
--
Markus Hoenicka
***@cats.de
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de
Dan O'Donnell
2007-01-22 19:49:45 UTC
Permalink
Post by Markus Hoenicka
Post by Daniel O'Donnell
I can see both: a call number and information about reading room, home
or office, etc. Also what about a chapter from a collection: the book
has a call number (so I can get it again to see the other chapters), and
the chapter I photocopied has a physical location. If we can have
multiple AVs, I suppose it is not a great problem. Hard to disentangle,
though if we decide to have a CN field later.
I see. Multiple AVs or a new CN field (or do you need more than one?) are of
course doable in the database. We just have to figure out ways to export these
data to RIS. We could concatenate the values with a separator, as the AV field
is of unlimited size as per the spec. risx could be amended to allow several AV
fields (in fact, it already supports one AV per user).
Post by Daniel O'Donnell
Lying awake thinking about this (!), I began to wonder is M was not
better than U: strictly speaking we are talking about agreeing on a
miscellaneous field, not adding a user-defined one.
The thing is, there is no M field which is unused in all reference types. You'll
always face conflicts with particular reference types no matter which M field
you pick. The M fields are a leftover of the record-based schema that the
original incarnations of Reference Manager apparently used, as I've discussed
http://www.mhoenicka.de/system-cgi/blog/index.php?itemid=515
Damn. I hadn't realised. What about using a high U for right now then?
Or a separate CN which is converted to RIS AV? I'd say put them all on
AV (i.e. locations like "Shelf in office" and "PR 1119 .A2 v304" except
that I can see reasons for wanting to be able to distinguish between
them: in printing out library cards of spine stickers (both of which I
use) for example: I would want to be able to suppress non-call number
AVs if both existed.

I suppose we could do some kind of ordering thing as they do with date:
AV - Call Number/Physical Location

-dan
Post by Markus Hoenicka
regards,
Markus
--
Daniel Paul O'Donnell, PhD
Chair, Text Encoding Initiative <http://www.tei-c.org/>
Director, Digital Medievalist Project <http://www.digitalmedievalist.org/>
Associate Professor and Chair of English
University of Lethbridge
Lethbridge AB T1K 3M4
Vox: +1 403 329 2378
Fax: +1 403 382-7191
Homepage: http://people.uleth.ca/~daniel.odonnell/
Loading...