Skip to content

Commit 16c4086

Browse files
committed
Add support for directional language-tagged strings from RDF 1.2.
1 parent 13164ed commit 16c4086

10 files changed

Lines changed: 205 additions & 63 deletions

File tree

README.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,9 @@ the 1.1 release of RDF.rb:
102102

103103
Notably, {RDF::Queryable#query} and {RDF::Query#execute} are now completely symmetric; this allows an implementation of {RDF::Queryable} to optimize queries using implementation-specific logic, allowing for substantial performance improvements when executing BGP queries.
104104

105+
## Differences between RDF 1.1 and RDF 1.2
106+
* {RDF::Literal} has an optional `direction` property for directional language-tagged strings.
107+
105108
## Tutorials
106109

107110
* [Getting data from the Semantic Web using Ruby and RDF.rb](https://semanticweb.org/wiki/Getting_data_from_the_Semantic_Web_%28Ruby%29)
@@ -400,6 +403,7 @@ from BNode identity (i.e., they each entail the other)
400403

401404
* [Ruby](https://ruby-lang.org/) (>= 2.6)
402405
* [LinkHeader][] (>= 0.0.8)
406+
* [bcp47_spec][] ( ~> 0.2)
403407
* Soft dependency on [RestClient][] (>= 2.1)
404408

405409
## Installation
@@ -481,8 +485,10 @@ This is free and unencumbered public domain software. For more information,
481485
see <https://unlicense.org/> or the accompanying {file:UNLICENSE} file.
482486

483487
[RDF]: https://www.w3.org/RDF/
484-
[N-Triples]: https://www.w3.org/TR/n-triples/
485-
[N-Quads]: https://www.w3.org/TR/n-quads/
488+
[LinkHeader]: https://github.com/asplake/link_header
489+
[bcp47_spec]: https://github.com/dadah89/bcp47_spec
490+
[N-Triples]: https://www.w3.org/TR/rdf-n-triples/
491+
[N-Quads]: https://www.w3.org/TR/rdf-n-quads/
486492
[YARD]: https://yardoc.org/
487493
[YARD-GS]: https://rubydoc.info/docs/yard/file/docs/GettingStarted.md
488494
[PDD]: https://unlicense.org/#unlicensing-contributions
@@ -496,6 +502,7 @@ see <https://unlicense.org/> or the accompanying {file:UNLICENSE} file.
496502
[SPARQL doc]: https://ruby-rdf.github.io/sparql
497503
[RDF 1.0]: https://www.w3.org/TR/2004/REC-rdf-concepts-20040210/
498504
[RDF 1.1]: https://www.w3.org/TR/rdf11-concepts/
505+
[RDF 1.1]: https://www.w3.org/TR/rdf12-concepts/
499506
[SPARQL 1.1]: https://www.w3.org/TR/sparql11-query/
500507
[RDF.rb]: https://ruby-rdf.github.io/
501508
[RDF::DO]: https://ruby-rdf.github.io/rdf-do

etc/n-triples.ebnf

Lines changed: 40 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,40 @@
1-
[1] ntriplesDoc ::= triple? (EOL triple)* EOL?
2-
[2] triple ::= subject predicate object '.'
3-
[3] subject ::= IRIREF | BLANK_NODE_LABEL
4-
[4] predicate ::= IRIREF
5-
[5] object ::= IRIREF | BLANK_NODE_LABEL | literal
6-
[6] literal ::= STRING_LITERAL_QUOTE ('^^' IRIREF | LANGTAG)?
1+
ntriplesDoc ::= triple? (EOL triple)* EOL?
2+
triple ::= subject predicate object '.'
3+
subject ::= IRIREF | BLANK_NODE_LABEL | quotedTriple
4+
predicate ::= IRIREF
5+
object ::= IRIREF | BLANK_NODE_LABEL | literal | quotedTriple
6+
literal ::= STRING_LITERAL_QUOTE ('^^' IRIREF | LANGTAG )?
7+
quotedTriple ::= '<<' subject predicate object '>>'
8+
9+
@terminals
10+
11+
IRIREF ::= '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>'
12+
BLANK_NODE_LABEL ::= '_:' ( PN_CHARS_U | [0-9] ) ((PN_CHARS|'.')* PN_CHARS)?
13+
LANGTAG ::= "@" [a-zA-Z]+ ( "-" [a-zA-Z0-9]+ )* ('--' ('ltr'|'rtl'))?`
14+
STRING_LITERAL_QUOTE ::= '"' ( [^#x22#x5C#xA#xD] | ECHAR | UCHAR )* '"'
15+
UCHAR ::= ( "\u" HEX HEX HEX HEX )
16+
| ( "\U" HEX HEX HEX HEX HEX HEX HEX HEX )
17+
ECHAR ::= ("\" [tbnrf"'])
18+
PN_CHARS_BASE ::= ([A-Z]
19+
| [a-z]
20+
| [#x00C0-#x00D6]
21+
| [#x00D8-#x00F6]
22+
| [#x00F8-#x02FF]
23+
| [#x0370-#x037D]
24+
| [#x037F-#x1FFF]
25+
| [#x200C-#x200D]
26+
| [#x2070-#x218F]
27+
| [#x2C00-#x2FEF]
28+
| [#x3001-#xD7FF]
29+
| [#xF900-#xFDCF]
30+
| [#xFDF0-#xFFFD]
31+
| [#x10000-#xEFFFF])
32+
PN_CHARS_U ::= PN_CHARS_BASE | '_'
33+
PN_CHARS ::= (PN_CHARS_U
34+
| "-"
35+
| [0-9]
36+
| #x00B7
37+
| [#x0300-#x036F]
38+
| [#x203F-#x2040])
39+
HEX ::= ([0-9] | [A-F] | [a-f])
40+
EOL ::= [#xD#xA]+

lib/rdf/model/literal.rb

Lines changed: 83 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
# -*- encoding: utf-8 -*-
2+
3+
require 'bcp47_spec'
4+
25
module RDF
36
##
47
# An RDF literal.
@@ -9,7 +12,9 @@ module RDF
912
#
1013
# Specific typed literals may have behavior different from the default implementation. See the following defined sub-classes for specific documentation. Additional sub-classes may be defined, and will interoperate by defining `DATATYPE` and `GRAMMAR` constants, in addition other required overrides of RDF::Literal behavior.
1114
#
12-
# In RDF 1.1, all literals are typed, including plain literals and language tagged literals. Internally, plain literals are given the `xsd:string` datatype and language tagged literals are given the `rdf:langString` datatype. Creating a plain literal, without a datatype or language, will automatically provide the `xsd:string` datatype; similar for language tagged literals. Note that most serialization formats will remove this datatype. Code which depends on a literal having the `xsd:string` datatype being different from a plain literal (formally, without a datatype) may break. However note that the `#has\_datatype?` will continue to return `false` for plain or language-tagged literals.
15+
# In RDF 1.1, all literals are typed, including plain literals and language-tagged strings. Internally, plain literals are given the `xsd:string` datatype and language-tagged strings are given the `rdf:langString` datatype. Creating a plain literal, without a datatype or language, will automatically provide the `xsd:string` datatype; similar for language-tagged strings. Note that most serialization formats will remove this datatype. Code which depends on a literal having the `xsd:string` datatype being different from a plain literal (formally, without a datatype) may break. However note that the `#has\_datatype?` will continue to return `false` for plain or language-tagged strings.
16+
#
17+
# RDF 1.2 adds **directional language-tagged strings** which are effectively a subclass of **language-tagged strings** contining an additional **direction** component with value either **ltr** or **rtl** for Left-to-Right or Right-to-Left. This determines the general direction of a string when presented in n a user agent, where it might be in conflict with the inherent direction of the leading Unicode code points. Directional language-tagged strings are given the `rdf:langString` datatype.
1318
#
1419
# * {RDF::Literal::Boolean}
1520
# * {RDF::Literal::Date}
@@ -23,16 +28,23 @@ module RDF
2328
# value = RDF::Literal.new("Hello, world!")
2429
# value.plain? #=> true`
2530
#
26-
# @example Creating a language-tagged literal (1)
31+
# @example Creating a language-tagged string (1)
2732
# value = RDF::Literal.new("Hello!", language: :en)
2833
# value.language? #=> true
2934
# value.language #=> :en
3035
#
31-
# @example Creating a language-tagged literal (2)
36+
# @example Creating a language-tagged string (2)
3237
# RDF::Literal.new("Wazup?", language: :"en-US")
3338
# RDF::Literal.new("Hej!", language: :sv)
3439
# RDF::Literal.new("¡Hola!", language: :es)
3540
#
41+
# @example Creating a directional language-tagged string
42+
# value = RDF::Literal.new("Hello!", language: :en, direction: :ltr)
43+
# value.language? #=> true
44+
# value.language #=> :en
45+
# value.direction? #=> true
46+
# value.direction #=> :ltr
47+
#
3648
# @example Creating an explicitly datatyped literal
3749
# value = RDF::Literal.new("2009-12-31", datatype: RDF::XSD.date)
3850
# value.datatype? #=> true
@@ -105,8 +117,14 @@ def self.datatyped_class(uri)
105117

106118
##
107119
# @private
108-
def self.new(value, language: nil, datatype: nil, lexical: nil, validate: false, canonicalize: false, **options)
109-
raise ArgumentError, "datatype with language must be rdf:langString" if language && (datatype || RDF.langString).to_s != RDF.langString.to_s
120+
def self.new(value, language: nil, datatype: nil, direction: nil, lexical: nil, validate: false, canonicalize: false, **options)
121+
if language && direction
122+
raise ArgumentError, "datatype with language and direction must be rdf:dirLangString" if (datatype || RDF.dirLangString).to_s != RDF.dirLangString.to_s
123+
elsif language
124+
raise ArgumentError, "datatype with language must be rdf:langString" if (datatype || RDF.langString).to_s != RDF.langString.to_s
125+
else
126+
raise ArgumentError, "datatype not compatible with language or direction" if language || direction
127+
end
110128

111129
klass = case
112130
when !self.equal?(RDF::Literal)
@@ -128,7 +146,7 @@ def self.new(value, language: nil, datatype: nil, lexical: nil, validate: false,
128146
end
129147
end
130148
literal = klass.allocate
131-
literal.send(:initialize, value, language: language, datatype: datatype, **options)
149+
literal.send(:initialize, value, language: language, datatype: datatype, direction: direction, **options)
132150
literal.validate! if validate
133151
literal.canonicalize! if canonicalize
134152
literal
@@ -137,18 +155,24 @@ def self.new(value, language: nil, datatype: nil, lexical: nil, validate: false,
137155
TRUE = RDF::Literal.new(true)
138156
FALSE = RDF::Literal.new(false)
139157
ZERO = RDF::Literal.new(0)
158+
XSD_STRING = RDF::URI("http://www.w3.org/2001/XMLSchema#string")
140159

141-
# @return [Symbol] The language tag (optional).
160+
# @return [Symbol] The language-tag (optional). Implies `datatype` is `rdf:langString`.
142161
attr_accessor :language
143162

163+
# @return [Symbol] The base direction (optional). Implies `datatype` is `rdf:dirLangString`.
164+
attr_accessor :direction
165+
144166
# @return [URI] The XML Schema datatype URI (optional).
145167
attr_accessor :datatype
146168

147169
##
148-
# Literals without a datatype are given either xsd:string or rdf:langString
149-
# depending on if there is language
170+
# Literals without a datatype are given either `xsd:string`, `rdf:langString`, or `rdf:dirLangString`,
171+
# depending on if there is `language` and/or `direction`.
150172
#
151173
# @param [Object] value
174+
# @param [Symbol] direction (nil)
175+
# Initial text direction.
152176
# @param [Symbol] language (nil)
153177
# Language is downcased to ensure proper matching
154178
# @param [String] lexical (nil)
@@ -163,16 +187,24 @@ def self.new(value, language: nil, datatype: nil, lexical: nil, validate: false,
163187
# @see http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal
164188
# @see http://www.w3.org/TR/rdf11-concepts/#section-Datatypes
165189
# @see #to_s
166-
def initialize(value, language: nil, datatype: nil, lexical: nil, validate: false, canonicalize: false, **options)
190+
def initialize(value, language: nil, datatype: nil, direction: nil, lexical: nil, validate: false, canonicalize: false, **options)
167191
@object = value.freeze
168192
@string = lexical if lexical
169193
@string = value if !defined?(@string) && value.is_a?(String)
170194
@string = @string.encode(Encoding::UTF_8).freeze if instance_variable_defined?(:@string)
171195
@object = @string if instance_variable_defined?(:@string) && @object.is_a?(String)
172196
@language = language.to_s.downcase.to_sym if language
197+
@direction = direction.to_s.downcase.to_sym if direction
173198
@datatype = RDF::URI(datatype).freeze if datatype
174199
@datatype ||= self.class.const_get(:DATATYPE) if self.class.const_defined?(:DATATYPE)
175-
@datatype ||= instance_variable_defined?(:@language) && @language ? RDF.langString : RDF::URI("http://www.w3.org/2001/XMLSchema#string")
200+
@datatype ||= if instance_variable_defined?(:@language) && @language &&
201+
instance_variable_defined?(:@direction) && @direction
202+
RDF.dirLangString
203+
elsif instance_variable_defined?(:@language) && @language
204+
RDF.langString
205+
else
206+
XSD_STRING
207+
end
176208
end
177209

178210
##
@@ -202,8 +234,8 @@ def literal?
202234
#
203235
# Compatibility of two arguments is defined as:
204236
# * The arguments are simple literals or literals typed as xsd:string
205-
# * The arguments are plain literals with identical language tags
206-
# * The first argument is a plain literal with language tag and the second argument is a simple literal or literal typed as xsd:string
237+
# * The arguments are plain literals with identical language-tags and directions
238+
# * The first argument is a plain literal with language-tag and the second argument is a simple literal or literal typed as xsd:string
207239
#
208240
# @example
209241
# compatible?("abc" "b") #=> true
@@ -224,19 +256,19 @@ def compatible?(other)
224256
return false unless other.literal? && plain? && other.plain?
225257

226258
# * The arguments are simple literals or literals typed as xsd:string
227-
# * The arguments are plain literals with identical language tags
228-
# * The first argument is a plain literal with language tag and the second argument is a simple literal or literal typed as xsd:string
229-
language? ?
230-
(language == other.language || other.datatype == RDF::URI("http://www.w3.org/2001/XMLSchema#string")) :
231-
other.datatype == RDF::URI("http://www.w3.org/2001/XMLSchema#string")
259+
# * The arguments are plain literals with identical language-tags
260+
# * The first argument is a plain literal with language-tag and the second argument is a simple literal or literal typed as xsd:string
261+
language? || direction? ?
262+
(language == other.language && direction == other.direction || other.datatype == XSD_STRING) :
263+
other.datatype == XSD_STRING
232264
end
233265

234266
##
235267
# Returns a hash code for this literal.
236268
#
237269
# @return [Integer]
238270
def hash
239-
@hash ||= [to_s, datatype, language].hash
271+
@hash ||= [to_s, datatype, language, direction].compact.hash
240272
end
241273

242274

@@ -270,6 +302,7 @@ def eql?(other)
270302
self.value_hash == other.value_hash &&
271303
self.value.eql?(other.value) &&
272304
self.language.to_s.eql?(other.language.to_s) &&
305+
self.direction.to_s.eql?(other.direction.to_s) &&
273306
self.datatype.eql?(other.datatype))
274307
end
275308

@@ -290,7 +323,10 @@ def ==(other)
290323
case
291324
when self.eql?(other)
292325
true
293-
when self.language? && self.language.to_s == other.language.to_s
326+
when self.direction? && self.direction == other.direction
327+
# Literals with directions can compare if languages and directions are identical
328+
self.value_hash == other.value_hash && self.value == other.value
329+
when self.language? && self.language == other.language
294330
# Literals with languages can compare if languages are identical
295331
self.value_hash == other.value_hash && self.value == other.value
296332
when self.simple? && other.simple?
@@ -342,14 +378,18 @@ def <=>(other)
342378

343379
##
344380
# Returns `true` if this is a plain literal. A plain literal
345-
# may have a language, but may not have a datatype. For
381+
# may have a language and direction, but may not have a datatype. For
346382
# all practical purposes, this includes xsd:string literals
347383
# too.
348384
#
349385
# @return [Boolean] `true` or `false`
350386
# @see http://www.w3.org/TR/rdf-concepts/#dfn-plain-literal
351387
def plain?
352-
[RDF.langString, RDF::URI("http://www.w3.org/2001/XMLSchema#string")].include?(datatype)
388+
[
389+
RDF.langString,
390+
RDF.dirLangString,
391+
XSD_STRING
392+
].include?(datatype)
353393
end
354394

355395
##
@@ -359,19 +399,28 @@ def plain?
359399
# @return [Boolean] `true` or `false`
360400
# @see http://www.w3.org/TR/sparql11-query/#simple_literal
361401
def simple?
362-
datatype == RDF::URI("http://www.w3.org/2001/XMLSchema#string")
402+
datatype == XSD_STRING
363403
end
364404

365405
##
366-
# Returns `true` if this is a language-tagged literal.
406+
# Returns `true` if this is a language-tagged string.
367407
#
368408
# @return [Boolean] `true` or `false`
369-
# @see http://www.w3.org/TR/rdf-concepts/#dfn-plain-literal
409+
# @see https://www.w3.org/TR/rdf-concepts/#dfn-language-tagged-string
370410
def language?
371-
datatype == RDF.langString
411+
[RDF.langString, RDF.dirLangString].include?(datatype)
372412
end
373413
alias_method :has_language?, :language?
374414

415+
##
416+
# Returns `true` if this is a directional language-tagged string.
417+
#
418+
# @return [Boolean] `true` or `false`
419+
# @see https://www.w3.org/TR/rdf-concepts/#dfn-dir-lang-string
420+
def direction?
421+
datatype == RDF.dirLangString
422+
end
423+
375424
##
376425
# Returns `true` if this is a datatyped literal.
377426
#
@@ -380,7 +429,7 @@ def language?
380429
# @return [Boolean] `true` or `false`
381430
# @see http://www.w3.org/TR/rdf-concepts/#dfn-typed-literal
382431
def datatype?
383-
!plain? && !language?
432+
!plain? && !language? && !direction?
384433
end
385434
alias_method :has_datatype?, :datatype?
386435
alias_method :typed?, :datatype?
@@ -393,10 +442,13 @@ def datatype?
393442
# @return [Boolean] `true` or `false`
394443
# @since 0.2.1
395444
def valid?
396-
return false if language? && language.to_s !~ /^[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*$/
445+
BCP47.parse(language.to_s) if language?
446+
return false if direction? && !%i{ltr rtl}.include?(direction)
397447
return false if datatype? && datatype.invalid?
398448
grammar = self.class.const_get(:GRAMMAR) rescue nil
399449
grammar.nil? || value.match?(grammar)
450+
rescue BCP47::InvalidLanguageTag
451+
false
400452
end
401453

402454
##
@@ -536,20 +588,20 @@ def inspect
536588

537589
##
538590
# @overload #to_str
539-
# This method is implemented when the datatype is `xsd:string` or `rdf:langString`
591+
# This method is implemented when the datatype is `xsd:string`, `rdf:langString`, or `rdf:dirLangString`
540592
# @return [String]
541593
def method_missing(name, *args)
542594
case name
543595
when :to_str
544-
return to_s if @datatype == RDF.langString || @datatype == RDF::URI("http://www.w3.org/2001/XMLSchema#string")
596+
return to_s if [RDF.langString, RDF.dirLangString, XSD_STRING].include?(@datatype)
545597
end
546598
super
547599
end
548600

549601
def respond_to_missing?(name, include_private = false)
550602
case name
551603
when :to_str
552-
return true if @datatype == RDF.langString || @datatype == RDF::URI("http://www.w3.org/2001/XMLSchema#string")
604+
return true if [RDF.langString, RDF.dirLangString, XSD_STRING].include?(@datatype)
553605
end
554606
super
555607
end

lib/rdf/ntriples.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ module RDF
1515
#
1616
# <https://rubygems.org/gems/rdf> <http://purl.org/dc/terms/title> "rdf" .
1717
#
18-
# ## RDFStar (RDF*)
18+
# ## Quoted Triples
1919
#
2020
# Supports statements as resources using `<<s p o>>`.
2121
#

0 commit comments

Comments
 (0)