On the last post I barely touch the surface of the RDF topic, so in this post I want to talk about RDF with more details. The Web in the way that it is developed have a lot of focus on how it looks like, developers expend a lot of time thinking about how to write webapps that interact with the users in a nice way, however, from my personal experience, the majority of web sites don’t provide a good description about the website, to help search engines such as Google to figure out what the web site is about, or even organize the information in a way to spread the knowledge around.

The question is why is it important? We have done a good work storing the knowledge in relational database, right? The point is that information and knowledge is pretty hard to predict and model. For example, you can expend a lot of time modeling a database in specific domain, but somehow the business rules changes, so you have to change your database model, add columns, remove columns, add table, remove tables and adapt your data to the new database model, from my personal experience as software developer and system admin, these changes are expensive, time consumer and very annoying. Please note that I’m not saying that the relational database are the evil, but I’m saying they are not that good to store knowledge, because we can not assume that we know everything about everything, so in so point, early or late our understand about the things will change and we will get ourselves in a situation that force us to change the database model again.

Changes are coming, so we need a structure that allows to add changes without broke the structure that we already have in place, the structure should be able to handle with new concepts and remove concepts that are not necessary anymore, yes, I am talking about concepts, I am not talking about  entities  nor columns nor lines in a table, remember that I still to talk about knowledge here. Now you can say: wait a second! I add and remove lines in a relational database, so what are you talking about. Let take a look in a example, I can say that I am a person, right? So if I am a person I should have address, a birthday and whatever information could be relevant to say about myself, in this way, I could create table with the columns: name, address and birthdate, so the data model is ok, we finish our project, everybody is happy, but one day we realize that we need a new column for my favorite music or anything else, so we have to change the table.

However, there is a structure called Graphs, where is very easy to add and remove things without compromise the structure itself. I don’t want to talk about Graphs in detail on this post, but you can think a graph as a set of points linked by lines. These points and lines in a graph can be anything, but I am talk about knowledge here, so these point and lines will be our data, by the way a linked data, they are the concepts in the knowledge graph, actually, is here where the RDF take his place.

The RDF is a data model, it have two main characteristic, it represent entities using URI and it is composed by three parts. The good aspect about to use URI to identify entity instead names or locals ids like primary keys it that the are universal, I mean there are a lot system all around the world that already know how to deal with URI, even more, if the URI is a URL, it will be resolvable in a web browser and it could provide a nice visualization about the data that it is representing. The three part that compose a RDF are the subject, predicate and object, I like to think on it as subject is the resource, the matter that we are talking about, the object is something that I have to say about the subject and the predicate is what I am saying about the subject.

To help me in my explanation, let see a example in the picture below, here I have two subjects: http://example.org/luiz and and http://example.org/miguel, both subject are linked to the object https://schema.org/Person by the predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#type, it means that both, luiz and miguel are persons, in another way the subject http://example.org/luiz is linked to the object  http://dbpedia.org/resource/The_Beatles by the predicate https://schema.org/LikeAction, which means that the http://example.org/luiz likes the object http://dbpedia.org/resource/The_Beatles. So the knowledge graph below says that luiz and miguel are persons and luiz likes Beatles. Now it should be essay to you see that we can add new subjects, predicates and objects what it will allow us to say more about this domain, without break the knowledge that we have in place right now.

rdf-sample

Now that I already talked about the RDF as model, it is time to talk about the RDF representation. As I said in a previous post, there are four commons representation of RDF: Turtle, RDF/XML, RDFa and JSOM-LD. Turtle is most human friendly format, and it can be simple as it is in the model below, note that it is the subject in the first line, the predicate in the second line and the object in the third line, note the point in the end of the third line, it shows the end of this triple. Turtle have some features that allow some kind of expressivity, such as when the triples differ only by the object you can separate the objects by comma.

http://example.org/luiz 
   http://www.w3.org/1999/02/22-rdf-syntax-ns#type 
      https://schema.org/Person .


RDF/XML was the first RDF format, it remains a time when everybody thought that everything should be a xml file. Express a graph in a tree structure as a xml file doesn’t seem every smart, but that was the way that they did it. Below you can see the same triple that we represent before; subject, predicate and object.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://example.org/stuff/1.0/">
    <rdf:Description rdf:about="http://example.org/luiz">
       <rdf:type>Person</rdf:type>
    </rdf:Description>
</rdf:RDF>


The next format that I want to talk about is the JSON-LD, this the representation of the RDF in native javascript object, very appropriated to be used in web development, it can be used as a response in a REST API response or even embedded in the HTML. In the example below I am describing the resource “http://example.org/luiz” and saying the it is a Person.

{
  "@context": "http://json-ld.org/contexts/person.jsonld",
  "@id": "http://example.org/luiz",
  "type": "https://schema.org/Person"
}



RDFa is last format that will allow us to embedded our RDF in a html page, indeed, RFDa file are html files with extra markup. In RDFa attributes are used to provide RDF triples date, like in the example below where the html tag p have the attribute vocab and the html tag span that have the attribute property name.




   My name is
   <span property="name">Luiz Claudio Santos</span>



Now that I talked a little bit more about the RDF model and its formats, let go back to my soccer taxonomy and Poolparty. In the Poolparty there is a tab called triple, where we can see the subject, predicate and object of RDF elements in the taxonomy. Actually, almost everything in Poolparty is stored as triple like you can see in the picture below, in this picture you can see cleanly the subject “http://localhost:8086/OlympicSoccerMansTournamentRio2016/49” and a lot of predicates and object that combined with the subject make a new triples.

 

neymar_triple

It was not my intention explain all details about RDF in this post, my intention here was make you be aware of the knowledge graph behind the RDF and hopefully by now you should have one question in mind: How can I make question to this knowledge structure that I already have in place? The short answer of this question is: you should do a SPARQL query. What is a SPARQL and how to do a SPARQL query is what I will talk about in the next post of this series.

Leave a Reply

Your email address will not be published. Required fields are marked *