Jan 5th, 2010, 11:16 PM
Making lists search engine friendly
After creating several hundred records to a application that was generated by ROO, I noticed that the paging isn't implemented such that it will be indexed by a search engine. For example the URL for page 2 looks as follows:
Google, for example, will typically only index "http://localhost:8080/sample/blogs", thus ignoring everything after the question mark. Therefore all the pages are considered to be the same url and records from other pages may not be indexed.
May I suggest that pages be part of the URL. Thus, page 1 will be "http://localhost:8080/sample/blogs", page 2 "http://localhost:8080/sample/blogs/page/2", page 3 "http://localhost:8080/sample/blogs/page/3" and so forth. This will enable the search engine to crawl through each page and each record in the page.
There is no reason really to indicate the number of records per page as part of the URL (aka the size parameter). Instead, that should be managed the same way that the theme and language is managed so that every time I visit the list page, it will honor whatever I chose before. Currently it always overwrites it to 10, which negates the feature.
One additional requirement will be to allow for the title and description of the page to reflect what is displayed on the list. Thus, if the page itself is shown by a search engine, the title may actually read "Sample > Blogs >Page 4" instead of just "Sample". This is also important for accessibility.
Jan 6th, 2010, 04:55 PM
Please feel free to log an enhancement request at https://jira.springsource.org/browse/ROO.
Jan 8th, 2010, 04:12 PM
We have actually made the decision to make page and size not part of the URI consciously. The problem is that we want to follow the REST pattern as much as we can which means that only resources should be referenced in the URI. So 'blogs' would be exactly that - a resource. Page and size on the other hand are not considered as resources and are therefore optional URI parameters. I think this is done similarly by others as well.
I can see the dilemma when it comes to search engines, so I am open to suggestions here. I guess there are tradeoffs for both sides.
However, remembering size settings via a cookie similar to what we are doing for themes and locales is something that should be corrected so we are more consistent.
Jan 8th, 2010, 06:36 PM
Components of urls, such as "en_us" in "http://localhost/travel/en_us", are often used to filter information. Page information are similar. Its done to keep urls simple, clean, human readable and searchable. However, many of those components may be optional and may not fit the recommended URI schemes for REST applications. So we need to be pragmatic about it. You may very well find more requests to filter data based on components in a URI, and it may not fit the guidelines for REST.
Rails does page numbers in the URI by default. We do it in ASP.NET MVC too. Maybe there should be away to "disable" it in ROO if their applications don't demand to be searchable.
Jan 11th, 2010, 08:20 PM
Sorry for the slight delay but I would like to pick up this discussion again.
I see your point about being pragmatic and have no major objections about changing some of our URI conventions to make them more search engine friendly.
I am however, still not quite convinced this should apply to the list use case. Let me explain. Say you have a 1000 records for the Product entity and want to display them in a paginated list view. If I page through it at some stage I will arrive at page 198 or so and have a list size of say 5. The current URI for this is:
What you are suggesting is to use
which is search engine friendly. Now how do you handle the situation when several of the products have been removed from the list? The search engine would still point to the same page but either there will be no results shown any more or (even worse) the items displayed will be completely different.
This would make more sense if we had pagination for business keys (all products that start with a letter between A and D) rather than simply using the collection size. So I think it makes sense to make the URIs for the Roo finders search engine friendly:
I think this would make sense.
Jan 11th, 2010, 10:54 PM
I'd like to follow the YAGNI principal.
The original problem is that we need to allow a search engine to index all the entities. To solve it, we can add the page as part of the URI. However adding the size to fetch is redundant. Whether there are 5 items per page or 100, the search engine will eventually get through them. We can solve the problem without adding the size to the URI. Instead, we should use a default, which could be hardcoded. I was only suggesting the page number be part of the URI, not the size. To be honest, even adding those "size" links to the list page is just complicating it and reduces the usability. I always remove it. I'll rather add it if I need it.
I don't think the last URI you gave will be of any use to a search engine. Its likely to return a subset of data that the default list view will return, and thus accessed twice. I don't believe it benefits the application in terms of search ranking at all.
The example I gave may actually open a can of worms, so I'd rather discuss filtering as a different topic. It doesn't always help with searching. I'm starting to think paging may be a special case. Perhaps we should just focus on enabling a search engine to crawl through each page (security permitting) so its searchable.