Enhancement #512

Enhancement #286: Make kune googleable/searchable using hash bangs #! instead of # in hashs

Page static generation via htmlunit perf issues

Added by Pablo Ojanguren over 6 years ago. Updated about 6 years ago.

Status:NewStart date:03/04/2013
Priority:NormalDue date:
Assignee:Pablo Ojanguren% Done:

0%

Category:Server side
Target version:Unplanned
Resolution: Tags:

Description

Generation of static html pages for crawler using htmlunit is provoking a lack of performance sometimes. We want to avoid this controlling how and when htmlunit processes are executed.

Associated revisions

Revision 2be1421b
Added by Vicente J. Ruiz Jurado about 6 years ago

Added some MBean to Search Servlet (wip)

Revision 5b3965e2
Added by Vicente J. Ruiz Jurado about 6 years ago

Added some mbean methods to SearchEngineServletFilter (related to #512 #286 and #70)

History

#1 Updated by Pablo Ojanguren over 6 years ago

Htmlunit process is controlled from this servlet filter class

cc.kune.core.server.searcheable.SearchEngineServletFilter

Init method sets thread configuration:

  public void init(final FilterConfig filterConfig) throws ServletException {
    this.filterConfig = filterConfig;
    cache = new Cache();
    executor = Executors.newFixedThreadPool(THREADS);
  }

We have to define following issues to address a solution:

  • WHEN have the server to avoid launching htmlunit? current server load %, current mem usage %...
  • WHAT behavior will the server have in that case? response HTTP 404?, response HTTP 500? I think crawlers are very sensitive to this!

#2 Updated by Vicente J. Ruiz Jurado over 6 years ago

Other thing I was thinking (and trying without success), was to maintain open the htmlunit WebClient, and not to do a client.closeAllWindows(); only when servlet destroy. I was trying to cache and make the page request faster, but... maybe we have to try again.

By the way, as we are using cloudflare CDN, I've added this rule:
kune.cc/*escaped_fragment*
Cache level: Aggressive caching

#3 Updated by Vicente J. Ruiz Jurado over 6 years ago

  • Parent task set to #286

#4 Updated by Vicente J. Ruiz Jurado about 6 years ago

After our last conversation, and your work with #70 I just added some mbean methods, and also I have refactorized a little bit this servlet. In short (when this is installed in kune.cc) we can debug this.

Also available in: Atom PDF