canteenParser.md 1.68 KB

Canteen Parser with JSoup

JSoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

Jsoup: Java HTML Parser

Installation

Just add the following line to your gradle dependencies in your build.gradle and refresh the project.

compile "org.jsoup:jsoup:1.10.2"

Procedure

The parser collects data of the beuth university canteen in the Luxemburger Straße 9, 13353 Berlin. To get the requested data, the parser has to navigate through the single pages.

To get data for two weeks, we have to call a page for every day. So at first we have to get all dates for the current and the next week.

The other thing we need is the resource_id for the canteen we like to parse. In our case it's the 527.

If we have the dates we wish to parse and the correct resource_id for our canteen, we loop the dates and call the URL: https://www.stw.berlin/xhr/speiseplan-wochentag.html with the date and resource_id as params.

After calling:

Document doc = Jsoup.connect("https://www.stw.berlin/xhr/speiseplan-wochentag.html").data(params).userAgent("Mozilla").post();

where params represents a string map containing the resource_id and the requested date, we got an Document which we can parse to get all needed data.

The Parser is called right before a given message needs to be checked by the drools rules.

Result

As a result we got a CanteenData object containing information of dishes for the current and next week.