Using regex to extract date for valid_upto field from bank's webpage

Thomas Baumgart thb at net-bembel.de
Wed Jun 10 23:18:31 BST 2020


On Mittwoch, 3. Juni 2020 17:15:59 CEST Prasun Kumar wrote:

> Hi mentors,
> Following up on my previous email:
> https://mail.kde.org/pipermail/kde-finance-apps/2020-June/000851.html I
> have an idea to get the date for the valid_upto field.
> Currently, the src/bankdata/CMakeLists.txt file contains the code to
> download the webpage
> https://www.bundesbank.de/de/aufgaben/unbarer-zahlungsverkehr/serviceangebot/bankleitzahlen/download-bankleitzahlen-602592
> (at lines 6-12) and uses a regex to search for 'blz-aktuell-txt-data.txt'
> to scrape the link of the file. Now, this <a> tag has the label which
> contains the date up to which the deletions were valid. We can scrape this
> date from this label using another regex and save this date as a suffix for
> the name of the file downloaded. This could then later be used to update
> the valid_upto column.
> Does this approach seem good enough?
> 
> I think REGEX MATCH should be used here as the first occurrence of
> "Bankleitzahlendateien - gültig vom" has the required date.
> 
> I am currently having a problem in working out the regex. Generally, to
> match a pattern after a string using regex, we use parentheses. But it is
> not working with CMake regex. Can anyone help me here?
> The line I have added looks like this:
> 
> >    string(REGEX MATCH \"Bankleitzahlendateien - gültig vom ([^ ]*)\"
> > FILE_DATE  \"\${DATA}\")
> 
> 
> The FILE_DATE then contains "Bankleitzahlendateien - gültig vom 09.03.2020"
> while I want only the pattern inside parentheses.
> I want to fetch the first date after "Bankleitzahlendateien - gültig vom "
> string.

Well, FILE_DATE contains the match, and that is what it contains. You want
to use CMAKE_MATCH_# to get the matches in parenthesis. Here's a small test:

cmake_minimum_required(VERSION 3.10)
set(DATA "Bankleitzahlendateien - gültig vom 09.03.2020")
string(REGEX MATCH "^Bankleitzahlendateien - gültig vom ([^ ]*)$" FILE_DATE ${DATA})
message("FILE_DATE:" ${FILE_DATE})
message("CMAKE_MATCH_0:" ${CMAKE_MATCH_0})
message("CMAKE_MATCH_1:" ${CMAKE_MATCH_1})

${CMAKE_MATCH_1} contains the date only.

> Also, how to make CMake regex non-greedy?

I honestly don't know. The docs are sparse at that point.

> I would appreciate some help as I am stuck here and the rest of my week's
> work requires the database.

Hope I am not too late for the game.

-- 

Regards

Thomas Baumgart

https://www.signal.org/       Signal, the better WhatsApp
-------------------------------------------------------------
Q: What did the drummer name his twin daughters? A: Anna one ... Anna Two ...
-------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-finance-apps/attachments/20200611/34b25c21/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 868 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-finance-apps/attachments/20200611/34b25c21/attachment-0001.sig>


More information about the Kde-finance-apps mailing list