regify utility  2.0.0-0
HTML Utilities

Functions for working with HTML content. More...

Functions

RUAPI alloc_chars ruHtmlEncodeText (trans_chars text)
 Return HTML compatible version of given text without any HTML wrapper tags. More...
 
RUAPI perm_chars ruHtmlSanitizeCustom (perm_chars html, alloc_chars *htmlCopy, alloc_chars *plainTxt, ruSet excludeTags, ruSet excludeAttrs)
 Sanitizes given HTML and optionally extracts plain text. More...
 
RUAPI perm_chars ruHtmlSanitize (perm_chars html, alloc_chars *htmlCopy, alloc_chars *plainTxt)
 Sanitizes given HTML and optionally extracts plain text. More...
 
RUAPI bool ruHtmlTestFor (trans_chars content)
 Checks whether given buffer is HTML. More...
 

Detailed Description

Functions for working with HTML content.

This uses https://github.com/htacg/tidy-html5

Function Documentation

◆ ruHtmlEncodeText()

RUAPI alloc_chars ruHtmlEncodeText ( trans_chars  text)

Return HTML compatible version of given text without any HTML wrapper tags.

This function simply replaces the following:

Character Sequence Encoding
& &
< &lt;
> &gt;
2 spaces &nbsp;&nbsp;
\r\n <br/>\n
\n <br/>\n
\r <br/>\n
Parameters
textText to HTML encode
Returns
HTML encoded version of given text. Caller must free.

◆ ruHtmlSanitize()

RUAPI perm_chars ruHtmlSanitize ( perm_chars  html,
alloc_chars htmlCopy,
alloc_chars plainTxt 
)

Sanitizes given HTML and optionally extracts plain text.

This runs ruHtmlSanitizeCustom with the default sets.

Parameters
htmlHTML buf to evaluate
htmlCopyWhere a cleaned copy if needed will be stored. Caller must free
plainTxtWhere a plain text copy will be stored. Caller must free.
Returns
Either cleaned up html or original.

◆ ruHtmlSanitizeCustom()

RUAPI perm_chars ruHtmlSanitizeCustom ( perm_chars  html,
alloc_chars htmlCopy,
alloc_chars plainTxt,
ruSet  excludeTags,
ruSet  excludeAttrs 
)

Sanitizes given HTML and optionally extracts plain text.

It returns a sanitized HTML copy if excluded items were found and an htmlCopy reference was given. If plainTxt reference was given it will store the extracted plain text there. At least one of htmlCopy or plainTxt must be set.

Parameters
htmlHTML buf to evaluate
htmlCopyWhere a cleaned copy if needed will be stored. Caller must free
plainTxtWhere a plain text copy will be stored. Caller must free.
excludeTagsOptional Set of tags to filter out. Default:
  • applet
  • script
  • object
  • iframe
  • noframes
  • noscript
excludeAttrsOptional aet of attributes to filter out. Default:
  • onabort
  • onblur
  • onchange
  • onclick
  • ondblclick
  • onerror
  • onfocus
  • onkeydown
  • onkeypress
  • onkeyup
  • onload
  • onmousedown
  • onmousemove
  • onmouseout
  • onmouseover
  • onmouseup
  • onreset
  • onselect
  • onsubmit
  • onunload
  • javascript
  • eval
  • script
Returns
Either cleaned up html or original.

◆ ruHtmlTestFor()

RUAPI bool ruHtmlTestFor ( trans_chars  content)

Checks whether given buffer is HTML.

This function checks the first 200 characters for <html and the last 20 for </html>.

Parameters
contentBuffer to check
Returns
false if body is NULL or doesn't have html tags in it.