Contagged
From TYPO3Wiki
<< Back to Extension manuals page
|
Author: Jochen Rau |
| This document is published under The content is related to TYPO3 - a GNU/GPL CMS/Framework available from www.typo3.com |
The extension was featured in one of the Typo3 podcasts. Check out the video at http://castor.t3o.punkt.de/files/podkast_7mf_contagged.m4v to see the extension in action.
Contents |
What does it do?
- auto-parsing, tagging and replacing of terms in a text
- scenarios of usage:
- auto-link to a glossary page
- auto-link to a bibliography page
- auto-replace a product number with the actual price
- auto tag a persons name to show the corresponding address in a tool-tip an link to a list of addresses
- replace text to hold typographic conventions (eg. replace ``foo´´ with ,,foo´´ or remove double typed spaces)
- the term can be treatet as regular expression (see type "Regular Expression")
- any database table can be used as a data source
- fully configurable via TS (table name, mapping of fields)
- default database table (term, alternative terms, replacement, short description, long description, language of the term, link)
- full support of stdWrap to do complex mapping of fields
- Synonyms and morphologies are supported (alternative terms)
- definition of new types via typoscript (definition, acronym, person, keyword, cities, replacements ...)
- inclusion and exclusion of branches, individual pages and individual content objects (cObj)
- content elements to be tagged are selected by using tt_content.text.20.parseFunc.userFunc = tx_contagged->main
- custom link (mail,http,pid) for each term
- auto-parser takes joined words (with a dash) into account
- single and list view
- template based
- alphabetical index (or any other index like ZIP-Codes, names etc.; configurable via locallang.xml)
- links only index-chars for which a database entry exists
- index-chars missing can be added automatically
- link "back to last page" in single view for easy navigation
- multi language support
- translated content
- localized sorting (not UTF8-safe by now)
- localized index (FE-Plugin)
- adds a "lang"-attribute, if the page language is not the same as the language of the term
- stdWrap for the term to be searched ("termStdWrap"; usefull to search for already tagged text like <person>Steve Jobs</person>)
- (new) Example configuration for RealURL is included
Installation
- Import and Install “tx_contagged” with the extension manager. Be sure getting the latest version.
- Configure the pid of the page/folder where you have stored your TypoScript-configuration of the type of terms (this is normally the pid of the root page). This can be done later in the extension manager by clicking on the name of the extension. If you miss that step, you won't see a content in the selector box "type of term" in the BE.
- Include the static templates “Content parser (contagged)” and “default CSS-styles (contagged)” in your main template. You have to select “Click here to edit whole template record” in the “Template”-module. There is a third Template "Experimental types (contagged)" with - ehrm - experimental types.
- Add a new page ("list page") where you can insert the FE-plugin (“contagged (list)”)
- Add a SysFolder you can store your terms in (can be any page including the list page, too).
- Configure the extension via the “Constant editor” in the “Template”-module or in your TS Setup.
- If you want to parse the content of tt_news add also the last line.
plugin.tx_contagged { listPages = 123,56 storagePids = 123,456 includeRootPages = 2 } plugin.tt_news.general_stdWrap.postUserFunc = tx_contagged->main
- listPage: a comma separated list of pids list pages you have inserted the FE-Plugin "List (contagged)" on; only the first page will be linked, but the term appears on every list page you have configured (glossary, list of persons)
- storagePids: a comma-separated list of pids you have stored your terms
- includeRootPages: a comma-separated list of pids of the root pages whose branches you want to be parsed (can be the overall root page of your site)
Please have a look at section "Questions and Wishes" if you have any Problems installing the extension. And please remember: It's still alpha.
Known problems
- block and inline views of the description (def_block and def_inline) are not bullet proof in every browser (Opera and -of course- IE6)
- sorting of the terms and index chars is not UTF8-safe (depending on locales; will be solved in PHP 6.0)
- (fixed) switching the FE-language is not recognized by the FE-plugin
- (fixed) multi byte index chars (UTF-8) are not linked correctly in the index of the FE-plugin
- keywords are only generated from the last cObj of a page
- (fixed) bad performance of the FE-plugin
To-Do list
- better control of amount of tagged/replaced terms
- (partly done) maximum of tagged terms per cObj/page
- random
- JavaScript tool-tips to display the description
- (done) parsing the content of other extensions (e.g. tt_news);
- custom tags (e.g. to mark ambiguities or to exclude single words)
- full support of regex via typoscript
- tagging a larger scope than only the matched term (via RegEx)
- backlinks and crosslinks
- frontend editing
- handling of ambiguities (e.g. „Bus“ = „large motor vehicle“ AND „set of conductors carring data“)
- BE-module
- statistics
- markup with custom tags (" ... lorem ipsum bus dolor sit amet ... [select tag \/]"
- simple search function
- enable the administrator to decide how the parser is initiated
Questions and Wishes
Feel free to add some new questions,ideas, suggestions and whishes.
- I can't select any "type of term" in the Back-End. The list is blank.
- You have to configure the pid of the page/folder where you have stored your TS-Setup (this is normally the pid of the root page as you included the "static template from extensions"). This step must be done in the extension manager (click on the name of the extension).
- Parsing content from other extensions is my favourite wish. Is there a reason why you only parse tt_content? Other taggers (like a21glossary) parse the complete parsed page before it gets outputted to the browser. Is that an idea? --Daniel
- There are several places (eg. tt_content.text) and several ways (postUserFunc, parseFunc.userFunc,contentPostProc-all) to initiate parsing. All of them have their advantages and disadvantages. I began to program contagged as I was unsatisfied by the accuracy of matches found by a21glossary. And I found out that it's really difficult to parse the body of a HTML-page by RegEx (preg_replace) if you want avoid unwanted matches. And if you hook in "contentPostProc-all" you can not cache the output. So every time you deliver a large page you have to parse it. On the other hand, you don't have problems to parse the output of any extension. For this reason I startet some tests with the hook 'contentPostProc-all' in 'tslib/class.tslib_fe.php'. I plan to enable the administrator to select which parsing method he prefers. --Jochen
- Would be nice to extend some functionality by modules. For example something like "contagged__drwiki" and "contagged__statisticview" can be created when one or more hooks exist. So it's easy to handle special wiki-features or see a statistic-chart. --Daniel
- That would be great. It's planned to create a backend module to tag terms manually and to view some statistical data. Furthermore I will add several hooks and the extension will be refactured to the MVC-pattern. By now the extension is highly configurable via TS. Please tell me, if you have special needs. --Jochen
Configuration
Basic TS-Setup
This is the TypoScript Setup of tx_contagged:
<TS> :# encoding: iso-8859-1 # include class tx_contagged as library includeLibs.tx_contagged = EXT:contagged/class.tx_contagged.php # invoke the parser lib.stdheader.stdWrap.postUserFunc = tx_contagged->main tt_content.text.20.postUserFunc = tx_contagged->main tt_content.bullets.20.postUserFunc = tx_contagged->main plugin.tx_contagged { templateFile = {$contagged.templateFile} linkToListPage = {$contagged.linkToListPage} listPages = {$contagged.listPages} storagePids = {$contagged.storagePids} includeRootPages = {$contagged.includeRootPages} excludeRootPages = {$contagged.excludeRootPages} includePages = {$contagged.includePages} excludePages = {$contagged.excludePages} excludeTags = {$contagged.excludeTags} autoExcludeTags = {$contagged.autoExcludeTags} checkPreAndPostMatches = {$contagged.checkPreAndPostMatches} addTitleAttribute = {$contagged.addTitleAttribute} addLangAttribute = {$contagged.addLangAttribute} addCssClassAttribute = {$contagged.addCssClassAttribute} replaceTerm = {$contagged.replaceTerm} maxRecurrences = {$contagged.maxRecurrences} updateKeywords = {$contagged.updateKeywords} labelWrap1 = {$contagged.labelWrap1} labelWrap2 = {$contagged.labelWrap2} modifier = {$contagged.modifier} fieldsToSearch = {$contagged.fieldsToSearch} sortField = {$contagged.sortField} fieldsToMap = {$contagged.fieldsToMap} secureFields = {$contagged.secureFields} showOnlyMatchedIndexChars = {$contagged.showOnlyMatchedIndexChars} autoAddIndexChars = {$contagged.autoAddIndexChars} addBackLinkDescription = {$contagged.addBackLinkDescription} types { definition { label = Definition label.de = Definition tag = dfn } acronym { label = Acronym label.de = Kurzwort aus Anfangsbuchstaben (Beispiel: NATO) tag = acronym } abbrevation { label = Abbrevation label.de = Abkürzung (Beispiel: u.s.w.) tag = abbr } } dataSources { default { sourceName = tx_contagged_terms hasSysLanguageUid = 1 storagePids = fieldsToEdit = term_main,term_alt,term_type,term_lang,term_replace,desc_short,desc_long,link,exclude mapping { uid.field = uid pid.field = pid term_main.field = term_main term_alt.field = term_alt term_type.field = term_type term_replace.field = term_replace term_lang.field = term_lang desc_short.field = desc_short desc_long.field = desc_long link.field = link exclude.field = exclude } } } }
Examples for Defining New Types and Datasources
You can define new type of terms and get your data from any database table you like. Check yout the examples in "static/examples/setup.txt". These examples are experimental. Don't use them on productive servers.
Example configuration for RealURL
This is just an excerpt of a configuration with the postVarSet for contagged.
$GLOBALS['TYPO3_CONF_VARS']['EXTCONF']['realurl'] = array (
'postVarSets' => array (
'_DEFAULT' => array (
'char' => array (
array (
'GETvar' => 'tx_contagged_pi1[index]',
),
),
'id' => array (
array (
'GETvar' => 'tx_contagged_pi1[key]',
),
),
'searchword' => array (
array (
'GETvar' => 'tx_contagged_pi1[sword]',
),
),
'backToUid' => array (
array (
'GETvar' => 'tx_contagged_pi1[backPid]',
),
),
),
),
);
Change Log
v0.1.5 2008-09-24 Jochen Rau <j.rau@web.de>
* IMP The parser is now invoked also for bullet lists and headers (if <h[1-6]> is not an excluded tag)
* IMP If maxRecurrences is set, the matches are now spread over the cObj constantly
* IMP It's not necessary anymore to set maxRecurrences to invoke the parser
v0.1.4 2008-09-23 Jochen Rau <j.rau@web.de>
* FIX Fixed typo in TS Setup
v0.1.3 2008-09-22 Jochen Rau <j.rau@web.de>
* ADD The maximum number of recurrences of a term (for a cObj) can be set by maxRecurrences in the TS Setup
v0.1.2 2008-09-09 Jochen Rau <j.rau@web.de>
* FIX In some cases no term was found; the use of the modifier "u" didn't work if the content is not UTF-8; now support for UTF-8 must be activated manually by changing the RegEx-modifier "Uis" to "Uuis"; fixes bug #1483
v0.1.1 2008-09-01 Jochen Rau <j.rau@web.de>
* ADD Added a RealUrl example configuration to "doc" folder
* IMP Non filled markers (###TITLE###) are now removed
v0.1.0 2008-08-31 Jochen Rau <j.rau@web.de>
* CHG Set status to beta
v0.0.18 and v0.0.19 2008-08-31 Jochen Rau <j.rau@web.de>
* ADD fieldsToEdit
* ADD New data source "references" added to experimental types
* CHG Cleaned up template file
* ADD Phrases can now be added by selecting them (experimental)
* ADD Terms can now be edited in the FE-list and in the content elements (you have to enable Admin Panel Editing)
* CHG Activated keywords option
* CHG Changed the way to generate links
v0.0.17 2008-02-14 Jochen Rau <j.rau@web.de>
* CHG Major revision of the parser; refactured code
* ADD Support for tt_news (be sure to add the static template of contagged after the static template of tt_news)
* FIX Link "More Details" showed list instead of extended single view if $termKey was "0"
* CHG Set autoExcludeTags = 1 to avoid nested parsing
* ADD Documentation (http://wiki.typo3.org/Contagged)
* ADD Added index for numbers "0-9"
* FIX Keywords stored in "contagged_keywords" (table "pages") are now taken from the whole page and not only from the last cObj that has been parsed
* FIX Uppercase handling of replaced term
* FIX Fixed inifinite loop caused by incorrect mapping (only PHP >5.2.2)
* CHG Secured fields are now term_main,term_alt,desc_short in the standard configuration
* RMV special exclude tag "exparse" was removed
* CHG listPage changed to listPages; more than one list page per type can be defined (comma separated); the first list page will be linked (if you want)
* FIX List pages show only those types of terms that are pointing to them (listPages)
* CHG If there are alternative terms: the longest takes precedence while parsing
* CHG Fields are not htmlspecialchared by default anymore
* CHG Experimental type definitions are now stored in a separate static template (to be included as usual)
v0.0.16 2007-12-05 Jochen Rau <j.rau@web.de>
* FIX Small bugfix to avoid inaccurate parsing inside a tag
v0.0.15 2007-10-19 Jochen Rau <j.rau@web.de>
* FIX Localization of the labels (BE) now depends on the BE-user settings
* RMV Removed obsolete parameter "backendLanguage" from TS Setup
* CHG Encoding of the file EXT:static/setup.txt is now iso-8859-1
* CHG The term is linked even if there is no long description (desc_long)
* FIX Part of a joined word is no longer disappearing, if the term is replaced
* ADD Support for tx_categories; you have to define the proper storage pid of the hidden sysFolder (tx_categories is an experimental extension maintained by Mads Brunn; not in TER; see TYP3_ect on news.netfielder.de)
* FIX Keys of $termsArray are no longer overwritten, if more than one data source is configured
v0.0.14 2007-10-09 Jochen Rau <j.rau@web.de>
* IMP Better support for multibyte character sets (using t3lib_cs instead of native strlen() and substr())
* FIX Link "back to page ..." in FE-Plugin
* CHG Sorting of terms (not improved yet)
v0.0.13 2007-10-06 Jochen Rau <j.rau@web.de>
* IMP Better support for joined words (with a dash)
* IMP Quoting of the term in the RegEx
* FIX Selection of a custom template file is now working
* FIX Handling of ambiguities (like the two meanings of the word "bus")
* CHG Added <dt> as a default excludeTag
* CHG Names of some template markers (esp. Links)
* IMP Enhanced performance of the FE-plugin (refactored code)
v0.0.12 2007-09-28 Jochen Rau <j.rau@web.de>
* IMP Check if the table configured as a data source exists in the database (avoids an error message)
v0.0.11 2007-09-26 Jochen Rau <j.rau@web.de>
* CHG You have to define one or more storagePids! This can be done globally (plugin.tx_contagged.storagePids), for each type (plugin.tx_contagged.types.foo.storagePids) or for each data source (plugin.tx_contagged.dataSources.bar.storagePids).
* IMP "fieldsToMap" and "secureFields" now made available via constants editor
* CHG Moved "fieldsToMap" and "secureFields" to the root of the TS Setup (plugin.tx_contagged.)
* ADD Every type of term can be excluded from beeing listed (new parameter "dontListTerms")
* ADD Every type of term can be hidden in the BE (new Parameter "hideSelection")
* IMP Next step towards MVC-Pattern (splitted tx_contagged_model into tx_contagged_model_terms and tx_contagged_model_mapper)
v0.0.10 2007-09-22 Jochen Rau <j.rau@web.de>
* ADD type "Regular Expression" (every term is treated as RegEx and matches can be replaced)
v0.0.9 2007-09-21 Jochen Rau <j.rau@web.de>
* IMP Any database table can now be configured as a data source for every single type of term (very powerful!)
* IMP Example configuration for tt_address
* ADD Keywords are now registered {register:contagged_keywords} to be inserted as "<meta>-keywords" of the page header (plugin "metatags" required)
* FIX Exclude individual cObjects (BE-field in tt_content)
* CHG Restructured code (half way to MVC-Pattern)
v0.0.8 2007-09-18 Jochen Rau <j.rau@web.de>
* ADD Added experimental support for foreign tables like tt_address (configurable through TS Setup: table name, field mapping); comment out line 378 in class.tx_contagged.php to activate
v0.0.7 2007-09-17 Jochen Rau <j.rau@web.de>
* ADD More than one char can be used as an index "char" (eg. names, ZIP-codes, cities)
* FIX Closing bracket in TS Setup
* FIX UTF-8-characters are now linked properly (auto generated index in FE-Plugin)
* IMP Cleaned up main RegEx
* IMP ALL database fields ar now registered in $GLOBALS['TSFE']->register['contagged_XXX'] to be used in TS Setup (for future hooks)
v0.0.6 2007-09-13 Jochen Rau <j.rau@web.de>
* FIX Term is now displayed as <dt>TERM</dt> again (FE-Plugin)
* ADD stdWrap for the term to be searched ("termStdWrap"; usefull to search for already tagged text like <person>Steve Jobs</person>)
* FIX Title-attribute will not be displayed, if the short description (desc_short) is empty
* IMP UTF-8 handling of function to prevent attributes from beeing parsed (eg. <def title="don't parse this text">)
* CHG Definition of types "dfn_block" and "dfn_inline" (work in progress!)
v0.0.5 2007-09-06 Jochen Rau <j.rau@web.de>
* CHG The types "dfn_block" and "dfn_inline" for pure css tool-tips are valid but still not running in IE6
(new parameter "stripBlockTags" for replacing <p>...</p> with <br/> in long description;
thanks to Markus Timtner)
* CHG Changed stdWrap in TS configuration to preStdWrap and added postStdWrap to make a outerWrap possible
* ADD Maximum amount of occurancies to be tagged can be configured for each type of term (e.g. "plugin.tx_contagged.types.dfn_block.maxOccur = 1")
* ADD Support for joined words (with a dash); new parameter "checkPreAndPostMatches"
* FIX bug in SQL-Statement (thanks to Tristan Knapp)
v0.0.4 2007-08-29 Jochen Rau <j.rau@web.de>
* IMP Better support for multibyte characters (UTF8).
v0.0.3 2007-08-28 Jochen Rau <j.rau@web.de>
* ADD New template based FE-list-plugin with index configurable through locallang.xml.
* FIX Fixed call of "userFunc".
* RMV The types "dfn_block" and "dfn_inline" are commented out bit still there as an example (it seems that a pure css tool-tip is not bullet proof)
* ADD Exclude individual cObjects (BE-field in tt_content)
v0.0.2 2007-05-20 Jochen Rau <j.rau@web.de>
* ADD Tags can be selected to be excluded from parsing.
* ADD New special tag <exparse></exparse> to exclude content from parsing.
v0.0.1 2007-05-16 Jochen Rau <j.rau@web.de>
* CHG Changed from tt_content.text.20.parseFunc.userFunc to .postUserFunc
* CHG Changed the separator of alternative terms from '|' (Pipe) to chr(10) (CR); the backend field is now multiline
* ADD Added a prefix 'contagged_' to the registered values in $GLOBALS['TSFE'] to prevent name conflicts
* FIX Some bugfixes in the type configuration and the css-class of dfn_inline
* ADD Added option 'updateKeywords': It is now possible to auto-update page keywords based on the terms found on a page
v0.0.0 2007-05-14 Jochen Rau <j.rau@web.de>
* Initial release
