07-19-2023, 11:51 PM
Currently designing a CMS for use on my website. I am wondering if there were any free libraries available for creating tags based on the content.
**Example**
> I like trees. Trees are plants that have leaves. Leaves on tree can be
> multi-colored.
Would produce the tags *trees* and *leaves*.
The library should be PHP or JS.
**EDIT 1:**
I have found a simple library for half my task -
I have edited what the library specifications should be (thanks to guidance from @NullUserException)-
- Count all words (ignoring case and inflections), throw out stop words and pick the ones with the highest frequency
- Edit text to make words that are more specific to the genre (may have a lower frequency), be of higher value. For example in the example - *'multi-colored'* should become higher value because it is more specific to the subject. However it should include a prefix indicating it relates to the subject (it would become leaves-multi-colored).
**EDIT 2:**
Algorithm should remove words that have less than 3 characters **unless** they are in capitals or formatted otherwise
**Example**
> I like trees. Trees are plants that have leaves. Leaves on tree can be
> multi-colored.
Would produce the tags *trees* and *leaves*.
The library should be PHP or JS.
**EDIT 1:**
I have found a simple library for half my task -
[To see links please register here]
I have edited what the library specifications should be (thanks to guidance from @NullUserException)-
- Count all words (ignoring case and inflections), throw out stop words and pick the ones with the highest frequency
- Edit text to make words that are more specific to the genre (may have a lower frequency), be of higher value. For example in the example - *'multi-colored'* should become higher value because it is more specific to the subject. However it should include a prefix indicating it relates to the subject (it would become leaves-multi-colored).
**EDIT 2:**
Algorithm should remove words that have less than 3 characters **unless** they are in capitals or formatted otherwise