In this lesson, we’ll take a look at Natural’s phonetics feature. We’ll learn how to check whether two words sound alike, looking at both the SoundEx and Metaphone algorithms.
Should I split documents into single sentences or use them as is to train Brain.js text classification model? I was wondering what's the best way to feed the model with training data.
Can i just use the document as is? like this: {"phrase": "First long document with up to 30 sentences", "result": {"label 1": 1}}, {"phrase": "first long document with up to 30 sentences", "result": {"label 2": 1}} {"phrase": "Second long document with up to 30 sentences", "result": {"label 2": 1}}, etc. Or, should I split all documents into sentences and then the data will look like something this: {"phrase": "Sentence 1 out of document 1", "result": {"label 1": 1}}, {"phrase": "Sentence 2 out of document 1", "result": {"label 2": 1}}, etc.
{"phrase": "Sentence 1 out of document 2", "result": {"label 5": 1}}, etc.
{"phrase": "Sentence X out of document X", "result": {"No labels at all": 1}}, etc. Same question about using the model, should I just apply it on the complete document or should I split it to separate sentences then apply the model on each sentence.
What's the best practice?