Although such training data can be useful for constructing accurate HMMs, collecting it requires a great deal of human effort.
To generate approximately one million words of annotated text, which is the approximate size of training data required for accurate estimates, people would have to annotate the equivalent of more than 1,500 news stories.