But my problem is that, using the data itself provided not me, it kept saying that a composition from 1897 is last one a specific composer wrote, before and after listing a composition from 1898 from the same composer.
It's possible that the composition from 1897 is specifically mentioned as "the last one" in some texts that are in the training dataset. LLMs are no better than the data that they've been trained on. Garbage in - garbage out.
This is why the EU requirement to disclose AI training datasets is very good. So that it's possible to check where exactly its inaccuracies are coming from.
I encountered a similar thing with StableDiffusion. When I tried to generate a minigun - it generated something that resembled a cannon instead. Then I looked up "minigun" in the dataset that SD was trained on and found out that many images that were tagged as "minigun" didn't actually contain miniguns, but cannons and towed artillery.
It's surprising how good the results are despite the fact that the quality of captions in the training dataset is all over the place.
BTW - I also tried Dall-E and Midjourney - and they could not generate miniguns either. Seems like that they all use similar datasets under the hood.