0 users online. Create an account or sign in to join them.Users
Dynamic XML Data Source - Weird Caching Issue
This is an open discussion with 10 replies, filed under Troubleshooting.
Search
Update:
I managed to track this down into Gateway::exec(). Echoing $result after the curl_exec, I'm getting alternating results of the old XML and the new XML, which would lead me to think it's the remote server, however, curl from the command line is returning the right XML every time.
Anyone know if Gateway or curl are doing any type of caching?
That's pretty crazy, but hopefully we can get to the bottom of it for you.
What's the external URL? It'll be helpful so I can try to reproduce locally and track down where the issue is occurring.
What's your specs? Symphony/Apache/PHP/OS etc. Is curl available?
Anyone know if Gateway or curl are doing any type of caching?
The Gateway class uses either CURL or sockets to connect to locations, to my knowledge there is no caching at the Gateway level, it is only applied by the Dynamic XML DS.
I've had some strange behaviour with dynamic data-sources when the directories in manifest aren't set up properly (can't remember if it's /cache or /tmp).
I've had some strange behaviour with dynamic data-sources when the directories in manifest aren't set up properly
Orly? That's weird, I think it only ever uses the cache table in the database.
Edit: the Mutex class probably writes a lock file to /manifest/tmp, that might be it.
nickdunn, good point. It looks a bit like lock file is not being released properly. So every other time it is force-cleared. That is, if every "Old XML" is equal to last "New XML" before it, otherwise (if "Old XML" stays the same all the time) it must be something else.
Unfortunately I can't give the feed URL since it's behind a firewall.
- System Specs:
- Symphony 2.2
- Apache 2.2.3
- PHP 5.3.6
- MySQL 5.0.77
- OS - RedHat 5.6
- CURL 7.15.5
I've done a bit of digging, and the Gateway is definitely using cURL to make the connection, and setting CURLOPTFORBIDREUSE and CURLOPTFRESHCONNECT is making no difference.
The manifest/tmp directory (permissions 777) is always empty, so I don't think a lock file is sticking around.
As for the XML, the new XML is always the same and the old XML is always the same, and they just keep flipping back and forth. Unfortunately, after a lot of testing last night, it seems that it's flipping back and forth randomly. As in, it be the new one, then the old, then the new, then the old the next few updates, then the new... and so on. What's really weird is that the longer this goes on, the less frequently I'm getting the new XML.
We're actually starting to think it's possible that it's a load balancer issue on the feed provider's end, but unfortunately it's only happening through Symphony, so we can't be sure, as in, when I hit the same URL from the command line, I get the right XML every time.
The problem is that I dug into the Gateway, and was dumping the result of curl_exec directly, which was alternating between the 2 XML files. So, this makes me roughly 90% sure it's got to be on the provider's end. I just wish I could get a browser call to the feed, or command line call to the feed to produce the wrong XML to be sure.
The problem is that I dug into the Gateway, and was dumping the result of curl_exec directly, which was alternating between the 2 XML files. So, this makes me roughly 90% sure it's got to be on the provider's end.
Yeah, so it probably has nothing to do with mutex class, but just to be sure...
The manifest/tmp directory (permissions 777) is always empty, so I don't think a lock file is sticking around.
Mutex class uses (line 150 of symphony/lib/toolkit/class.mutex.php) system setting:
if(is_null($path)) $path = sys_get_temp_dir();
You can try to change it to TMP (directory configured by Symphony, which is manifest/tmp by default) like this:
if(is_null($path)) $path = TMP;
and see if it helps anything.
So, I took another route and built my own standalone PHP page which implements a basic curl request and I'm getting the same issue. So, it doesn't look like it has anything to do with Symphony. Now 95% sure it's the provider.
Thanks for the suggestions/assistance though.
So, as a further update... The issue was not on the provider's end, it was on my end.
I changed the call to Gateway::exec() to force the use of sockets and everything appears to be working fine. Somehow, even though I've tried every combination I can think of to restrict CURL from caching, PHP CURL is caching the request and serving old results.
It almost seems like CURL is making a request, getting new data, serving that data, but then is unable to overwrite it's own internal cache, so when it serves up a cached version, is serving an old result.
If I am not mistaken thoroughly, CURL has no cache. Why should it have one?
Are you sure that you are not tricked by an application that is scaled across different servers? Subsequent requests might give you different responses if those servers are not synchronized fast enough.
Another possibility would be browser caching. If you debug anything in the browser (especially if you use Safari), be sure to reload/clear the cache/reload/clear the cache etc. often.
Create an account or sign in to comment.
Hoping someone can help, as I'm experiencing a weird caching issue.
My site pulls in an XML feed as a Dynamic Data Source from an external URL. This works fine, however, recently the dynamic data source XML has been reverting to an earlier version. An example will help here since the XML feed has timestamps in it...
When I hit the feed URL directly, I get the following XML snippet back...
<?xml version="1.0"?> <IRXML CorpMasterID="XXXXX"> <NewsReleases PubDate="20110914" PubTime="06:09:06"> /snip/ </NewsReleases> </IRXML>Now, when Symphony serves up the page, looking at the XML for that data source, I see the following, which is as it should be with a PubDate of 20110914
<reuters-news status="fresh" creation="2011-09-14T17:55:42+00:00"> <IRXML CorpMasterID="XXXXX"> <NewsReleases PubDate="20110914" PubTime="06:09:06"> /snip/ </NewsReleases> </IRXML> </reuters-news>That's all well and good, but the next few times I hit the page, after the cache has timed out (in theory when it should be pulling a new version) it's reverting back to a previous version. (as can be seen in the PubDate attribute... Also note the Symphony creation timestamps).
<reuters-news status="fresh" creation="2011-09-14T18:03:01+00:00"> <IRXML CorpMasterID="XXXXX"> <NewsReleases PubDate="20110912" PubTime="18:09:58"> /snip/ </NewsReleases> </IRXML> </reuters-news>I'm not sure how that's happening. In both cases, hitting the Feed URL directly gives me the correct XML (even hitting via cURL on the production server). I've tried clearing out the sym_cache table, as well as /manifest/cache and /manifest/tmp, but I still get the old XML. Each time Syphony says the xml is fresh.
The data sources are being cached for 5 minutes. When I set the cache time to 1 minute, here's what happened over the next 10 minutes:
I'm at a loss here as to what's happening.