2 users online. Create an account or sign in to join them.Users

Search

Hi all, looking for a little advice…

I’m planning a heavily data-driven site that mashes together and analyzes data feeds from a number of sources. I’m going to be importing several dozen feeds locally on a scheduled basis, probably hourly. The results will need to be parsed to exclude unnecessary content, and then visualized by some renderer on the front end (either HTML5 Canvas, Processing.js, or a jQuery charting library), similar to what I did for http://jonasdowney.com/ephemera. (But improved.)

I had been planning to use something like Django to do this, but after my experience was so good with Symphony, I’ve started to think maybe it would be a better fit. My only concern is that the data analysis piece might be somewhat limited, with respect to what you can get out of the combination of Symphony data source output and XSLT. (I.E. with Django you can more or less directly hit the database however you want.)

Would you guys tackle this with Symphony or switch to a full-blown development framework?

Thanks, Jonas

I’m sure others like Nick could comment with more experience, but I would say that once you’re comfortable with extension authoring, you can do any PHP processing you need. Even if your sources are external to Symphony & your processing is in your own php code, Symphony will likely make things easy to assemble for web presentation.

However, If you won’t be leveraging XSLT in any significant way, maybe Symphony would be more of a distraction than its worth?

Of course you’re also talking Python vs PHP, which I’m not going to touch ;)

Maybe you can elaborate a bit on exactly what the analysis will consist of? It may just be a matter of customizing a few data sources…

I’m not exactly sure what the analysis is going to be yet, but most likely it will involve a lot of aggregating feeds together, gathering statistics (frequency of words, etc.) across several feeds, and querying for specific phrases. It’s the latter two that I’m not sure how to do with Symphony, other than spitting out the fields to XSLT and parsing it out that way.

@Andrew, didn’t mean to get into the Python/PHP debate either; I’m a PHP guy usually!

I’m using Symphony for something similar at the moment. Data comes from section, dynamic XML and some custom datasources. Custom datasources both for internal section queries and and the aggregation for of external feeds that require authentication. I find Symphony useful for this because I can create slightly abstracted datasources and do some individualisation in XSLT. And it’s easy to output your data in different formats like XML or JSON.

importing several dozen feeds locally on a scheduled basis, probably hourly

The XML Importer extension was born out of this requirement cropping up time again in client work. The extension lets you create “Importers” which have their own execute URL so they can be added to your crontab easily. An Importer accepts an XML source (a URL) and you specify how elements of the XML are mapped onto fields within a section in Symphony. You can choose one field to be the unique “ID” field, so as to prevent duplicates or allow the editing of existing entries from the feed where IDs match.

The XML Importer also allows the value of each mapped field to be run through a PHP function of your choosing (you can add your own functions) to allow an element of pre-processing before the value is saved to Symphony. This was originally developed for simple things like:

  • date processing
  • reverting incoming HTML to Markdown syntax
  • cross-referencing a value in another section (a custom lookup)

So this lets you pre-process data, so you could probably do a level of your crunching and analysis on the data as it comes in, and cache the results in the entry. That’d be faster than doing it at runtime.

If you put your text into Input or Textarea fields then you have the ability to use the MySQL REGEXP function in a Data Source for keyword matching. Alternatively you can try my Search Index extension which provides fulltext boolean search on a section via both normal Data Source filters (for searching one section) and a custom Data Source (for searching multiple sections at once).

XML Importer puts content into sections so you have the power of querying/filtering with a Data Source. If you need more flexible queries then it’s not too difficult to write basic SQL to join the relevant tables together and grab the data yourself. Obviously it depends on the level of complexity.

Nick, many thanks for the detailed description! I’m already using (and loving) XML Importer with cron for my recent site, but I forgot about the PHP pre-processor capability, and combined with regular expressions on the data source and XSLT, I think that would cover 90% of what I’m trying to do. Once I start building this thing I’ll probably be pestering you again :)

No probs :-) If a CMS can rival Django framework’s capabilities then we’re definitely all doing something right.

Agreed! This isn’t really a regular “web application” in the traditional sense, which I don’t think I would attempt with Symphony…but for this particular purpose I think Symphony has a leg up on most application frameworks (at least those I’ve tried) in terms of grabbing data from lots of sources — usually frameworks hook out to some other parser, like SimplePie or Universal Feed Parser. Which is fine, but it’s not nearly as integrated or slick as the XML Importer. I.E. with SimplePie, I’d still have to write all the code to get the data in the DB.

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements
  • PHP 5.2 or above
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 4.1 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts