• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Any XSL gurus here?

Status
Not open for further replies.
I just can't wrap my head around a programming language that doesn't allow variables.

I have a tree like this:
Code:
<a>
  <b>
    <c>
      <value>blah</value>
      <value>foo</value>
      <value>bar</value>
      <value>foo</value>
      ...

I need to parse it and find the most commonly used value (in the tree segment above the XSL would print out "foo" for instance, because it appears twice).

I can't figure out how to do this without storing some sort of state. The only means of simulating a state I can find is recursion, which is out of the question in this situation, because there are many distinct values of Value in the full tree (400+).

Any suggestions?
 

Nerevar

they call me "Man Gravy".
I use lots of XSL, and the only way to do it (AFAIK) is through recursion. Create a string that you can parse for values, and update the string while you recurse through it. Because of the limits on variables (i.e. all variables are static), you can't really do it any other way.

Alternatively, if all the XML values are in the same level of the tree, you can probaly use base XSL / XMLpath functions to retrieve the values. I imagine you can use an xpath query to retrieve the number of all subnodes with a given name, then do a max() on the final values.

I guess it really depends on whether or not you know what values are going to be in those nodes.
 
Nerevar said:
Alternatively, if all the XML values are in the same level of the tree, you can probaly use base XSL / XMLpath functions to retrieve the values. I imagine you can use an xpath query to retrieve the number of all subnodes with a given name, then do a max() on the final values.

That's the direction I've been looking. The algorithm I've been thinking about is:

1) Get the distinct values in the sequence
2) Get the indices of the distinct values in the sequence
3) Perform a count on the indices, which will get the number
4) Somehow put the count into a list
5) Find the max of the count
6) Find the index of the count where the max is
7) Find the corresponding index in the distinct values.

(A,A,A,B,B,B,B,C,C)
1) A,B,C
2) (1,2,3),(4,5,6,7),(8,9)
3) 3,4,2
4) 3,4,2
5) 4
6) index = 2
7) distinct-values[2] = B

The problem is in steps 3-4- the only way I can find to get the number of occurances of something in a list is to get a separate list (of the indices) and count that. Due to the lack of variables, however, I have no way to put the list back together to work on the remaining steps.
 
Depending on which XML Engine you are using, i would be inclined to accomplish this using script to iterate over the nodes... of course it then depends what you want to do with the result... :--

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:extra="urn:extra-functions" exclude-result-prefixes="extra">
<xsl:eek:utput method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<msxsl:script xmlns:msxsl="urn:schemas-microsoft-com:xslt" language="VBScript" implements-prefix="extra">
Function GetMostFrequent(nodList)
sReturn=""
For i = 0 to nodList.length -1
...
' Calculate the most frequent node value / xpath expression
...
Next
End Function
</msxsl:script>
<xsl:variable name="NodeListToTest" select="//value"/>
<xsl:variable name="Frequent"><xsl:value-of select="extra:GetMostFrequent($NodeListToTest)"/>
</xsl:variable>

...

Using script in the transform can hobble performance, so if performance is critical, it's not advised, but i've found that it works out ok.
 

Nerevar

they call me "Man Gravy".
Mister Zimbu said:
That's the direction I've been looking. The algorithm I've been thinking about is:

1) Get the distinct values in the sequence
2) Get the indices of the distinct values in the sequence
3) Perform a count on the indices, which will get the number
4) Somehow put the count into a list
5) Find the max of the count
6) Find the index of the count where the max is
7) Find the corresponding index in the distinct values.

(A,A,A,B,B,B,B,C,C)
1) A,B,C
2) (1,2,3),(4,5,6,7),(8,9)
3) 3,4,2
4) 3,4,2
5) 4
6) index = 2
7) distinct-values[2] = B

The problem is in steps 3-4- the only way I can find to get the number of occurances of something in a list is to get a separate list (of the indices) and count that. Due to the lack of variables, however, I have no way to put the list back together to work on the remaining steps.

if you can get the distinct values in the sequence, you don't have to create a list of the indices of the distinct values to find which one is the max. You simply can call :

//a/b/c/value[value() = 'A']

and it will return all the nodes that have that value. You can count the number of nodes you have at that point and just return the value of the one with the maximum number of nodes. Of course, that only works if you know exactly what it is you're querying on.

(the syntax of that might be wrong, but you get the idea).
 

maharg

idspispopd
XSLT is a pure functional programming language. Recursion is the correct way of doing iteration. Why do you feel it's unacceptable? A correct implementation of xslt will optimize the recursion as all functional interpreters do.

However, the use of an xpath query would probably achieve the same thing if your structure really is that simple.
 
And XSLT *does* have variables

XSL does have variables as in <xsl:variable>, but the problem Mister Zimbu has with them is once they are assigned, they cannot be reassigned until they go out of scope...

If you could reassign and xsl:variable, then the above problem would be easy for people used to non-functional programming.


Using script in XSLT is not the most elegant way to accomplish things, but it can make things a lot easier, especially if it's XSL 1.0, and there are no result-trees node-sets functions available.

It's simply a different way of thinking using recursive functions, and anathema to some people
 

iapetus

Scary Euro Man
Anyone suggesting using godawful non-standard MS scripting languages as part of a solution to an XSL problem deserves to be slapped.

Then slapped again for claiming that performance is the primary problem with this approach.

It's like suggesting that the best way to create a linked list in Java is to use JNI to access a COBOL.Net implementation.
 
maharg said:
XSLT is a pure functional programming language. Recursion is the correct way of doing iteration. Why do you feel it's unacceptable? A correct implementation of xslt will optimize the recursion as all functional interpreters do.

However, the use of an xpath query would probably achieve the same thing if your structure really is that simple.

I feel recursion-only is out of the question if the sheer size of your data set is too large. I have 480 distinct values to search through, which means I have to potentially recurse through a function call that many times. That overflows the stack easily (I'm using Saxon as my parser, by the way).

Javascript and whatnot is out of the question; I'm not outputting HTML, just plaintext. This script isn't meant to be run through a browser; rather, just from a command prompt.

The actual XML document itself is large and more complicated than the example I gave. It's an XML export of an iTunes song database.
 
It's like suggesting that the best way to create a linked list in Java is to use JNI to access a COBOL.Net implementation.

I never said it was the best way, just that it could be done. Just offering an out to someone who "I just can't wrap my head around a programming language that doesn't allow variables."

Nice that you care about elegance tho :D
 
One way to do this avoiding both script :lol and recursion is to use xsl:key and the generate-id function to identify the unique values, and append a count attribute. Then this nodeset can easily be processed with a max template...
 

maharg

idspispopd
Mister Zimbu said:
I feel recursion-only is out of the question if the sheer size of your data set is too large. I have 480 distinct values to search through, which means I have to potentially recurse through a function call that many times. That overflows the stack easily (I'm using Saxon as my parser, by the way).

ok, you missed my point. What you don't seem to understand is that in a pure functional language, recursion is not as expensive as it is in a procedural language. In fact, tail recursion is essentially the same thing as looping, so if you want to optimize, that's the way you want to go.

Trying to force xslt into a procedural model will give you far more performance problems than using it as it was intended. I suggest that you try and do it the natural way and optimize if it's a problem. You don't seem to have actually written anything yet, but you're already looking for ways to optimize it.

Use the Knuth, luke. I also suggest you read up on Functional Programming Languages and Tail Recursion before trying to overoptimize a program written in a functional language.

Here is an article specifically on optimization of recursive algorithms in xsl that should prove valuable as well. Saxon apparently supports (or supported) tail recursion optimization on at least self-referential functions, so you're in luck. That was in 2003, so it may even support it on all forms of tail recursion by now.

Not sure how other xsl processors deal with it. As of 2003, there were a couple of others that did, but msxml and libxslt didn't. That might have changed as well.

(note: none of this should be taken to apply to the actual problem at hand, since an xpath will probably solve it well enough. This is meta-discussion about avoiding recursion in a functional language like xslt)
 
Thanks for the help. I'm trying the recursive approach, but am getting stack overflows, despite the fact that the last call I'm making is to the function:

Code:
<xsl:template name="checkAlbum">
 <xsl:param name="numAlbums" />
 <xsl:param name="name" />
 <xsl:param name="tree" />
 <xsl:param name="fullTree />

 <xsl:choose>
  <xsl:when test="empty($tree)">
   <xsl:value-of select="$name" />
  </xsl:when>
  <xsl:otherwise>
   <xsl:variable name="occur" select="(xpath query stuff)" />
   <xsl:variable name="newName">
    <xsl:choose>
     <xsl:when test="$occur &gt; $numAlbums">
      <xsl:value-of select="$tree[1]" />
     </xsl:when>
     <xsl:otherwise>
      <xsl:value-of select="$name" />
     </xsl:otherwise>
    </xsl:choose>
   </xsl:variable>
   <xsl:call-template name="checkAlbum">
    <xsl:with-param name="name" select="$newName" />
    <xsl:with-param name="numAlbums" select="$occur" />
    <xsl:with-param name="fullTree" select="$fullTree" />
    <xsl:with-param name="tree" select="$tree" />
   </xsl:call-template>
  </xsl:otherwise>
 </xsl:choose>
</xsl:template>

Anything wrong with that? Am I not using tail recursion properly?

Please don't comment on other things (even basic functional things, like whetehr the algorithm is correct or not, and I realize I probably don't have to copy the whole tree around), I can fix them later.

Thanks.
 
Status
Not open for further replies.
Top Bottom