Loading...
 
 Error
  • Contact the site administrator. The index needs rebuilding.
Tiki Importer

Tiki Importer


Mediawiki importer

Brazil
Use this topic to discuss issues related with the Mediawiki importer.

Hi Rodrigo, could see your message on IRC. I'm using Tiki version 5.1, PHP5.1.6, I don't know where the DOMdocument should be. The start of the mediawiki export is as follows :

<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="en">
Brazil

Hi tochinet,

DOMDocument is part of PHP. See http://php.net/manual/en/book.dom.php

Soon I will add a check in the code to print a user friendly message instead of the error if DOMDocument is not available. I have also updated the documentation, now DOMDocument is listed as a requirement.

In order to use the Mediawiki importer you have to enable DOMDocument on your PHP installation. It is enabled by default on recent versions.

The importer use DOMDocument to read the XML file, so it is not possible to use the importer without it.

Let me know if you net further assistance.

Brazil

Hi tochinet,

I have updated the code and now it checks if DOMDocument is available and if not display a user friendly error message. It is on trunk.

Thanks, Rodrigo.


Rodrigo, the two servers use PHP 5.1.6. I found (in /info.php) the following infos :

DOM/XML enabled
DOM/XML API Version 20031129
libxml Version 2.6.26
HTML Support enabled
XPath Support enabled
XPointer Support enabled
Schema Support enabled
RelaxNG Support enabled

but also a --disable-dom in the "Configure Command" table.

Any idea what the problem could be ?

According to other forums on the Net, a yum update php-xml solved the issue for some people, but my php-xml seems up to date: yum says "No package marked for updates".

Brazil

Hi tochinet,

Unfortunately I don't have any experience with yum. I use Ubuntu and the PHP compiled for it (and probably all debian based distros) have DOMDocument by default. PHP version is 5.3.2 but as far as I know XML support had major changes on PHP only from version 4 to 5. So I don't think that your problem is that you are running 5.1.6. Nonetheless it is a good idea to check that.

I was unable to find in which PHP version DOMDocument was added. If you find this information please let me know and I will add it to the documentation.

I can't say if you have XML support enabled. Apparently yes but this --disable-dom seems to state the contrary.

Further looking in forums, the --disable-dom seems to be a compiler?? option, and set by default. People that had a similar issue ans solved it (by upgrading, which didn't work for me) still had this set after the correction.

But ... strangely enough, it worked today. Still some quirks (some links seems to be translated in strange way, and strange error messages to investigate), but I got at least something.

About these errors :

- Notice: Undefined variable : page in ./lib/tikilib.php on line 4609
- Notice: Undefined offset : 1 in ./lib/filegals/filegallib.php on line 1261
- Warning: Missing argument 13 for sendWikiEmailNotification(), called in ./lib/tikilib.php on line 4587 and defined in ./lib/notifications/notificationemaillib.php on line 172

Brazil

Hi tochinet,

Maybe you manage to enable DOM but was missing restarting the web server. So after you reboot this last time you tried DOM was working. Just a guess.

Since I'm working again in the importer to create the Wordpress importer I would like to check those errors you mentioned. If it is not confidential data, could you send me a copy of the XML file you are importing? You can find my e-mail address on my user page. Will be much easier for me with the XML file.

Also what kind of links were not correctly translated? You have an example? Have you check the importer limitations: http://doc.tiki.org/Mediawiki%20importer#Known_issues

Thanks, Rodrigo.


Rodrigo -

I'm getting an error message as follows:

"XML file does not validate against the Mediawiki XML schema"

What am i doing wrong?

Ben

Brazil

Hi Ben,

This error probably means that your XML file is not valid according to the Mediawiki DTD (http://en.wikipedia.org/wiki/Document_Type_Definition) file.

Might be a problem in your XML file or in the version of it.

Which version of Mediawiki you are running? Can you tell me the version of the XML file (on the top of the XML file look for something like "http://www.mediawiki.org/xml/export-0.3/")?

The importer was tested with Mediawiki 1.14 but it is very likely that it works with other versions.

Maybe they have updated their XML definition, if no structural changes have been made should be very easy to support this new version.

Rodrigo


Rodrigo -

I may have started out here wrong. I'm trying to pull down data directly from wikipedia which i know is not the same as MediaWiki. By your comment back, wondering if this tool is simply good for pulling from MediaWiki. Do you know how to make it work for pulling from Wikipedia? There is a special export function in Wikipedia and this is what i was pulling from.

Really appreciate the help here.

Ben


Rodrigo -

If it is of any help. I've attached the copy of the file i was trying to bring in from Wikipedia.

Ben

Brazil

Hi Ben,

I think you forgot to attach the file. Anyway, I guess I know the problem. It is ok to import a Wikipedia file (Wikipedia is just one huge Mediawiki installation).

But the newer versions of Mediawiki generated a new XML file that as expected does not validate against the DTD of the old one. The importer works with version 0.3 (http://www.mediawiki.org/xml/export-0.3/) and now Mediawiki is using version 0.4 (http://www.mediawiki.org/xml/export-0.4/).

Next week I will take a look at this problem and if not much have changed from one version to the other I should be able to fix this issue.

If you are a programmer you can take a look at lib/importer/tikiimporter_wiki_mediawiki.php and see if you can fix the problem by yourself.

Thanks for reporting this problem, I was not aware of this new version of the XML file.

Brazil

Hi Ben,

I just added support for Mediawiki XML files 0.4 on trunk. For more information I'm copying a message I send to another developer on the Tiki devel list (he was also asking about support for version 0.4).

I suggest you try to import your Mediawiki file running Tiki trunk. Let me know if you need more information on how to get Tiki trunk running.

Cheers, Rodrigo.

Hi Jonny,

I guess I'm using schemaValidate() instead of validate() because the later generates a "no DTD found" error. I haven't investigate to understand why.

See r30477 it adds support for the Mediawiki XML file version 0.4. Apparently there is some validation problems in the DTD of the version 0.4. See the Mediawiki bug I have reported https://bugzilla.wikimedia.org/show_bug.cgi?id=25753

Now you should be able to import the file you were using to test http://en.wikipedia.org/wiki/Special:Export/Train

I have tested and I got some weird results. A big portion of this Wikipedia article is not displayed. I have checked and apparently its content is correctly parsed by TikiImporter_Wiki_Mediawiki::convertMarkup() (see test testConvertMarkupParserWikipediaSamplePage) and correctly added to the tiki_pages table. So apparently the problem is in Tiki parser when we try to display the page. I haven't checked more than this.

This problem might be related with the fact that in this article (and in most Wikipedia articles) the sintax is used a lot. Text_Wiki ignore this sintax. A solution might be to change Text_Wiki Tiki renderer to add ~np~ when rendering .

I'm not planning to put more time on this in the near future but let me know if you need any help.

Cheers, Rodrigo.

- I just added support for Mediawiki XML files 0.4 on trunk.

How generally usable is trunk just now? We've got a fair sized internal Mediawiki we'd dearly love to import to something better featured like Tiki, but it's only feasible if the conversion is automated. Obviously trunk can have no guarantees. But does it largely work, or are there so many loose ends that we'd kick ourselves if we tried to seriously use it after converting our internal docs to it?

Brazil

Hi,

Trunk is not stable and is not recommended for a production site.

I just backported the support for Mediawiki XML files version 0.4 from trunk to branch 6.x. So it should be included in version 6.1.

So I suggest you wait for version 6.1 or start your site running branches/6.x which is much more stable than trunk (after 6.1 is released all the commits are reviewed before being added to branches/6.x).

Note that the only thing I did was to change the importer to accept Mediawiki XML file 0.4. I haven't check what have changed from version 0.4 and version 0.3 and I did only a very basic test importing a Wikipedia article XML file version 0.4. Some changes might affect the importer. Please let me know if you find any issue.

Thanks, Rodrigo.


Brazil

Hi,

I have created a Forum for the Tiki Importer (include Mediawiki and Wordpress support, though Wordpress support is still under development). I will keep this topic for documentation purposes but locked.

Thanks, Rodrigo.