Saving web pages in different formats

Discussion in 'all things UNIX' started by vasa1, Nov 26, 2011.

Thread Status:
Not open for further replies.
  1. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,152
    Yesterday, Ocky & I had a bit of back and forth and the current post is based on matters arising.

    Saving web pages
    This is about Firefox. I'm using Aurora (~Firefox 10).
    I've installed MAF (Mozilla Archive Format 2.0.2 {7f57cf46-4467-4c2d-adfa-0cba7c507e54}) from AMO.
    With this add-on enabled, saving can be done in six ways:
    1. Web page, complete
    2. Web page, HTML only
    3. Web Archive, MAFF zipped
    4. Web Archive, MHTML
    5. Text Files
    6. All Files

    I'm interested in comparing 1, 3, and 4.

    With my PC disconnected from the internet, I opened a "conventional" web page (stored on my hard disk as "Web page, Complete" way before installing MAF).

    I saved it in the three formats and compared the space taken. (I know space is not relevant for many people.) Option 1 has two items, an HTML file and a folder. The other two are "single" files.
    Option 1: 100.1 kB (just the html file) + 19.5 MB (70 items in the folder) = 19.61 MB
    Option 3: 19.4 MB
    Option 4: 26.8 MB

    From this one example, it appears that there isn't much saving of space comparing 1 and 3, but with 3 there's the convenience of "seeing" and handling just one file as opposed to one file plus a folder of files. (It can be unzipped). Option 4 gives just one file in the true sense of the word, but the file size is larger.

    I don't have Opera installed and cannot comment on whether the Opera-generated .mht file would be superior (or not) to the .mht file generated with Firefox and the MAF add-on.
     
  2. Ocky

    Ocky Registered Member

    Joined:
    May 6, 2006
    Posts:
    2,677
    Location:
    George, S.Africa
    As per https://www.wilderssecurity.com/showpost.php?p=1978149&postcount=7 the .mht was smaller. Obviously the adblock filters were the same for html web page complete.
    Now why this is different to your findings is hard to say.
    It will be interesting to see what differences others may find.

    Pointless really, another 'test' showed the .mht quite a bit larger. o_O

    Edit: Maff is OK, but several steps instead of one click.
     
    Last edited: Nov 26, 2011
  3. guest

    guest Guest

    I can test, but I have a few questions first:

    - Should I disable all others extensions that somewhat "edit" the webpage content (like ADB+ filters and NoScript) first?

    - Which webpage(s) are you interested in seeing my results?

    - Will this MAF extension allow me to save, for example, a youtube webpage with all its contents (including the flash or html5 video)?

    Thank you!
     
  4. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,152
    1. Not necessary to disable any other extensions.
    2. It should be a page that doesn't change during your saving the three different ways. So a static page, or a page that is unlikely to change in a few seconds should be fine.
    3. I very much doubt that.
     
  5. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,152
    Why do you say so? If you save as .maff once, Firefox remembers that setting and will append .maff to the suggested name the next time. So example.html will be saved as example.html.maff. Or are you meaning something else?
     
  6. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,152
    Some more on maff:
    http://www.ghacks.net/2009/11/13/save-websites-with-mozilla-archive-format/
    .............​
    http://en.wikipedia.org/wiki/Mozilla_Archive_Format
    I have no idea how this will work on YouTube or embedded audio, video o_O
    .............​
    http://maf.mozdev.org/documentation.html <<< has a lot of basic information. Worth a read.
    .............​

    Special note for Daveski: Mozilla Archive Format is an add-on for the Firefox and SeaMonkey browsers
     
    Last edited: Nov 26, 2011
  7. Ocky

    Ocky Registered Member

    Joined:
    May 6, 2006
    Posts:
    2,677
    Location:
    George, S.Africa
    $ du -ah
    28K ./Linux for newbies Opera.mht >>>>>>>>>>>>>>>(save as .mht with Opera - default)
    32K ./Linux for newbies with Maff.mht >>>>>>>>>>>(save as archive and coverted to mhtml using Maff)
    4.0K ./Linux for newbies_files/belug100.css
    8.0K ./Linux for newbies_files
    28K ./Linux for newbies.html

    http://www.linfo.org/newbies.html was the page.

    The last 3 are again with Opera but save as web page complete. Ignore the 4.0K and 8.0K
    du command doesn't seem to do this right. The folder and contents are only 4.82K

    That's right, but you still need to convert to mhtml, or do you run it from archive ?

    EDIT: Maff archive is only 10.4KB ! Pretty good.
     
  8. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,152
    There are the two usual ways: with Aurora already open, use ctrl+O and navigate to where the file is and open it
    or just double click on the maff file (wherever it is) after associating the maff extension with Aurora as we talked about in the other thread.

    I don't have to convert it to .mht. The .maff is good to go.

    Edit: double-clicking the .maff archive to open it as a web page is possible but doesn't seem advisable (at least with my current knowledge being what it is).
     
    Last edited: Nov 26, 2011
  9. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,152
    Associating the maff extension with Aurora may not be a good idea. That's because .maff is actually .zip and changing its association to Aurora could have unpleasant possibilities.

    So if I want to open a .maff archive as a web page, I'll have to first open Aurora and proceed via Ctrl+O.
     
  10. guest

    guest Guest

    Surprisingly moderately hard criteria, lol. I decided for this one: http://www.marilynvossavant.com/

    My results:

    Option 1:
    HTML Document file named Marilyn vos Savant _ Home.htm
    Size: 14.8 KB (15,222 bytes)
    Size on disk: 16.0 KB (16,384 bytes)
    Folder (with 42 files) named Marilyn vos Savant _ Home_files
    Size: 178 KB (183,110 bytes)
    Size on disk: 284 KB (290,816 bytes)

    Option 3:
    MAFF file named Marilyn vos Savant _ Home.maff
    Size: 169 KB (173,818 bytes)
    Size on disk: 172 KB (176,128 bytes)

    Option 4:
    MHTML Document file named Marilyn vos Savant _ Home.mht
    Size: 262 KB (268,550 bytes)
    Size on disk: 264 KB (270,336 bytes)

    Going to test now. ;)
     
  11. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,152
    Thanks for testing!

    Since you're on Windows, how will you open the .maff archive? In Ubuntu, double-clicking on the .maff file opens an archive manager by default since the archive is basically a .zip file. I can change things to make Aurora open the .maff file as a web page when I double-click on the .maff file but that may not be a good idea. So I'll have to use Ctrl+O (or Alt+F, O) to open the .maff file from within the browser.
    How are you going to do that? And are you hoping to actually download embedded content (audio/video)? I doubt it's possible but if you manage that will be wonderful!
     
  12. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,152
  13. guest

    guest Guest

    I opened the MAFF file with my locally installed Portable Firefox. I just manually selected the PortableFirefox.exe, as I did with the others.

    Except for the MAFF file, the others can be opened by default with Internet Explorer with no loss of functionality (the saved webpage displays normally). But if I try to open the MAFF file with Internet Explorer, it asks me to save the MAFF file or find a program online to open it, lol.

    The extension is unable to save videos from youtube, be them in flash or html5, to offline execution. Tried every option with different youtube videos (some in flash, some in html5).
     
    Last edited by a moderator: Nov 26, 2011
  14. guest

    guest Guest

    Oh, and Windows doesn't recognize the MAFF file by default. But 7-Zip was able to open the MAFF file here as if it was a ZIP file.
     
    Last edited by a moderator: Nov 26, 2011
  15. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,152
    Makes sense. So the way to handle .maff would be to open it from within Firefox and not from the file manager.

    As for the other stuff, we'll just have to be content with available add-ons for saving videos.
     
  16. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,152
    I tried "maffing" a YouTube video (http://www.youtube.com/watch?v=5qgigImnw4Q) and this page from gHacks. In both cases, I could see that a .swf file (Shockwave Flash file) was present in the archive. Both just seem to be the video container (I'm not sure of the correct term). Both were 211.7 kB. So it's no go, just as guest found.
     
Loading...
Thread Status:
Not open for further replies.