javapoiword转html

作者: 喂---出来啦
来源: 51数据库
2020-06-05

1. 如何使用apache poi将word转化为html

Java可以使用这个开源框架，对word进行读取合并等操作，Apache POI是一个开源的利用Java读写Excel、WORD等微软OLE2组件文档的项目。最新的3.5版本有很多改进，加入了对采用OOXML格式的Office 2007支持，如xlsx、docx、pptx文档。示例如下：import org.apache.poi.POITextExtractor;

import org.apache.poi.hwpf.extractor.WordExtractor;

//得到.doc文件提取器

org.apache.poi.hwpf.extractor.WordExtractor doc = new WordExtractor(new FileInputStream(filePath));

//提取.doc正文文本

String text = doc.getText();

//提取.doc批注

String[] comments = doc. getCommentsText();

2007

import org.apache.poi.POITextExtractor;

import org.apache.poi.xwpf.extractor.XWPFWordExtractor;

import org.apache.poi.xwpf.usermodel.XWPFComment;

import org.apache.poi.xwpf.usermodel.XWPFDocument;

//得到.docx文件提取器

org.apache.poi.xwpf.extractor.XWPFWordExtractor docx = new XWPFWordExtractor(POIXMLDocument.openPackage(filePath));

//提取.docx正文文本

String text = docx.getText();

//提取.docx批注

org.apache.poi.xwpf.usermodel.XWPFComment[] comments = docx.getDocument()).getComments();

for(XWPFComment comment:comments){

comment.getId（)；//提取批注Id

comment.getAuthor（)；//提取批注修改人

comment.getText（)；//提取批注内容

}

2. 如何java程序将 word转换成html

Dispatch wordfile = Dispatch.invoke(

wordacc,

"Open",

Dispatch.Method,

new Object[] { ls_word, new Variant(false),

new Variant(true) }, new int[1]).toDispatch();

Dispatch.invoke(wordfile, "SaveAs", Dispatch.Method, new Object[] {

ls_html, new Variant(8) }, new int[1]);

Variant f = new Variant(false);

// 编写生成的html

Dispatch.call((Dispatch) wordfile, "Close", (Object) f);

在执行Dispatch.invoke(wordfile, "SaveAs", Dispatch.Method, new Object[] {

ls_html, new Variant(8) }, new int[1]);

jacob.jar

jacob-1.14-x86.dll jacob-1.14-x64.dll 这些的位置有什么要求

抛出异常路径绝对没有错

com.jacob.com.ComFailException: Invoke of: SaveAs

Source: Microsoft Word

Description：这不是有效文件名。

请试用下列方法：

* 检查路径，确认键入无误。

* 从文件和文件夹列表中选择文件。

at com.jacob.com.Dispatch.invokev(Native Method)

at com.jacob.com.Dispatch.invokev(Dispatch.java:858)

at com.jacob.com.Dispatch.invoke(Dispatch.java:502)

3. 怎样用Java把word文档转换为html文档

在线学习的话应该是B/S模式吧，如果楼主是想将我word内容连同样式一起转换成html有两种方法

一种是手动将要上传的word文件另存为html文件，并将html文件传进服务器，由浏览器打开就行

另一种是使用控件将要上传的word文件内容转成html代码，现在较好的控件有FCKeditor,eWebEditor，前者免费，后者精简版免费，商业版支持直接上传word文件转成html代码，不过是收费的

源码的话真的没有，本人也是最近要做类似的项目，现学现卖的~

eWebEditor主页：

FCKeditor主页：

4. JAVA中如何把WORD文档直接转换成html

jacob是java和windows下的com桥，通过它我们可以在java程序中调用COM组件。

如果你的JDK是1。4，那你需要下载jacob1。

9的jni库才能正常运行，早期版本在JDK1。4下有些问题。

package com;/** * Title:Word文档转html类 * Description: * Copyright:() 2002 * @author 舵手 * @version 1。 0 */import com。

jacob。com。

*;import com。jacob。

activeX。*; public class WordtoHtml { /** *文档转换函数 *@param docfile word文档的绝对路径加文件名（包含扩展名） *@param htmlfile 转换后的html文件绝对路径和文件名（不含扩展名） */ public static void change(String docfile, String htmlfile) { ActiveXComponent app = new ActiveXComponent("Word。

Application")；// 启动word try { app。setProperty("Visible", new Variant(false)）； //设置word不可见 Object docs = app。

getProperty("Documents")。 toDispatch(); Object doc = Dispatch。

invoke(docs,"Open",Dispatch。Method,new Object[] { docfile, new Variant(false),new Variant(true) }, new int[1])。

toDispatch（)； // 打开word文件 Dispatch。invoke(doc, "SaveAs", Dispatch。

Method, new Object[] {htmlfile, new Variant(8) }, new int[1]）； // 作为html格式保存到临时文件 Variant f = new Variant(false); Dispatch。 call(doc, "Close", f); } catch (Exception e) { e。

printStackTrace(); } finally { app。invoke("Quit", new Variant[]{}); } } public static void main(String[] strs){ WordtoHtml。

change("c:\\a\\运输管理调度系统总体方案。doc", "c:\\a\\t"); }}。

5. java 有关word,excel,pdf转换成html 有几种方式

java将Word/Excel/PDF文件转换成HTML整理项目开发过程中，需求涉及到了各种文档转换为HTML或者网页易显示格式，现在将实现方式整理如下：一、使用Jacob转换Word,Excel为HTML “JACOB一个Java-COM中间件.通过这个组件你可以在Java应用程序中调用COM组件和Win32 libraries。”

首先下载Jacob包，JDK1.5以上需要使用Jacob1.9版本（JDK1.6尚未测试），与先前的Jacob1.7差别不大1、将压缩包解压后，Jacob.jar添加到Libraries中；2、将Jacob.dll放至“WINDOWS\SYSTEM32”下面。需要注意的是：【使用IDE启动Web服务器时，系统读取不到Jacob.dll，例如用MyEclipse启动Tomcat，就需要将dll文件copy到MyEclipse安装目录的“jre\bin”下面。

一般系统没有加载到Jacob.dll文件时，报错信息为：“java.lang.UnsatisfiedLinkError: no jacob in java.library.path”】新建类：1public class JacobUtil 2{ 3 public static final int WORD_HTML = 8; 4 5 public static final int WORD_TXT = 7; 6 7 public static final int EXCEL_HTML = 44; 8 9 /** *//** 10 * WORD转HTML 11 * @param docfile WORD文件全路径 12 * @param htmlfile 转换后HTML存放路径 13 */ 14 public static void wordToHtml(String docfile, String htmlfile) 15 { 16 ActiveXComponent app = new ActiveXComponent("Word.Application")； // 启动word 17 try 18 { 19 app.setProperty("Visible", new Variant(false)); 20 Dispatch docs = app.getProperty("Documents").toDispatch(); 21 Dispatch doc = Dispatch.invoke( 22 docs, 23 "Open", 24 Dispatch.Method, 25 new Object[] { docfile, new Variant(false), 26 new Variant(true) }, new int[1]).toDispatch(); 27 Dispatch.invoke(doc, "SaveAs", Dispatch.Method, new Object[] { 28 htmlfile, new Variant(WORD_HTML) }, new int[1]); 29 Variant f = new Variant(false); 30 Dispatch.call(doc, "Close", f); 31 } 32 catch (Exception e) 33 { 34 e.printStackTrace(); 35 } 36 finally 37 { 38 app.invoke("Quit", new Variant[] {}); 39 } 40 } 41 42 /** *//** 43 * EXCEL转HTML 44 * @param xlsfile EXCEL文件全路径 45 * @param htmlfile 转换后HTML存放路径 46 */ 47 public static void excelToHtml(String xlsfile, String htmlfile) 48 { 49 ActiveXComponent app = new ActiveXComponent("Excel.Application")； // 启动word 50 try 51 { 52 app.setProperty("Visible", new Variant(false)); 53 Dispatch excels = app.getProperty("Workbooks").toDispatch(); 54 Dispatch excel = Dispatch.invoke( 55 excels, 56 "Open", 57 Dispatch.Method, 58 new Object[] { xlsfile, new Variant(false), 59 new Variant(true) }, new int[1]).toDispatch(); 60 Dispatch.invoke(excel, "SaveAs", Dispatch.Method, new Object[] { 61 htmlfile, new Variant(EXCEL_HTML) }, new int[1]); 62 Variant f = new Variant(false); 63 Dispatch.call(excel, "Close", f); 64 } 65 catch (Exception e) 66 { 67 e.printStackTrace(); 68 } 69 finally 70 { 71 app.invoke("Quit", new Variant[] {}); 72 } 73 } 74 75} 76当时我在找转换控件时，发现网易也转载了一偏关于Jacob使用帮助，但其中出现了比较严重的错误：String htmlfile = "C:\\AA"；只指定到了文件夹一级，正确写法是String htmlfile = "C:\\AA\\xxx.html"；到此WORD/EXCEL转换HTML就已经差不多了，相信大家应该很清楚了：）二、使用XPDF将PDF转换为HTML1、下载xpdf最新版本，地址：我下载的是xpdf-3.02pl2-win32.zip2、下载中文支持包我下载的是xpdf-chinese-simplified.tar.gz3、下载pdftohtml支持包地址：/我下载的是：pdftohtml-0.39-win32.tar.gz4、解压调试1）先将xpdf-3.02pl2-win32.zip解压，解压后的内容可根据需要进行删减，如果只需要转换为txt格式，其他的exe文件可以删除，只保留pdftotext.exe，以此类推；2）然后将xpdf-chinese-simplified.tar.gz解压到刚才xpdf-3.02pl2-win32.zip的解压目录；3）将pdftohtml-0.39-win32.tar.gz解压，pdftohtml.exe解压到xpdf-3.02pl2-win32.zip的解压目录；4）目录结构：+---[X:\xpdf] |-------各种转换用到的exe文件 | |-------xpdfrc | +------[X:\xpdf\xpdf-chinese-simplified] | | +-------很多转换时需要用到的字符文件xpdfrc：此文件是用来声明转换字符集对应路径的文件5）修改xpdfrc文件（文件原名为sample-xpdfrc）修改文件内容为：Txt代码 #----- begin Chinese Simplified support package cidToUnicode Adobe-GB1 xpdf-chinese-simplified\Adobe-GB1.cidToUnicode unicodeMap ISO-2022-CN xpdf-chinese-simplified\ISO-2022-CN.unicodeMap unicodeMap EUC-CN xpdf-chinese-simplified\EUC-CN.unicodeMap unicodeMap GBK xpdf-chinese-simplified\GBK.unicodeMap cMapDir Adobe-GB1 xpdf-chinese-simplified\CMap toUnicodeDir xpdf-chinese-simplified\CMap fontDir C:\WINDOWS\Fonts displayCIDFontTT Adobe-GB1 C:\WINDOWS\Fonts\simhei.ttf #----- end Chinese Simplified support package 6。

6. poi的word转html,怎么显示修订内容的最终状态

实现代码如下：public class Word2Html { public static void main(String argv[]) { try { //word 路径 html输出路径 convert2Html("D:/doctohtml/1.doc","D:/doctohtml/1.html"); } catch (Exception e) { e.printStackTrace(); } } public static void writeFile(String content, String path) { FileOutputStream fos = null; BufferedWriter bw = null; try { File file = new File(path); fos = new FileOutputStream(file); bw = new BufferedWriter(new OutputStreamWriter(fos,"utf-8")); bw.write(content); } catch (FileNotFoundException fnfe) { fnfe.printStackTrace(); } catch (IOException ioe) { ioe.printStackTrace(); } finally { try { if (bw != null) bw.close(); if (fos != null) fos.close(); } catch (IOException ie) { } } } public static void convert2Html(String fileName, String outPutFile) throws TransformerException, IOException, ParserConfigurationException { HWPFDocument wordDocument = new HWPFDocument(new FileInputStream(fileName));//WordToHtmlUtils.loadDoc(new FileInputStream(inputFile)); WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter( DocumentBuilderFactory.newInstance().newDocumentBuilder() .newDocument()); wordToHtmlConverter.setPicturesManager( new PicturesManager() { public String savePicture( byte[] content, PictureType pictureType, String suggestedName, float widthInches, float heightInches ) { //html 中图片标签中显示的图片路路径 return "d:/doctohtml/"+suggestedName; } } ); wordToHtmlConverter.processDocument(wordDocument); //save pictures List pics=wordDocument.getPicturesTable().getAllPictures(); if(pics!=null){ for(int i=0;iword中图片的存储路径 pic.writeImageContent(new FileOutputStream("D:/doctohtml/" + pic.suggestFullFileName())); } catch (FileNotFoundException e) { e.printStackTrace(); } } } Document htmlDocument = wordToHtmlConverter.getDocument(); ByteArrayOutputStream out = new ByteArrayOutputStream(); DOMSource domSource = new DOMSource(htmlDocument); StreamResult streamResult = new StreamResult(out); TransformerFactory tf = TransformerFactory.newInstance(); Transformer serializer = tf.newTransformer(); serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8"); serializer.setOutputProperty(OutputKeys.INDENT, "yes"); serializer.setOutputProperty(OutputKeys.METHOD, "html"); serializer.transform(domSource, streamResult); out.close(); writeFile(new String(out.toByteArray()), outPutFile); }}。

转载请注明出处51数据库 » javapoiword转html