CÁCH CHUYỂN ĐỒI TOÀN BỘ BÀI VIẾT TRANG WEB THÀNH MỘT FILE PDF ĐỂ LƯU TRỮ

23/04/2021

Admin

[tintuc]

BÀI VIẾT HƯỚNG DẪN LẤY NGUYÊN TRANG WEB CHỨA THÔNG TIN THUỐC (VÍ DỤ WWW.THUOC.NET.VN) THÀNH MỘT FILE PDF.

Một số yêu cầu cấu hình máy:

Máy chạy linux hoặc Ubuntu
Cài đặt các chương trình như wkhtmltopdf, wget, ...
Nếu chạy trên windows bằng trình duyệt cài jupyterhub (inbox để hướng dẫn thêm,

Save a list of Web pages as PDF file

· First install wkhtmltopdf conversion tool (this tool requires desktop environment; source):

· sudo apt install wkhtmltopdf

· Then create a file that contains a list of URLs of multiple target web pages (each on new line). Let's call this file url-list.txt and let's place it in ~/Downloads/PDF/. For example its content could be:

· https://askubuntu.com/users/721082/tarek

· https://askubuntu.com/users/566421/pa4080

· And then run the next command, that will generate a PDF file for each site URL, located into the directory where the command is executed:

· while read i; do wkhtmltopdf "$i" "$(echo "$i" | sed -e 's/https\?:\/\///' -e 's/\//-/g' ).pdf"; done < ~/Downloads/PDF/url-list.txt

The result of this command - executed within the directory ~/Downloads/PDF/ - is:

~/Downloads/PDF/$ ls -1 *.pdf

askubuntu.com-users-566421-pa4080.pdf

askubuntu.com-users-721082-tarek.pdf

· Merge the output files by the next command, executed in the above directory (source):

· gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=merged-output.pdf $(ls -1 *.pdf)

The result is:

~/Downloads/PDF/$ ls -1 *.pdf

askubuntu.com-users-566421-pa4080.pdf

askubuntu.com-users-721082-tarek.pdf

merged-output.pdf

Save an entire Website as PDF file

· First we must create a file (url-list.txt) that contains URL map of the site. Run these commands (source):

· TARGET_SITE="https://www.yahoo.com/"

· wget --spider --force-html -r -l2 "$TARGET_SITE" 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '\.$css\|js\|png\|gif\|jpg$$' > url-list.txt

· Then we need go through the steps from the above section.

Create a script that will Save an entire Website as PDF file (recursively)

· To automate the process we can bring all together in a script file.

· Create an executable file, called site-to-pdf.sh:

· mkdir -p ~/Downloads/PDF/

· touch ~/Downloads/PDF/site-to-pdf.sh

· chmod +x ~/Downloads/PDF/site-to-pdf.sh

· nano ~/Downloads/PDF/site-to-pdf.sh

· The script content is:

· #!/bin/sh

· TARGET_SITE="$1"

· wget --spider --force-html -r -l2 "$TARGET_SITE" 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '\.$css\|js\|png\|gif\|jpg\|txt$$' > url-list.txt

· while read i; do wkhtmltopdf "$i" "$(echo "$i" | sed -e 's/https\?:\/\///' -e 's/\//-/g' ).pdf"; done < url-list.txt

· gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=merged-output.pdf $(ls -1 *.pdf)

Copy the above content and in nano use: Shift+Insert for paste; Ctrl+O and Enter for save; Ctrl+X for exit.

· Usage:

The answer to the original question:

Convert multiple PHP files to one PDF (recursively)

· First install the package enscript, which is a 'regular file to pdf' conversion tool:

· sudo apt update && sudo apt install enscript

· Then run the next command, that will generate file called output.pdf, located into directory where the command is executed, which will contains the content of all php files within /path/to/folder/ and its sub-directories:

· find /path/to/folder/ -type f -name '*.php' -exec printf "\n\n{}\n\n" \; -exec cat "{}" \; | enscript -o - | ps2pdf - output.pdf

· Example, from my system, that generated this file:

· find /var/www/wordpress/ -type f -name '*.php' -exec printf "\n\n{}\n\n" \; -exec cat "{}" \; | enscript -o - | ps2pdf - output.pdf

*** Xem bài viết gốc

Sản phẩm và Dịch vụ của Nhóm Nghiên Cứu Thuốc:
Nghiên cứu khoa học và phát triển công nghệ bào chế thuốc, thuốc thú y, thuốc thủy sản, thực phẩm chức năng, mỹ phẩm,...
Chuyển giao công nghệ và qui trình sản xuất thuốc, thuốc thú y, thuốc thủy sản, thực phẩm chức năng, mỹ phẩm,...

Chuyển giao công nghệ và qui trình chiết xuất dược liệu, hợp chất thiên nhiên,.. theo chuẩn GMP

Đào tạo các chuyên ngành bào chế, sản xuất, QA, QC
Bán các công thức đã bào chế hiệu quả, an toàn, giá sản phẩm hợp lý và triển khai sản xuất ổn định
Tư vấn xây dựng TCCS và PPKN tương thích các hệ thống quản lý chất lương GxP

Xem thêm danh mục tại đây: Danh mục chuyển giao công nghệ, Danh mục công thức

Chi tiết vui lòng liên hệ Hotline CSKH: 0888.999.311 | info@nghiencuuthuoc.com | nghiencuuthuoc@gmail.com

Nhóm Nghiên Cứu Thuốc - Đồng hành cùng Doanh nghiệp